datacleaner.DataCleaner-xml-config.5.4.1.source-code.configuration.xsd Maven / Gradle / Ivy
Defines the catalog of datastores that are usable
as input sources for the analysis jobs.
Defines the catalog of reference data, containing
Dictionaries, Synonyms
etc.
Defines a multi-threaded task runner, enabling
processing of records in parallel.
Defines a single-threaded task runner, which means
all records will be processed sequentially, in the same thread.
Defines which java packages should be scanned for
components to use in AnalyzerBeans.
Select this storage provider to use a storage
provider that combines different technologies based on the
storage
entity type. This is the recommended storage provider
for typical
data profiling needs as it allows to combine
in-memory row
annotations (for previewing in profiling results)
with database
backed collections (for intermediary results).
Select this storage provider to store staging data
and intermediary results in memory. This is by far the best
performing
storage provider but it also brings in the risk of
running out of memory for very large jobs.
Defines a custom component by a class name and properties values
Defines which java packages should be scanned for
components to use in DataCleaner.
Defines a datastore based on a JDBC database
connection.
Defines a datastore based on a MS Access database
file.
Defines a datastore based on a Comma-separated
file.
Defines a datastore based on a Salesforce.com
account.
Defines a datastore based on a SugarCRM system.
Defines a datastore based on an Apache HBase
database.
Defines a datastore based on a MongoDB database.
Defines a datastore based on an ElasticSearch
index.
Defines a datastore based on an Cassandra
index.
Defines a datastore based on an Apache CouchDB
database.
Defines a datastore based on a Neo4j graph
database.
Defines a datastore based on a fixed width value
file.
Defines a datastore based on a directory of SAS
data
sets.
Defines a datastore based on a MS Excel spreadsheet
file.
Defines a datastore based on a JSON file.
Defines a datastore based on a dBase database file.
Defines a datastore based on a OpenOffice.org
database file.
Defines a datastore based on an XML file.
Defines an in-memory datastore based on Plain Old
Java Objects (POJOs).
Defines a composite datastore, which allows to
virtually treat several datastores as a single datastore.
Defines a custom datastore based on a class
implementing the Datastore interface.
Indicates whether multiple connections (aka.
connection pooling) may be created or not. Connection pooling
is
preferred for performance reasons, but can safely be
disabled if not desired. The max number of connections cannot
be configured,
but no more connections than the number of
threads in the task runner should be expected.
The row number (1-based) of the header line. If
no
header line is present, use 0.
The index (1-based) of the header line. If no
header line is present, use 0.
Defines whether the regex matcher should match the
whole string or if just a subsequence match is sufficient.
Sets a threshold upon the number of annotated rows to
store in memory. Any additional rows will be discarded, although
the
counter will still handle them correctly.
Sets a threshold upon the number of sample sets with
annotated rows to
store in memory.
Sets the path for the directory to use for on-disk
storage. This is optional as Hsqldb will otherwise automatically
assign a temporary directory for the purpose.
Sets the path for the directory to use for on-disk
storage. This is optional as H2 will otherwise
automatically assign
a
temporary directory for the purpose.
Sets the maximum available number of threads that the
thread pool may assign. Don't set this value lower than 5 as it may
cause serious performance penalties from threads waiting on each
other.
Adds a property to this custom type. Properties are
mapped to fields in the corresponding class that are annotated
with
the @Configured annotation.
The java class name of this custom type.