datacleaner.DataCleaner-xml-config.5.4.2.source-code.configuration.xsd Maven / Gradle / Ivy

Go to download



	
		
			
				
				
					
						
							Defines the catalog of datastores that are usable
							as input sources for the analysis jobs.
						
					
				
				
					
						
							Defines the catalog of reference data, containing
							Dictionaries, Synonyms
							etc.
						
					
				
				
					
						
							
								Defines a multi-threaded task runner, enabling
								processing of records in parallel.
							
						
					
					
						
							
								Defines a single-threaded task runner, which means
								all records will be processed sequentially, in the same thread.
							
						
					
					
				
				
				

				

				
				
					
						
							
								Defines which java packages should be scanned for
								components to use in AnalyzerBeans.
							
						
					
					
				
			

		
	

	
		
			
				
					
						Select this storage provider to use a storage
						provider that combines different technologies based on the
						storage
						entity type. This is the recommended storage provider
						for typical
						data profiling needs as it allows to combine
						in-memory row
						annotations (for previewing in profiling results)
						with database
						backed collections (for intermediary results).
					
				
			
			
				
					
						Select this storage provider to store staging data
						and intermediary results in memory. This is by far the best
						performing
						storage provider but it also brings in the risk of
						running out of memory for very large jobs.
					
				
			
			
			
		
	

	
		
			
			
			
			
			
			
		
	

    
        
            
                
                    
                        Defines a custom component by a class name and properties values
                    
                
            
            
                
                    
                        Defines which java packages should be scanned for
                        components to use in DataCleaner.
                    
                
            
        
    

	
		
			
			
				
					
						
							
						
					
				
			
		
	

	
		
			
				
					
						
							Defines a datastore based on a JDBC database
							connection.
						
					
				
				
					
						
							Defines a datastore based on a MS Access database
							file.
						
					
				
				
					
						
							Defines a datastore based on a Comma-separated
							file.
						
					
				
				
					
						
							Defines a datastore based on a Salesforce.com
							account.
						
					
				
				
					
						
							Defines a datastore based on a SugarCRM system.
						
					
				
				
					
						
							Defines a datastore based on an Apache HBase
							database.
						
					
				
				
					
						
							Defines a datastore based on a MongoDB database.
						
					
				
				
					
						
							Defines a datastore based on an ElasticSearch
							index.
						
					
				
				
					
						
							Defines a datastore based on an Cassandra
							index.
						
					
				
				
					
						
							Defines a datastore based on an Apache CouchDB
							database.
						
					
				
				
					
						
							Defines a datastore based on a Neo4j graph
							database.
						
					
				
				
					
						
							Defines a datastore based on a fixed width value
							file.
						
					
				
				
					
						
							Defines a datastore based on a directory of SAS
							data
							sets.
						
					
				
				
					
						
							Defines a datastore based on a MS Excel spreadsheet
							file.
						
					
				
				
					
						
							Defines a datastore based on a JSON file.
						
					
				
				
					
						
							Defines a datastore based on a dBase database file.
						
					
				
				
					
						
							Defines a datastore based on a OpenOffice.org
							database file.
						
					
				
				
					
						
							Defines a datastore based on an XML file.
						
					
				
				
					
						
							Defines an in-memory datastore based on Plain Old
							Java Objects (POJOs).
						
					
				
				
					
						
							Defines a composite datastore, which allows to
							virtually treat several datastores as a single datastore.
						
					
				
			
			
				
					
						Defines a custom datastore based on a class
						implementing the Datastore interface.
					
				
			
		
	

	
		
		
	

	
		
			
				
					
						
							
							
							
							
							
								
									
										Indicates whether multiple connections (aka.
										connection pooling) may be created or not. Connection pooling
										is
										preferred for performance reasons, but can safely be
										disabled if not desired. The max number of connections cannot
										be configured,
										but no more connections than the number of
										threads in the task runner should be expected.
									
								
							
						
						
					
					
						
							
								
							
						
					
					
				
			
		
	

	
		
			
			
			
			
			
			
			
			
		
	

	
		
			
				
					
					
					
					
					
					
					
					
						
							
								The row number (1-based) of the header line. If
								no
								header line is present, use 0.
							
						
					
					
						
							
								
							
						
					
				
			
		
	

	
		
			
				
					
					
					
					
				
			
		
	

	
		
			
				
					
					
					
				
			
		
	

		
		
			
				
					
					
					
					
				
			
		
	

	
		
			
				
					
					
				
			
		
	

	
		
			
			
				
					
						
							
								
									
									
								
							
						
					
				
			
			
				
					
						
							
								
									
										
									
								
							
						
					
				
			
		
	

	
		
			
				
					
					
					
						
							
								
								
									
										
											
											
											
										
									
								
							
						
					
				
			
		
	

	
		
			
				
					
					
					
					
					
					
						
							
								
								
									
										
											
											
										
									
								
							
						
					
				
			
		
	

	
		
			
				
					
					
					
					
					
						
							
								
								
									
										
											
											
										
									
								
							
						
					
					
					
					
					
					
					
				
			
		
	

	
		
			
				
					
					
					
					
					
					
					
						
							
								
								
									
										
											
											
										
									
								
							
						
					
				
			
		
	

	
		
			
				
					
					
					
					
					
					
						
							
								
								
									
										
											
											
										
									
								
							
						
					
				
			
		
	

	
		
			
				
					
					
						
							
								
								
							
						
					
					
					
						
							
								The index (1-based) of the header line. If no
								header line is present, use 0.
							
						
					
					
					
					
					
						
							
								
							
						
					
				
			
		
	

	
		
			
				
					
				
			
		
	

	
		
			
				
					
					
						
							
								
							
						
					
				
			
		
	

	
		
			
				
					
				
			
		
	

	
		
			
				
					
					
						
							
								
								
							
						
					
				
			
		
	

	
		
			
				
					
				
			
		
	

	
		
			
				
					
				
			
		
	

	
		
			
				
					
				
			
		
	

	
		
			
				
					
				
			
		
	

	
		
			
				
					
						
						
						
						
					
				
			

			
				
					
						
						
						
						
					
				
			

			
				
					
						
						
						
					
				
			
		
	

	
		
			
			
			
			
			
			
			
		
		
		
	

	
		
			
			
				
					
						Defines whether the regex matcher should match the
						whole string or if just a subsequence match is sufficient.
					
				
			
		
		
		
	

	
		
			
		
		
		
	

	
		
			
			
			
		
		
		
	

	
		
			
				
					
						
						
					
				
			
		
		
		
	

	
		
			
			
			
			
		
		
		
	

	
		
			
			
			
		
		
		
	

	
		
			
			
		
		
		
	

	
		
			
			
			
		
		
		
	

	
		
			
				
					Sets a threshold upon the number of annotated rows to
					store in memory. Any additional rows will be discarded, although
					the
					counter will still handle them correctly.
				
			
		
		
			
				
					Sets a threshold upon the number of sample sets with
					annotated rows to
					store in memory.
				
			
		
	

	
		
			
				
					
						Sets the path for the directory to use for on-disk
						storage. This is optional as Hsqldb will otherwise automatically
						assign a temporary directory for the purpose.
					
				
			
		
	

	
		
			
				
					
						Sets the path for the directory to use for on-disk
						storage. This is optional as H2 will otherwise
						automatically assign
						a
						temporary directory for the purpose.
					
				
			
		
	

	
		
			
			
		
	

	
		
			
		
	

	
		
			
				
					Sets the maximum available number of threads that the
					thread pool may assign. Don't set this value lower than 5 as it may
					cause serious performance penalties from threads waiting on each
					other.
				
			
		
	

	
	

	
		
			
				
					
						
					
				
			
		
	

	
		
			
				
			
			
				
					
						
					
				
			
			
		
		
		
	

	
		
			
				
					
						Adds a property to this custom type. Properties are
						mapped to fields in the corresponding class that are annotated
						with
						the @Configured annotation.
					
				
				
					
					
				
			
		
		
			
				
					The java class name of this custom type.