s.jate.2.0-beta.11.source-code.jate.properties Maven / Gradle / Ivy

Go to download

Show more of this group Show more artifacts with this name
Show all versions of jate Show documentation

JATE is a toolkit for developing and experimenting Automatic Term Extractions/Recognition algorithms in Java. The motivation of this toolkit is to make available several state-of-the-art algorithms of ATE/ATR to developers and users of ATE/ATR, and encourage developers of ATE/ATR methods to develop their methods under a uniform framework to enable comparative studies.

The newest version!

############################################################################
#		Mapping with Solr config/schema for Term Recognition Algorithms	   #
############################################################################

# Value type: string
# Required
# The Solr uniqueKey field encodes the identity semantics of a document.
solr_field_id=id
#fieldname_id

# Value type: string
# Required
# THIS FIELD IS NOT TO BE CONFUSED WITH THE FIELD THAT STORES TERM CANDIDATES
# 
# This field stores n-grams from a corpus. It is configured to OVER-GENERATE terms
# so n-grams that are not necessarily term candidates can be generated. The goal of
# this field is to be used as a lookup source for statistic information of term 
# candidates.
# MUST BE configured to store termVector and termOffset information
solr_field_content_ngrams=jate_ngraminfo

# Value type: string
# Required
# Solr Content/Text Field to index and store candidate terms. 
# MUST BE INDEXED by a TR aware analyser with termVectors and termOffsets set to true
# Refer to "schema.xml" for the example setting
solr_field_content_terms=jate_cterms


# Value type: string
# OPTIONAL
# This is a Solr Content/Text dynamic field mapping document parts by Tika (e.g.,title, links, first paragraph, etc.)
# MUST BE INDEXED by a TR aware analyser
# 
# Mapping document parts with terms provide a way to trace the part of a document where a term is found. 
# Such information can be used by some ATR algorithms. However most ATR
# 	algorithms do not use such information.
# Refer to "schema.xml" for the example setting
# 
# 	and recommend to set meaningful field name "_*" + field type abbv. (e.g., "*_text2Terms") for indexed dynamic fields
#   Please also refer to the example in App* source code in comments about how to use the functionality
solr_field_map_doc_parts=jate_cterms_f*

# Value type: string
# OPTIONAL
# Solr string field to index and store final filtered candidate terms
# By default, the filtered candidate terms (i.e., domain terms) will not be indexed and stored
solr_field_domain_terms=jate_domain_terms

#################################################
#			Performance Tuning			        #
#################################################


# Value type: number
# OPTIONAL
# Performance parameter for performance tuning
# Maximum of data units each thread (worker) of a 
# 	SolrParallelIndexingWorker should commit to solr. 
# When not defined or invalid value is used, default as 500
indexer_max_units_to_commit=500

# Value type: number
# OPTIONAL
# Performance parameter for performance tuning
# Maximum % of CPU cores that parallel processes of JATE can use in.
# When not defined or invalid value is used, default as 1
max_cores=8