
s.jate.2.0-beta.11.source-code.jate.properties Maven / Gradle / Ivy
Go to download
Show more of this group Show more artifacts with this name
Show all versions of jate Show documentation
Show all versions of jate Show documentation
JATE is a toolkit for developing and experimenting Automatic Term Extractions/Recognition algorithms in
Java. The motivation of this toolkit is to make available several state-of-the-art algorithms of ATE/ATR to
developers and users of ATE/ATR, and encourage developers of ATE/ATR methods to develop their methods under a
uniform framework to enable comparative studies.
The newest version!
############################################################################
# Mapping with Solr config/schema for Term Recognition Algorithms #
############################################################################
# Value type: string
# Required
# The Solr uniqueKey field encodes the identity semantics of a document.
solr_field_id=id
#fieldname_id
# Value type: string
# Required
# THIS FIELD IS NOT TO BE CONFUSED WITH THE FIELD THAT STORES TERM CANDIDATES
#
# This field stores n-grams from a corpus. It is configured to OVER-GENERATE terms
# so n-grams that are not necessarily term candidates can be generated. The goal of
# this field is to be used as a lookup source for statistic information of term
# candidates.
# MUST BE configured to store termVector and termOffset information
solr_field_content_ngrams=jate_ngraminfo
# Value type: string
# Required
# Solr Content/Text Field to index and store candidate terms.
# MUST BE INDEXED by a TR aware analyser with termVectors and termOffsets set to true
# Refer to "schema.xml" for the example setting
solr_field_content_terms=jate_cterms
# Value type: string
# OPTIONAL
# This is a Solr Content/Text dynamic field mapping document parts by Tika (e.g.,title, links, first paragraph, etc.)
# MUST BE INDEXED by a TR aware analyser
#
# Mapping document parts with terms provide a way to trace the part of a document where a term is found.
# Such information can be used by some ATR algorithms. However most ATR
# algorithms do not use such information.
# Refer to "schema.xml" for the example setting
#
# and recommend to set meaningful field name "_*" + field type abbv. (e.g., "*_text2Terms") for indexed dynamic fields
# Please also refer to the example in App* source code in comments about how to use the functionality
solr_field_map_doc_parts=jate_cterms_f*
# Value type: string
# OPTIONAL
# Solr string field to index and store final filtered candidate terms
# By default, the filtered candidate terms (i.e., domain terms) will not be indexed and stored
solr_field_domain_terms=jate_domain_terms
#################################################
# Performance Tuning #
#################################################
# Value type: number
# OPTIONAL
# Performance parameter for performance tuning
# Maximum of data units each thread (worker) of a
# SolrParallelIndexingWorker should commit to solr.
# When not defined or invalid value is used, default as 500
indexer_max_units_to_commit=500
# Value type: number
# OPTIONAL
# Performance parameter for performance tuning
# Maximum % of CPU cores that parallel processes of JATE can use in.
# When not defined or invalid value is used, default as 1
max_cores=8
© 2015 - 2025 Weber Informatics LLC | Privacy Policy