org.apache.lucene.analysis.standard.package.html Maven / Gradle / Ivy
The newest version!
Fast, general-purpose grammar-based tokenizers.
- {@link org.apache.lucene.analysis.standard.ClassicTokenizer ClassicTokenizer}:
this class was formerly (prior to Lucene 3.1) named
StandardTokenizer
. (Its tokenization rules are not
based on the Unicode Text Segmentation algorithm.)
{@link org.apache.lucene.analysis.standard.ClassicAnalyzer ClassicAnalyzer} includes
{@link org.apache.lucene.analysis.standard.ClassicTokenizer ClassicTokenizer},
{@link org.apache.lucene.analysis.LowerCaseFilter LowerCaseFilter}
and {@link org.apache.lucene.analysis.StopFilter StopFilter}.
- {@link org.apache.lucene.analysis.standard.UAX29URLEmailTokenizer UAX29URLEmailTokenizer}:
implements the Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29, except
URLs and email addresses are also tokenized according to the relevant RFCs.
{@link org.apache.lucene.analysis.standard.UAX29URLEmailAnalyzer UAX29URLEmailAnalyzer} includes
{@link org.apache.lucene.analysis.standard.UAX29URLEmailTokenizer UAX29URLEmailTokenizer},
{@link org.apache.lucene.analysis.LowerCaseFilter LowerCaseFilter}
and {@link org.apache.lucene.analysis.StopFilter StopFilter}.
This Java package additionally contains {@code StandardAnalyzer} and {@code StandardTokenizer},
which are not visible here, because they moved to Lucene Core.
The factories for those components (e.g., used in Solr) are still part of this module.