All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.apache.lucene.analysis.standard.package.html Maven / Gradle / Ivy

The newest version!





   


 Fast, general-purpose grammar-based tokenizers.
 
  • {@link org.apache.lucene.analysis.standard.ClassicTokenizer ClassicTokenizer}: this class was formerly (prior to Lucene 3.1) named StandardTokenizer. (Its tokenization rules are not based on the Unicode Text Segmentation algorithm.) {@link org.apache.lucene.analysis.standard.ClassicAnalyzer ClassicAnalyzer} includes {@link org.apache.lucene.analysis.standard.ClassicTokenizer ClassicTokenizer}, {@link org.apache.lucene.analysis.LowerCaseFilter LowerCaseFilter} and {@link org.apache.lucene.analysis.StopFilter StopFilter}.
  • {@link org.apache.lucene.analysis.standard.UAX29URLEmailTokenizer UAX29URLEmailTokenizer}: implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29, except URLs and email addresses are also tokenized according to the relevant RFCs.
    {@link org.apache.lucene.analysis.standard.UAX29URLEmailAnalyzer UAX29URLEmailAnalyzer} includes {@link org.apache.lucene.analysis.standard.UAX29URLEmailTokenizer UAX29URLEmailTokenizer}, {@link org.apache.lucene.analysis.LowerCaseFilter LowerCaseFilter} and {@link org.apache.lucene.analysis.StopFilter StopFilter}.

This Java package additionally contains {@code StandardAnalyzer} and {@code StandardTokenizer}, which are not visible here, because they moved to Lucene Core. The factories for those components (e.g., used in Solr) are still part of this module.





© 2015 - 2024 Weber Informatics LLC | Privacy Policy