org.apache.lucene.analysis.standard.package.html Maven / Gradle / Ivy

Go to download

Show more of this group Show more artifacts with this name
Show all versions of lucene-analyzers-common Show documentation

Additional Analyzers

The newest version!






   


 Fast, general-purpose grammar-based tokenizers.
 
     {@link org.apache.lucene.analysis.standard.ClassicTokenizer ClassicTokenizer}:
         this class was formerly (prior to Lucene 3.1) named 
         StandardTokenizer.  (Its tokenization rules are not
         based on the Unicode Text Segmentation algorithm.)
         {@link org.apache.lucene.analysis.standard.ClassicAnalyzer ClassicAnalyzer} includes
         {@link org.apache.lucene.analysis.standard.ClassicTokenizer ClassicTokenizer},
         {@link org.apache.lucene.analysis.LowerCaseFilter LowerCaseFilter}
         and {@link org.apache.lucene.analysis.StopFilter StopFilter}.
     
     {@link org.apache.lucene.analysis.standard.UAX29URLEmailTokenizer UAX29URLEmailTokenizer}:
         implements the Word Break rules from the Unicode Text Segmentation
         algorithm, as specified in 
         Unicode Standard Annex #29, except
         URLs and email addresses are also tokenized according to the relevant RFCs.
         

         {@link org.apache.lucene.analysis.standard.UAX29URLEmailAnalyzer UAX29URLEmailAnalyzer} includes
         {@link org.apache.lucene.analysis.standard.UAX29URLEmailTokenizer UAX29URLEmailTokenizer},
         {@link org.apache.lucene.analysis.LowerCaseFilter LowerCaseFilter}
         and {@link org.apache.lucene.analysis.StopFilter StopFilter}.
     
 
 
 This Java package additionally contains {@code StandardAnalyzer} and {@code StandardTokenizer},
 which are not visible here, because they moved to Lucene Core.
 The factories for those components (e.g., used in Solr) are still part of this module.