edu.stanford.nlp.process.TokenizerFactory Maven / Gradle / Ivy
Go to download
Show more of this group Show more artifacts with this name
Show all versions of stanford-parser Show documentation
Show all versions of stanford-parser Show documentation
Stanford Parser processes raw text in English, Chinese, German, Arabic, and French, and extracts constituency parse trees.
package edu.stanford.nlp.process;
import edu.stanford.nlp.objectbank.IteratorFromReaderFactory;
import java.io.Reader;
/**
* A TokenizerFactory is used to convert a java.io.Reader into a Tokenizer
* (an extension of Iterator) over objects of type T represented by the text
* in the java.io.Reader. It's mainly a convenience, since you could cast
* down anyway.
*
* IMPORTANT NOTE:
*
* A TokenizerFactory should also provide two static methods:
* {@code public static TokenizerFactory extends HasWord> newTokenizerFactory(); }
* {@code public static TokenizerFactory newWordTokenizerFactory(String options); }
*
* These are expected by certain JavaNLP code (e.g., LexicalizedParser),
* which wants to produce a TokenizerFactory by reflection.
*
* @author Christopher Manning
*
* @param The type of the tokens returned by the Tokenizer
*/
public interface TokenizerFactory extends IteratorFromReaderFactory {
public Tokenizer getTokenizer(Reader r);
public Tokenizer getTokenizer(Reader r, String extraOptions);
public void setOptions(String options);
}