edu.stanford.nlp.process.TokenizerFactory Maven / Gradle / Ivy

Show more of this group Show more artifacts with this name
Show all versions of stanford-parser Show documentation

Stanford Parser processes raw text in English, Chinese, German, Arabic, and French, and extracts constituency parse trees.

There is a newer version: 3.9.2

Show newest version

package edu.stanford.nlp.process;

import edu.stanford.nlp.objectbank.IteratorFromReaderFactory;

import java.io.Reader;

/**
 * A TokenizerFactory is used to convert a java.io.Reader into a Tokenizer
 * (an extension of Iterator) over objects of type T represented by the text
 * in the java.io.Reader.  It's mainly a convenience, since you could cast
 * down anyway.
 *
 * IMPORTANT NOTE:

 *
 * A TokenizerFactory should also provide two static methods: 

 * {@code public static TokenizerFactory newTokenizerFactory(); }
 * {@code public static TokenizerFactory newWordTokenizerFactory(String options); }
 * 

 * These are expected by certain JavaNLP code (e.g., LexicalizedParser),
 * which wants to produce a TokenizerFactory by reflection.
 *
 * @author Christopher Manning
 *
 * @param  The type of the tokens returned by the Tokenizer
 */
public interface TokenizerFactory extends IteratorFromReaderFactory {

  public Tokenizer getTokenizer(Reader r);

  public Tokenizer getTokenizer(Reader r, String extraOptions);

  public void setOptions(String options);

}