All Downloads are FREE. Search and download functionalities are using the official Maven repository.

edu.stanford.nlp.process.TokenizerFactory Maven / Gradle / Ivy

Go to download

Stanford Parser processes raw text in English, Chinese, German, Arabic, and French, and extracts constituency parse trees.

There is a newer version: 3.9.2
Show newest version
package edu.stanford.nlp.process;

import edu.stanford.nlp.objectbank.IteratorFromReaderFactory;

import java.io.Reader;

/**
 * A TokenizerFactory is used to convert a java.io.Reader into a Tokenizer
 * (an extension of Iterator) over objects of type T represented by the text
 * in the java.io.Reader.  It's mainly a convenience, since you could cast
 * down anyway.
 *
 * IMPORTANT NOTE:
* * A TokenizerFactory should also provide two static methods:
* {@code public static TokenizerFactory newTokenizerFactory(); } * {@code public static TokenizerFactory newWordTokenizerFactory(String options); } *
* These are expected by certain JavaNLP code (e.g., LexicalizedParser), * which wants to produce a TokenizerFactory by reflection. * * @author Christopher Manning * * @param The type of the tokens returned by the Tokenizer */ public interface TokenizerFactory extends IteratorFromReaderFactory { public Tokenizer getTokenizer(Reader r); public Tokenizer getTokenizer(Reader r, String extraOptions); public void setOptions(String options); }




© 2015 - 2024 Weber Informatics LLC | Privacy Policy