edu.stanford.nlp.ling.tokensregex.package-info Maven / Gradle / Ivy

Go to download

Show more of this group Show more artifacts with this name
Show all versions of stanford-parser Show documentation

Stanford Parser processes raw text in English, Chinese, German, Arabic, and French, and extracts constituency parse trees.

The newest version!

/**
 * This package contains a library, TokensRegex, for matching regular expressions over
 * tokens.  TokensRegex is incorporated into the
 * {@link edu.stanford.nlp.pipeline.TokensRegexAnnotator},
 * the {@link edu.stanford.nlp.pipeline.TokensRegexNERAnnotator},
 * and the SUTime functionality in {@link edu.stanford.nlp.pipeline.NERCombinerAnnotator}.
 *
 * Rules for extracting expression using TokensRegex
 *
 * TokensRegex provides a language for specifying rules to extract expressions over a token sequence.
 * {@link edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor} and
 * {@link edu.stanford.nlp.ling.tokensregex.SequenceMatchRules} describes
 * the language and how the extraction rules are created.
 *
 * Core classes for token sequence matching using TokensRegex
 *
 * At the core of TokensRegex are the
 * {@link edu.stanford.nlp.ling.tokensregex.TokenSequenceMatcher} and
 * {@link edu.stanford.nlp.ling.tokensregex.TokenSequencePattern} classes which
 * can be used to match patterns over a sequences of tokens.
 * The usage is designed to follow the paradigm of the Java regular expression library
 * {@code java.util.regex}.  The usage is similar except that matches are done
 * over {@code List<CoreMap>} instead of over {@code String}.
 * 
 * Example:
 *  {@code List tokens = ...;
 * TokenSequencePattern pattern = TokenSequencePattern.compile(...);
 * TokenSequenceMatcher matcher = pattern.getMatcher(tokens);
 * }
 * The classes {@link edu.stanford.nlp.ling.tokensregex.SequenceMatcher} and
 * {@link edu.stanford.nlp.ling.tokensregex.SequencePattern} can be used to build
 * classes for recognizing regular expressions over sequences of arbitrary types.
 *
 * Utility classes
 *
 * TokensRegex also offers a group of utility classes.
 * 
 * {@link edu.stanford.nlp.ling.tokensregex.MultiPatternMatcher} provides utility functions for
 * finding expressions with multiple patterns.
 * For instance, using {@link edu.stanford.nlp.ling.tokensregex.MultiPatternMatcher#findNonOverlapping}
 * you can find all nonoverlapping subsequences for a given set of patterns.
 * 
 * To find character offsets of multiple word expressions in a {@code String},
 * you can also use
 * {@link edu.stanford.nlp.ling.tokensregex.MultiWordStringMatcher#findTargetStringOffsets}.
 * 
 *
 * @author Angel Chang ([email protected])
 */
package edu.stanford.nlp.ling.tokensregex;