edu.stanford.nlp.ling.tokensregex.package-info Maven / Gradle / Ivy
Show all versions of stanford-parser Show documentation
/**
* This package contains a library, TokensRegex, for matching regular expressions over
* tokens. TokensRegex is incorporated into the
* {@link edu.stanford.nlp.pipeline.TokensRegexAnnotator},
* the {@link edu.stanford.nlp.pipeline.TokensRegexNERAnnotator},
* and the SUTime functionality in {@link edu.stanford.nlp.pipeline.NERCombinerAnnotator}.
*
* Rules for extracting expression using TokensRegex
*
* TokensRegex provides a language for specifying rules to extract expressions over a token sequence.
* {@link edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor} and
* {@link edu.stanford.nlp.ling.tokensregex.SequenceMatchRules} describes
* the language and how the extraction rules are created.
*
* Core classes for token sequence matching using TokensRegex
*
* At the core of TokensRegex are the
* {@link edu.stanford.nlp.ling.tokensregex.TokenSequenceMatcher} and
* {@link edu.stanford.nlp.ling.tokensregex.TokenSequencePattern} classes which
* can be used to match patterns over a sequences of tokens.
* The usage is designed to follow the paradigm of the Java regular expression library
* {@code java.util.regex}. The usage is similar except that matches are done
* over {@code List<CoreMap>} instead of over {@code String}.
*
* Example:
* {@code List tokens = ...;
* TokenSequencePattern pattern = TokenSequencePattern.compile(...);
* TokenSequenceMatcher matcher = pattern.getMatcher(tokens);
* }
* The classes {@link edu.stanford.nlp.ling.tokensregex.SequenceMatcher} and
* {@link edu.stanford.nlp.ling.tokensregex.SequencePattern} can be used to build
* classes for recognizing regular expressions over sequences of arbitrary types.
*
* Utility classes
*
* TokensRegex also offers a group of utility classes.
*
* {@link edu.stanford.nlp.ling.tokensregex.MultiPatternMatcher} provides utility functions for
* finding expressions with multiple patterns.
* For instance, using {@link edu.stanford.nlp.ling.tokensregex.MultiPatternMatcher#findNonOverlapping}
* you can find all nonoverlapping subsequences for a given set of patterns.
*
* To find character offsets of multiple word expressions in a {@code String},
* you can also use
* {@link edu.stanford.nlp.ling.tokensregex.MultiWordStringMatcher#findTargetStringOffsets}.
*
*
* @author Angel Chang ([email protected])
*/
package edu.stanford.nlp.ling.tokensregex;