All Downloads are FREE. Search and download functionalities are using the official Maven repository.

edu.stanford.nlp.ling.tokensregex.package-info Maven / Gradle / Ivy

/**
 * This package contains a library, TokensRegex, for matching regular expressions over
 * tokens.  TokensRegex is incorporated into the
 * {@link edu.stanford.nlp.pipeline.TokensRegexAnnotator}
 * and {@link edu.stanford.nlp.pipeline.TokensRegexNERAnnotator}.
 * 

*

Rules for extracting expression using TokensRegex

* TokensRegex provides a language for specifying rules to extract expressions over token sequence. *

{@link edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor} and {@link edu.stanford.nlp.ling.tokensregex.SequenceMatchRules} describes * the language and how the extraction rules are created

*

Core classes for token sequence matching using TokensRegex

*

At the core of TokensRegex are the * {@link edu.stanford.nlp.ling.tokensregex.TokenSequenceMatcher} and * {@link edu.stanford.nlp.ling.tokensregex.TokenSequencePattern} classes which * can be used to match patterns over a sequences of tokens. * The usage is designed to follow the paradigm of the Java regular expression library * java.util.regex. The usage is similar except that matches are done * over List<CoreMap> instead of over String. *

* Example: *
 * 
 * List<CoreLabel< tokens = ...;
 * TokenSequencePattern pattern = TokenSequencePattern.compile(...);
 * TokenSequenceMatcher matcher = pattern.getMatcher(tokens);
 * 
 * 
*

The classes {@link edu.stanford.nlp.ling.tokensregex.SequenceMatcher} and {@link edu.stanford.nlp.ling.tokensregex.SequencePattern} can be used to build * classes for recognizing regular expressions over sequences of arbitrary types

*

Utility classes

* TokensRegex also offers a group of utility classes. *

* {@link edu.stanford.nlp.ling.tokensregex.MultiPatternMatcher} provides utility functions for finding expressions with multiple patterns. * For instance, using {@link edu.stanford.nlp.ling.tokensregex.MultiPatternMatcher#findNonOverlapping} * you can find all nonoverlapping subsequences for a given set of patterns. *

*

To find character offsets of multiple word expressions in a String, * can also use {@link edu.stanford.nlp.ling.tokensregex.MultiWordStringMatcher#findTargetStringOffsets}.

* * @author Angel Chang ([email protected]) */ package edu.stanford.nlp.ling.tokensregex;




© 2015 - 2025 Weber Informatics LLC | Privacy Policy