All Downloads are FREE. Search and download functionalities are using the official Maven repository.

edu.stanford.nlp.ling.tokensregex.package-info Maven / Gradle / Ivy

Go to download

Stanford Parser processes raw text in English, Chinese, German, Arabic, and French, and extracts constituency parse trees.

The newest version!
/**
 * This package contains a library, TokensRegex, for matching regular expressions over
 * tokens.  TokensRegex is incorporated into the
 * {@link edu.stanford.nlp.pipeline.TokensRegexAnnotator},
 * the {@link edu.stanford.nlp.pipeline.TokensRegexNERAnnotator},
 * and the SUTime functionality in {@link edu.stanford.nlp.pipeline.NERCombinerAnnotator}.
 *
 * 

Rules for extracting expression using TokensRegex

* *

TokensRegex provides a language for specifying rules to extract expressions over a token sequence.

*

{@link edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor} and * {@link edu.stanford.nlp.ling.tokensregex.SequenceMatchRules} describes * the language and how the extraction rules are created.

* *

Core classes for token sequence matching using TokensRegex

* *

At the core of TokensRegex are the * {@link edu.stanford.nlp.ling.tokensregex.TokenSequenceMatcher} and * {@link edu.stanford.nlp.ling.tokensregex.TokenSequencePattern} classes which * can be used to match patterns over a sequences of tokens. * The usage is designed to follow the paradigm of the Java regular expression library * {@code java.util.regex}. The usage is similar except that matches are done * over {@code List<CoreMap>} instead of over {@code String}. *

*

Example:

*
 {@code List tokens = ...;
 * TokenSequencePattern pattern = TokenSequencePattern.compile(...);
 * TokenSequenceMatcher matcher = pattern.getMatcher(tokens);
 * }
*

The classes {@link edu.stanford.nlp.ling.tokensregex.SequenceMatcher} and * {@link edu.stanford.nlp.ling.tokensregex.SequencePattern} can be used to build * classes for recognizing regular expressions over sequences of arbitrary types.

* *

Utility classes

* *

TokensRegex also offers a group of utility classes.

*

* {@link edu.stanford.nlp.ling.tokensregex.MultiPatternMatcher} provides utility functions for * finding expressions with multiple patterns. * For instance, using {@link edu.stanford.nlp.ling.tokensregex.MultiPatternMatcher#findNonOverlapping} * you can find all nonoverlapping subsequences for a given set of patterns. *

*

To find character offsets of multiple word expressions in a {@code String}, * you can also use * {@link edu.stanford.nlp.ling.tokensregex.MultiWordStringMatcher#findTargetStringOffsets}. *

* * @author Angel Chang ([email protected]) */ package edu.stanford.nlp.ling.tokensregex;




© 2015 - 2024 Weber Informatics LLC | Privacy Policy