All Downloads are FREE. Search and download functionalities are using the official Maven repository.

edu.stanford.nlp.international.spanish.SpanishUnknownWordSignatures Maven / Gradle / Ivy

Go to download

Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, mark up the structure of sentences in terms of phrases and word dependencies, and indicate which noun phrases refer to the same entities. It provides the foundational building blocks for higher level text understanding applications.

There is a newer version: 4.5.7
Show newest version
package edu.stanford.nlp.international.spanish;

import java.util.regex.Pattern;

/**
 * Contains patterns for matching certain word types in Spanish, such
 * as common suffices for nouns, verbs, adjectives and adverbs.
 *
 * These utilities are used to characterize unknown words within the
 * POS tagger and the parser.
 *
 * @see edu.stanford.nlp.tagger.maxent.ExtractorFramesRare
 * @see SpanishUnknownWordModel
 *
 * @author Jon Gauthier
 */
public class SpanishUnknownWordSignatures {

  private static final Pattern pMasculine = Pattern.compile("os?$");
  private static final Pattern pFeminine = Pattern.compile("as?$");

  // The following patterns help to distinguish between verbs in the
  // conditional tense and -er, -ir verbs in the indicative imperfect.
  // Words in these two forms have matching suffixes and are otherwise
  // difficult to distinguish.
  private static final Pattern pConditionalSuffix = Pattern.compile("[aei]ría(?:s|mos|is|n)?$");
  private static final Pattern pImperfectErIrSuffix = Pattern.compile("[^r]ía(?:s|mos|is|n)?$");

  private static final Pattern pImperfect = Pattern.compile(
    "(?:aba(?:[sn]|is)?|ábamos|[^r]ía(?:s|mos|is|n)?)$");
  private static final Pattern pInfinitive = Pattern.compile("[aei]r$");

  private static final Pattern pAdverb = Pattern.compile("mente$");

  // Most of the words disguised as first-person plural verb forms have
  // contrastive stress.. yay, easy to match!
  private static final Pattern pVerbFirstPersonPlural = Pattern.compile(
    "(?




© 2015 - 2024 Weber Informatics LLC | Privacy Policy