All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.nlpcraft.model.NCElement Maven / Gradle / Ivy

There is a newer version: 0.8.2
Show newest version
/*
 * “Commons Clause” License, https://commonsclause.com/
 *
 * The Software is provided to you by the Licensor under the License,
 * as defined below, subject to the following condition.
 *
 * Without limiting other conditions in the License, the grant of rights
 * under the License will not include, and the License does not grant to
 * you, the right to Sell the Software.
 *
 * For purposes of the foregoing, “Sell” means practicing any or all of
 * the rights granted to you under the License to provide to third parties,
 * for a fee or other consideration (including without limitation fees for
 * hosting or consulting/support services related to the Software), a
 * product or service whose value derives, entirely or substantially, from
 * the functionality of the Software. Any license notice or attribution
 * required by the License must also include this Commons Clause License
 * Condition notice.
 *
 * Software:    NLPCraft
 * License:     Apache 2.0, https://www.apache.org/licenses/LICENSE-2.0
 * Licensor:    Copyright (C) 2018 DataLingvo, Inc. https://www.datalingvo.com
 *
 *     _   ____      ______           ______
 *    / | / / /___  / ____/________ _/ __/ /_
 *   /  |/ / / __ \/ /   / ___/ __ `/ /_/ __/
 *  / /|  / / /_/ / /___/ /  / /_/ / __/ /_
 * /_/ |_/_/ .___/\____/_/   \__,_/_/  \__/
 *        /_/
 */

package org.nlpcraft.model;

import java.util.List;
import java.util.regex.Pattern;

/**
 * Semantic model element.
 * 

* An element is the main building block of the semantic model. A semantic element defines an entity * that will be automatically recognized in the user input either by one of its synonyms or values, or directly by * its ID. * *

Synonyms

* Synonyms are the key building blocks of the semantic element and used in the following methods: *
    *
  • {@link #getSynonyms()}
  • *
  • {@link #getExcludedSynonyms()}
  • *
  • {@link #getValues()}
  • *
* Each model element has one or more synonyms. Note that element ID is its implicit synonym so that even if no * additional synonyms are defined at least one synonym always exists. Each individual synonym is a whitespace * separated combination of: *
    *
  • simple word,
  • *
  • regular expression, or
  • *
  • PoS tag
  • *
* Note that synonym matching for simple words is case insensitive and automatically * performed on normalized and stemmatized forms of such word and therefore the model * provider doesn't have to account for this in the synonyms themselves. * *

Macro Expansions

* Listing all possible multi-word synonyms for a given element can be a time consuming tasks. Macros together with * option groups allow for significant simplification of this process. Model provides a list of macros via * {@link NCModel#getMacros()} method. Each macro has a name in a form of {@code } where {@code X} is * just any string, and a string value. Note that macros can be nested, i.e. macro value can include references * to another macros. When macro name {@code } is encountered in the synonym it gets recursively replaced with * its value. * *

Option Groups

* Option groups are a simplified form of regular expressions that operates on a single word base. The * following examples demonstrate how to use option groups. Consider that the following macros are defined: * * * * * * * * * * * * * * * * * *
Macro NameMacro Value
{@code }{@code aaa}
{@code }{@code bbb}
{@code }{@code bbb {z|w}}
* Note that macros {@code } and {@code } are nested. Then the following option group expansions * will occur in these examples: * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
SynonymExpanded Synonyms
{@code {b|*} c} * {@code "aaa b c"}
* {@code "aaa c"} *
{@code {b|*} c} * {@code "aaa bbb b c"}
* {@code "aaa bbb c"} *
{@code {b|```NN```} c} * {@code "b c"}
* {@code "```NN``` c"} *
{@code {b|\{\*\}}} * {@code "b"}
* {@code "b {*}"} *
{@code a {b|*}. c} * {@code "a b. c"}
* {@code "a . c"} *
{@code a .{b, |*}. c} * {@code "a .b, . c"}
* {@code "a .. c"} *
{@code a {{b|c}|*}.} * {@code "a ."}
* {@code "a b."}
* {@code "a c."} *
{@code a {{{}}|{*}} c} * {@code "a aaa bbb z c"}
* {@code "a aaa bbb w c"}
* {@code "a c"} *
{@code {{{a}}} {b||*|{{*}}||*}} * {@code "a b"}
* {@code "a"} *
* Specifically: *
    *
  • {@code {A|B}} denotes either {@code A} or {@code B}.
  • *
  • {@code {A|B|*}} denotes either {@code A} or {@code B} or nothing.
  • *
  • Excessive curly brackets are ignored, when safe to do so.
  • *
  • Macros cannot be recursive but can be nested.
  • *
  • Option groups can be nested.
  • *
  • * {@code '\'} (backslash) can be used to escape '{', '}', {@code '|'} and * {@code '*'} special symbols used by the option groups. *
  • *
  • Excessive whitespaces are trimmed when expanding option groups.
  • *
* *

Regular Expressions

* Any individual synonym word that starts and ends with {@code "///"} (three forward slashes) is considered to be Java * regular expression as defined in {@link Pattern}. Note that regular expression can only span a single word, i.e. * only individual words from the user input will be matched against given regular expression and no whitespaces are * allowed within regular expression. Note also that option group special symbols '{', '}', * {@code '|'} and {@code '*'} have to be escaped in the regular expression using {@code '\'} (backslash). *

* For example, the following synonym {@code {foo|///[bar].+///}} will match word {@code foo} or any other strings * that start with {@code bar} as long as this string doesn't contain whitespaces. * *

PoS Tags

* Any individual synonym word that that starts and ends with {@code "```"} (three back ticks) in a form ```XXX``` is * considered to be a PoS (Part-of-Speech) tag that will be matched against PoS tag of the individual word in the * user input, where {@code XXX} is one of the * Penn Treebank PoS tags. *

* For example, the following synonym {foo|{```NN```|```NNS```|```NNP```|```NNPS```}} will match word {@code foo} or any * form of a noun. */ public interface NCElement { /** * Element's value. * * @see NCElement#getValues() */ interface NCValue { /** * Gets value name. * * @return Value name. */ String getName(); /** * Gets optional list of value's synonyms. * * @return Potentially empty list of value's synonyms. */ List getSynonyms(); } /** * Gets unique ID of this element. *

* This unique ID should be human readable for simpler debugging and testing of the model. * Although element ID could be any arbitrary string it is highly recommended to have * element ID as a lower case string starting with some model prefix, followed by colon and * then the element's name. For example, some built-in IDs are: nlp:date, * nlp:geo. *

* Few important notes: *
    *
  • Element IDs starting with nlp: are reserved for built-in system IDs.
  • *
  • * Element ID can be used in the user input directly (i.e. "power user mode") to clearly * disambiguate the element in the input sentence instead of relying on synonyms or other * ways of detection. *
  • *
* * @see NCToken#getId() * @return Unique ID of this element. */ String getId(); /** * Gets optional group name this element belongs to. *

* Elements groups is an important mechanism in implementing {@link NCModel#query(NCQueryContext)} method. * Defining proper group for an element is important for proper operation of Short-Term-Memory (STM) in * {@link NCConversationContext conversation context}. Specifically, a user token (i.e. found model element) * with a given group name will be overridden in the conversation by the more recent token from the same group. * * @return Optional group name, or {@code null} if not specified. Note that {@code null} group logically * defines a default group. * @see NCConversationContext */ String getGroup(); /** * Gets optional user-defined element's metadata. When a {@link NCToken token} for this element * is detected in the input this metadata can be accessed via {@link NCToken#getElementMetadata()} method. * * @return Element's metadata. */ NCMetadata getMetadata(); /** * Gets optional element description. * * @return Optional element description. */ String getDescription(); /** * Gets optional map of {@link NCValue values} for this element. *

* Each element can generally be recognized either by one of its synonyms or values. Elements and their values * are analogous to types and instances of that type in programming languages. Each value * has a name and optional set of its own synonyms by which that value, and ultimately its element, can be * recognized by. Note that value name itself acts as an implicit synonym even when no additional synonyms added * for that value. *

* Consider this example. A model element {@code x:car} can have: *
    *
  • * Set of general synonyms: * {transportation|transport|*} {vehicle|car|sedan|auto|automobile|suv|crossover|coupe|truck} *
  • *
  • Set of values: *
      *
    • {@code mercedes} with synonyms {@code (mercedes, mercedes-benz, mb, benz)}
    • *
    • {@code bmw} with synonyms {@code (bmw, bimmer)}
    • *
    • {@code chevrolet} with synonyms {@code (chevy, chevrolet)}
    • *
    *
  • *
* With that setup {@code x:car} element will be recognized by any of the following input sub-string: *
    *
  • {@code transport car}
  • *
  • {@code benz}
  • *
  • {@code automobile}
  • *
  • {@code transport vehicle}
  • *
  • {@code sedan}
  • *
  • {@code chevy}
  • *
  • {@code bimmer}
  • *
  • {@code x:car}
  • *
* * @return Map of value's name and its synonyms or {@code null} if not defined. */ List getValues(); /** * Gets optional ID of the immediate parent element. Parent ID allows elements to form into hierarchy * and can be used by the user logic in {@link NCModel#query(NCQueryContext)} method. * * @return Optional parent element ID, or {@code null} if not specified. */ String getParentId(); /** * Gets the list of synonyms by which this semantic element will be recognized by. * * @return List of synonyms for this element. List is generally optional since element's ID acts * as an implicit synonym. * @see #getExcludedSynonyms() */ List getSynonyms(); /** * Gets the optional list of synonyms to exclude from the list returned by {@link #getSynonyms()}. * Can return empty list or {@code null} to indicate that there are no synonyms to exclude. *

* Note that it is sometimes easier to exclude a specific synonym or a group of synonyms than creating * complex rules with macros and option groups for inclusive synonyms. * * @return Optional list of synonyms to exclude. * @see #getSynonyms() */ List getExcludedSynonyms(); }




© 2015 - 2025 Weber Informatics LLC | Privacy Policy