org.nlpcraft.model.NCElement Maven / Gradle / Ivy
Go to download
Show more of this group Show more artifacts with this name
Show all versions of nlpcraft Show documentation
Show all versions of nlpcraft Show documentation
An API to convert natural language into actions.
/*
* “Commons Clause” License, https://commonsclause.com/
*
* The Software is provided to you by the Licensor under the License,
* as defined below, subject to the following condition.
*
* Without limiting other conditions in the License, the grant of rights
* under the License will not include, and the License does not grant to
* you, the right to Sell the Software.
*
* For purposes of the foregoing, “Sell” means practicing any or all of
* the rights granted to you under the License to provide to third parties,
* for a fee or other consideration (including without limitation fees for
* hosting or consulting/support services related to the Software), a
* product or service whose value derives, entirely or substantially, from
* the functionality of the Software. Any license notice or attribution
* required by the License must also include this Commons Clause License
* Condition notice.
*
* Software: NLPCraft
* License: Apache 2.0, https://www.apache.org/licenses/LICENSE-2.0
* Licensor: Copyright (C) 2018 DataLingvo, Inc. https://www.datalingvo.com
*
* _ ____ ______ ______
* / | / / /___ / ____/________ _/ __/ /_
* / |/ / / __ \/ / / ___/ __ `/ /_/ __/
* / /| / / /_/ / /___/ / / /_/ / __/ /_
* /_/ |_/_/ .___/\____/_/ \__,_/_/ \__/
* /_/
*/
package org.nlpcraft.model;
import java.util.List;
import java.util.regex.Pattern;
/**
* Semantic model element.
*
* An element is the main building block of the semantic model. A semantic element defines an entity
* that will be automatically recognized in the user input either by one of its synonyms or values, or directly by
* its ID.
*
* Synonyms
* Synonyms are the key building blocks of the semantic element and used in the following methods:
*
* - {@link #getSynonyms()}
* - {@link #getExcludedSynonyms()}
* - {@link #getValues()}
*
* Each model element has one or more synonyms. Note that element ID is its implicit synonym so that even if no
* additional synonyms are defined at least one synonym always exists. Each individual synonym is a whitespace
* separated combination of:
*
* - simple word,
* - regular expression, or
* - PoS tag
*
* Note that synonym matching for simple words is case insensitive and automatically
* performed on normalized and stemmatized forms of such word and therefore the model
* provider doesn't have to account for this in the synonyms themselves.
*
* Macro Expansions
* Listing all possible multi-word synonyms for a given element can be a time consuming tasks. Macros together with
* option groups allow for significant simplification of this process. Model provides a list of macros via
* {@link NCModel#getMacros()} method. Each macro has a name in a form of {@code } where {@code X} is
* just any string, and a string value. Note that macros can be nested, i.e. macro value can include references
* to another macros. When macro name {@code } is encountered in the synonym it gets recursively replaced with
* its value.
*
* Option Groups
* Option groups are a simplified form of regular expressions that operates on a single word base. The
* following examples demonstrate how to use option groups. Consider that the following macros are defined:
*
*
* Macro Name
* Macro Value
*
*
* {@code }
* {@code aaa}
*
*
* {@code }
* {@code bbb}
*
*
* {@code }
* {@code bbb {z|w}}
*
*
* Note that macros {@code } and {@code } are nested. Then the following option group expansions
* will occur in these examples:
*
*
* Synonym
* Expanded Synonyms
*
*
* {@code {b|*} c}
*
* {@code "aaa b c"}
* {@code "aaa c"}
*
*
*
* {@code {b|*} c}
*
* {@code "aaa bbb b c"}
* {@code "aaa bbb c"}
*
*
*
* {@code {b|```NN```} c}
*
* {@code "b c"}
* {@code "```NN``` c"}
*
*
*
* {@code {b|\{\*\}}}
*
* {@code "b"}
* {@code "b {*}"}
*
*
*
* {@code a {b|*}. c}
*
* {@code "a b. c"}
* {@code "a . c"}
*
*
*
* {@code a .{b, |*}. c}
*
* {@code "a .b, . c"}
* {@code "a .. c"}
*
*
*
* {@code a {{b|c}|*}.}
*
* {@code "a ."}
* {@code "a b."}
* {@code "a c."}
*
*
*
* {@code a {{{}}|{*}} c}
*
* {@code "a aaa bbb z c"}
* {@code "a aaa bbb w c"}
* {@code "a c"}
*
*
*
* {@code {{{a}}} {b||*|{{*}}||*}}
*
* {@code "a b"}
* {@code "a"}
*
*
*
* Specifically:
*
* - {@code {A|B}} denotes either {@code A} or {@code B}.
* - {@code {A|B|*}} denotes either {@code A} or {@code B} or nothing.
* - Excessive curly brackets are ignored, when safe to do so.
* - Macros cannot be recursive but can be nested.
* - Option groups can be nested.
* -
* {@code '\'} (backslash) can be used to escape
'{'
, '}'
, {@code '|'} and
* {@code '*'} special symbols used by the option groups.
*
* - Excessive whitespaces are trimmed when expanding option groups.
*
*
* Regular Expressions
* Any individual synonym word that starts and ends with {@code "///"} (three forward slashes) is considered to be Java
* regular expression as defined in {@link Pattern}. Note that regular expression can only span a single word, i.e.
* only individual words from the user input will be matched against given regular expression and no whitespaces are
* allowed within regular expression. Note also that option group special symbols '{'
, '}'
,
* {@code '|'} and {@code '*'} have to be escaped in the regular expression using {@code '\'} (backslash).
*
* For example, the following synonym {@code {foo|///[bar].+///}} will match word {@code foo} or any other strings
* that start with {@code bar} as long as this string doesn't contain whitespaces.
*
* PoS Tags
* Any individual synonym word that that starts and ends with {@code "```"} (three back ticks) in a form ```XXX```
is
* considered to be a PoS (Part-of-Speech) tag that will be matched against PoS tag of the individual word in the
* user input, where {@code XXX} is one of the
* Penn Treebank PoS tags.
*
* For example, the following synonym {foo|{```NN```|```NNS```|```NNP```|```NNPS```}}
will match word {@code foo} or any
* form of a noun.
*/
public interface NCElement {
/**
* Element's value.
*
* @see NCElement#getValues()
*/
interface NCValue {
/**
* Gets value name.
*
* @return Value name.
*/
String getName();
/**
* Gets optional list of value's synonyms.
*
* @return Potentially empty list of value's synonyms.
*/
List getSynonyms();
}
/**
* Gets unique ID of this element.
*
* This unique ID should be human readable for simpler debugging and testing of the model.
* Although element ID could be any arbitrary string it is highly recommended to have
* element ID as a lower case string starting with some model prefix, followed by colon and
* then the element's name. For example, some built-in IDs are: nlp:date
,
* nlp:geo
.
*
* Few important notes:
*
* - Element IDs starting with
nlp:
are reserved for built-in system IDs.
* -
* Element ID can be used in the user input directly (i.e. "power user mode") to clearly
* disambiguate the element in the input sentence instead of relying on synonyms or other
* ways of detection.
*
*
*
* @see NCToken#getId()
* @return Unique ID of this element.
*/
String getId();
/**
* Gets optional group name this element belongs to.
*
* Elements groups is an important mechanism in implementing {@link NCModel#query(NCQueryContext)} method.
* Defining proper group for an element is important for proper operation of Short-Term-Memory (STM) in
* {@link NCConversationContext conversation context}. Specifically, a user token (i.e. found model element)
* with a given group name will be overridden in the conversation by the more recent token from the same group.
*
* @return Optional group name, or {@code null} if not specified. Note that {@code null} group logically
* defines a default group.
* @see NCConversationContext
*/
String getGroup();
/**
* Gets optional user-defined element's metadata. When a {@link NCToken token} for this element
* is detected in the input this metadata can be accessed via {@link NCToken#getElementMetadata()} method.
*
* @return Element's metadata.
*/
NCMetadata getMetadata();
/**
* Gets optional element description.
*
* @return Optional element description.
*/
String getDescription();
/**
* Gets optional map of {@link NCValue values} for this element.
*
* Each element can generally be recognized either by one of its synonyms or values. Elements and their values
* are analogous to types and instances of that type in programming languages. Each value
* has a name and optional set of its own synonyms by which that value, and ultimately its element, can be
* recognized by. Note that value name itself acts as an implicit synonym even when no additional synonyms added
* for that value.
*
* Consider this example. A model element {@code x:car} can have:
*
* -
* Set of general synonyms:
*
{transportation|transport|*} {vehicle|car|sedan|auto|automobile|suv|crossover|coupe|truck}
*
* - Set of values:
*
* - {@code mercedes} with synonyms {@code (mercedes, mercedes-benz, mb, benz)}
* - {@code bmw} with synonyms {@code (bmw, bimmer)}
* - {@code chevrolet} with synonyms {@code (chevy, chevrolet)}
*
*
*
* With that setup {@code x:car} element will be recognized by any of the following input sub-string:
*
* - {@code transport car}
* - {@code benz}
* - {@code automobile}
* - {@code transport vehicle}
* - {@code sedan}
* - {@code chevy}
* - {@code bimmer}
* - {@code x:car}
*
*
* @return Map of value's name and its synonyms or {@code null} if not defined.
*/
List getValues();
/**
* Gets optional ID of the immediate parent element. Parent ID allows elements to form into hierarchy
* and can be used by the user logic in {@link NCModel#query(NCQueryContext)} method.
*
* @return Optional parent element ID, or {@code null} if not specified.
*/
String getParentId();
/**
* Gets the list of synonyms by which this semantic element will be recognized by.
*
* @return List of synonyms for this element. List is generally optional since element's ID acts
* as an implicit synonym.
* @see #getExcludedSynonyms()
*/
List getSynonyms();
/**
* Gets the optional list of synonyms to exclude from the list returned by {@link #getSynonyms()}.
* Can return empty list or {@code null} to indicate that there are no synonyms to exclude.
*
* Note that it is sometimes easier to exclude a specific synonym or a group of synonyms than creating
* complex rules with macros and option groups for inclusive synonyms.
*
* @return Optional list of synonyms to exclude.
* @see #getSynonyms()
*/
List getExcludedSynonyms();
}
© 2015 - 2025 Weber Informatics LLC | Privacy Policy