All Downloads are FREE. Search and download functionalities are using the official Maven repository.

resources..russie-ortho-metadata.long-desc.html Maven / Gradle / Ivy

The newest version!

A named entity recognition pipeline that identifies basic entity types, such as Person, Location, Organization, Money amounts, Time and Date expressions. It works on documents in the Russian language.

This version of the pipeline includes an orthomatcher to perform basic coreference resolution based on orthographic similarity.

Default annotations
:Person Standard named entity types
:Location
:Organization
:Date
:Address Includes email and IP addresses as well as street addresses
Additional annotations available if selected
:Money Monetary amounts
:Percent Expressions representing percentages
:Token The individual tokens of the text, with "category" feature for POS
:SpaceToken The spaces between tokens
:Sentence Sentences detected by the sentence splitter
:Lookup Individual gazetteer lookups
:MSD "Morpho-Syntactic Description" for selected tokens, including features for "lemma" (the base form of inflected words) and "type" (roughly equivalent to a part of speech tag in English, though more complex as it encodes features such as gender, grammatical case, etc.)




© 2015 - 2025 Weber Informatics LLC | Privacy Policy