All Downloads are FREE. Search and download functionalities are using the official Maven repository.

lemmatizer-dicts.language-tool.README.txt Maven / Gradle / Ivy

 
DICTIONARIES FOR LEMMATIZATION
--------------------------------

The dictionaries contained in this tarball have been prepared using the dictionaries
distributed by the Language Tool project (http://languagetool.org/),
which distributes its data contents under a variety of licenses. 

http://wiki.languagetool.org/archive-developing-a-tagger-dictionary

Please read the COPYING file in this tarball for terms about
distribution and use of these dictionaries. 

Languages: 
    + English: 301402 word-lemma-pos combinations. 


The format of the dictionaries consist of "word\tab\lemma\tabpos" and it can be 
used to perform dictionary-based lemmatization. They are distributed as 
spanish.dict, english.dict, etc., in a finite state automata format created 
using Morfologik (https://github.com/morfologik/). These dictionaries need 
the corresponding $lang.info files to work. 

These dictionaries (in both binary and plain text format) are being
used by the ready to use IXA pipes (ixa2.si.ehu.es/ixa-pipes) pos tagger 
(https://github.com/ixa-ehu/ixa-pipe-pos) to perform English and
Spanish dictionary-based lemmatization.  

Note that plain text dictionaries can also be used in their current form
via the API (check SimpleLemmatizer class). 

Contact details: 
----------------
Rodrigo Agerri 
IXA NLP Group
University of the Basque Country (UPV/EHU)




© 2015 - 2025 Weber Informatics LLC | Privacy Policy