iki.ok-ml-pipelines_2.10.0.2-spark1.6.source-code.README.md Maven / Gradle / Ivy
## About the "languages" folder and files
Most of these files are from the original software from Nakatani Shuyo.
Unfortunately, the data sources from which they were generated are not available.
It looks like the text comes from Wikipedia pages.
To generate your own language profile, see the main readme at https://github.com/optimaize/language-detector
km Khmer:
sources available, see https://github.com/optimaize/language-detector/issues/19
## About the "languages.shorttext" folder and files
These files are from the original software from Nakatani Shuyo.
Either they are for detecting language on short messages, or they are built from short message text, or
both, I don't know.
## About the "messages.properties" file
They are used in the CharNormalizer.