org.apache.lucene.analysis.ja.stopwords.txt Maven / Gradle / Ivy

Go to download

Show more of this group Show more artifacts with this name
Show all versions of lucene-analyzers-kuromoji Show documentation

Lucene Kuromoji Japanese Morphological Analyzer

There is a newer version: 8.11.3

Show newest version

#
# This file defines a stopword set for Japanese.
#
# This set is made up of hand-picked frequent terms from segmented Japanese Wikipedia.
# Punctuation characters and frequent kanji have mostly been left out.  See LUCENE-3745
# for frequency lists, etc. that can be useful for making your own set (if desired)
#
# Note that there is an overlap between these stopwords and the terms stopped when used
# in combination with the JapanesePartOfSpeechStopFilter.  When editing this file, note
# that comments are not allowed on the same line as stopwords.
#
# Also note that stopping is done in a case-insensitive manner.  Change your StopFilter
# configuration if you need case-sensitive stopping.  Lastly, note that stopping is done
# using the same character width as the entries in this file.  Since this StopFilter is
# normally done after a CJKWidthFilter in your chain, you would usually want your romaji
# entries to be in half-width and your kana entries to be in full-width.
#
の
に
は
を
た
が
で
て
と
し
れ
さ
ある
いる
も
する
から
な
こと
として
い
や
れる
など
なっ
ない
この
ため
その
あっ
よう
また
もの
という
あり
まで
られ
なる
へ
か
だ
これ
によって
により
おり
より
による
ず
なり
られる
において
ば
なかっ
なく
しかし
について
せ
だっ
その後
できる
それ
う
ので
なお
のみ
でき
き
つ
における
および
いう
さらに
でも
ら
たり
その他
に関する
たち
ます
ん
なら
に対して
特に
せる
及び
これら
とき
では
にて
ほか
ながら
うち
そして
とともに
ただし
かつて
それぞれ
または
お
ほど
ものの
に対する
ほとんど
と共に
といった
です
とも
ところ
ここ
##### End of file