docbook.webhelp.docs.ch03s02s01.html Maven / Gradle / Ivy
New Stemmers - - README: Web-based Help from DocBook XML
Adding new Stemmers is very simple.
Currently, only English, French, and German stemmers are integrated in to WebHelp. But
the code is extensible such that you can add new stemmers easily by few steps.
What you need:
-
You'll need two versions of the stemmer; One written in JavaScript, and another
in Java. But fortunately, Snowball contains Java stemmers for number of popular
languages, and are already included with the package. You can see the full list in
Adding support for other (non-CJKV) languages.
If your language is listed there, Then you have to find javascript version of the
stemmer. Generally, new stemmers are getting added in to Snowball Stemmers in
other languages location. If javascript stemmer for your language is
available, then download it. Else, you can write a new stemmer in JavaScript using
SnowBall algorithm fairly easily. Algorithms are at Snowball.
-
Then, name the JS stemmer exactly like this:
{$language-code}_stemmer.js
.
For example, for Italian(it), name it as,
it_stemmer.js
. Then, copy it to
the
docbook-webhelp/template/search/stemmers/
folder. (I assumed
docbook-webhelp
is the root
folder for webhelp.)
Note
Make sure you changed the
webhelp.indexer.language
property
in build.properties
to your
language.
-
Now two easy changes needed for the indexer.
-
Open
docbook-webhelp/indexer/src/com/nexwave/nquindexer/IndexerTask.java
in a text editor and add your language code to the
supportedLanguages
String Array.
Example 2. Add new language to supportedLanguages array
change the Array from,
private String[] supportedLanguages= {"en", "de", "fr", "cn", "ja", "ko"};
//currently extended support available for
// English, German, French and CJK (Chinese, Japanese, Korean) languages only.
To,
private String[] supportedLanguages= {"en", "de", "fr", "cn", "ja", "ko", "it"};
//currently extended support available for
// English, German, French, CJK (Chinese, Japanese, Korean), and Italian languages only.
-
Now, open
docbook-webhelp/indexer/src/com/nexwave/nquindexer/SaxHTMLIndex.java
and add the following line to the code where it initializes the Stemmer (Search
for SnowballStemmer stemmer;
). Then add code to initialize the
stemmer Object in your language. It's self understandable. See the example. The
class names are at:
docbook-webhelp/indexer/src/com/nexwave/stemmer/snowball/ext/
.
Example 3. Initialize correct stemmer based on the
webhelp.indexer.language
specified
SnowballStemmer stemmer;
if(indexerLanguage.equalsIgnoreCase("en")){
stemmer = new EnglishStemmer();
} else if (indexerLanguage.equalsIgnoreCase("de")){
stemmer= new GermanStemmer();
} else if (indexerLanguage.equalsIgnoreCase("fr")){
stemmer= new FrenchStemmer();
}
else if (indexerLanguage.equalsIgnoreCase("it")){ //If language code is "it" (Italian)
stemmer= new italianStemmer(); //Initialize the stemmer to italianStemmer
object.
}
else {
stemmer = null;
}
That's all. Now run ant build-indexer
to compile and build the java code.
Then, run ant webhelp
to generate the output from your docbook file. For any
questions, contact us or email to the docbook mailing list
<[email protected]>
.