docbook.webhelp.docs.ch03s02.html Maven / Gradle / Ivy
Search - - README: Web-based Help from DocBook XML
Overview design of Search mechanism.
The serching is a fully client-side implementation of querying texts for content
searching. There's no server involved. So, the search queries by the users are processed by
JavaScript inside the browser, and displays the matching results by comparing the query with
a simplified 'index' that too resides in JavaScript. Mainly the search mechanism has two
parts.
-
Indexing: First we need to traverse the content in
the docs folder and index the words in it. This is done
by webhelpindexer.jar
in
xsl/extentions/
folder. You can
invoke it by ant index
command from the
root of webhelp of directory. The source of
webhelpindexer is now moved to it's own location at
trunk/xsl-webhelpindexer/
.
Checkout the Docbook trunk svn directory to get this
source. Then, do your changes and recompile it by simply
running ant
command. My assumption is that
it can be opened by Netbeans IDE by one click. Or if you
are using IntelliJ Idea, you can simply create a new
project from existing sources. Indexer has extensive
support for features such as word scoring, stemming of
words, and support for languages English, German,
French. For CJK (Chinese, Japanese, Korean) languages,
it uses bi-gram tokenizing to break up the words (since
CJK languages does not have spaces between
words).
When ant index
is run, it generates five output files:
-
htmlFileList.js
- This contains an array named
fl
which stores details all the files indexed by the indexer.
Further, the doStem in it defines whether stemming should be used. It defaults
to false.
-
htmlFileInfoList.js
-
This includes some meta data about the indexed
files in an array named fil
. It
includes details about file name, file (html)
title, a summary of the content. Format would look
like, fil["4"]= "ch03.html@@@Developer
Docs@@@This chapter provides an overview of how
webhelp is implemented.";
-
index-*.js
(Three index files) - These three files
actually stores the index of the content. Index is added to an array named
w
.
-
Querying: Query processing happens totally in client side. Following JavaScript
files handles them.
-
nwSearchFnt.js
- This handles the user query and
returns the search results. It does query word tokenizing, drop unnecessary
punctuations and common words, do stemming if docbook language supports it,
etc.
-
{$indexer-language-code}_stemmer.js
- This includes the
stemming library. nwSearchFnt.js
file calls
stemmer
method in this file for stemming. ex: var stem =
stemmer(foobar);