org.cleartk.syntax.dependency.clear.README Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of cleartk-clearparser Show documentation
ClearTK wrapper for ClearParser
The newest version!
############################
conll-2009-dev-shift-pop.jar
############################
The model conll-2009-dev-shift-pop.jar was provided by Jinho Choi and is trained on the 
development set of the CoNLL 2009 English data.  More information about this data can 
be found here:

http://ufal.mff.cuni.cz/conll2009-st/train-dev-data.html

The ConLL 2009 shared task is described in the following paper. 
http://aclweb.org/anthology-new/W/W09/W09-1201.pdf

This model uses the "shift pop" algorithm.  When using this model you should specify the 
value AbstractDepParser.ALG_SHIFT_POP for the configuration parameter 
org.cleartk.syntax.dependency.clear.ClearParser.parserAlgorithmName as annotated in 
ClearParser.  Also, your default memory setting for your JVM may not suffice to load this 
model.  It can load with the following argument "-Xmx1g". 

On 29 Sep 2011, the model was modified by adding a "1" before the "17" near the end of
the "lexica" file, to accomodate a change in model format for the 0.4.0-SNAPSHOT release.

#################
additional models
#################

There is an additional model available from the ClearTK downloads page:

http://code.google.com/p/cleartk/downloads/list

and is called conll-2009-training-dev-shift-pop.jar.  This model is built from
both the training data and development data from the CoNLL 2009 shared task (see
links above).  This model was provided by Jinho Choi as a file named 
conll-trndev-sp.mod.3.  This model is very large and expands considerably in memory. 
You will need at least 8GB (gigabytes) to load it and run it.     

#####
Notes
#####

- Both models were trained using PennTreebank part-of-speech tags.  So, your input part-of-speech
tags should match the tags used for training this model.  The part-of-speech tag for punctuation 
symbols such as "." "," ":" ";" "(" ")" etc. are the symbols themselves (i.e. the part-of-speech tag
for the token ")" should be ")".)  This may be inconsistent with other PTB-derived tagging schemes that
may use tags such as "COLON" or "RRB".  Your part-of-speech tags should be modified to be consistent with
the tagging scheme used here.

- The dependency labeling scheme produced by the models provided here is separate/different from that of
the Malt parser models used by ClearTK's wrapper of the Malt parser.  You should not assume that 
the wrapper for the Clear Parser can be used interchangeably with the wrapper for the Malt Parser.