edu.stanford.nlp.objectbank.package-info Maven / Gradle / Ivy

Go to download

Show more of this group Show more artifacts with this name
Show all versions of stanford-parser Show documentation

Stanford Parser processes raw text in English, Chinese, German, Arabic, and French, and extracts constituency parse trees.

The newest version!

/**
 * The ObjectBank class is designed to make it easy to change the format/source
 * of data read in by other classes and to standardize how data is read in javaNLP
 * classes.  This should make reuse of existing code (by non-authors of the code)
 * easier because one has to just create a new ObjectBank which knows where to
 * look for the data and how to turn it into Objects, and then use the new
 * ObjectBank in the class.  This will also make it easier to reuse code for
 * reading in the same data.
 * 
 * An ObjectBank is a Collection of Objects.  These objects are taken
 * from input sources and then tokenized and parsed into the desired
 * kind of Object.  An ObjectBank requires a  ReaderIteratorFactory and an
 * IteratorFromReaderFactory.  The  ReaderIteratorFactory is used to get
 * an Iterator over java.util.Readers which contain representations of
 * the Objects.  A  ReaderIteratorFactory resembles a  Collection that
 * takes input sources and dispenses  Iterators over  java.util.Readers
 * of those sources.  An  IteratorFromReaderFactory is used to turn a single
 * java.util.Reader into an  Iterator over Objects.  The  IteratorFromReaderFactory
 * splits the contents of the  java.util.Reader into Strings and then parses them
 * into appropriate Objects.
 * 

 * Example Usage:
 * 

 * You have a collection of files in the directory /u/nlp/data/gre/questions.  Each file
 * contains several Puzzle documents which look like:
 * 
 * <puzzle>
 * <preamble> some text </preamble>
 * <question> some intro text
 * <answer> answer1 </answer>
 * <answer> answer2 </answer>
 * <answer> answer3 </answer>
 * <answer> answer4 </answer>
 * </question>
 * <question> another question
 * <answer> answer1 </answer>
 * <answer> answer2 </answer>
 * <answer> answer3 </answer>
 * <answer> answer4 </answer>
 * </question>
 * </puzzle>
 * 
 *
 * First you need to build a  ReaderIteratorFactory which will provide  java.io.Readers
 * over all the files in your directory:
 * 
 *


 * Collection c = new FileSequentialCollection("/u/nlp/data/gre/questions/", "", false);
 * ReaderIteratorFactory rif = new ReaderIteratorFactory(c);
 * 
 * 
 * Next you need to make a  IteratorFromReaderFactory which will take the  java.io.Readers
 * vended by the  ReaderIteratorFactory, split them up into documents (Strings) and
 * then convert the Strings into Objects.  In this case we want to keep everything
 * between each set of <puzzle> </puzzle> tags so we would use a  BeginEndIteratorFactory.
 * You would also need to write a class which extends Appliable and whose apply method
 * converts the String between the <puzzle> </puzzle> tags into Puzzle objects.
 *
 * 
 * public class PuzzleParser implements Appliable {
 * public Object apply (Object o) {
 * String s = (String)o;
 * ...
 * Puzzle p = new Puzzle(...);
 * ...
 * return p;
 * 
 *
 * Now to build the  IteratorFromReaderFactory:
 *
 *  * IteratorFromReaderFactory rtif = BeginEndIterator.getFactory("<puzzle>", "</puzzle>", new PuzzleParser());
 * 
 *
 * Now, to create your  ObjectBank you just give it the  ReaderIteratorFactory and
 * IteratorFromReaderFactory that you just created:
 *
 *  * ObjectBank puzzles = new ObjectBank(rif, rtif);
 * 
 *
 * Now, if you get a new set of puzzles that are located elsewhere and formatted differently
 * you create a new  ObjectBank for reading them in and use that  ObjectBank instead with only
 * trivial changes (or possible none at all if the ObjectBank is read in on a constructor)
 * to your code.  Or even better, if someone else wants to use your code to evaluate their puzzles,
 * which are  located elsewhere and formatted differently, they already know what they have to do
 * to make your code work for them.
 */
package edu.stanford.nlp.objectbank;