com.aliasi.classify.PerceptronClassifier Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of aliasi-lingpipe Show documentation
This is the original Lingpipe: http://alias-i.com/lingpipe/web/download.html There were not made any changes to the source code.
There is a newer version: 4.1.2-JL1.0
Show newest version
/*
 * LingPipe v. 4.1.0
 * Copyright (C) 2003-2011 Alias-i
 *
 * This program is licensed under the Alias-i Royalty Free License
 * Version 1 WITHOUT ANY WARRANTY, without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the Alias-i
 * Royalty Free License Version 1 for more details.
 *
 * You should have received a copy of the Alias-i Royalty Free License
 * Version 1 along with this program; if not, visit
 * http://alias-i.com/lingpipe/licenses/lingpipe-license-1.txt or contact
 * Alias-i, Inc. at 181 North 11th Street, Suite 401, Brooklyn, NY 11211,
 * +1 (718) 290-9170.
 */

package com.aliasi.classify;

import com.aliasi.corpus.Corpus;
import com.aliasi.corpus.ObjectHandler;

import com.aliasi.features.Features;

import com.aliasi.matrix.KernelFunction;
import com.aliasi.matrix.Vector;

import com.aliasi.symbol.MapSymbolTable;
import com.aliasi.symbol.SymbolTable;

import com.aliasi.util.AbstractExternalizable;
import com.aliasi.util.Arrays;
import com.aliasi.util.Compilable;
import com.aliasi.util.FeatureExtractor;

import java.io.IOException;
import java.io.ObjectInput;
import java.io.ObjectOutput;
import java.io.Serializable;

import java.util.ArrayList;
import java.util.List;
import java.util.HashMap;
import java.util.Map;

/**
 * A PerceptronClassifier implements a binary classifier
 * based on an averaged kernel-based perceptron.  These
 * classifiers are large margin (discriminitive) linear classifiers in
 * a feature space expanded by a plug-and-play kernel implemeting
 * {@link KernelFunction}.
 *
 * A perceptron classifier may be applied to any type of object.
 * An appropriately typed {@link FeatureExtractor} is used to map
 * these objects to feature vectors for use in the perceptron
 * classifier.
 *
 * 
Corpus Training
 *
 * Unlike the language-model-based classifiers, for which training,
 * classification and compilation may be interleaved, averaged
 * perceptron-based classifiers require batch training.  This requires
 * the entire training corpus to be available in one shot.  In
 * particular, training will iterate over a fixed corpus multiple
 * times before producing a completely trained classifier.
 *
 * 
The constructor will do the training using the supplied instance of
 * {@link Corpus}.  The constructor will store the entire corpus in
 * memory in the form of {@link com.aliasi.matrix.SparseFloatVector} and boolean
 * polarities for the accept/reject decision.  The corpus will only be
 * held locally in the constructor; it is available for garbage
 * collection, as are all intermediate results, as soon as the
 * constructor is done training.
 *
 * 
Kernel Function
 *
 * The basic (non-kernel) perceptron is equivalent to using the
 * kernel function {@link com.aliasi.matrix.DotProductKernel}.  A good
 * choice for most classification tasks is the polynomial kernel,
 * implemented in {@link com.aliasi.matrix.PolynomialKernel}.
 * Usually, higher polynomial kernel degrees perform dramatically
 * better than dot products.  3 is a good general starting degree, as
 * sometimes performance degrades in higher kernel degrees.  In
 * some cases, the Gaussian radial basis kernel implemented in
 * {@link com.aliasi.matrix.GaussianRadialBasisKernel} works well.
 *
 * 
If the kernel function is neither serializable nor compilable,
 * then the resulting perceptron classifier will not be serializable.
 *
 * 
Training Iterations
 *
 * More training iterations are usually better for accuracy.  As
 * more basis vectors are added to the perceptron, more memory is
 * needed for the model and more time is needed for classification.
 * Typically, the amount of memory required at run time will
 * stabilize after a few training iterations.
 *
 * 
Memory Usage: Training and Compiled
 *
 * The memory usage of a perceptron classifier may be very high.
 * The perceptron must store every feature vector in the input on
 * which the classifier being trained made a mistake in some
 * iteration.  During training, every feature vector in the input
 * corpus must be stored.  These are stored in memory as instances of
 * {@link com.aliasi.matrix.SparseFloatVector}.
 *
 * 
If the data are linearly separable in the kernel space,
 * the training process will converge to the point where no additional
 * basis feature vectors are needed and the result will converge to
 * using the single final perceptron (which requires all the intermediate
 * to be stored given the demands of a non-linear kernel calculation).
 *
 * 
Serialization
 *
 * After a perceptron classifier is constructed, it may be
 * serialized to an object output stream.  If the underlying
 * feature extractor is compilable, it's compiled, but if
 * it's not compilable, it's serialized.  To be serializable,
 * a perceptron classifier requires both its feature extractor
 * and kernel function to be {@link Serializable}.
 *
 * 
The object read back in after serialization will
 * be an instance of PerceptronClassifier.
 *
 * 
About Averaged Kernel Perceptrons
 * Perceptrons
 * are a kind of large-margin linear classifier.  The
 * polynomial kernel trick
 * is used to embed the basic feature vector in a higher-degree vector space
 * in which the data are more separable.
 *
 * 
An average of all of the perceptrons created during training is
 * used for the final prediction. The factored formulation of the
 * algorithm allows the perceptrons to be expressed as linearly
 * weighted training samples.
 *
 * 
Although theoretical bounds are almost equivalent, in practice
 * averaged perceptrons slightly underperform support
 * vector machine (SVM) learners over the same polynomial kernels.
 * The advantage of perceptrons is that they are much more efficient
 * in time and slightly more efficient in space in practice.
 *
 * 
Averaged Perceptron Model with Polynomial Kernel
 *
 * The model used for runtime predictions by the averaged
 * perceptron is quite straightforward, consisting of a set of
 * weighted feature vectors (represented as parallel arrays of basis
 * vectors and weights) and a kernel degree:
 *
 * 
 * Vector[] basisVectors;
 * int[] weights;
 * int degree;
 *
 * The basis vectors are all vectors derived from single training
 * examples by the specified feature extractor.  The weights may
 * be positive or negative and represent the cumulative voted
 * weight of the specified basis vector.
 *
 * The kernel function computes a distance
 * kernel(v1,v2) between vectors v1 and
 * v2 in an enhanced feature space defined by the
 * particular kernel employed.
 *
 * 
A new input to classify is first converted to a feature
 * vector by the feature extractor.  Classification is then
 * based on the sign of the following score:
 *
 * 
 * score(Vector v) = Σ_{_i} weights[i] * kernel(basisVectors[i],v)
 *
 * An example is accepted if the score of its feature vector is
 * greater than zero and rejected otherwise.
 *
 * Estimating the Perceptron Model
 *
 * To estimate the perceptron model, we will assume that we have a
 * training corpus consisting of an array of vectors with boolean
 * polarities indicating whether they are positive (to accept) or
 * negative (to reject) examples.  We also assume we have a fixed
 * kernel function.  The training method iterates over the corpus a
 * specified number of times.
 *
 *  * Vector[] basisVectors;
 * int[] incrementalWeights;
 * boolean[] polarities;
 * int degree;
 * int index = -1;
 * for (# iterations)
 *     for (vector,polarity) in training corpus
 *         yHat = scoreIntermediate(vector);
 *         if (yHat > 0 && polarity) || (yHat < 0 && !polarity)
 *             ++incrementalWeights[index];
 *         else
 *             ++index;
 *             basisVectors[index] = vector;
 *             polarities[index] = polarity;
 *             incrementalWeights[index] = 1;
 *
 *  * scoreIntermediate(vector)
 *   = Σ_{_{i <= index}} polarities[i] * kernel(basisVectors[i],vector)
 * 
 *
 * The final weight for a vector is the cumulative weight
 * computed as follows:
 *
 *  * cumulativeWeight(j) = Σ_{_{k >= j}} incrementalWeights[k]
 * 
 *
 * The actual implementations of these methods involve
 * considerably more indirection and index chasing to avoid
 * copies and duplication in the final vectors.
 *
 * Historical Notes
 *
 * The averaged kernel perceptron implemented here was introduced
 * in the following paper, which also provides error bounds
 * for learning and evaluations with polynomial kernels of various
 * degrees:
 *
 * 

 * Freund, Yoav and Robert E. Schapire (1999)
 * Large margin classification using the perceptron algorithm.
 * Machine Learning 37(3):277-296.
 * 
 *
 * The basic perceptron model was introduced in:
 *
 * 
 * Block, H.D. (1962) The perceptron: a model for brain functioning.
 * Reviews of Modern Physics 34:123-135.
 * 

 * The kernel-based perceptron was introduced in:
 *
 * 

 * Aizerman, M.A., E.M. Braverman, and L.I. Rozonoer.  1964.
 * Theoretical foundations of the potential function method in pattern
 * recognition learning.  Automation and Remote Control.
 * 25:821-837.
 * 
 *
 * The basis of the voting scheme is a deterministically averaged
 * version of the randomized approach of adapting online learners
 * to a batch setup described in the following paper:
 *
 * 
 * Helmbold, D.P. and M.K. Warmuth.  (1995)
 * On weak learning.  Journal of Computer and System Sciences
 * 50:551-573.
 * 
 *
 * @author  Bob Carpenter
 * @version 4.0.0
 * @since   LingPipe3.1
 * @param  the type of object being classified
 */
public class PerceptronClassifier
    implements ScoredClassifier,
               Serializable {

    static final long serialVersionUID = 8752291174601085455L;

    final FeatureExtractor mFeatureExtractor;
    final MapSymbolTable mSymbolTable;
    final KernelFunction mKernelFunction;

    final Vector[] mBasisVectors;
    final int[] mBasisWeights;

    final String mAcceptCategory;
    final String mRejectCategory;


    PerceptronClassifier(FeatureExtractor featureExtractor,
                         KernelFunction kernelFunction,
                         MapSymbolTable symbolTable,
                         Vector[] basisVectors,
                         int[] basisWeights,
                         String acceptCategory,
                         String rejectCategory) {

        mFeatureExtractor = featureExtractor;

        mKernelFunction = kernelFunction;
        mBasisVectors = basisVectors;
        mBasisWeights = basisWeights;

        mAcceptCategory = acceptCategory;
        mRejectCategory = rejectCategory;

        mSymbolTable = symbolTable;
    }

    /**
     * Construct a perceptron classifier from the specified feature extractor,
     * corpus with designated accept category, polynomial kernel degree and
     * number of training iterations, and output accept and reject categories.
     *
     * @param corpus Corpus to use for training.
     * @param featureExtractor Feature extractor for objects.
     * @param corpusAcceptCategory Category in training data to treat as positive.
     * @param kernelFunction Kernel function for expanding vector basis.
     * @param numIterations Number of iterations to carry out during training.
     * @param outputAcceptCategory Category with which to label accepted instances.
     * @param outputRejectCategory Category with which to label rejected instances.
     */
    public PerceptronClassifier(Corpus>> corpus,
                                FeatureExtractor featureExtractor,
                                KernelFunction kernelFunction,
                                String corpusAcceptCategory,
                                int numIterations,
                                String outputAcceptCategory,
                                String outputRejectCategory)
        throws IOException {

        mFeatureExtractor = featureExtractor;
        mKernelFunction = kernelFunction;
        mAcceptCategory = outputAcceptCategory;
        mRejectCategory = outputRejectCategory;

        mSymbolTable = new MapSymbolTable();

        // collect training vectors and categories
        CorpusCollector collector = new CorpusCollector();
        corpus.visitCorpus(collector);
        Vector[] featureVectors = collector.featureVectors();
        boolean[] polarities = collector.polarities();
        corpus = null; // don't need it any more

        // initialize perceptrons
        int currentPerceptronIndex = -1;  // no initial zero perceptron
        int[] weights = new int[INITIAL_BASIS_SIZE];
        int[] basisIndexes = new int[INITIAL_BASIS_SIZE];

        for (int iteration = 0; iteration < numIterations; ++iteration) {
            // System.out.println("\n\nIteration=" + iteration);
            for (int i = 0; i < featureVectors.length; ++i) {
                double yHat = prediction(featureVectors[i],
                                         featureVectors,
                                         polarities,
                                         weights,
                                         basisIndexes,
                                         currentPerceptronIndex);
                boolean accept = yHat > 0.0;
                //System.out.println("      yHat=" + yHat
                // + " accept=" + accept
                // + " for vect=" + featureVectors[i]);
                if (accept == polarities[i]) {
                    // System.out.println("       correct");
                    if (currentPerceptronIndex >= 0) // avoid incrementing zero
                        ++weights[currentPerceptronIndex];
                } else {
                    // System.out.println("       incorrect");
                    ++currentPerceptronIndex;
                    if (currentPerceptronIndex >= weights.length) {
                        weights = Arrays.reallocate(weights);
                        basisIndexes = Arrays.reallocate(basisIndexes);
                    }
                    basisIndexes[currentPerceptronIndex] = i;
                    weights[currentPerceptronIndex] = 1;
                }
            }
        }

        // renumber indexes to pack only necessary basis vectors
        Map renumbering = new HashMap();
        int next = 0;
        for (int i = 0; i <= currentPerceptronIndex; ++i)
            if (!renumbering.containsKey(basisIndexes[i]))
                renumbering.put(basisIndexes[i],next++);

        // compute basis vectors and cumulative weight for avg
        mBasisVectors = new Vector[renumbering.size()];
        mBasisWeights = new int[renumbering.size()];
        int weightSum = 0;
        for (int i = currentPerceptronIndex+1; --i >= 0; ) {
            int oldIndex = basisIndexes[i];
            int newIndex = renumbering.get(oldIndex);
            mBasisVectors[newIndex] = featureVectors[oldIndex];
            weightSum += weights[i];
            if (polarities[i])
                mBasisWeights[newIndex] += weightSum;
            else
                mBasisWeights[newIndex] -= weightSum;
        }
    }

    /**
     * Returns the kernel function for this perceptron.
     *
     * @return The kernel function for this perceptron.
     */
    public KernelFunction kernelFunction() {
        return mKernelFunction;
    }


    /**
     * Returns the feature extractor for this perceptron.
     *
     * @return The feature extractor for this perceptron.
     */
    public FeatureExtractor featureExtractor() {
        return mFeatureExtractor;
    }

    /**
     * Returns a string-based representation of this perceptron.
     * This may be long, as it outputs every basis vector and weight.
     *
     * @return A string-based representation of this perceptron.
     */
    @Override
    public String toString() {
        StringBuilder sb = new StringBuilder();
        sb.append("Averaged Perceptron");
        sb.append("  Kernel Function=" + mKernelFunction + "\n");
        for (int i = 0; i < mBasisVectors.length; ++i)
            sb.append("  idx=" + i + " "
                      + "vec=" + mBasisVectors[i]
                      + " wgt=" + mBasisWeights[i]
                      + "\n");
        return sb.toString();
    }


    /**
     * Return the scored classification for the specified input.  The
     * input is first converted to a feature vector using the feature
     * extractor, then scored against the perceptron.  The resulting
     * score for the accept category is the perceptron score, and
     * the resulting score for the reject category is the negative
     * perceptron score.
     *
     * @param in The element to be classified.
     * @return The scored classification for the specified element.
     */
    public ScoredClassification classify(E in) {
        Map featureVector = mFeatureExtractor.features(in);
        Vector inputVector = Features.toVector(featureVector,mSymbolTable,Integer.MAX_VALUE,false);
        double sum = 0.0;
        for (int i = mBasisVectors.length; --i >= 0; )
            sum += mBasisWeights[i] * mKernelFunction.proximity(mBasisVectors[i],
                                                               inputVector);
        return sum > 0
            ? new ScoredClassification(new String[] { mAcceptCategory,
                                                      mRejectCategory },
                                       new double[] { sum, -sum })
            : new ScoredClassification(new String[] { mRejectCategory,
                                                      mAcceptCategory },
                                       new double[] { -sum, sum });
    }

    double prediction(Vector inputVector,
                      Vector[] featureVectors,
                      boolean[] polarities,
                      int[] ignoreMyWeights,
                      int[] basisIndexes,
                      int currentPerceptronIndex) {
        double sum = 0.0;
        // int weightSum = 0;
        int weightSum = 1;
        for (int i = currentPerceptronIndex; i >= 0; --i) {
            // weightSum += weights[i];
            int index = basisIndexes[i];
            double kernel = mKernelFunction.proximity(inputVector,featureVectors[index]);
            double total = (polarities[i] ? weightSum : -weightSum) * kernel;
            sum += total;
        }
        return sum;
    }

    static double power(double base, int exponent) {
        switch (exponent) {
        case 0:
            return 1.0;
        case 1:
            return base;
        case 2:
            return base * base;
        case 3:
            return base * base * base;
        case 4:
            return base * base * base * base;
        default:
            return java.lang.Math.pow(base,exponent);
        }
    }

    private Object writeReplace() {
        return new Externalizer(this);
    }


    class CorpusCollector
        implements ObjectHandler> {
        
        final List mInputFeatureVectorList
            = new ArrayList();
        final List mInputAcceptList
            = new ArrayList();

        public void handle(Classified classified) {
            E object = classified.getObject();
            Classification c = classified.getClassification();
            Map featureMap = mFeatureExtractor.features(object);
            mInputFeatureVectorList.add(Features.toVectorAddSymbols(featureMap,mSymbolTable,Integer.MAX_VALUE,false));
            mInputAcceptList.add(mAcceptCategory.equals(c.bestCategory())
                                 ? Boolean.TRUE
                                 : Boolean.FALSE);
        }
        Vector[] featureVectors() {
            return mInputFeatureVectorList.toArray(EMPTY_SPARSE_FLOAT_VECTOR_ARRAY);
        }
        boolean[] polarities() {
            boolean[] categories = new boolean[mInputAcceptList.size()];
            for (int i = 0; i < categories.length; ++i)
                categories[i] = mInputAcceptList.get(i).booleanValue();
            return categories;
        }
    }

    static final Vector[] EMPTY_SPARSE_FLOAT_VECTOR_ARRAY
        = new Vector[0];


    static class Externalizer extends AbstractExternalizable {
        static final long serialVersionUID = -1901362811305741506L;
        final PerceptronClassifier mClassifier;
        public Externalizer() {
            this(null);
        }
        public Externalizer(PerceptronClassifier classifier) {
            mClassifier = classifier;
        }
        @Override
        @SuppressWarnings("deprecation")
        public Object read(ObjectInput in) throws ClassNotFoundException, IOException {

            // required for read object
            @SuppressWarnings("unchecked")
            FeatureExtractor featureExtractor
                = (FeatureExtractor) in.readObject();

            KernelFunction kernelFunction
                = (KernelFunction) in.readObject();

            MapSymbolTable symbolTable = (MapSymbolTable) in.readObject();

            int basisLen = in.readInt();
            Vector[] basisVectors = new Vector[basisLen];
            for (int i = 0; i < basisLen; ++i)
                basisVectors[i] = (Vector) in.readObject();

            int[] basisWeights = new int[basisLen];
            for (int i = 0; i < basisLen; ++i)
                basisWeights[i] = in.readInt();

            String acceptCategory = in.readUTF();
            String rejectCategory = in.readUTF();

            return new PerceptronClassifier(featureExtractor,
                                               kernelFunction,
                                               symbolTable,
                                               basisVectors,
                                               basisWeights,
                                               acceptCategory,
                                               rejectCategory);
        }

        @Override
        public void writeExternal(ObjectOutput out) throws IOException {
            AbstractExternalizable.compileOrSerialize(mClassifier.mFeatureExtractor,out);
            AbstractExternalizable.compileOrSerialize(mClassifier.mKernelFunction,out);

            // symbol table
            out.writeObject(mClassifier.mSymbolTable);


            // basis length
            out.writeInt(mClassifier.mBasisVectors.length);

            // basis vectors
            for (int i = 0; i < mClassifier.mBasisVectors.length; ++i)
                out.writeObject(mClassifier.mBasisVectors[i]);

            // basis weights
            for (int i = 0; i < mClassifier.mBasisWeights.length; ++i)
                out.writeInt(mClassifier.mBasisWeights[i]);

            // accept, reject cats
            out.writeUTF(mClassifier.mAcceptCategory);
            out.writeUTF(mClassifier.mRejectCategory);

         }
    }


    static final int INITIAL_BASIS_SIZE = 32*1024;  // 32K * 8B = 240KB initially

}