All Downloads are FREE. Search and download functionalities are using the official Maven repository.

com.aliasi.classify.PerceptronClassifier Maven / Gradle / Ivy

Go to download

This is the original Lingpipe: http://alias-i.com/lingpipe/web/download.html There were not made any changes to the source code.

There is a newer version: 4.1.2-JL1.0
Show newest version
/*
 * LingPipe v. 4.1.0
 * Copyright (C) 2003-2011 Alias-i
 *
 * This program is licensed under the Alias-i Royalty Free License
 * Version 1 WITHOUT ANY WARRANTY, without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the Alias-i
 * Royalty Free License Version 1 for more details.
 *
 * You should have received a copy of the Alias-i Royalty Free License
 * Version 1 along with this program; if not, visit
 * http://alias-i.com/lingpipe/licenses/lingpipe-license-1.txt or contact
 * Alias-i, Inc. at 181 North 11th Street, Suite 401, Brooklyn, NY 11211,
 * +1 (718) 290-9170.
 */

package com.aliasi.classify;

import com.aliasi.corpus.Corpus;
import com.aliasi.corpus.ObjectHandler;

import com.aliasi.features.Features;

import com.aliasi.matrix.KernelFunction;
import com.aliasi.matrix.Vector;

import com.aliasi.symbol.MapSymbolTable;
import com.aliasi.symbol.SymbolTable;

import com.aliasi.util.AbstractExternalizable;
import com.aliasi.util.Arrays;
import com.aliasi.util.Compilable;
import com.aliasi.util.FeatureExtractor;

import java.io.IOException;
import java.io.ObjectInput;
import java.io.ObjectOutput;
import java.io.Serializable;

import java.util.ArrayList;
import java.util.List;
import java.util.HashMap;
import java.util.Map;

/**
 * A PerceptronClassifier implements a binary classifier
 * based on an averaged kernel-based perceptron.  These
 * classifiers are large margin (discriminitive) linear classifiers in
 * a feature space expanded by a plug-and-play kernel implemeting
 * {@link KernelFunction}.
 *
 * 

A perceptron classifier may be applied to any type of object. * An appropriately typed {@link FeatureExtractor} is used to map * these objects to feature vectors for use in the perceptron * classifier. * *

Corpus Training

* *

Unlike the language-model-based classifiers, for which training, * classification and compilation may be interleaved, averaged * perceptron-based classifiers require batch training. This requires * the entire training corpus to be available in one shot. In * particular, training will iterate over a fixed corpus multiple * times before producing a completely trained classifier. * *

The constructor will do the training using the supplied instance of * {@link Corpus}. The constructor will store the entire corpus in * memory in the form of {@link com.aliasi.matrix.SparseFloatVector} and boolean * polarities for the accept/reject decision. The corpus will only be * held locally in the constructor; it is available for garbage * collection, as are all intermediate results, as soon as the * constructor is done training. * *

Kernel Function

* *

The basic (non-kernel) perceptron is equivalent to using the * kernel function {@link com.aliasi.matrix.DotProductKernel}. A good * choice for most classification tasks is the polynomial kernel, * implemented in {@link com.aliasi.matrix.PolynomialKernel}. * Usually, higher polynomial kernel degrees perform dramatically * better than dot products. 3 is a good general starting degree, as * sometimes performance degrades in higher kernel degrees. In * some cases, the Gaussian radial basis kernel implemented in * {@link com.aliasi.matrix.GaussianRadialBasisKernel} works well. * *

If the kernel function is neither serializable nor compilable, * then the resulting perceptron classifier will not be serializable. * *

Training Iterations

* *

More training iterations are usually better for accuracy. As * more basis vectors are added to the perceptron, more memory is * needed for the model and more time is needed for classification. * Typically, the amount of memory required at run time will * stabilize after a few training iterations. * *

Memory Usage: Training and Compiled

* *

The memory usage of a perceptron classifier may be very high. * The perceptron must store every feature vector in the input on * which the classifier being trained made a mistake in some * iteration. During training, every feature vector in the input * corpus must be stored. These are stored in memory as instances of * {@link com.aliasi.matrix.SparseFloatVector}. * *

If the data are linearly separable in the kernel space, * the training process will converge to the point where no additional * basis feature vectors are needed and the result will converge to * using the single final perceptron (which requires all the intermediate * to be stored given the demands of a non-linear kernel calculation). * *

Serialization

* *

After a perceptron classifier is constructed, it may be * serialized to an object output stream. If the underlying * feature extractor is compilable, it's compiled, but if * it's not compilable, it's serialized. To be serializable, * a perceptron classifier requires both its feature extractor * and kernel function to be {@link Serializable}. * *

The object read back in after serialization will * be an instance of PerceptronClassifier. * *

About Averaged Kernel Perceptrons

*

Perceptrons * are a kind of large-margin linear classifier. The * polynomial kernel trick * is used to embed the basic feature vector in a higher-degree vector space * in which the data are more separable. * *

An average of all of the perceptrons created during training is * used for the final prediction. The factored formulation of the * algorithm allows the perceptrons to be expressed as linearly * weighted training samples. * *

Although theoretical bounds are almost equivalent, in practice * averaged perceptrons slightly underperform support * vector machine (SVM) learners over the same polynomial kernels. * The advantage of perceptrons is that they are much more efficient * in time and slightly more efficient in space in practice. * *

Averaged Perceptron Model with Polynomial Kernel

* *

The model used for runtime predictions by the averaged * perceptron is quite straightforward, consisting of a set of * weighted feature vectors (represented as parallel arrays of basis * vectors and weights) and a kernel degree: * *

 * Vector[] basisVectors;
 * int[] weights;
 * int degree;
* * The basis vectors are all vectors derived from single training * examples by the specified feature extractor. The weights may * be positive or negative and represent the cumulative voted * weight of the specified basis vector. * *

The kernel function computes a distance * kernel(v1,v2) between vectors v1 and * v2 in an enhanced feature space defined by the * particular kernel employed. * *

A new input to classify is first converted to a feature * vector by the feature extractor. Classification is then * based on the sign of the following score: * *

 * score(Vector v) = Σi weights[i] * kernel(basisVectors[i],v)
* * An example is accepted if the score of its feature vector is * greater than zero and rejected otherwise. * *

Estimating the Perceptron Model

* * To estimate the perceptron model, we will assume that we have a * training corpus consisting of an array of vectors with boolean * polarities indicating whether they are positive (to accept) or * negative (to reject) examples. We also assume we have a fixed * kernel function. The training method iterates over the corpus a * specified number of times. * *
 * Vector[] basisVectors;
 * int[] incrementalWeights;
 * boolean[] polarities;
 * int degree;
 * int index = -1;
 * for (# iterations)
 *     for (vector,polarity) in training corpus
 *         yHat = scoreIntermediate(vector);
 *         if (yHat > 0 && polarity) || (yHat < 0 && !polarity)
 *             ++incrementalWeights[index];
 *         else
 *             ++index;
 *             basisVectors[index] = vector;
 *             polarities[index] = polarity;
 *             incrementalWeights[index] = 1;
* *
 * scoreIntermediate(vector)
 *   = Σi <= index polarities[i] * kernel(basisVectors[i],vector)
 * 
* * The final weight for a vector is the cumulative weight * computed as follows: * *
 * cumulativeWeight(j) = Σk >= j incrementalWeights[k]
 * 
* * The actual implementations of these methods involve * considerably more indirection and index chasing to avoid * copies and duplication in the final vectors. * *

Historical Notes

* *

The averaged kernel perceptron implemented here was introduced * in the following paper, which also provides error bounds * for learning and evaluations with polynomial kernels of various * degrees: * *

* Freund, Yoav and Robert E. Schapire (1999) * Large margin classification using the perceptron algorithm. * Machine Learning 37(3):277-296. *
* * The basic perceptron model was introduced in: * *
* Block, H.D. (1962) The perceptron: a model for brain functioning. * Reviews of Modern Physics 34:123-135. *
*

The kernel-based perceptron was introduced in: * *

* Aizerman, M.A., E.M. Braverman, and L.I. Rozonoer. 1964. * Theoretical foundations of the potential function method in pattern * recognition learning. Automation and Remote Control. * 25:821-837. *
* * The basis of the voting scheme is a deterministically averaged * version of the randomized approach of adapting online learners * to a batch setup described in the following paper: * *
* Helmbold, D.P. and M.K. Warmuth. (1995) * On weak learning. Journal of Computer and System Sciences * 50:551-573. *
* * @author Bob Carpenter * @version 4.0.0 * @since LingPipe3.1 * @param the type of object being classified */ public class PerceptronClassifier implements ScoredClassifier, Serializable { static final long serialVersionUID = 8752291174601085455L; final FeatureExtractor mFeatureExtractor; final MapSymbolTable mSymbolTable; final KernelFunction mKernelFunction; final Vector[] mBasisVectors; final int[] mBasisWeights; final String mAcceptCategory; final String mRejectCategory; PerceptronClassifier(FeatureExtractor featureExtractor, KernelFunction kernelFunction, MapSymbolTable symbolTable, Vector[] basisVectors, int[] basisWeights, String acceptCategory, String rejectCategory) { mFeatureExtractor = featureExtractor; mKernelFunction = kernelFunction; mBasisVectors = basisVectors; mBasisWeights = basisWeights; mAcceptCategory = acceptCategory; mRejectCategory = rejectCategory; mSymbolTable = symbolTable; } /** * Construct a perceptron classifier from the specified feature extractor, * corpus with designated accept category, polynomial kernel degree and * number of training iterations, and output accept and reject categories. * * @param corpus Corpus to use for training. * @param featureExtractor Feature extractor for objects. * @param corpusAcceptCategory Category in training data to treat as positive. * @param kernelFunction Kernel function for expanding vector basis. * @param numIterations Number of iterations to carry out during training. * @param outputAcceptCategory Category with which to label accepted instances. * @param outputRejectCategory Category with which to label rejected instances. */ public PerceptronClassifier(Corpus>> corpus, FeatureExtractor featureExtractor, KernelFunction kernelFunction, String corpusAcceptCategory, int numIterations, String outputAcceptCategory, String outputRejectCategory) throws IOException { mFeatureExtractor = featureExtractor; mKernelFunction = kernelFunction; mAcceptCategory = outputAcceptCategory; mRejectCategory = outputRejectCategory; mSymbolTable = new MapSymbolTable(); // collect training vectors and categories CorpusCollector collector = new CorpusCollector(); corpus.visitCorpus(collector); Vector[] featureVectors = collector.featureVectors(); boolean[] polarities = collector.polarities(); corpus = null; // don't need it any more // initialize perceptrons int currentPerceptronIndex = -1; // no initial zero perceptron int[] weights = new int[INITIAL_BASIS_SIZE]; int[] basisIndexes = new int[INITIAL_BASIS_SIZE]; for (int iteration = 0; iteration < numIterations; ++iteration) { // System.out.println("\n\nIteration=" + iteration); for (int i = 0; i < featureVectors.length; ++i) { double yHat = prediction(featureVectors[i], featureVectors, polarities, weights, basisIndexes, currentPerceptronIndex); boolean accept = yHat > 0.0; //System.out.println(" yHat=" + yHat // + " accept=" + accept // + " for vect=" + featureVectors[i]); if (accept == polarities[i]) { // System.out.println(" correct"); if (currentPerceptronIndex >= 0) // avoid incrementing zero ++weights[currentPerceptronIndex]; } else { // System.out.println(" incorrect"); ++currentPerceptronIndex; if (currentPerceptronIndex >= weights.length) { weights = Arrays.reallocate(weights); basisIndexes = Arrays.reallocate(basisIndexes); } basisIndexes[currentPerceptronIndex] = i; weights[currentPerceptronIndex] = 1; } } } // renumber indexes to pack only necessary basis vectors Map renumbering = new HashMap(); int next = 0; for (int i = 0; i <= currentPerceptronIndex; ++i) if (!renumbering.containsKey(basisIndexes[i])) renumbering.put(basisIndexes[i],next++); // compute basis vectors and cumulative weight for avg mBasisVectors = new Vector[renumbering.size()]; mBasisWeights = new int[renumbering.size()]; int weightSum = 0; for (int i = currentPerceptronIndex+1; --i >= 0; ) { int oldIndex = basisIndexes[i]; int newIndex = renumbering.get(oldIndex); mBasisVectors[newIndex] = featureVectors[oldIndex]; weightSum += weights[i]; if (polarities[i]) mBasisWeights[newIndex] += weightSum; else mBasisWeights[newIndex] -= weightSum; } } /** * Returns the kernel function for this perceptron. * * @return The kernel function for this perceptron. */ public KernelFunction kernelFunction() { return mKernelFunction; } /** * Returns the feature extractor for this perceptron. * * @return The feature extractor for this perceptron. */ public FeatureExtractor featureExtractor() { return mFeatureExtractor; } /** * Returns a string-based representation of this perceptron. * This may be long, as it outputs every basis vector and weight. * * @return A string-based representation of this perceptron. */ @Override public String toString() { StringBuilder sb = new StringBuilder(); sb.append("Averaged Perceptron"); sb.append(" Kernel Function=" + mKernelFunction + "\n"); for (int i = 0; i < mBasisVectors.length; ++i) sb.append(" idx=" + i + " " + "vec=" + mBasisVectors[i] + " wgt=" + mBasisWeights[i] + "\n"); return sb.toString(); } /** * Return the scored classification for the specified input. The * input is first converted to a feature vector using the feature * extractor, then scored against the perceptron. The resulting * score for the accept category is the perceptron score, and * the resulting score for the reject category is the negative * perceptron score. * * @param in The element to be classified. * @return The scored classification for the specified element. */ public ScoredClassification classify(E in) { Map featureVector = mFeatureExtractor.features(in); Vector inputVector = Features.toVector(featureVector,mSymbolTable,Integer.MAX_VALUE,false); double sum = 0.0; for (int i = mBasisVectors.length; --i >= 0; ) sum += mBasisWeights[i] * mKernelFunction.proximity(mBasisVectors[i], inputVector); return sum > 0 ? new ScoredClassification(new String[] { mAcceptCategory, mRejectCategory }, new double[] { sum, -sum }) : new ScoredClassification(new String[] { mRejectCategory, mAcceptCategory }, new double[] { -sum, sum }); } double prediction(Vector inputVector, Vector[] featureVectors, boolean[] polarities, int[] ignoreMyWeights, int[] basisIndexes, int currentPerceptronIndex) { double sum = 0.0; // int weightSum = 0; int weightSum = 1; for (int i = currentPerceptronIndex; i >= 0; --i) { // weightSum += weights[i]; int index = basisIndexes[i]; double kernel = mKernelFunction.proximity(inputVector,featureVectors[index]); double total = (polarities[i] ? weightSum : -weightSum) * kernel; sum += total; } return sum; } static double power(double base, int exponent) { switch (exponent) { case 0: return 1.0; case 1: return base; case 2: return base * base; case 3: return base * base * base; case 4: return base * base * base * base; default: return java.lang.Math.pow(base,exponent); } } private Object writeReplace() { return new Externalizer(this); } class CorpusCollector implements ObjectHandler> { final List mInputFeatureVectorList = new ArrayList(); final List mInputAcceptList = new ArrayList(); public void handle(Classified classified) { E object = classified.getObject(); Classification c = classified.getClassification(); Map featureMap = mFeatureExtractor.features(object); mInputFeatureVectorList.add(Features.toVectorAddSymbols(featureMap,mSymbolTable,Integer.MAX_VALUE,false)); mInputAcceptList.add(mAcceptCategory.equals(c.bestCategory()) ? Boolean.TRUE : Boolean.FALSE); } Vector[] featureVectors() { return mInputFeatureVectorList.toArray(EMPTY_SPARSE_FLOAT_VECTOR_ARRAY); } boolean[] polarities() { boolean[] categories = new boolean[mInputAcceptList.size()]; for (int i = 0; i < categories.length; ++i) categories[i] = mInputAcceptList.get(i).booleanValue(); return categories; } } static final Vector[] EMPTY_SPARSE_FLOAT_VECTOR_ARRAY = new Vector[0]; static class Externalizer extends AbstractExternalizable { static final long serialVersionUID = -1901362811305741506L; final PerceptronClassifier mClassifier; public Externalizer() { this(null); } public Externalizer(PerceptronClassifier classifier) { mClassifier = classifier; } @Override @SuppressWarnings("deprecation") public Object read(ObjectInput in) throws ClassNotFoundException, IOException { // required for read object @SuppressWarnings("unchecked") FeatureExtractor featureExtractor = (FeatureExtractor) in.readObject(); KernelFunction kernelFunction = (KernelFunction) in.readObject(); MapSymbolTable symbolTable = (MapSymbolTable) in.readObject(); int basisLen = in.readInt(); Vector[] basisVectors = new Vector[basisLen]; for (int i = 0; i < basisLen; ++i) basisVectors[i] = (Vector) in.readObject(); int[] basisWeights = new int[basisLen]; for (int i = 0; i < basisLen; ++i) basisWeights[i] = in.readInt(); String acceptCategory = in.readUTF(); String rejectCategory = in.readUTF(); return new PerceptronClassifier(featureExtractor, kernelFunction, symbolTable, basisVectors, basisWeights, acceptCategory, rejectCategory); } @Override public void writeExternal(ObjectOutput out) throws IOException { AbstractExternalizable.compileOrSerialize(mClassifier.mFeatureExtractor,out); AbstractExternalizable.compileOrSerialize(mClassifier.mKernelFunction,out); // symbol table out.writeObject(mClassifier.mSymbolTable); // basis length out.writeInt(mClassifier.mBasisVectors.length); // basis vectors for (int i = 0; i < mClassifier.mBasisVectors.length; ++i) out.writeObject(mClassifier.mBasisVectors[i]); // basis weights for (int i = 0; i < mClassifier.mBasisWeights.length; ++i) out.writeInt(mClassifier.mBasisWeights[i]); // accept, reject cats out.writeUTF(mClassifier.mAcceptCategory); out.writeUTF(mClassifier.mRejectCategory); } } static final int INITIAL_BASIS_SIZE = 32*1024; // 32K * 8B = 240KB initially }




© 2015 - 2025 Weber Informatics LLC | Privacy Policy