com.aliasi.classify.PrecisionRecallEvaluation Maven / Gradle / Ivy
Show all versions of aliasi-lingpipe Show documentation
/* * LingPipe v. 4.1.0 * Copyright (C) 2003-2011 Alias-i * * This program is licensed under the Alias-i Royalty Free License * Version 1 WITHOUT ANY WARRANTY, without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the Alias-i * Royalty Free License Version 1 for more details. * * You should have received a copy of the Alias-i Royalty Free License * Version 1 along with this program; if not, visit * http://alias-i.com/lingpipe/licenses/lingpipe-license-1.txt or contact * Alias-i, Inc. at 181 North 11th Street, Suite 401, Brooklyn, NY 11211, * +1 (718) 290-9170. */ package com.aliasi.classify; import com.aliasi.stats.Statistics; /** * A
* * This class derives its name from the following four statistics, * which are illustrated in the four tables. * *PrecisionRecallEvaluation
collects and reports a * suite of descriptive statistics for binary classification tasks. * The basis of a precision recall evaluation is a matrix of counts * of reference and response classifications. Each cell in the matrix * corresponds to a method returning a long integer count. * ** ** * The most basic statistic is accuracy, which is the number of * correct responses divided by the total number of cases. * **
* ** * Response *Reference Totals ** *true *false * Refer
-encetrue *{@link #truePositive()} (TP) {@link #falseNegative} (FN) *{@link #positiveReference} (TP+FN) ** false *{@link #falsePositive()} (FP) {@link #trueNegative()} (TN) *{@link #negativeReference()} (FP+TN) ** Response Totals {@link #positiveResponse()} (TP+FP) *{@link #negativeResponse()} (FN+TN) *{@link #total()} (TP+FN+FP+TN) ** accuracy()
* = correct() / total() ** ** recall() * = truePositive() / positiveReference() *
* ** precision() * = truePositive() / positiveResponse() *
* ** rejectionRecall() * = trueNegative() / negativeReference() *
* * Each measure is defined to be the green count divided by the green * plus red count in the corresponding table: * ** rejectionPrecision() * = trueNegative() / negativeResponse() *
* ** * This picture clearly illustrates the relevant * dualities. Precision is the dual to recall if the reference and * response are switched (the matrix is transposed). Similarly, * rejection recall is dual to recall with true and false labels * switched (reflection around each axis in turn); rejection precision is * similarly dual to precision. * ** *
** * * *
* ** * Recall * *Response * *True *False * Refer
-enceTrue *+ - * False ** * *
* ** * Precision * *Response * *True *False * Refer
-enceTrue *+ * False *- * * *
* ** * Rejection *
Recall *Response * *True *False * Refer
-enceTrue ** False *- + * * *
* ** * Rejection *
Precision *Response * *True *False * Refer
-enceTrue *- * False *+ Precision and recall may be combined by weighted geometric * averaging by using the f-measure statistic, with *
β
between 0 and infinity being the relative * weight of precision, with 1 being a neutral value. ** ** fMeasure() = fMeasure(1) *
* ** fMeasure(β) * = (1 + β2) * * {@link #precision()} * * {@link #recall()} * / ({@link #recall()} + β2 * {@link #precision()}) *
There are four traditional measures of binary classification, * which are as follows. * *
* ** fowlkesMallows() * = truePositive() / (precision() * recall())(1/2) *
* ** jaccardCoefficient() * = truePositive() / (total() - trueNegative()) *
** yulesQ() * = (truePositive() * trueNegative() - falsePositive() * falseNegative()) * / (truePositive() * trueNegative() + falsePositive() * falsePositive()) *
* ** yulesY() * = ((truePositive() * trueNegative())(1/2) * - (falsePositive() * falseNegative())(1/2)) *
/ ((truePositive() * trueNegative())(1/2) + (falsePositive() * falsePositive())(1/2)) *Replacing precision and recall with their definitions, *
TP/(TP+FP)
andTP/(TP+FN)
: * * ** F1 * = 2 * (TP/(TP+FP)) * (TP/(TP+FN)) * / (TP/(TP+FP) + TP/(TP+FN)) * = 2 * (TP*TP / (TP+FP)(TP+FN)) * / (TP*(TP+FN)/(TP+FP)(TP+FN) + TP*(TP+FP)/(TP+FN)(TP+FP)) * = 2 * (TP / (TP+FP)(TP+FN)) * / ((TP+FN)/(TP+FP)(TP+FN) + (TP+FP)/(TP+FN)(TP+FP)) * = 2 * TP / * / ((TP+FN) + (TP+FP)) * = 2*TP / (2*TP + FP + FN)* * Thus the F1-measure is very closely related to the Jaccard * coefficient,TP/(TP+FP+FN)
. Like the Jaccard * coefficient, the F measure does not vary with varying true * negative counts. Rejection precision and recall do vary with * changes in true negative count. * *Basic reference and response likelihoods are computed by * frequency. * *
* ** referenceLikelihood() = positiveReference() / total() *
* * An algorithm that chose responses at random according to the * response likelihood would have the following accuracy against * test cases chosen at random according to the reference likelihood: * ** responseLikelihood() = positiveResponse() / total() *
* * The two summands arise from the likelihood of true positive and the * likelihood of a true negative. From random accuracy, the * κ-statistic is defined by dividing out the random accuracy * from the accuracy, in some way giving a measure of performance * above a baseline expectation. * ** randomAccuracy() * = referenceLikelihood() * responseLikelihood() * + (1 - referenceLikelihood()) * (1 - responseLikelihood()) *
* ** kappa() * = kappa(accuracy(),randomAccuracy()) *
* ** kappa(p,e) * = (p - e) / (1 - e) *
There are two alternative forms of the κ-statistic, both * of which attempt to correct for putative bias in the estimation of * random accuracy. The first involves computing the random accuracy * by taking the average of the reference and response likelihoods to * be the baseline reference and response likelihood, and squaring the * result to get the so-called unbiased random accuracy and the * unbiased κ-statistic: *
* ** randomAccuracyUnbiased() * = avgLikelihood()2 * + (1 - avgLikelihood())2 *
* avgLikelihood() = (referenceLikelihood() + responseLikelihood()) / 2 ** ** kappaUnbiased() * = kappa(accuracy(),randomAccuracyUnbiased()) *
Kappa can also be adjusted for the prevalence of positive * reference cases, which leads to the following simple definition: * *
* ** kappaNoPrevalence() * = (2 * accuracy()) - 1 *
Pearson's C2 statistic is provided by * the following method: * *
* ** chiSquared() * = total() * phiSquared() *
* ** phiSquared() * = ((truePositive()*trueNegative()) * (falsePositive()*falseNegative()))2 *
/ ((truePositive()+falseNegative()) * (falsePositive()+trueNegative()) * (truePositive()+falsePositive()) * (falseNegative()+trueNegative())) *The accuracy deviation is the deviation of the average number of * positive cases in a binomial distribution with accuracy equal to * the classification accuracy and number of trials equal to the total * number of cases. * *
* * This number can be used to provide error intervals around the * accuracy results. * ** accuracyDeviation() * = (accuracy() * (1 - accuracy()) / total())(1/2) *
Using the following three tables as examples: * *
** * The various statistics evaluate to the following values: * **
* ** ** * **
** Cab-vs-All * * Response * *Cab *Other * Refer
-enceCab *9 3 * Other *4 11 * * **
** Syrah-vs-All * * Response * *Syrah *Other * Refer
-enceSyrah *5 4 * Other *4 14 * * **
** Pinot-vs-All * * Response * *Pinot *Other * Refer
-encePinot *4 2 * Other *1 20 ** * @author Bob Carpenter * @version 2.1 * @since LingPipe2.1 */ public class PrecisionRecallEvaluation { private long mTP; private long mFP; private long mTN; private long mFN; /** * Construct a precision-recall evaluation with all counts set to * zero. */ public PrecisionRecallEvaluation() { this(0,0,0,0); } /** * Construction a precision-recall evaluation initialized with the * specified counts. * * @param tp True positive count. * @param fn False negative count. * @param fp False positive count. * @param tn True negative count. * @throws IllegalArgumentException If any of the counts are * negative. */ public PrecisionRecallEvaluation(long tp, long fn, long fp, long tn) { validateCount("tp",tp); validateCount("fp",fp); validateCount("tn",tn); validateCount("fn",fn); mTP = tp; mFP = fp; mTN = tn; mFN = fn; } /** * Adds a case with the specified reference and response * classifications. * * @param reference Reference classification. * @param response Response classification. */ public void addCase(boolean reference, boolean response) { if (reference && response) ++mTP; else if (reference && (!response)) ++mFN; else if ((!reference) && response) ++mFP; else ++mTN; } void addCase(boolean reference, boolean response, int count) { if (reference && response) mTP += count; else if (reference && (!response)) mFN += count; else if ((!reference) && response) mFP += count; else mTN += count; } /** * Returns the number of true positive cases. A true positive * is where both the reference and response are true. * * @return The number of true positives. */ public long truePositive() { return mTP; } /** * Returns the number of false positive cases. A false positive * is where the reference is false and response is true. * * @return The number of false positives. */ public long falsePositive() { return mFP; } /** * Returns the number of true negative cases. A true negative * is where both the reference and response are false. * * @return The number of true negatives. */ public long trueNegative() { return mTN; } /** * Returns the number of false negative cases. A false negative * is where the reference is true and response is false. * * @return The number of false negatives. */ public long falseNegative() { return mFN; } /** * Returns the number of positive reference cases. A positive * reference case is one where the reference is true. * * @return The number of positive references. */ public long positiveReference() { return truePositive() + falseNegative(); } /** * Returns the number of negative reference cases. A negative * reference case is one where the reference is false. * * @return The number of negative references. */ public long negativeReference() { return trueNegative() + falsePositive(); } /** * Returns the sample reference likelihood, or prevalence, which * is the number of positive references divided * by the total * number of cases. * * @return The sample reference likelihood. */ public double referenceLikelihood() { return div(positiveReference(), total()); } /** * Returns the number of positive response cases. A positive * response case is one where the response is true. * * @return The number of positive responses. */ public long positiveResponse() { return truePositive() + falsePositive(); } /** * Returns the number of negative response cases. A negative * response case is one where the response is false. * * @return The number of negative responses. */ public long negativeResponse() { return trueNegative() + falseNegative(); } /** * Returns the sample response likelihood, which is the number of * positive responses divided by the total number of cases. * * @return The sample response likelihood. */ public double responseLikelihood() { return div(positiveResponse(), total()); } /** * Returns the number of cases where the response is correct. A * correct response is one where the reference and response are * the same. * * @return The number of correct responses. */ public long correctResponse() { return truePositive() + trueNegative(); } /** * Returns the number of cases where the response is incorrect. * An incorrect response is one where the reference and response * are different. * * @return The number of incorrect responses. */ public long incorrectResponse() { return falsePositive() + falseNegative(); } /** * Returns the total number of cases. * * @return The total number of cases. */ public long total() { return mTP + mFP + mTN + mFN; } /** * Returns the sample accuracy of the responses. The accuracy is * just the number of correct responses divided by the total number * of respones. * * @return The sample accuracy. */ public double accuracy() { return div(correctResponse(), total()); } /** * Returns the recall. The recall is the number of true positives * divided by the number of positive references. This is the * fraction of positive reference cases that were found by the * classifier. * * @return The recall value. */ public double recall() { return div(truePositive(), positiveReference()); } /** * Returns the precision. The precision is the number of true * positives divided by the number of positive respones. This is * the fraction of positive responses returned by the classifier * that were correct. * * @return The precision value. */ public double precision() { return div(truePositive(), positiveResponse()); } /** * Returns the rejection recall, or specificity, value. * The rejection recall is the percentage of negative references * that had negative respones. * * @return The rejection recall value. */ public double rejectionRecall() { return div(trueNegative(), negativeReference()); } /** * Returns the rejection prection, or selectivity, value. * The rejection precision is the percentage of negative responses * that were negative references. * * @return The rejection precision value. */ public double rejectionPrecision() { return div(trueNegative(), negativeResponse()); } /** * Returns the F1 measure. This is the * result of applying the method {@link #fMeasure(double)} to **
** Method *Cabernet *Syrah *Pinot * {@link #positiveReference()} *12 9 6 * {@link #negativeReference()} *15 18 21 * {@link #positiveResponse()} *13 9 5 * {@link #negativeResponse()} *14 18 22 * {@link #correctResponse()} *20 19 24 * {@link #total()} *27 27 27 * {@link #accuracy()} *0.7407 0.7037 0.8889 * {@link #recall()} *0.7500 0.5555 0.6666 * {@link #precision()} *0.6923 0.5555 0.8000 * {@link #rejectionRecall()} *0.7333 0.7778 0.9524 * {@link #rejectionPrecision()} *0.7858 0.7778 0.9091 * {@link #fMeasure()} *0.7200 0.5555 0.7272 * {@link #fowlkesMallows()} *12.49 9.00 5.48 * {@link #jaccardCoefficient()} *0.5625 0.3846 0.5714 * {@link #yulesQ()} *0.7838 0.6279 0.9512 * {@link #yulesY()} *0.4835 0.3531 0.7269 * {@link #referenceLikelihood()} *0.4444 0.3333 0.2222 * {@link #responseLikelihood()} *0.4815 0.3333 0.1852 * {@link #randomAccuracy()} *0.5021 0.5556 0.6749 * {@link #kappa()} *0.4792 0.3333 0.6583 * {@link #randomAccuracyUnbiased()} *0.5027 0.5556 0.6756 * {@link #kappaUnbiased()} *0.4789 0.3333 0.6575 * {@link #kappaNoPrevalence()} *0.4814 0.4074 0.7778 * {@link #chiSquared()} *6.2382 3.0000 11.8519 * {@link #phiSquared()} *0.2310 0.1111 0.4390 * {@link #accuracyDeviation()} *0.0843 0.0879 0.0605 1
. of the method * * * @return The F1 measure. */ public double fMeasure() { return fMeasure(1.0); } /** * Returns theFβ
value for * the specifiedβ
. * * @param beta Theβ
parameter. * @return TheFβ
value. */ public double fMeasure(double beta) { return fMeasure(beta,recall(),precision()); } /** * Returns the Jaccard coefficient. * * @return The Jaccard coefficient. */ public double jaccardCoefficient() { return div(truePositive(), truePositive() + falseNegative() + falsePositive()); } /** * Returns the χ2 value. * * @return The χ2 value. */ public double chiSquared() { double tp = truePositive(); double tn = trueNegative(); double fp = falsePositive(); double fn = falseNegative(); double tot = total(); double diff = tp * tn - fp * fn; return tot * diff * diff / ((tp + fn) * (fp + tn) * (tp + fp) * (fn + tn)); } /** * Returns the φ2 value. * * @return The φ2 value. */ public double phiSquared() { return chiSquared() / (double) total(); } /** * Return the value of Yule's Q statistic. * * @return The value of Yule's Q statistic. */ public double yulesQ() { double tp = truePositive(); double tn = trueNegative(); double fp = falsePositive(); double fn = falseNegative(); return (tp*tn - fp*fn) / (tp*tn + fp*fn); } /** * Return the value of Yule's Y statistic. * * @return The value of Yule's Y statistic. */ public double yulesY() { double tp = truePositive(); double tn = trueNegative(); double fp = falsePositive(); double fn = falseNegative(); return (Math.sqrt(tp*tn) - Math.sqrt(fp*fn)) / (Math.sqrt(tp*tn) + Math.sqrt(fp*fn)); } /** * Return the Fowlkes-Mallows score. * * @return The Fowlkes-Mallows score. */ public double fowlkesMallows() { double tp = truePositive(); return tp / Math.sqrt(precision() * recall()); } /** * Returns the standard deviation of the accuracy. This is * computed as the deviation of an equivalent accuracy generated * by a binomial distribution, which is just a sequence of * Bernoulli (binary) trials. * * @return The standard deviation of the accuracy. */ public double accuracyDeviation() { // e.g. p = 0.05 for a 5% conf interval double p = accuracy(); double total = total(); double variance = p * (1.0 - p) / total; return Math.sqrt(variance); } /** * The probability that the reference and response are the same if * they are generated randomly according to the reference and * response likelihoods. * * @return The accuracy of a random classifier. */ public double randomAccuracy() { double ref = referenceLikelihood(); double resp = responseLikelihood(); return ref * resp + (1.0 - ref) * (1.0 - resp); } /** * The probability that the reference and the response are the same * if the reference and response likelihoods are both the average * of the sample reference and response likelihoods. * * @return The unbiased random accuracy. */ public double randomAccuracyUnbiased() { double avg = (referenceLikelihood() + responseLikelihood()) / 2.0; return avg * avg + (1.0 - avg) * (1.0 - avg); } /** * Returns the value of the kappa statistic. * * @return The value of the kappa statistic. */ public double kappa() { return Statistics.kappa(accuracy(),randomAccuracy()); } /** * Returns the value of the unbiased kappa statistic. * * @return The value of the unbiased kappa statistic. */ public double kappaUnbiased() { return Statistics.kappa(accuracy(),randomAccuracyUnbiased()); } /** * Returns the value of the kappa statistic adjusted for * prevalence. * * @return The value of the kappa statistic adjusted for * prevalence. */ public double kappaNoPrevalence() { return 2.0 * accuracy() - 1.0; } /** * Returns a string-based representation of this evaluation. * * @return A string-based representation of this evaluation. */ @Override public String toString() { StringBuilder sb = new StringBuilder(2048); sb.append(" Total=" + total() + '\n'); sb.append(" True Positive=" + truePositive() + '\n'); sb.append(" False Negative=" + falseNegative() + '\n'); sb.append(" False Positive=" + falsePositive() + '\n'); sb.append(" True Negative=" + trueNegative() + '\n'); sb.append(" Positive Reference=" + positiveReference() + '\n'); sb.append(" Positive Response=" + positiveResponse() + '\n'); sb.append(" Negative Reference=" + negativeReference() + '\n'); sb.append(" Negative Response=" + negativeResponse() + '\n'); sb.append(" Accuracy=" + accuracy() + '\n'); sb.append(" Recall=" + recall() + '\n'); sb.append(" Precision=" + precision() + '\n'); sb.append(" Rejection Recall=" + rejectionRecall() + '\n'); sb.append(" Rejection Precision=" + rejectionPrecision() + '\n'); sb.append(" F(1)=" + fMeasure(1) + '\n'); sb.append(" Fowlkes-Mallows=" + fowlkesMallows() + '\n'); sb.append(" Jaccard Coefficient=" + jaccardCoefficient() + '\n'); sb.append(" Yule's Q=" + yulesQ() + '\n'); sb.append(" Yule's Y=" + yulesY() + '\n'); sb.append(" Reference Likelihood=" + referenceLikelihood() + '\n'); sb.append(" Response Likelihood=" + responseLikelihood() + '\n'); sb.append(" Random Accuracy=" + randomAccuracy() + '\n'); sb.append(" Random Accuracy Unbiased=" + randomAccuracyUnbiased() + '\n'); sb.append(" kappa=" + kappa() + '\n'); sb.append(" kappa Unbiased=" + kappaUnbiased() + '\n'); sb.append(" kappa No Prevalence=" + kappaNoPrevalence() + '\n'); sb.append(" chi Squared=" + chiSquared() + '\n'); sb.append(" phi Squared=" + phiSquared() + '\n'); sb.append(" Accuracy Deviation=" + accuracyDeviation()); return sb.toString(); } /** * Returns the Fβ measure for * a specified β, recall and precision values. * * @param beta Relative weighting of precision. * @param recall Recall value. * @param precision Precision value. * @return The Fβ measure. */ public static double fMeasure(double beta, double recall, double precision) { double betaSq = beta * beta; return (1.0 + betaSq) * recall * precision / (recall + (betaSq*precision)); } private static void validateCount(String countName, long val) { if (val < 0) { String msg = "Count must be non-negative." + " Found " + countName + "=" + val; throw new IllegalArgumentException(msg); } } static double div(double x, double y) { return x/y; } }