All Downloads are FREE. Search and download functionalities are using the official Maven repository.

com.aliasi.classify.PrecisionRecallEvaluation Maven / Gradle / Ivy

Go to download

This is the original Lingpipe: http://alias-i.com/lingpipe/web/download.html There were not made any changes to the source code.

There is a newer version: 4.1.2-JL1.0
Show newest version
/*
 * LingPipe v. 4.1.0
 * Copyright (C) 2003-2011 Alias-i
 *
 * This program is licensed under the Alias-i Royalty Free License
 * Version 1 WITHOUT ANY WARRANTY, without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the Alias-i
 * Royalty Free License Version 1 for more details.
 * 
 * You should have received a copy of the Alias-i Royalty Free License
 * Version 1 along with this program; if not, visit
 * http://alias-i.com/lingpipe/licenses/lingpipe-license-1.txt or contact
 * Alias-i, Inc. at 181 North 11th Street, Suite 401, Brooklyn, NY 11211,
 * +1 (718) 290-9170.
 */

package com.aliasi.classify;

import com.aliasi.stats.Statistics;

/**
 * A PrecisionRecallEvaluation collects and reports a
 * suite of descriptive statistics for binary classification tasks.
 * The basis of a precision recall evaluation is a matrix of counts
 * of reference and response classifications.  Each cell in the matrix
 * corresponds to a method returning a long integer count.
 *
 * 
* * * * * * * * * * * * * * * * * * * * * *
 ResponseReference Totals
truefalse
Refer
-ence
true{@link #truePositive()} (TP){@link #falseNegative} (FN){@link #positiveReference} (TP+FN)
false{@link #falsePositive()} (FP){@link #trueNegative()} (TN){@link #negativeReference()} (FP+TN)
Response Totals{@link #positiveResponse()} (TP+FP){@link #negativeResponse()} (FN+TN){@link #total()} (TP+FN+FP+TN)
*
*
* * The most basic statistic is accuracy, which is the number of * correct responses divided by the total number of cases. * *
* accuracy() * = correct() / total() *
* * This class derives its name from the following four statistics, * which are illustrated in the four tables. * *
* recall() * = truePositive() / positiveReference() *
* *
* precision() * = truePositive() / positiveResponse() *
* *
* rejectionRecall() * = trueNegative() / negativeReference() *
* *
* rejectionPrecision() * = trueNegative() / negativeResponse() *
* * Each measure is defined to be the green count divided by the green * plus red count in the corresponding table: * *
* * * * *
* * * * * * * * * * * *
* Recall * Response
TrueFalse
Refer
-ence
True+-
False  
* *
* * * * * * * * * * * *
* Precision * Response
TrueFalse
Refer
-ence
True+ 
False- 
* *
* * * * * * * * * * * *
* Rejection
Recall
*
Response
TrueFalse
Refer
-ence
True  
False-+
* *
* * * * * * * * * * * *
* Rejection
Precision
*
Response
TrueFalse
Refer
-ence
True -
False +
* *
*
* * This picture clearly illustrates the relevant * dualities. Precision is the dual to recall if the reference and * response are switched (the matrix is transposed). Similarly, * rejection recall is dual to recall with true and false labels * switched (reflection around each axis in turn); rejection precision is * similarly dual to precision. * *

Precision and recall may be combined by weighted geometric * averaging by using the f-measure statistic, with * β between 0 and infinity being the relative * weight of precision, with 1 being a neutral value. *

* fMeasure() = fMeasure(1) *
* *
* fMeasure(β) * = (1 + β2) * * {@link #precision()} * * {@link #recall()} * / ({@link #recall()} + β2 * {@link #precision()}) *
* *

There are four traditional measures of binary classification, * which are as follows. * *

* fowlkesMallows() * = truePositive() / (precision() * recall())(1/2) *
* *
* jaccardCoefficient() * = truePositive() / (total() - trueNegative()) *
* *
* yulesQ() * = (truePositive() * trueNegative() - falsePositive() * falseNegative()) * / (truePositive() * trueNegative() + falsePositive() * falsePositive()) *
*
* yulesY() * = ((truePositive() * trueNegative())(1/2) * - (falsePositive() * falseNegative())(1/2)) *
/ ((truePositive() * trueNegative())(1/2) + (falsePositive() * falsePositive())(1/2)) *
* *

Replacing precision and recall with their definitions, * TP/(TP+FP) and TP/(TP+FN): * * *

 *      F1
 *      = 2 * (TP/(TP+FP)) * (TP/(TP+FN)) 
 *        / (TP/(TP+FP) + TP/(TP+FN))     
 *      = 2 * (TP*TP / (TP+FP)(TP+FN))
 *        / (TP*(TP+FN)/(TP+FP)(TP+FN) + TP*(TP+FP)/(TP+FN)(TP+FP))
 *      = 2 * (TP / (TP+FP)(TP+FN))
 *        / ((TP+FN)/(TP+FP)(TP+FN) + (TP+FP)/(TP+FN)(TP+FP))
 *      = 2 * TP / 
 *        / ((TP+FN) + (TP+FP))
 *      = 2*TP / (2*TP + FP + FN)
* * Thus the F1-measure is very closely related to the Jaccard * coefficient, TP/(TP+FP+FN). Like the Jaccard * coefficient, the F measure does not vary with varying true * negative counts. Rejection precision and recall do vary with * changes in true negative count. * *

Basic reference and response likelihoods are computed by * frequency. * *

* referenceLikelihood() = positiveReference() / total() *
* *
* responseLikelihood() = positiveResponse() / total() *
* * An algorithm that chose responses at random according to the * response likelihood would have the following accuracy against * test cases chosen at random according to the reference likelihood: * *
* randomAccuracy() * = referenceLikelihood() * responseLikelihood() * + (1 - referenceLikelihood()) * (1 - responseLikelihood()) *
* * The two summands arise from the likelihood of true positive and the * likelihood of a true negative. From random accuracy, the * κ-statistic is defined by dividing out the random accuracy * from the accuracy, in some way giving a measure of performance * above a baseline expectation. * *
* kappa() * = kappa(accuracy(),randomAccuracy()) *
* *
* kappa(p,e) * = (p - e) / (1 - e) *
* *

There are two alternative forms of the κ-statistic, both * of which attempt to correct for putative bias in the estimation of * random accuracy. The first involves computing the random accuracy * by taking the average of the reference and response likelihoods to * be the baseline reference and response likelihood, and squaring the * result to get the so-called unbiased random accuracy and the * unbiased κ-statistic: *

* randomAccuracyUnbiased() * = avgLikelihood()2 * + (1 - avgLikelihood())2 *
* avgLikelihood() = (referenceLikelihood() + responseLikelihood()) / 2 *
* *
* kappaUnbiased() * = kappa(accuracy(),randomAccuracyUnbiased()) *
* *

Kappa can also be adjusted for the prevalence of positive * reference cases, which leads to the following simple definition: * *

* kappaNoPrevalence() * = (2 * accuracy()) - 1 *
* *

Pearson's C2 statistic is provided by * the following method: * *

* chiSquared() * = total() * phiSquared() *
* *
* phiSquared() * = ((truePositive()*trueNegative()) * (falsePositive()*falseNegative()))2 *
/ ((truePositive()+falseNegative()) * (falsePositive()+trueNegative()) * (truePositive()+falsePositive()) * (falseNegative()+trueNegative())) *
* *

The accuracy deviation is the deviation of the average number of * positive cases in a binomial distribution with accuracy equal to * the classification accuracy and number of trials equal to the total * number of cases. * *

* accuracyDeviation() * = (accuracy() * (1 - accuracy()) / total())(1/2) *
* * This number can be used to provide error intervals around the * accuracy results. * *

Using the following three tables as examples: * *

* * * * * * * * * *
* * * * * * * * * * * *
Cab-vs-All
 Response
CabOther
Refer
-ence
Cab93
Other411
*
* * * * * * * * * * * *
Syrah-vs-All
 Response
SyrahOther
Refer
-ence
Syrah54
Other414
*
* * * * * * * * * * * *
Pinot-vs-All
 Response
PinotOther
Refer
-ence
Pinot42
Other120
*
* *
* * The various statistics evaluate to the following values: * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
MethodCabernetSyrahPinot
{@link #positiveReference()}1296
{@link #negativeReference()}151821
{@link #positiveResponse()}1395
{@link #negativeResponse()}141822
{@link #correctResponse()}201924
{@link #total()}272727
{@link #accuracy()}0.74070.70370.8889
{@link #recall()}0.75000.55550.6666
{@link #precision()}0.69230.55550.8000
{@link #rejectionRecall()}0.73330.77780.9524
{@link #rejectionPrecision()}0.78580.77780.9091
{@link #fMeasure()}0.72000.55550.7272
{@link #fowlkesMallows()}12.499.005.48
{@link #jaccardCoefficient()}0.56250.38460.5714
{@link #yulesQ()}0.78380.62790.9512
{@link #yulesY()}0.48350.35310.7269
{@link #referenceLikelihood()}0.44440.33330.2222
{@link #responseLikelihood()}0.48150.33330.1852
{@link #randomAccuracy()}0.50210.55560.6749
{@link #kappa()}0.47920.33330.6583
{@link #randomAccuracyUnbiased()}0.50270.55560.6756
{@link #kappaUnbiased()}0.47890.33330.6575
{@link #kappaNoPrevalence()}0.48140.40740.7778
{@link #chiSquared()}6.23823.000011.8519
{@link #phiSquared()}0.23100.11110.4390
{@link #accuracyDeviation()}0.08430.08790.0605
*
* * @author Bob Carpenter * @version 2.1 * @since LingPipe2.1 */ public class PrecisionRecallEvaluation { private long mTP; private long mFP; private long mTN; private long mFN; /** * Construct a precision-recall evaluation with all counts set to * zero. */ public PrecisionRecallEvaluation() { this(0,0,0,0); } /** * Construction a precision-recall evaluation initialized with the * specified counts. * * @param tp True positive count. * @param fn False negative count. * @param fp False positive count. * @param tn True negative count. * @throws IllegalArgumentException If any of the counts are * negative. */ public PrecisionRecallEvaluation(long tp, long fn, long fp, long tn) { validateCount("tp",tp); validateCount("fp",fp); validateCount("tn",tn); validateCount("fn",fn); mTP = tp; mFP = fp; mTN = tn; mFN = fn; } /** * Adds a case with the specified reference and response * classifications. * * @param reference Reference classification. * @param response Response classification. */ public void addCase(boolean reference, boolean response) { if (reference && response) ++mTP; else if (reference && (!response)) ++mFN; else if ((!reference) && response) ++mFP; else ++mTN; } void addCase(boolean reference, boolean response, int count) { if (reference && response) mTP += count; else if (reference && (!response)) mFN += count; else if ((!reference) && response) mFP += count; else mTN += count; } /** * Returns the number of true positive cases. A true positive * is where both the reference and response are true. * * @return The number of true positives. */ public long truePositive() { return mTP; } /** * Returns the number of false positive cases. A false positive * is where the reference is false and response is true. * * @return The number of false positives. */ public long falsePositive() { return mFP; } /** * Returns the number of true negative cases. A true negative * is where both the reference and response are false. * * @return The number of true negatives. */ public long trueNegative() { return mTN; } /** * Returns the number of false negative cases. A false negative * is where the reference is true and response is false. * * @return The number of false negatives. */ public long falseNegative() { return mFN; } /** * Returns the number of positive reference cases. A positive * reference case is one where the reference is true. * * @return The number of positive references. */ public long positiveReference() { return truePositive() + falseNegative(); } /** * Returns the number of negative reference cases. A negative * reference case is one where the reference is false. * * @return The number of negative references. */ public long negativeReference() { return trueNegative() + falsePositive(); } /** * Returns the sample reference likelihood, or prevalence, which * is the number of positive references divided * by the total * number of cases. * * @return The sample reference likelihood. */ public double referenceLikelihood() { return div(positiveReference(), total()); } /** * Returns the number of positive response cases. A positive * response case is one where the response is true. * * @return The number of positive responses. */ public long positiveResponse() { return truePositive() + falsePositive(); } /** * Returns the number of negative response cases. A negative * response case is one where the response is false. * * @return The number of negative responses. */ public long negativeResponse() { return trueNegative() + falseNegative(); } /** * Returns the sample response likelihood, which is the number of * positive responses divided by the total number of cases. * * @return The sample response likelihood. */ public double responseLikelihood() { return div(positiveResponse(), total()); } /** * Returns the number of cases where the response is correct. A * correct response is one where the reference and response are * the same. * * @return The number of correct responses. */ public long correctResponse() { return truePositive() + trueNegative(); } /** * Returns the number of cases where the response is incorrect. * An incorrect response is one where the reference and response * are different. * * @return The number of incorrect responses. */ public long incorrectResponse() { return falsePositive() + falseNegative(); } /** * Returns the total number of cases. * * @return The total number of cases. */ public long total() { return mTP + mFP + mTN + mFN; } /** * Returns the sample accuracy of the responses. The accuracy is * just the number of correct responses divided by the total number * of respones. * * @return The sample accuracy. */ public double accuracy() { return div(correctResponse(), total()); } /** * Returns the recall. The recall is the number of true positives * divided by the number of positive references. This is the * fraction of positive reference cases that were found by the * classifier. * * @return The recall value. */ public double recall() { return div(truePositive(), positiveReference()); } /** * Returns the precision. The precision is the number of true * positives divided by the number of positive respones. This is * the fraction of positive responses returned by the classifier * that were correct. * * @return The precision value. */ public double precision() { return div(truePositive(), positiveResponse()); } /** * Returns the rejection recall, or specificity, value. * The rejection recall is the percentage of negative references * that had negative respones. * * @return The rejection recall value. */ public double rejectionRecall() { return div(trueNegative(), negativeReference()); } /** * Returns the rejection prection, or selectivity, value. * The rejection precision is the percentage of negative responses * that were negative references. * * @return The rejection precision value. */ public double rejectionPrecision() { return div(trueNegative(), negativeResponse()); } /** * Returns the F1 measure. This is the * result of applying the method {@link #fMeasure(double)} to * 1. of the method * * * @return The F1 measure. */ public double fMeasure() { return fMeasure(1.0); } /** * Returns the Fβ value for * the specified β. * * @param beta The β parameter. * @return The Fβ value. */ public double fMeasure(double beta) { return fMeasure(beta,recall(),precision()); } /** * Returns the Jaccard coefficient. * * @return The Jaccard coefficient. */ public double jaccardCoefficient() { return div(truePositive(), truePositive() + falseNegative() + falsePositive()); } /** * Returns the χ2 value. * * @return The χ2 value. */ public double chiSquared() { double tp = truePositive(); double tn = trueNegative(); double fp = falsePositive(); double fn = falseNegative(); double tot = total(); double diff = tp * tn - fp * fn; return tot * diff * diff / ((tp + fn) * (fp + tn) * (tp + fp) * (fn + tn)); } /** * Returns the φ2 value. * * @return The φ2 value. */ public double phiSquared() { return chiSquared() / (double) total(); } /** * Return the value of Yule's Q statistic. * * @return The value of Yule's Q statistic. */ public double yulesQ() { double tp = truePositive(); double tn = trueNegative(); double fp = falsePositive(); double fn = falseNegative(); return (tp*tn - fp*fn) / (tp*tn + fp*fn); } /** * Return the value of Yule's Y statistic. * * @return The value of Yule's Y statistic. */ public double yulesY() { double tp = truePositive(); double tn = trueNegative(); double fp = falsePositive(); double fn = falseNegative(); return (Math.sqrt(tp*tn) - Math.sqrt(fp*fn)) / (Math.sqrt(tp*tn) + Math.sqrt(fp*fn)); } /** * Return the Fowlkes-Mallows score. * * @return The Fowlkes-Mallows score. */ public double fowlkesMallows() { double tp = truePositive(); return tp / Math.sqrt(precision() * recall()); } /** * Returns the standard deviation of the accuracy. This is * computed as the deviation of an equivalent accuracy generated * by a binomial distribution, which is just a sequence of * Bernoulli (binary) trials. * * @return The standard deviation of the accuracy. */ public double accuracyDeviation() { // e.g. p = 0.05 for a 5% conf interval double p = accuracy(); double total = total(); double variance = p * (1.0 - p) / total; return Math.sqrt(variance); } /** * The probability that the reference and response are the same if * they are generated randomly according to the reference and * response likelihoods. * * @return The accuracy of a random classifier. */ public double randomAccuracy() { double ref = referenceLikelihood(); double resp = responseLikelihood(); return ref * resp + (1.0 - ref) * (1.0 - resp); } /** * The probability that the reference and the response are the same * if the reference and response likelihoods are both the average * of the sample reference and response likelihoods. * * @return The unbiased random accuracy. */ public double randomAccuracyUnbiased() { double avg = (referenceLikelihood() + responseLikelihood()) / 2.0; return avg * avg + (1.0 - avg) * (1.0 - avg); } /** * Returns the value of the kappa statistic. * * @return The value of the kappa statistic. */ public double kappa() { return Statistics.kappa(accuracy(),randomAccuracy()); } /** * Returns the value of the unbiased kappa statistic. * * @return The value of the unbiased kappa statistic. */ public double kappaUnbiased() { return Statistics.kappa(accuracy(),randomAccuracyUnbiased()); } /** * Returns the value of the kappa statistic adjusted for * prevalence. * * @return The value of the kappa statistic adjusted for * prevalence. */ public double kappaNoPrevalence() { return 2.0 * accuracy() - 1.0; } /** * Returns a string-based representation of this evaluation. * * @return A string-based representation of this evaluation. */ @Override public String toString() { StringBuilder sb = new StringBuilder(2048); sb.append(" Total=" + total() + '\n'); sb.append(" True Positive=" + truePositive() + '\n'); sb.append(" False Negative=" + falseNegative() + '\n'); sb.append(" False Positive=" + falsePositive() + '\n'); sb.append(" True Negative=" + trueNegative() + '\n'); sb.append(" Positive Reference=" + positiveReference() + '\n'); sb.append(" Positive Response=" + positiveResponse() + '\n'); sb.append(" Negative Reference=" + negativeReference() + '\n'); sb.append(" Negative Response=" + negativeResponse() + '\n'); sb.append(" Accuracy=" + accuracy() + '\n'); sb.append(" Recall=" + recall() + '\n'); sb.append(" Precision=" + precision() + '\n'); sb.append(" Rejection Recall=" + rejectionRecall() + '\n'); sb.append(" Rejection Precision=" + rejectionPrecision() + '\n'); sb.append(" F(1)=" + fMeasure(1) + '\n'); sb.append(" Fowlkes-Mallows=" + fowlkesMallows() + '\n'); sb.append(" Jaccard Coefficient=" + jaccardCoefficient() + '\n'); sb.append(" Yule's Q=" + yulesQ() + '\n'); sb.append(" Yule's Y=" + yulesY() + '\n'); sb.append(" Reference Likelihood=" + referenceLikelihood() + '\n'); sb.append(" Response Likelihood=" + responseLikelihood() + '\n'); sb.append(" Random Accuracy=" + randomAccuracy() + '\n'); sb.append(" Random Accuracy Unbiased=" + randomAccuracyUnbiased() + '\n'); sb.append(" kappa=" + kappa() + '\n'); sb.append(" kappa Unbiased=" + kappaUnbiased() + '\n'); sb.append(" kappa No Prevalence=" + kappaNoPrevalence() + '\n'); sb.append(" chi Squared=" + chiSquared() + '\n'); sb.append(" phi Squared=" + phiSquared() + '\n'); sb.append(" Accuracy Deviation=" + accuracyDeviation()); return sb.toString(); } /** * Returns the Fβ measure for * a specified β, recall and precision values. * * @param beta Relative weighting of precision. * @param recall Recall value. * @param precision Precision value. * @return The Fβ measure. */ public static double fMeasure(double beta, double recall, double precision) { double betaSq = beta * beta; return (1.0 + betaSq) * recall * precision / (recall + (betaSq*precision)); } private static void validateCount(String countName, long val) { if (val < 0) { String msg = "Count must be non-negative." + " Found " + countName + "=" + val; throw new IllegalArgumentException(msg); } } static double div(double x, double y) { return x/y; } }




© 2015 - 2025 Weber Informatics LLC | Privacy Policy