Performance Measures for Classification Algorithms

back to notes

 

Basic Definitions

 

Confusion Matrix: Algorithms are usually tested with some test-data (where we know what classes should have been given by the algorithm), after which a confusion matrix can be constructed by counting how many times the algorithm recognised class A as class B (with A potentially equal to B). If we have just one class (or are interested in one class), the matrix looks like this:

  • True Positive: algorithm outputs class A, correctly => P(A¦A)

  • False Positive: algorithm outputs class A, falsely => P(A¦ not A)

  • True Negative: algorithm outputs not class A, correctly => P(not A¦ not A)

  • False Negative: algorithm outputs not class A, falsely => P(not A¦A)

 

   

Accuracy

 

The sum of the diagonal elements in the confusion matrix / sum of all elements in the confusion matrix, or

correct predictions / all predictions

 

   
Precision/Recall Curve
 

The terminology comes from document retrieval, where Precision tells how many of the returned documents are correct, and Recall tells how many of the positives the model returns:

  • Precision = True Positives / All Positives = P(classifier outputs correct class A) / (P(classifier outputs correct class A) + P(classifier outputs falsely class A)) = P(A¦A)/(P(A¦A)+P(A¦not A)) = P(A¦A) / P(A)
  • Recall = True Positives / All True = P(classifier outputs correct class A) = P(A¦A)

The plot is created by dividing the algorithm's output prediction (expected to be in [0,1]) up into sub-domains, calculate both precision and recall for that sub-domain, and connect the plot-points to get to a curve:

 

 

   
Receiver Operator Characteristic Curve (ROC Curve)
 

ROC Curves were developed in WWII to statistically model false positive and false negative detections of radar operators. It plots Sensitivity vs 1-Specificity:

  • Sensitivity = Recall = P(classifier outputs correct class A) = True Positive Rate = P(A¦A)
  • Specificity = True Negatives / All Negatives = P(classifier outputs correctly not class A) = P(not A¦ not A)
  • 1-Specificity = False Positive Rate = P(classifier outputs falsely class A) = P(A¦ notA))

Similar to the Precision-Recall Curve, we can divide the algorithm's output prediction up into sub-domains to get to the ROC curve plot:

Properties:

  • Slope is non-increasing (see graph for example)
  • Each point on ROC represents different trade-off (cost ratio) between false positives and false negatives
  • Slope of line tangent to curve defines the cost ratio
  • ROC Area represents performance averaged over all possible cost ratios
  • If two ROC curves do not intersect, one method dominates the other
  • If two ROC curves intersect, one method is better for some cost ratios, and other method is better for other cost ratios
     
   
Links
     
 

Compiled by Kristof Van Laerhoven.