Confusion Matrix & Metrics
Adjust predictions and true values to see how accuracy, precision, recall, and F1 score react.
Confusion Matrix & Metrics
Concept Overview
A confusion matrix is a fundamental tool for evaluating the performance of a classification model. It visualizes the counts of correct and incorrect predictions by comparing predicted labels against true labels. This matrix allows us to look beyond simple accuracy and compute granular metrics like precision, recall, and the F1 score, revealing how a model handles different types of errors, especially in imbalanced datasets.
Mathematical Definition
A binary confusion matrix is constructed from four core outcomes:
Key Concepts
- Accuracy: The proportion of all correct predictions. Often misleading when classes are highly imbalanced.Accuracy = (TP + TN) / (TP + TN + FP + FN)
- Precision (Positive Predictive Value): Out of all instances predicted as positive, how many were actually positive? Focuses on minimizing False Positives.Precision = TP / (TP + FP)
- Recall (Sensitivity or True Positive Rate): Out of all actual positive instances, how many were correctly predicted? Focuses on minimizing False Negatives.Recall = TP / (TP + FN)
- Specificity (True Negative Rate): Out of all actual negative instances, how many were correctly predicted?Specificity = TN / (TN + FP)
- F1 Score: The harmonic mean of precision and recall. It balances both metrics and is useful when you care about both False Positives and False Negatives.F1 = 2 · (Precision · Recall) / (Precision + Recall)
Historical Context
The terminology of true/false positives and negatives traces back to early signal detection theory, which was extensively developed during World War II for radar operators distinguishing enemy aircraft from noise. It was later adapted into medical diagnostics and eventually became a staple of machine learning evaluation. The terms "Type I Error" (False Positive) and "Type II Error" (False Negative) stem from Neyman and Pearson's foundational work on hypothesis testing in the 1930s.
Real-world Applications
- Medical Diagnosis: High recall is crucial to ensure no life-threatening disease is missed (minimizing False Negatives).
- Spam Filtering: High precision is essential because classifying a legitimate email as spam (False Positive) is highly disruptive.
- Fraud Detection: Striking a balance via F1 score, or tailoring the decision threshold depending on the cost of investigating a false alarm vs. missing actual fraud.
Related Concepts
- Receiver Operating Characteristic (ROC) Curve — plotting Recall against False Positive Rate.
- Area Under the Curve (AUC) — quantifying overall ability to discriminate between classes.
- Cross-Entropy Loss — a common loss function used to optimize classifiers before evaluating with a confusion matrix.
Experience it interactively
Adjust parameters, observe in real time, and build deep intuition with Riano’s interactive Confusion Matrix & Metrics module.
Try Confusion Matrix & Metrics on Riano →