Confusion Matrix Calculator

Calculate accuracy, precision, recall, F1 score, MCC, and other classification metrics from a confusion matrix

About the Confusion Matrix Calculator

The Confusion Matrix Calculator is an essential tool for data scientists, machine learning engineers, and researchers who need to evaluate the performance of classification algorithms. While many systems output raw accuracy, a confusion matrix provides the granular detail necessary to understand where a model is failing. By organizing predictions into a 2x2 grid of true positives, true negatives, false positives, and false negatives, users can see exactly how many times a model confused one class for another. This tool is particularly valuable when working with imbalanced datasets, where standard accuracy figures can be deceptive.

This calculator computes a comprehensive suite of performance metrics beyond simple accuracy, including precision, recall (sensitivity), specificity, and the F1 score. It also provides more robust statistical indicators like the Matthews Correlation Coefficient (MCC) and the False Positive Rate. Whether you are tuning a logistic regression for medical diagnosis or evaluating a neural network for fraud detection, this tool translates raw frequency counts into actionable insights, helping you decide if your model is ready for deployment or requires further optimization of its decision threshold.

Formula

Accuracy = (TP + TN) / (TP + TN + FP + FN) | Precision = TP / (TP + FP) | Recall = TP / (TP + FN) | F1 = 2 * (Precision * Recall) / (Precision + Recall)

The formula uses four primary inputs: True Positives (TP) are cases where the model correctly predicted the positive class; True Negatives (TN) are correct predictions of the negative class; False Positives (FP) occur when the model incorrectly predicts a positive result (Type I error); and False Negatives (FN) occur when the model misses a positive result (Type II error). Advanced metrics like the Matthews Correlation Coefficient (MCC) and Balanced Accuracy use these same four values to provide deeper insights into model performance across imbalanced datasets.

Worked examples

Example 1: A fraud detection model is tested on 1,000 transactions. It correctly identifies 20 fraudulent ones (TP) and 930 legitimate ones (TN). It incorrectly flags 10 legitimate ones as fraud (FP) and misses 30 fraudulent ones (FN).

1. Accuracy = (20 + 930) / 1000 = 0.95\n2. Precision = 20 / (20 + 10) = 0.6667\n3. Recall = 20 / (20 + 30) = 0.40\n4. F1 = 2 * (0.666 * 0.4) / (0.666 + 0.4) = 0.50

Result: Accuracy: 95%, Precision: 66.67%, Recall: 40%, F1 Score: 50%. Even though accuracy is high, the model is actually quite poor at identifying the minority class.

Example 2: A marketing model predicts if 200 customers will subscribe. It accurately predicts 70 subscribers (TP) and 90 non-subscribers (TN). It incorrectly predicts 10 people will subscribe (FP) and misses 30 who actually did (FN).

1. Total = 70 + 90 + 10 + 30 = 200\n2. Accuracy = (70 + 90) / 200 = 0.80\n3. Precision = 70 / (70 + 10) = 0.875\n4. Recall = 70 / (70 + 30) = 0.70\n5. F1 = 2 * (0.875 * 0.7) / (0.875 + 0.7) = 0.7777

Result: Accuracy: 80%, Precision: 87.5%, Recall: 70%, F1 Score: 77.7%. This model shows a balanced performance with a slight bias toward precision.

Common use cases

Pitfalls and limitations

Frequently asked questions

What is a good F1 score for a confusion matrix?

A high F1 score indicates that a model has a good balance between precision and recall. It is especially useful when you have an uneven class distribution, as it penalizes models that favor one metric significantly over the other.

Why is accuracy not enough for model evaluation?

Accuracy is the percentage of total correct predictions (TP + TN) / Total. It is often misleading if your dataset is imbalanced; for example, if 95% of users don't churn, a model that always predicts 'not churn' is 95% accurate but useless for finding churners.

difference between precision and recall in confusion matrix

Precision measures how many of the positive predictions were actually correct, whereas Recall measures how many of the actual positive cases the model managed to find. High precision avoids 'false alarms,' while high recall avoids 'missed opportunities'.

What does MCC tell you about a classifier?

The Matthew's Correlation Coefficient (MCC) is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories. It ranges from -1 to +1, where +1 is a perfect prediction.

is sensitivity the same thing as recall

Recall and Sensitivity are identical metrics. They both calculate the ratio of accurately predicted positive observations to all observations in the actual class.

Related calculators

5 Number Summary Calculator
Calculate the five-number summary (min, Q1, median, Q3, max) and visualize with a box plot
Absolute Uncertainty Calculator
Calculate absolute and relative uncertainty for measurements and experimental data
Average Rating Calculator
Calculate the weighted average star rating from individual vote counts for reviews and feedback
Accuracy Calculator
Calculate accuracy, precision, and error rates for statistical analysis
Adjusted R-Squared Calculator
Calculate adjusted R² to account for the number of predictors in regression models
AIC/BIC Calculator
Compare statistical models using Akaike and Bayesian Information Criteria for model selection
Accuracy Calculator
Calculate accuracy, precision, and error rates for statistical analysis
ANOVA Calculator
Perform one-way Analysis of Variance to test if group means differ significantly