Bayes' Theorem Calculator
Calculate posterior probability from prior and test accuracy
About the Bayes' Theorem Calculator
Bayes' Theorem is a cornerstone of probability theory that describes the probability of an event based on prior knowledge of conditions that might be related to the event. This calculator helps users update their beliefs when new evidence emerges, a process known as Bayesian inference. It is most frequently used to interpret the results of medical tests, legal evidence, and diagnostic screenings where the rarity of a condition (the base rate) can drastically alter the meaning of a positive result.
Professionals in data science, medicine, and risk management use this tool to avoid the common pitfall of base rate neglect. For instance, if a rare disease affects only 1% of the population, a test with 99% accuracy might still return more false positives than true positives. This calculator provides the specific 'posterior probability,' allowing users to distinguish between the accuracy of a test and the actual likelihood that the condition is present after a positive result is received.
Formula
P(A|B) = [P(B|A) * P(A)] / [P(B|A) * P(A) + P(B|not A) * P(not A)]P(A|B) is the posterior probability (the chance that A is true given that B occurred). P(B|A) is the likelihood or sensitivity (the chance that B occurs given A is true). P(A) is the prior probability (the initial chance A is true before evidence). P(B|not A) is the false positive rate (the chance B occurs even if A is false). P(not A) is simply 1 minus the prior probability.
Worked examples
Example 1: A patient tests positive for a disease that affects 1% of the population using a test with 80% sensitivity and a 10% false positive rate.
Prior P(A) = 0.01\nSensitivity P(B|A) = 0.80\nFalse Positive P(B|not A) = 0.10\nStep 1: (0.80 * 0.01) = 0.008\nStep 2: (0.10 * 0.99) = 0.099\nStep 3: 0.008 / (0.008 + 0.099) = 0.07476
Result: 7.5%, meaning there is only a 1 in 13 chance you actually have the disease.
Example 2: A spam filter identifies the word 'Free' in an email. 20% of all emails are spam. 70% of spam emails contain 'Free', while only 1% of legitimate emails do.
Prior P(Spam) = 0.20\nLikelihood P(Free|Spam) = 0.70\nFalse Positive P(Free|Legit) = 0.01\nStep 1: (0.70 * 0.20) = 0.14\nStep 2: (0.01 * 0.80) = 0.008\nStep 3: 0.14 / (0.14 + 0.008) = 0.9459
Result: 93.6%, indicating a very high likelihood that the message is indeed spam.
Common use cases
- Calculating the actual likelihood of a disease following a positive medical screening result.
- Updating the probability of a mechanical failure after a sensor triggers an alarm.
- Predicting weather event outcomes based on historical data and current morning observations.
- Assessing the probability of a legal defendant's guilt based on a specific piece of forensic evidence.
- Determining if an email is spam based on the presence of a specific keyword.
Pitfalls and limitations
- Confusing test sensitivity with the probability of actually having the condition.
- Failing to account for the base rate of an event in the general population.
- Using the theorem with subjective priors that have no statistical basis.
- Misinterpreting a '95% accurate test' as a 95% chance of being sick without checking the false positive rate.
Frequently asked questions
what is base rate neglect and how does Bayes calculate it?
Base rate neglect is a cognitive bias where people ignore the prior probability of an event in favor of new evidence. Bayes' Theorem corrects this by mathematically weighing the rarity of the condition against the accuracy of the test results.
how do false positives change the probability of a result?
A false positive occurs when the test suggests a condition is present when it is not. In the formula, this is represented by P(B|not A). High false positive rates significantly lower the posterior probability, especially for rare events.
can bayes theorem be used for spam filters?
Yes, Bayesian inference is used in spam filters by calculating the probability that a message is spam given the occurrence of certain words, based on how often those words appeared in previous spam versus legitimate emails.
how do i figure out the prior probability?
The prior is your initial estimate of probability before seeing new data. If you have no data, you might use a flat prior (0.5) or a known population frequency, such as the prevalence of a disease in a specific age group.
is bayes theorem always 100% accurate?
The theorem itself is a mathematical certainty, but the results are only as accurate as the input data. If the prior probability or the test sensitivity/specificity are incorrect, the resulting posterior probability will also be unreliable.