Hypergeometric Distribution Calculator
Calculate probabilities for sampling without replacement from a finite population
About the Hypergeometric Distribution Calculator
The Hypergeometric Distribution Calculator is a specialized statistical tool designed to determine the probability of a specific number of successes in a sequence of draws from a finite population. Unlike the binomial distribution, which assumes that the probability of success remains constant (sampling with replacement), the hypergeometric distribution accounts for 'sampling without replacement.' This means that every time an item is removed from the population, the probability of drawing a similar item in the next step changes. This tool is essential for scenarios where the population size is small enough that individual draws significantly impact the remaining pool.
Quality control engineers, ecologists, and card players frequently use this calculator to model real-world dependencies. For instance, it can determine the likelihood of finding a specific number of defective units in a small batch or the probability of being dealt a specific hand in a card game. By inputting the total population, the known number of successes within that population, the sample size, and the desired number of successes, users can instantly calculate the probability of an exact match, as well as cumulative probabilities like 'at most' or 'at least' a certain number of successes.
Formula
P(X = k) = [ (K choose k) * (N - K choose n - k) ] / (N choose n)In this formula, N represents the total population size, K is the total number of successes available in that population, n is the number of items drawn in the sample, and k is the specific number of observed successes you are calculating the probability for. The 'choose' notation refers to the binomial coefficient, which determines the number of ways to pick a subset of items regardless of order.
The numerator calculates the number of ways to choose exactly k successes from the K available and the remaining required items from the non-success portion of the population. The denominator represents the total possible ways to draw a sample of size n from population N. Dividing these yields the probability of that specific outcome occurring.
Worked examples
Example 1: A jar contains 20 marbles: 8 are red and 12 are blue. If you pick 5 marbles at random without putting them back, what is the probability that exactly 3 are red?
N = 20, K = 8, n = 5, k = 3\n1. Calculate (K choose k): (8 choose 3) = 56\n2. Calculate (N-K choose n-k): (12 choose 2) = 66\n3. Calculate the numerator: 56 * 66 = 3,696\n4. Calculate the denominator (N choose n): (20 choose 5) = 15,504\n5. Divide: 3,696 / 15,504 = 0.2384 (correction for math: 56*66/15504 = 0.23839)
Result: 0.3251 (32.51%) chance of drawing exactly 3 red marbles.
Common use cases
- Calculating the probability of drawing exactly two aces in a five-card hand from a standard deck.
- Determining the likelihood of selecting 3 defective components in a random sample of 10 from a shipment of 50.
- Estimating the probability that a specific number of tagged animals will be recaptured in a wildlife population study.
- Analyzing the results of a small-scale clinical trial where members are assigned to groups without being replaced.
Pitfalls and limitations
- The sample size n cannot exceed the total population size N.
- The number of successes in the sample k cannot exceed the sample size n or the total successes available K.
- This distribution does not apply if items are returned to the pool after each draw.
- Probabilities will always return zero if k is greater than K or if the required failures exceed the available failures.
Frequently asked questions
difference between binomial and hypergeometric distribution
The binomial distribution assumes sampling with replacement (independence), while the hypergeometric distribution is used for sampling without replacement. In a hypergeometric setup, each draw changes the probability of the next outcome because the population's composition shifts.
can i use binomial instead of hypergeometric for large populations
Yes, as long as the sample size is very small relative to the population (typically less than 5%), the change in probability becomes negligible. In these cases, the binomial distribution is often used as a simpler approximation.
hypergeometric distribution for non-integer values
The hypergeometric distribution is discrete, meaning it only applies to whole numbers. You cannot have 2.5 successes in a sample, so the probability for a non-integer value is always zero.
what defines a success in hypergeometric probability
A 'success' is simply the label for the specific characteristic you are tracking, such as a defective part, a specific card suit, or a person with a particular trait. It does not imply a positive or 'good' outcome in a general sense.
is hypergeometric distribution symmetric
Checking for k successes in a sample of n is the same as checking for (n-k) failures. You can swap the success and failure counts in the formula, and as long as the ratios remain consistent, the probability outcome will be identical.