Laws of Total Probability
P(A) = Σ P(A|Bᵢ)·P(Bᵢ)
B₁ ∪ B₂ ∪ … ∪ Bₙ = S
B₁ ∪ B₂ ∪ … ∪ Bₙ = S
Probability Theory · Statistics · Mathematics
Laws of Total Probability: A Complete Guide
Master the theorem that powers Bayesian inference, medical diagnosis, machine learning, and actuarial science — with step-by-step solved examples and real-world applications built from the ground up.
Introduction & Core Concept
Laws of Total Probability: Why Every Probability Student Needs to Master This Theorem
The Law of Total Probability answers a specific, powerful question: how do you find the probability of an event when the sample space is complex, but you can break it into simpler, well-understood pieces? That’s the theorem’s entire job — and it does it elegantly. Once you understand it, you will start seeing its structure in problems across medicine, engineering, finance, and machine learning.
Here is the core intuition before any formal notation. Imagine you want to know the probability of getting a defective product from a factory. The factory has three production lines. Each line has a known defect rate, and you know what fraction of total output comes from each line. You cannot compute the overall defect probability directly — but you can compute it line by line and add. That weighted sum is the Law of Total Probability in action. Understanding probability theory starts with exactly this kind of structured thinking.
1933
Year Andrey Kolmogorov formalized modern probability axioms, grounding the Law of Total Probability in rigorous mathematics
∞
Fields using total probability: medicine, insurance, AI, engineering, finance, genetics, weather forecasting, and more
P(A)
What the theorem computes — the marginal probability of any event A from conditional probabilities over a partition
The Law of Total Probability is built on two even simpler ideas: conditional probability and partitions of a sample space. If you understand those two concepts, the theorem itself becomes almost obvious. This guide builds from those foundations upward, so nothing is assumed and nothing is glossed over.
The theorem is also inseparable from Bayes’ Theorem, which uses P(A) — computed via total probability — as its denominator. Mastering the Law of Total Probability essentially means you are halfway to mastering Bayesian inference. Given that Bayesian statistics, Bayesian machine learning, and probabilistic reasoning are among the most in-demand skills in data science today, this is not just academic — it is career-relevant. Bayesian inference is built directly on top of what you’ll learn here.
Who This Guide Is For
This guide is written for college and university students taking probability, statistics, or mathematics courses — including introductory stats courses at community college level, intermediate probability courses at four-year universities like the University of California system, Penn State, University of Michigan, the University of Edinburgh, or Imperial College London, and advanced courses at the graduate level. It is also directly relevant for students preparing for Society of Actuaries (SOA) Exam P, Casualty Actuarial Society (CAS) exams, the GRE Mathematics Subject Test, and data science technical interviews at companies like Google, Amazon, and Meta.
Formal Definition
What Is the Law of Total Probability? Definition, Notation, and Prerequisites
Let’s build the definition carefully. Before stating the Law of Total Probability, you need two prerequisite concepts: conditional probability and partitions of a sample space. Both are simple. Both are essential.
What Is Conditional Probability?
Conditional probability P(A | B) is the probability that event A occurs, given that event B has already occurred. Formally:
P(A | B) = P(A ∩ B) / P(B)
Valid only when P(B) > 0. Read as “the probability of A given B.”
This formula says: restrict your attention to the portion of the sample space where B has occurred, then ask what fraction of that portion also contains A. If you flip a coin twice, the probability of getting two heads given the first flip was heads is simply 1/2 — because knowing the first flip was heads restricts your view to just that half of the outcomes. Probability distributions and conditional probability are deeply connected concepts that reinforce each other throughout statistics coursework.
Rearranging the conditional probability formula gives the multiplication rule:
P(A ∩ B) = P(A | B) × P(B)
The probability of both A and B occurring equals the conditional probability of A given B, times the probability of B.
This multiplication rule is used directly inside the Law of Total Probability. Remember it — it is the engine of the theorem.
What Is a Partition of a Sample Space?
A partition of sample space S is a collection of events B₁, B₂, …, Bₙ satisfying two conditions:
- Mutually exclusive: No two events in the partition can occur simultaneously. B₁ ∩ B₂ = ∅, B₁ ∩ B₃ = ∅, and so on for all pairs.
- Collectively exhaustive: Together, the partition events cover every possible outcome. B₁ ∪ B₂ ∪ … ∪ Bₙ = S.
In a partition, every outcome in S belongs to exactly one partition event. Not zero — not two. Exactly one. The simplest partition possible is any event B and its complement Bᶜ. Those two events are always mutually exclusive (they cannot both happen) and collectively exhaustive (every outcome is either in B or in Bᶜ). This two-event partition is the most common setup in introductory probability problems.
Key intuition for partitions: Think of the sample space as a pie. A partition cuts the pie into slices that together cover the whole pie, with no overlap and no gaps. Each slice is a partition event Bᵢ. The partition condition ensures the slices are clean — no crust shared between slices, no part of the pie missing.
The Law of Total Probability — Formal Statement
With conditional probability and partitions in hand, the Law of Total Probability follows directly. Let S be a sample space. Let B₁, B₂, …, Bₙ be a partition of S (mutually exclusive and collectively exhaustive), each with positive probability. Let A be any event in S. Then:
P(A) = Σ P(A | Bᵢ) × P(Bᵢ)
= P(A|B₁)·P(B₁) + P(A|B₂)·P(B₂) + … + P(A|Bₙ)·P(Bₙ)
= P(A|B₁)·P(B₁) + P(A|B₂)·P(B₂) + … + P(A|Bₙ)·P(Bₙ)
Where {B₁, B₂, …, Bₙ} is a partition of S, and P(Bᵢ) > 0 for all i.
Read this formula in plain English: “The probability of A equals the sum, over every partition event, of the conditional probability of A given that partition event times the probability of that partition event.” You are breaking A into pieces — the piece that happens inside B₁, the piece inside B₂, and so on — computing each piece’s probability, and adding. The result is the total probability of A. Hypothesis testing and many other inferential procedures depend on this exact calculation of marginal probabilities.
The Two-Event Special Case
When the partition has just two events — B and Bᶜ — the formula simplifies to the version most commonly seen in introductory courses:
P(A) = P(A|B)·P(B) + P(A|Bᶜ)·P(Bᶜ)
The two-event case: partition into any event B and its complement Bᶜ.
This two-event form appears in the vast majority of undergraduate probability textbooks, including those used at MIT, Stanford, Harvard, Oxford, and Cambridge. It is the go-to form for most exam problems at the introductory level, from AP Statistics through SOA Exam P. Once you are comfortable with this form, extending to three or more partition events is entirely mechanical.
Why Does the Formula Work? The Derivation
The Law of Total Probability is not an arbitrary formula — it follows from first principles. Here is the derivation, step by step:
Since {B₁, …, Bₙ} is a partition of S, every outcome in A must lie inside exactly one Bᵢ. So:
A = (A ∩ B₁) ∪ (A ∩ B₂) ∪ … ∪ (A ∩ Bₙ)
These intersections (A ∩ Bᵢ) are mutually exclusive, because the Bᵢ are mutually exclusive. So by the addition rule for mutually exclusive events:
P(A) = P(A ∩ B₁) + P(A ∩ B₂) + … + P(A ∩ Bₙ)
Now apply the multiplication rule to each term — P(A ∩ Bᵢ) = P(A | Bᵢ) × P(Bᵢ):
P(A) = P(A|B₁)P(B₁) + P(A|B₂)P(B₂) + … + P(A|Bₙ)P(Bₙ)
That is the Law of Total Probability, derived from just two ingredients: the definition of conditional probability and the additive property for mutually exclusive events. No magic — just careful accounting. Expected values and variance use related additive structures, so this derivation style recurs throughout probability coursework.
Bayes’ Theorem Connection
How the Law of Total Probability Connects to Bayes’ Theorem
The Law of Total Probability and Bayes’ Theorem are inseparable. You cannot correctly apply Bayes’ Theorem without the Law of Total Probability — it provides the normalizing constant that makes the posterior probability a legitimate probability. Understanding this connection transforms both tools from isolated formulas into a unified reasoning framework.
Bayes’ Theorem: The Formula
Bayes’ Theorem answers the inverse question from the Law of Total Probability. While total probability asks “given I know the causes (Bᵢ), what is the probability of the effect (A)?”, Bayes asks “given I observed the effect (A), what is the probability of each cause (Bᵢ)?”
P(Bᵢ | A) = [P(A | Bᵢ) × P(Bᵢ)] / P(A)
Where P(A) = Σ P(A|Bᵢ)P(Bᵢ) — computed using the Law of Total Probability
The denominator P(A) in Bayes’ Theorem is precisely what the Law of Total Probability computes. Without it, Bayes’ Theorem cannot be evaluated. This is why many textbooks — including those used in courses at the London School of Economics, University of Chicago, and Yale University — teach the Law of Total Probability immediately before Bayes’ Theorem. They are sequential building blocks. Bayes’ Theorem applications span medical diagnosis, spam filtering, forensic evidence, and scientific hypothesis testing.
Forward vs. Backward Probability Reasoning
⟶ Law of Total Probability (Forward)
Direction: Causes → Effect
Question: Given the conditional probabilities of A under each scenario Bᵢ, what is the overall probability P(A)?
Input: P(Bᵢ) for all i, P(A | Bᵢ) for all i
Output: P(A) — the marginal probability of the effect
Example: Given defect rates per factory line, what is the overall defect rate?
⟵ Bayes’ Theorem (Backward)
Direction: Effect → Causes
Question: Given that A occurred, what is the probability that it came from cause Bᵢ?
Input: P(Bᵢ), P(A | Bᵢ), and P(A) from total probability
Output: P(Bᵢ | A) — the posterior probability of each cause
Example: Given a product is defective, what is the probability it came from Line 2?
The Classic Medical Diagnosis Example
The most famous and pedagogically valuable application of the Bayes + Total Probability combination is medical testing. Suppose a disease affects 1% of a population. A diagnostic test has 95% sensitivity (P(positive | disease) = 0.95) and 90% specificity (P(negative | no disease) = 0.90, so P(positive | no disease) = 0.10).
First, use the Law of Total Probability to find P(positive test):
P(positive) = P(pos | disease)·P(disease) + P(pos | no disease)·P(no disease)
= (0.95)(0.01) + (0.10)(0.99) = 0.0095 + 0.099 = 0.1085
= (0.95)(0.01) + (0.10)(0.99) = 0.0095 + 0.099 = 0.1085
Then Bayes’ Theorem reveals the posterior probability that a positive test means you actually have the disease:
P(disease | positive) = (0.95 × 0.01) / 0.1085 ≈ 0.0876 ≈ 8.8%
Counterintuitively, a positive test for a rare disease only means ~8.8% probability of actually having the disease — even with a good test. This is the base rate fallacy, and it cannot be calculated without the Law of Total Probability.
Struggling With Probability Assignments?
Our expert statistics tutors help students master the Law of Total Probability, Bayes’ Theorem, and every other probability concept — from introductory courses to graduate-level work.
Get Statistics Help Now Log InSolved Examples
Law of Total Probability: Step-by-Step Solved Examples
The best way to understand the Law of Total Probability is through worked examples that vary in context and difficulty. The following examples progress from introductory to intermediate, covering the types of problems you will encounter in university probability courses, standardized exams, and professional applications. Each solution follows the same structured approach: identify the partition, gather probabilities, apply the formula.
Example 1: Two Boxes of Balls (Classic Introductory Problem)
Example 1 — Introductory
Problem Statement
Box A contains 3 red balls and 2 blue balls. Box B contains 1 red ball and 4 blue balls. A box is chosen at random (each with probability 1/2), and then one ball is drawn at random. What is the probability the drawn ball is red?
Step-by-Step Solution
Step 1 — Identify event A: A = {red ball drawn}
Step 2 — Identify the partition: B₁ = {Box A chosen}, B₂ = {Box B chosen}. These are mutually exclusive and exhaustive — each trial uses exactly one box.
Step 3 — P(Bᵢ): P(B₁) = 1/2, P(B₂) = 1/2
Step 4 — P(A | Bᵢ): P(red | Box A) = 3/5, P(red | Box B) = 1/5
Step 5 — Apply formula:
P(red) = P(red|Box A)·P(Box A) + P(red|Box B)·P(Box B)
= (3/5)(1/2) + (1/5)(1/2) = 3/10 + 1/10 = 4/10 = 0.40
Answer: The probability of drawing a red ball is 0.40 (40%).
Example 2: Three Factory Lines (Manufacturing Quality Control)
Example 2 — Intermediate
Problem Statement
A factory has three production lines. Line 1 produces 50% of all items with a 2% defect rate. Line 2 produces 30% with a 5% defect rate. Line 3 produces 20% with a 10% defect rate. An item is chosen at random. What is the probability it is defective?
Step-by-Step Solution
Step 1 — Event A: A = {item is defective}
Step 2 — Partition: B₁ = {from Line 1}, B₂ = {from Line 2}, B₃ = {from Line 3}. Mutually exclusive, collectively exhaustive (0.50 + 0.30 + 0.20 = 1.00 ✓).
Step 3 — P(Bᵢ): P(B₁) = 0.50, P(B₂) = 0.30, P(B₃) = 0.20
Step 4 — P(A | Bᵢ): P(defective | Line 1) = 0.02, P(defective | Line 2) = 0.05, P(defective | Line 3) = 0.10
Step 5 — Apply formula:
P(defective) = (0.02)(0.50) + (0.05)(0.30) + (0.10)(0.20)
= 0.010 + 0.015 + 0.020 = 0.045 = 4.5%
Answer: The probability that a randomly selected item is defective is 4.5%.
Example 3: Student Exam Passage (Three Preparation Levels)
Example 3 — Intermediate
Problem Statement
In a university statistics course, 20% of students studied intensively (passed with probability 0.95), 50% studied moderately (passed with probability 0.75), and 30% barely studied (passed with probability 0.40). What is the probability a randomly chosen student passes the exam?
Step-by-Step Solution
Partition: B₁ = intensive, B₂ = moderate, B₃ = minimal study. P(B₁) = 0.20, P(B₂) = 0.50, P(B₃) = 0.30.
Conditional pass probabilities: P(pass|B₁) = 0.95, P(pass|B₂) = 0.75, P(pass|B₃) = 0.40
Total Probability:
P(pass) = (0.95)(0.20) + (0.75)(0.50) + (0.40)(0.30)
= 0.190 + 0.375 + 0.120 = 0.685 = 68.5%
Answer: A randomly chosen student has a 68.5% probability of passing the exam.
Example 4: Insurance Risk Pooling (Actuarial Application)
Example 4 — Advanced / Actuarial
Problem Statement
An insurance company divides policyholders into three risk categories: Low risk (60% of policyholders, 5% claim probability), Medium risk (30%, 20% claim probability), High risk (10%, 60% claim probability). What is the overall probability that a randomly chosen policyholder files a claim?
Step-by-Step Solution
Partition: B₁ = Low, B₂ = Medium, B₃ = High risk. Sums to 1.00 ✓.
P(claim):
= P(claim|Low)·P(Low) + P(claim|Med)·P(Med) + P(claim|High)·P(High)
= (0.05)(0.60) + (0.20)(0.30) + (0.60)(0.10)
= 0.030 + 0.060 + 0.060 = 0.150 = 15%
Answer: The overall claim probability across the portfolio is 15%. This type of calculation is foundational to actuarial pricing models — insurance companies use it to set portfolio-wide premiums and reserves.
Example 5: Email Spam Filtering (Machine Learning Application)
Example 5 — Data Science Application
Problem Statement
In an email inbox, 30% of emails are spam and 70% are legitimate. A spam filter correctly identifies spam 90% of the time and incorrectly flags legitimate email 5% of the time. What is the probability that a randomly chosen email gets flagged?
Step-by-Step Solution
Partition: B₁ = spam, B₂ = legitimate. P(B₁) = 0.30, P(B₂) = 0.70.
P(flagged):
= P(flagged|spam)·P(spam) + P(flagged|legit)·P(legit)
= (0.90)(0.30) + (0.05)(0.70)
= 0.270 + 0.035 = 0.305 = 30.5%
Answer: 30.5% of all emails get flagged. From here, Bayes’ Theorem would let you compute what fraction of flagged emails are actually spam — the precision of the filter. This is the computational core of the Naive Bayes classifier.
Step-by-Step Application Guide
How to Apply the Law of Total Probability: A Systematic Framework
Students who struggle with total probability problems typically fail at one of two points: identifying the correct partition, or gathering the right conditional probabilities. The following step-by-step framework eliminates both failure points by making the process explicit and systematic.
1
Read the Problem and Identify Event A
Determine clearly which event’s probability you want to find. This is your A. Write it down explicitly before doing anything else. Common phrasings: “What is the probability that…?” or “Find P(A)” when the sample space has multiple scenarios.
2
Identify a Valid Partition {B₁, B₂, …, Bₙ}
Look for a set of scenarios, categories, or causes that are mutually exclusive and cover all possibilities. Common partition structures: machine/factory choices, risk categories, disease status (present/absent), which group a person belongs to, weather states. Verify the partition probabilities sum to 1.
3
Collect P(Bᵢ) for All Partition Events
These are the “weights” — the probabilities of each scenario occurring. They should be given in the problem or derivable from given information. Write them in a column. Confirm they sum to 1.00 before proceeding.
4
Collect P(A | Bᵢ) for All Partition Events
These are the conditional probabilities of A given each scenario. They represent “if I know I’m in scenario Bᵢ, how likely is A?” Write them next to their corresponding P(Bᵢ) values.
5
Compute Each Product P(A | Bᵢ) × P(Bᵢ)
For each partition event, multiply the conditional probability by the partition probability. Do each multiplication separately and write down the intermediate results. This step-by-step arithmetic is where most calculation errors occur.
6
Sum the Products
Add all the products from Step 5. The result is P(A). This final sum should be a number between 0 and 1. If it is not, recheck your partition probabilities and your conditional probabilities.
7
Sanity-Check the Answer
Does the answer make intuitive sense? The total probability P(A) should fall between the minimum and maximum of the conditional probabilities P(A | Bᵢ). If your conditional probabilities are 0.02, 0.05, and 0.10, your total probability must be between 0.02 and 0.10. This simple range check catches most formula errors instantly.
Key Concepts, Notation & Comparisons
Essential Probability Concepts Related to the Law of Total Probability
The Law of Total Probability does not exist in isolation. It is embedded in a network of related probability concepts. The following table maps the core concepts, their notation, their definitions, and their relationship to total probability.
| Concept | Notation | Definition | Role in Total Probability |
|---|---|---|---|
| Sample Space | S | The set of all possible outcomes of a probability experiment | The partition {B₁,…,Bₙ} must cover all of S |
| Event | A, B, C | Any subset of the sample space S | A is the target event; Bᵢ are the partition events |
| Conditional Probability | P(A|B) | P(A ∩ B) / P(B); probability of A given B occurred | Provides P(A|Bᵢ) — the inputs to the formula |
| Partition | {B₁,…,Bₙ} | Mutually exclusive, collectively exhaustive events that cover S | The structural foundation — required for the formula to hold |
| Marginal Probability | P(A) | The unconditional probability of event A across all scenarios | This is what the Law of Total Probability computes |
| Joint Probability | P(A ∩ B) | Probability that both A and B occur simultaneously | Each term P(A|Bᵢ)P(Bᵢ) = P(A ∩ Bᵢ); summed to give P(A) |
| Prior Probability | P(Bᵢ) | Probability of each partition event before observing A | Weights in the total probability sum |
| Likelihood | P(A|Bᵢ) | Probability of A given scenario Bᵢ | The conditional factors in each term of the total probability sum |
| Posterior Probability | P(Bᵢ|A) | Probability of cause Bᵢ given effect A has been observed | Computed via Bayes’ Theorem using P(A) from total probability |
| Complement | Bᶜ or B̄ | All outcomes in S not in B; P(Bᶜ) = 1 − P(B) | B and Bᶜ form the simplest two-event partition |
The MIT OpenCourseWare Introduction to Probability course by Professors Dimitri Bertsekas and John Tsitsiklis is one of the most rigorous and accessible free resources for probability theory, covering the Law of Total Probability with exceptional clarity and depth.
Real-World Applications
Real-World Applications of the Law of Total Probability Across Fields
The Law of Total Probability is not just a textbook theorem. It is embedded in the computational infrastructure of multiple industries and scientific disciplines.
Medicine and Epidemiology
Medical researchers and clinicians use the Law of Total Probability constantly. Computing the prevalence of a condition across a diverse population involves weighting condition rates within demographic subgroups by the size of each subgroup — that is total probability. During the COVID-19 pandemic, computing overall population infection rates required weighting age-group-specific infection rates by the proportion of each age group in the population. That is the Law of Total Probability applied at national scale.
Finance and Actuarial Science
Insurance companies and financial risk managers use total probability to compute aggregate default probabilities, claim rates, and portfolio loss distributions. The Society of Actuaries (SOA) and Casualty Actuarial Society (CAS) include total probability problems explicitly in their Exam P (Probability) syllabus. Credit rating agencies like Moody’s and S&P Global use Bayesian and total probability frameworks to compute overall corporate default probabilities across credit rating categories.
Machine Learning and Artificial Intelligence
The Naive Bayes classifier uses the Law of Total Probability to compute the marginal probability of observed features in a dataset. Hidden Markov Models (HMMs), used in speech recognition at Apple (Siri), Google, and Amazon (Alexa), use total probability in the forward algorithm. Any time a model involves latent variables — hidden states not directly observed — total probability is the mathematical mechanism for handling them.
Engineering — Reliability and Fault Analysis
Reliability engineering uses the Law of Total Probability to compute system failure probabilities when a system can fail via multiple different failure modes. This analysis is used by aerospace engineers at NASA and the European Space Agency (ESA), by nuclear safety engineers, and by software engineers doing fault tree analysis for safety-critical systems.
Genetics and Bioinformatics
Population genetics uses total probability to compute allele frequencies across mixed populations. Bioinformatics algorithms for gene sequence alignment and protein structure prediction use probabilistic models where total probability underpins the computation of model likelihoods.
Where You Will See This on Standardized Exams
The Law of Total Probability appears explicitly on: SOA Exam P (actuarial), AP Statistics, GRE Mathematics Subject Test, university probability finals, and data science technical interviews at companies including Google, Amazon, Meta, Microsoft, and Two Sigma. It is a core tool that examiners return to repeatedly.
Need Help With Your Statistics Exam?
Our expert tutors specialize in probability theory, Bayesian methods, and every concept on this page — including SOA Exam P preparation, university coursework, and data science interview prep.
Order Now Log InCommon Mistakes & How to Avoid Them
Common Mistakes Students Make With the Law of Total Probability
Mistake 1: Using an Invalid Partition
The most fundamental error is applying the formula with events that are not a valid partition. Students sometimes choose partition events that overlap (not mutually exclusive) or that miss some outcomes (not collectively exhaustive). Always verify: the partition probabilities sum to 1.00 before applying the formula.
Mistake 2: Confusing P(A | B) with P(B | A)
Reversing the conditional probability — using P(B | A) when you need P(A | B) — is called the base rate fallacy or the prosecutor’s fallacy. In the medical testing example: confusing “the probability of a positive test given disease” (0.95) with “the probability of disease given a positive test” (8.8%) produces catastrophically wrong conclusions.
Mistake 3: Ignoring Base Rates (Not Weighting by P(Bᵢ))
Some students correctly identify conditional probabilities but then average them without weighting — treating all partition events as equally likely when they are not. The weights P(Bᵢ) in the formula are not optional — they are what makes it a total probability calculation.
Mistake 4: Applying the Formula When Events Are Not a Partition
Sometimes students apply the total probability formula to events that do not form a complete partition. Be systematic: list every possible scenario and confirm none are omitted.
Mistake 5: Arithmetic Errors in the Final Sum
Write each product separately before summing. Use decimal notation consistently. Double-check that your final answer is between 0 and 1 — a result of P(A) = 1.15 is mathematically impossible.
The Four-Step Error Check: Before submitting any total probability calculation, verify: (1) Partition probabilities sum to 1.00. (2) All conditional probabilities are between 0 and 1. (3) You used P(A|Bᵢ) not P(Bᵢ|A) in each term. (4) Final answer P(A) is between 0 and 1, and between the minimum and maximum conditional probability values.
History & Key Entities
The History of the Law of Total Probability: Key Figures and Institutions
Andrey Nikolaevich Kolmogorov (1903–1987)
Andrey Kolmogorov, working at Moscow State University, published his foundational Grundbegriffe der Wahrscheinlichkeitsrechnung in 1933. This established the axiomatic framework — sample spaces, sigma-algebras, probability measures — on which all modern probability theory, including the Law of Total Probability, is formally grounded.
Thomas Bayes (1701–1761) and Richard Price
Thomas Bayes developed the essential insight connecting conditional probabilities in both directions. Bayes never published during his lifetime — his friend Richard Price presented the manuscript to the Royal Society of London in 1763. The paper, “An Essay towards solving a Problem in the Doctrine of Chances,” became one of the most cited works in the history of statistics.
Pierre-Simon Laplace (1749–1827)
Pierre-Simon Laplace independently rediscovered Bayes’ result and formalized it in far greater generality in his Théorie analytique des probabilités (1812). Laplace developed the complete Bayesian framework, including the use of prior probabilities weighted by likelihoods — the computational structure of the Law of Total Probability.
Key Institutions in Probability Education
| Institution | Country | Notable Contribution | Resources for Students |
|---|---|---|---|
| MIT | USA | OpenCourseWare probability courses; leading research in probabilistic AI | Free OCW materials; MIT 6.041 |
| Harvard University | USA | Stat 110 (Probability) by Joe Blitzstein — one of the most watched probability courses globally | Free YouTube lectures; Statistics 110 materials |
| Stanford University | USA | Leading probabilistic machine learning research; Coursera courses | Stanford Online; CS229 materials |
| University of Cambridge | UK | Statistics Laboratory; Part II and Part III Probability Tripos | Cambridge Statistical Laboratory lecture notes |
| University of Oxford | UK | Oxford Statistics Department; pioneering work in Bayesian inference | Oxford Probability lecture notes |
| Khan Academy | USA (global) | Accessible, free video explanations for learners at all levels | Free probability course at khanacademy.org |
Extensions & Related Topics
Extensions of the Law of Total Probability: Expectation, Variance, and Continuous Cases
Law of Total Expectation (Adam’s Law)
The Law of Total Expectation extends the total probability idea to expected values. For random variables X and Y:
E[X] = E[E[X | Y]]
The expected value of X equals the expected value of the conditional expected value of X given Y. Also called the Law of Iterated Expectations (LIE) or Adam’s Law.
In the discrete case, this becomes: E[X] = Σᵢ E[X | Y = yᵢ] × P(Y = yᵢ) — structurally identical to the Law of Total Probability with expectations replacing probabilities.
Law of Total Variance (Eve’s Law)
Var(X) = E[Var(X|Y)] + Var(E[X|Y])
Total variance = Expected conditional variance + Variance of conditional expectation. These represent “within-group variability” and “between-group variability.”
This decomposition is the foundation of Analysis of Variance (ANOVA) — one of the most widely used statistical methods in social science, medicine, and engineering research.
Continuous Version of the Law of Total Probability
f(x) = ∫ f(x|y) · f(y) dy
Continuous Law of Total Probability: the marginal density f(x) equals the integral over y of the conditional density f(x|y) times the marginal density f(y).
This continuous form is foundational in Bayesian inference with continuous prior distributions, in mixture models used in clustering and density estimation, and in computing marginal likelihoods in statistical modeling.
Statistics Assignment Due Soon?
From probability fundamentals to advanced Bayesian methods, our expert statisticians provide step-by-step solutions and clear explanations — 24/7, for students at every level.
Get Help Now Log InFrequently Asked Questions
Frequently Asked Questions: Laws of Total Probability
What is the Law of Total Probability?
The Law of Total Probability states that if events B₁, B₂, …, Bₙ form a partition of the sample space S (mutually exclusive and collectively exhaustive), then the probability of any event A can be computed as P(A) = Σ P(A | Bᵢ) × P(Bᵢ). You break the sample space into non-overlapping pieces, compute the conditional probability of A given each piece, weight each by the probability of that piece, and sum. The result is the marginal probability P(A).
What is the difference between the Law of Total Probability and Bayes’ Theorem?
The Law of Total Probability computes P(A) by summing conditional probabilities over a partition — it goes from causes to effect. Bayes’ Theorem uses P(A) — computed via total probability — to calculate a posterior probability P(Bᵢ | A). Formally: P(Bᵢ | A) = [P(A | Bᵢ) × P(Bᵢ)] / P(A), where P(A) in the denominator is found using the Law of Total Probability. They are sequential: total probability provides the denominator that Bayes’ Theorem needs.
What is a partition of a sample space, and why does it matter?
A partition of sample space S is a collection of events B₁, B₂, …, Bₙ satisfying two conditions: mutually exclusive (no two events can occur simultaneously) and collectively exhaustive (together they cover all possible outcomes). Every outcome in the sample space belongs to exactly one partition event. The partition matters because it is the structural foundation of the Law of Total Probability — the formula is only valid when the conditioning events form a genuine partition.
Can the Law of Total Probability be used with more than two partition events?
Yes, absolutely. The Law of Total Probability applies to any finite number of partition events. The formula P(A) = Σ P(A|Bᵢ)P(Bᵢ) extends naturally to 3, 4, 5, or any number of partition events. In actuarial applications, insurance portfolios might be segmented into 5 or more risk categories. The mathematics is identical regardless of the number of partition events — just more terms in the sum.
What are the most common exam question types for the Law of Total Probability?
The most common types include: (1) Two-box or two-urn problems where a container is chosen randomly and then an item drawn. (2) Manufacturing/quality control problems with multiple production lines. (3) Medical testing problems computing overall positive test rates. (4) Insurance risk pooling problems. (5) Student performance problems with study group proportions. (6) Combined total probability + Bayes’ theorem problems. SOA Exam P, AP Statistics, and university probability finals all regularly include these question types.
How does the Law of Total Probability relate to marginal probability?
The Law of Total Probability is precisely the procedure for computing a marginal probability. A marginal probability P(A) is the unconditional probability of an event A, summed over all possible values of another variable. When the conditioning variable is discrete with a finite partition, the marginal probability is computed exactly by the Law of Total Probability: P(A) = Σ P(A|Bᵢ)P(Bᵢ). This is why statisticians use “marginalize” as a verb meaning “apply the Law of Total Probability.”
What is the Law of Total Expectation and how does it relate to total probability?
The Law of Total Expectation (Adam’s Law) states that E[X] = E[E[X|Y]] — the expected value of X equals the expected value of the conditional expected value of X given Y. This is the direct generalization of the Law of Total Probability to random variables. The Law of Total Probability computes P(A) = E[1_A] — the expected value of the indicator variable for event A. Adam’s Law extends this to any random variable X.
Why do students find the Law of Total Probability confusing, and how can I master it?
Students typically struggle for three reasons: (1) They cannot identify a valid partition in the problem context. (2) They confuse P(A|Bᵢ) with P(Bᵢ|A) — reversing the conditional probability. (3) They apply the formula without verifying the partition is valid. The most effective path to mastery is working through 20–30 varied practice problems until the partition-identification step becomes automatic. Use the structured approach: identify A, identify the partition, write down P(Bᵢ) and P(A|Bᵢ) in a table, multiply, sum.
