Statistics

Laws of Total Probability

Posted by

Byron Otieno

On May 8, 2025

0 comments

Laws of Total Probability: Definition, Formula, Examples & Applications | Ivy League Assignment Help

P(A) = Σ P(A|Bᵢ)·P(Bᵢ)
B₁ ∪ B₂ ∪ … ∪ Bₙ = S

Probability Theory · Statistics · Mathematics

Laws of Total Probability: A Complete Guide

Master the theorem that powers Bayesian inference, medical diagnosis, machine learning, and actuarial science — with step-by-step solved examples and real-world applications built from the ground up.

Order Statistics Help Now

Trustpilot

4.9/5 on Trustpilot

6,200+ assignments completed

Delivered in 3–6 hours

100% plagiarism-free

Introduction & Core Concept

Laws of Total Probability: Why Every Probability Student Needs to Master This Theorem

The Law of Total Probability answers a specific, powerful question: how do you find the probability of an event when the sample space is complex, but you can break it into simpler, well-understood pieces? That’s the theorem’s entire job — and it does it elegantly. Once you understand it, you will start seeing its structure in problems across medicine, engineering, finance, and machine learning.

Here is the core intuition before any formal notation. Imagine you want to know the probability of getting a defective product from a factory. The factory has three production lines. Each line has a known defect rate, and you know what fraction of total output comes from each line. You cannot compute the overall defect probability directly — but you can compute it line by line and add. That weighted sum is the Law of Total Probability in action. Understanding probability theory starts with exactly this kind of structured thinking.

1933

Year Andrey Kolmogorov formalized modern probability axioms, grounding the Law of Total Probability in rigorous mathematics

∞

Fields using total probability: medicine, insurance, AI, engineering, finance, genetics, weather forecasting, and more

P(A)

What the theorem computes — the marginal probability of any event A from conditional probabilities over a partition

The Law of Total Probability is built on two even simpler ideas: conditional probability and partitions of a sample space. If you understand those two concepts, the theorem itself becomes almost obvious. This guide builds from those foundations upward, so nothing is assumed and nothing is glossed over.

The theorem is also inseparable from Bayes’ Theorem, which uses P(A) — computed via total probability — as its denominator. Mastering the Law of Total Probability essentially means you are halfway to mastering Bayesian inference. Given that Bayesian statistics, Bayesian machine learning, and probabilistic reasoning are among the most in-demand skills in data science today, this is not just academic — it is career-relevant. Bayesian inference is built directly on top of what you’ll learn here.

Who This Guide Is For

This guide is written for college and university students taking probability, statistics, or mathematics courses — including introductory stats courses at community college level, intermediate probability courses at four-year universities like the University of California system, Penn State, University of Michigan, the University of Edinburgh, or Imperial College London, and advanced courses at the graduate level. It is also directly relevant for students preparing for Society of Actuaries (SOA) Exam P, Casualty Actuarial Society (CAS) exams, the GRE Mathematics Subject Test, and data science technical interviews at companies like Google, Amazon, and Meta.

Formal Definition

What Is the Law of Total Probability? Definition, Notation, and Prerequisites

Let’s build the definition carefully. Before stating the Law of Total Probability, you need two prerequisite concepts: conditional probability and partitions of a sample space. Both are simple. Both are essential.

What Is Conditional Probability?

Conditional probability P(A | B) is the probability that event A occurs, given that event B has already occurred. Formally:

P(A | B) = P(A ∩ B) / P(B)

Valid only when P(B) > 0. Read as “the probability of A given B.”

This formula says: restrict your attention to the portion of the sample space where B has occurred, then ask what fraction of that portion also contains A. If you flip a coin twice, the probability of getting two heads given the first flip was heads is simply 1/2 — because knowing the first flip was heads restricts your view to just that half of the outcomes. Probability distributions and conditional probability are deeply connected concepts that reinforce each other throughout statistics coursework.

Rearranging the conditional probability formula gives the multiplication rule:

P(A ∩ B) = P(A | B) × P(B)

The probability of both A and B occurring equals the conditional probability of A given B, times the probability of B.

This multiplication rule is used directly inside the Law of Total Probability. Remember it — it is the engine of the theorem.

What Is a Partition of a Sample Space?

A partition of sample space S is a collection of events B₁, B₂, …, Bₙ satisfying two conditions:

Mutually exclusive: No two events in the partition can occur simultaneously. B₁ ∩ B₂ = ∅, B₁ ∩ B₃ = ∅, and so on for all pairs.
Collectively exhaustive: Together, the partition events cover every possible outcome. B₁ ∪ B₂ ∪ … ∪ Bₙ = S.

In a partition, every outcome in S belongs to exactly one partition event. Not zero — not two. Exactly one. The simplest partition possible is any event B and its complement Bᶜ. Those two events are always mutually exclusive (they cannot both happen) and collectively exhaustive (every outcome is either in B or in Bᶜ). This two-event partition is the most common setup in introductory probability problems.

Key intuition for partitions: Think of the sample space as a pie. A partition cuts the pie into slices that together cover the whole pie, with no overlap and no gaps. Each slice is a partition event Bᵢ. The partition condition ensures the slices are clean — no crust shared between slices, no part of the pie missing.

The Law of Total Probability — Formal Statement

With conditional probability and partitions in hand, the Law of Total Probability follows directly. Let S be a sample space. Let B₁, B₂, …, Bₙ be a partition of S (mutually exclusive and collectively exhaustive), each with positive probability. Let A be any event in S. Then:

P(A) = Σ P(A | Bᵢ) × P(Bᵢ)

= P(A|B₁)·P(B₁) + P(A|B₂)·P(B₂) + … + P(A|Bₙ)·P(Bₙ)

Where {B₁, B₂, …, Bₙ} is a partition of S, and P(Bᵢ) > 0 for all i.

Read this formula in plain English: “The probability of A equals the sum, over every partition event, of the conditional probability of A given that partition event times the probability of that partition event.” You are breaking A into pieces — the piece that happens inside B₁, the piece inside B₂, and so on — computing each piece’s probability, and adding. The result is the total probability of A. Hypothesis testing and many other inferential procedures depend on this exact calculation of marginal probabilities.

The Two-Event Special Case

When the partition has just two events — B and Bᶜ — the formula simplifies to the version most commonly seen in introductory courses:

P(A) = P(A|B)·P(B) + P(A|Bᶜ)·P(Bᶜ)

The two-event case: partition into any event B and its complement Bᶜ.

This two-event form appears in the vast majority of undergraduate probability textbooks, including those used at MIT, Stanford, Harvard, Oxford, and Cambridge. It is the go-to form for most exam problems at the introductory level, from AP Statistics through SOA Exam P. Once you are comfortable with this form, extending to three or more partition events is entirely mechanical.

Why Does the Formula Work? The Derivation

The Law of Total Probability is not an arbitrary formula — it follows from first principles. Here is the derivation, step by step:

Since {B₁, …, Bₙ} is a partition of S, every outcome in A must lie inside exactly one Bᵢ. So:

A = (A ∩ B₁) ∪ (A ∩ B₂) ∪ … ∪ (A ∩ Bₙ)

These intersections (A ∩ Bᵢ) are mutually exclusive, because the Bᵢ are mutually exclusive. So by the addition rule for mutually exclusive events:

P(A) = P(A ∩ B₁) + P(A ∩ B₂) + … + P(A ∩ Bₙ)

Now apply the multiplication rule to each term — P(A ∩ Bᵢ) = P(A | Bᵢ) × P(Bᵢ):

P(A) = P(A|B₁)P(B₁) + P(A|B₂)P(B₂) + … + P(A|Bₙ)P(Bₙ)

That is the Law of Total Probability, derived from just two ingredients: the definition of conditional probability and the additive property for mutually exclusive events. No magic — just careful accounting. Expected values and variance use related additive structures, so this derivation style recurs throughout probability coursework.

Bayes’ Theorem Connection

How the Law of Total Probability Connects to Bayes’ Theorem

The Law of Total Probability and Bayes’ Theorem are inseparable. You cannot correctly apply Bayes’ Theorem without the Law of Total Probability — it provides the normalizing constant that makes the posterior probability a legitimate probability. Understanding this connection transforms both tools from isolated formulas into a unified reasoning framework.

Bayes’ Theorem: The Formula

Bayes’ Theorem answers the inverse question from the Law of Total Probability. While total probability asks “given I know the causes (Bᵢ), what is the probability of the effect (A)?”, Bayes asks “given I observed the effect (A), what is the probability of each cause (Bᵢ)?”

P(Bᵢ | A) = [P(A | Bᵢ) × P(Bᵢ)] / P(A)

Where P(A) = Σ P(A|Bᵢ)P(Bᵢ) — computed using the Law of Total Probability

The denominator P(A) in Bayes’ Theorem is precisely what the Law of Total Probability computes. Without it, Bayes’ Theorem cannot be evaluated. This is why many textbooks — including those used in courses at the London School of Economics, University of Chicago, and Yale University — teach the Law of Total Probability immediately before Bayes’ Theorem. They are sequential building blocks. Bayes’ Theorem applications span medical diagnosis, spam filtering, forensic evidence, and scientific hypothesis testing.

Forward vs. Backward Probability Reasoning

⟶ Law of Total Probability (Forward)

Direction: Causes → Effect

Question: Given the conditional probabilities of A under each scenario Bᵢ, what is the overall probability P(A)?

Input: P(Bᵢ) for all i, P(A | Bᵢ) for all i

Output: P(A) — the marginal probability of the effect

Example: Given defect rates per factory line, what is the overall defect rate?

⟵ Bayes’ Theorem (Backward)

Direction: Effect → Causes

Question: Given that A occurred, what is the probability that it came from cause Bᵢ?

Input: P(Bᵢ), P(A | Bᵢ), and P(A) from total probability

Output: P(Bᵢ | A) — the posterior probability of each cause

Example: Given a product is defective, what is the probability it came from Line 2?

The Classic Medical Diagnosis Example

The most famous and pedagogically valuable application of the Bayes + Total Probability combination is medical testing. Suppose a disease affects 1% of a population. A diagnostic test has 95% sensitivity (P(positive | disease) = 0.95) and 90% specificity (P(negative | no disease) = 0.90, so P(positive | no disease) = 0.10).

First, use the Law of Total Probability to find P(positive test):

P(positive) = P(pos | disease)·P(disease) + P(pos | no disease)·P(no disease)

= (0.95)(0.01) + (0.10)(0.99) = 0.0095 + 0.099 = 0.1085

Then Bayes’ Theorem reveals the posterior probability that a positive test means you actually have the disease:

P(disease | positive) = (0.95 × 0.01) / 0.1085 ≈ 0.0876 ≈ 8.8%

Counterintuitively, a positive test for a rare disease only means ~8.8% probability of actually having the disease — even with a good test. This is the base rate fallacy, and it cannot be calculated without the Law of Total Probability.

Struggling With Probability Assignments?

Our expert statistics tutors help students master the Law of Total Probability, Bayes’ Theorem, and every other probability concept — from introductory courses to graduate-level work.

Get Statistics Help Now Log In

Solved Examples

Law of Total Probability: Step-by-Step Solved Examples

The best way to understand the Law of Total Probability is through worked examples that vary in context and difficulty. The following examples progress from introductory to intermediate, covering the types of problems you will encounter in university probability courses, standardized exams, and professional applications. Each solution follows the same structured approach: identify the partition, gather probabilities, apply the formula.

Example 1: Two Boxes of Balls (Classic Introductory Problem)

Example 1 — Introductory

Problem Statement

Box A contains 3 red balls and 2 blue balls. Box B contains 1 red ball and 4 blue balls. A box is chosen at random (each with probability 1/2), and then one ball is drawn at random. What is the probability the drawn ball is red?

Step-by-Step Solution

Step 1 — Identify event A: A = {red ball drawn}

Step 2 — Identify the partition: B₁ = {Box A chosen}, B₂ = {Box B chosen}. These are mutually exclusive and exhaustive — each trial uses exactly one box.

Step 3 — P(Bᵢ): P(B₁) = 1/2, P(B₂) = 1/2

Step 4 — P(A | Bᵢ): P(red | Box A) = 3/5, P(red | Box B) = 1/5

Step 5 — Apply formula:
P(red) = P(red|Box A)·P(Box A) + P(red|Box B)·P(Box B)
= (3/5)(1/2) + (1/5)(1/2) = 3/10 + 1/10 = 4/10 = 0.40

Answer: The probability of drawing a red ball is 0.40 (40%).

Example 2: Three Factory Lines (Manufacturing Quality Control)

Example 2 — Intermediate

Problem Statement

A factory has three production lines. Line 1 produces 50% of all items with a 2% defect rate. Line 2 produces 30% with a 5% defect rate. Line 3 produces 20% with a 10% defect rate. An item is chosen at random. What is the probability it is defective?

Step-by-Step Solution

Step 1 — Event A: A = {item is defective}

Step 2 — Partition: B₁ = {from Line 1}, B₂ = {from Line 2}, B₃ = {from Line 3}. Mutually exclusive, collectively exhaustive (0.50 + 0.30 + 0.20 = 1.00 ✓).

Step 3 — P(Bᵢ): P(B₁) = 0.50, P(B₂) = 0.30, P(B₃) = 0.20

Step 4 — P(A | Bᵢ): P(defective | Line 1) = 0.02, P(defective | Line 2) = 0.05, P(defective | Line 3) = 0.10

Step 5 — Apply formula:
P(defective) = (0.02)(0.50) + (0.05)(0.30) + (0.10)(0.20)
= 0.010 + 0.015 + 0.020 = 0.045 = 4.5%

Answer: The probability that a randomly selected item is defective is 4.5%.

Example 3: Student Exam Passage (Three Preparation Levels)

Example 3 — Intermediate

Problem Statement

In a university statistics course, 20% of students studied intensively (passed with probability 0.95), 50% studied moderately (passed with probability 0.75), and 30% barely studied (passed with probability 0.40). What is the probability a randomly chosen student passes the exam?

Step-by-Step Solution

Partition: B₁ = intensive, B₂ = moderate, B₃ = minimal study. P(B₁) = 0.20, P(B₂) = 0.50, P(B₃) = 0.30.

Conditional pass probabilities: P(pass|B₁) = 0.95, P(pass|B₂) = 0.75, P(pass|B₃) = 0.40

Total Probability:
P(pass) = (0.95)(0.20) + (0.75)(0.50) + (0.40)(0.30)
= 0.190 + 0.375 + 0.120 = 0.685 = 68.5%

Answer: A randomly chosen student has a 68.5% probability of passing the exam.

Example 4: Insurance Risk Pooling (Actuarial Application)

Example 4 — Advanced / Actuarial

Problem Statement

An insurance company divides policyholders into three risk categories: Low risk (60% of policyholders, 5% claim probability), Medium risk (30%, 20% claim probability), High risk (10%, 60% claim probability). What is the overall probability that a randomly chosen policyholder files a claim?

Step-by-Step Solution

Partition: B₁ = Low, B₂ = Medium, B₃ = High risk. Sums to 1.00 ✓.

P(claim):
= P(claim|Low)·P(Low) + P(claim|Med)·P(Med) + P(claim|High)·P(High)
= (0.05)(0.60) + (0.20)(0.30) + (0.60)(0.10)
= 0.030 + 0.060 + 0.060 = 0.150 = 15%

Answer: The overall claim probability across the portfolio is 15%. This type of calculation is foundational to actuarial pricing models — insurance companies use it to set portfolio-wide premiums and reserves.

Example 5: Email Spam Filtering (Machine Learning Application)

Example 5 — Data Science Application

Problem Statement

In an email inbox, 30% of emails are spam and 70% are legitimate. A spam filter correctly identifies spam 90% of the time and incorrectly flags legitimate email 5% of the time. What is the probability that a randomly chosen email gets flagged?

Step-by-Step Solution

Partition: B₁ = spam, B₂ = legitimate. P(B₁) = 0.30, P(B₂) = 0.70.

P(flagged):
= P(flagged|spam)·P(spam) + P(flagged|legit)·P(legit)
= (0.90)(0.30) + (0.05)(0.70)
= 0.270 + 0.035 = 0.305 = 30.5%

Answer: 30.5% of all emails get flagged. From here, Bayes’ Theorem would let you compute what fraction of flagged emails are actually spam — the precision of the filter. This is the computational core of the Naive Bayes classifier.

Step-by-Step Application Guide

How to Apply the Law of Total Probability: A Systematic Framework

Students who struggle with total probability problems typically fail at one of two points: identifying the correct partition, or gathering the right conditional probabilities. The following step-by-step framework eliminates both failure points by making the process explicit and systematic.

Read the Problem and Identify Event A

Determine clearly which event’s probability you want to find. This is your A. Write it down explicitly before doing anything else. Common phrasings: “What is the probability that…?” or “Find P(A)” when the sample space has multiple scenarios.

Identify a Valid Partition {B₁, B₂, …, Bₙ}

Look for a set of scenarios, categories, or causes that are mutually exclusive and cover all possibilities. Common partition structures: machine/factory choices, risk categories, disease status (present/absent), which group a person belongs to, weather states. Verify the partition probabilities sum to 1.

Collect P(Bᵢ) for All Partition Events

These are the “weights” — the probabilities of each scenario occurring. They should be given in the problem or derivable from given information. Write them in a column. Confirm they sum to 1.00 before proceeding.

Collect P(A | Bᵢ) for All Partition Events

These are the conditional probabilities of A given each scenario. They represent “if I know I’m in scenario Bᵢ, how likely is A?” Write them next to their corresponding P(Bᵢ) values.

Compute Each Product P(A | Bᵢ) × P(Bᵢ)

For each partition event, multiply the conditional probability by the partition probability. Do each multiplication separately and write down the intermediate results. This step-by-step arithmetic is where most calculation errors occur.

Sum the Products

Add all the products from Step 5. The result is P(A). This final sum should be a number between 0 and 1. If it is not, recheck your partition probabilities and your conditional probabilities.

Sanity-Check the Answer

Does the answer make intuitive sense? The total probability P(A) should fall between the minimum and maximum of the conditional probabilities P(A | Bᵢ). If your conditional probabilities are 0.02, 0.05, and 0.10, your total probability must be between 0.02 and 0.10. This simple range check catches most formula errors instantly.

Key Concepts, Notation & Comparisons

Essential Probability Concepts Related to the Law of Total Probability

The Law of Total Probability does not exist in isolation. It is embedded in a network of related probability concepts. The following table maps the core concepts, their notation, their definitions, and their relationship to total probability.

Concept	Notation	Definition	Role in Total Probability
Sample Space	S	The set of all possible outcomes of a probability experiment	The partition {B₁,…,Bₙ} must cover all of S
Event	A, B, C	Any subset of the sample space S	A is the target event; Bᵢ are the partition events
Conditional Probability	P(A\|B)	P(A ∩ B) / P(B); probability of A given B occurred	Provides P(A\|Bᵢ) — the inputs to the formula
Partition	{B₁,…,Bₙ}	Mutually exclusive, collectively exhaustive events that cover S	The structural foundation — required for the formula to hold
Marginal Probability	P(A)	The unconditional probability of event A across all scenarios	This is what the Law of Total Probability computes
Joint Probability	P(A ∩ B)	Probability that both A and B occur simultaneously	Each term P(A\|Bᵢ)P(Bᵢ) = P(A ∩ Bᵢ); summed to give P(A)
Prior Probability	P(Bᵢ)	Probability of each partition event before observing A	Weights in the total probability sum
Likelihood	P(A\|Bᵢ)	Probability of A given scenario Bᵢ	The conditional factors in each term of the total probability sum
Posterior Probability	P(Bᵢ\|A)	Probability of cause Bᵢ given effect A has been observed	Computed via Bayes’ Theorem using P(A) from total probability
Complement	Bᶜ or B̄	All outcomes in S not in B; P(Bᶜ) = 1 − P(B)	B and Bᶜ form the simplest two-event partition

The MIT OpenCourseWare Introduction to Probability course by Professors Dimitri Bertsekas and John Tsitsiklis is one of the most rigorous and accessible free resources for probability theory, covering the Law of Total Probability with exceptional clarity and depth.

Real-World Applications

Real-World Applications of the Law of Total Probability Across Fields

The Law of Total Probability is not just a textbook theorem. It is embedded in the computational infrastructure of multiple industries and scientific disciplines.

Medicine and Epidemiology

Medical researchers and clinicians use the Law of Total Probability constantly. Computing the prevalence of a condition across a diverse population involves weighting condition rates within demographic subgroups by the size of each subgroup — that is total probability. During the COVID-19 pandemic, computing overall population infection rates required weighting age-group-specific infection rates by the proportion of each age group in the population. That is the Law of Total Probability applied at national scale.

Finance and Actuarial Science

Insurance companies and financial risk managers use total probability to compute aggregate default probabilities, claim rates, and portfolio loss distributions. The Society of Actuaries (SOA) and Casualty Actuarial Society (CAS) include total probability problems explicitly in their Exam P (Probability) syllabus. Credit rating agencies like Moody’s and S&P Global use Bayesian and total probability frameworks to compute overall corporate default probabilities across credit rating categories.

Machine Learning and Artificial Intelligence

The Naive Bayes classifier uses the Law of Total Probability to compute the marginal probability of observed features in a dataset. Hidden Markov Models (HMMs), used in speech recognition at Apple (Siri), Google, and Amazon (Alexa), use total probability in the forward algorithm. Any time a model involves latent variables — hidden states not directly observed — total probability is the mathematical mechanism for handling them.

Engineering — Reliability and Fault Analysis

Reliability engineering uses the Law of Total Probability to compute system failure probabilities when a system can fail via multiple different failure modes. This analysis is used by aerospace engineers at NASA and the European Space Agency (ESA), by nuclear safety engineers, and by software engineers doing fault tree analysis for safety-critical systems.

Genetics and Bioinformatics

Population genetics uses total probability to compute allele frequencies across mixed populations. Bioinformatics algorithms for gene sequence alignment and protein structure prediction use probabilistic models where total probability underpins the computation of model likelihoods.

Where You Will See This on Standardized Exams

The Law of Total Probability appears explicitly on: SOA Exam P (actuarial), AP Statistics, GRE Mathematics Subject Test, university probability finals, and data science technical interviews at companies including Google, Amazon, Meta, Microsoft, and Two Sigma. It is a core tool that examiners return to repeatedly.

Need Help With Your Statistics Exam?

Our expert tutors specialize in probability theory, Bayesian methods, and every concept on this page — including SOA Exam P preparation, university coursework, and data science interview prep.

Order Now Log In

Common Mistakes & How to Avoid Them

Common Mistakes Students Make With the Law of Total Probability

Mistake 1: Using an Invalid Partition

The most fundamental error is applying the formula with events that are not a valid partition. Students sometimes choose partition events that overlap (not mutually exclusive) or that miss some outcomes (not collectively exhaustive). Always verify: the partition probabilities sum to 1.00 before applying the formula.

Mistake 2: Confusing P(A | B) with P(B | A)

Reversing the conditional probability — using P(B | A) when you need P(A | B) — is called the base rate fallacy or the prosecutor’s fallacy. In the medical testing example: confusing “the probability of a positive test given disease” (0.95) with “the probability of disease given a positive test” (8.8%) produces catastrophically wrong conclusions.

Mistake 3: Ignoring Base Rates (Not Weighting by P(Bᵢ))

Some students correctly identify conditional probabilities but then average them without weighting — treating all partition events as equally likely when they are not. The weights P(Bᵢ) in the formula are not optional — they are what makes it a total probability calculation.

Mistake 4: Applying the Formula When Events Are Not a Partition

Sometimes students apply the total probability formula to events that do not form a complete partition. Be systematic: list every possible scenario and confirm none are omitted.

Mistake 5: Arithmetic Errors in the Final Sum

Write each product separately before summing. Use decimal notation consistently. Double-check that your final answer is between 0 and 1 — a result of P(A) = 1.15 is mathematically impossible.

The Four-Step Error Check: Before submitting any total probability calculation, verify: (1) Partition probabilities sum to 1.00. (2) All conditional probabilities are between 0 and 1. (3) You used P(A|Bᵢ) not P(Bᵢ|A) in each term. (4) Final answer P(A) is between 0 and 1, and between the minimum and maximum conditional probability values.

History & Key Entities

The History of the Law of Total Probability: Key Figures and Institutions

Andrey Nikolaevich Kolmogorov (1903–1987)

Andrey Kolmogorov, working at Moscow State University, published his foundational Grundbegriffe der Wahrscheinlichkeitsrechnung in 1933. This established the axiomatic framework — sample spaces, sigma-algebras, probability measures — on which all modern probability theory, including the Law of Total Probability, is formally grounded.

Thomas Bayes (1701–1761) and Richard Price

Thomas Bayes developed the essential insight connecting conditional probabilities in both directions. Bayes never published during his lifetime — his friend Richard Price presented the manuscript to the Royal Society of London in 1763. The paper, “An Essay towards solving a Problem in the Doctrine of Chances,” became one of the most cited works in the history of statistics.

Pierre-Simon Laplace (1749–1827)

Pierre-Simon Laplace independently rediscovered Bayes’ result and formalized it in far greater generality in his Théorie analytique des probabilités (1812). Laplace developed the complete Bayesian framework, including the use of prior probabilities weighted by likelihoods — the computational structure of the Law of Total Probability.

Key Institutions in Probability Education

Institution	Country	Notable Contribution	Resources for Students
MIT	USA	OpenCourseWare probability courses; leading research in probabilistic AI	Free OCW materials; MIT 6.041
Harvard University	USA	Stat 110 (Probability) by Joe Blitzstein — one of the most watched probability courses globally	Free YouTube lectures; Statistics 110 materials
Stanford University	USA	Leading probabilistic machine learning research; Coursera courses	Stanford Online; CS229 materials
University of Cambridge	UK	Statistics Laboratory; Part II and Part III Probability Tripos	Cambridge Statistical Laboratory lecture notes
University of Oxford	UK	Oxford Statistics Department; pioneering work in Bayesian inference	Oxford Probability lecture notes
Khan Academy	USA (global)	Accessible, free video explanations for learners at all levels	Free probability course at khanacademy.org

Extensions & Related Topics

Extensions of the Law of Total Probability: Expectation, Variance, and Continuous Cases

Law of Total Expectation (Adam’s Law)

The Law of Total Expectation extends the total probability idea to expected values. For random variables X and Y:

E[X] = E[E[X | Y]]

The expected value of X equals the expected value of the conditional expected value of X given Y. Also called the Law of Iterated Expectations (LIE) or Adam’s Law.

In the discrete case, this becomes: E[X] = Σᵢ E[X | Y = yᵢ] × P(Y = yᵢ) — structurally identical to the Law of Total Probability with expectations replacing probabilities.

Law of Total Variance (Eve’s Law)

Var(X) = E[Var(X|Y)] + Var(E[X|Y])

Total variance = Expected conditional variance + Variance of conditional expectation. These represent “within-group variability” and “between-group variability.”

This decomposition is the foundation of Analysis of Variance (ANOVA) — one of the most widely used statistical methods in social science, medicine, and engineering research.

Continuous Version of the Law of Total Probability

f(x) = ∫ f(x|y) · f(y) dy

Continuous Law of Total Probability: the marginal density f(x) equals the integral over y of the conditional density f(x|y) times the marginal density f(y).

This continuous form is foundational in Bayesian inference with continuous prior distributions, in mixture models used in clustering and density estimation, and in computing marginal likelihoods in statistical modeling.

Statistics Assignment Due Soon?

From probability fundamentals to advanced Bayesian methods, our expert statisticians provide step-by-step solutions and clear explanations — 24/7, for students at every level.

Get Help Now Log In

Frequently Asked Questions

Frequently Asked Questions: Laws of Total Probability

What is the Law of Total Probability? +

The Law of Total Probability states that if events B₁, B₂, …, Bₙ form a partition of the sample space S (mutually exclusive and collectively exhaustive), then the probability of any event A can be computed as P(A) = Σ P(A | Bᵢ) × P(Bᵢ). You break the sample space into non-overlapping pieces, compute the conditional probability of A given each piece, weight each by the probability of that piece, and sum. The result is the marginal probability P(A).

What is the difference between the Law of Total Probability and Bayes’ Theorem? +

The Law of Total Probability computes P(A) by summing conditional probabilities over a partition — it goes from causes to effect. Bayes’ Theorem uses P(A) — computed via total probability — to calculate a posterior probability P(Bᵢ | A). Formally: P(Bᵢ | A) = [P(A | Bᵢ) × P(Bᵢ)] / P(A), where P(A) in the denominator is found using the Law of Total Probability. They are sequential: total probability provides the denominator that Bayes’ Theorem needs.

What is a partition of a sample space, and why does it matter? +

A partition of sample space S is a collection of events B₁, B₂, …, Bₙ satisfying two conditions: mutually exclusive (no two events can occur simultaneously) and collectively exhaustive (together they cover all possible outcomes). Every outcome in the sample space belongs to exactly one partition event. The partition matters because it is the structural foundation of the Law of Total Probability — the formula is only valid when the conditioning events form a genuine partition.

Can the Law of Total Probability be used with more than two partition events? +

Yes, absolutely. The Law of Total Probability applies to any finite number of partition events. The formula P(A) = Σ P(A|Bᵢ)P(Bᵢ) extends naturally to 3, 4, 5, or any number of partition events. In actuarial applications, insurance portfolios might be segmented into 5 or more risk categories. The mathematics is identical regardless of the number of partition events — just more terms in the sum.

What are the most common exam question types for the Law of Total Probability? +

The most common types include: (1) Two-box or two-urn problems where a container is chosen randomly and then an item drawn. (2) Manufacturing/quality control problems with multiple production lines. (3) Medical testing problems computing overall positive test rates. (4) Insurance risk pooling problems. (5) Student performance problems with study group proportions. (6) Combined total probability + Bayes’ theorem problems. SOA Exam P, AP Statistics, and university probability finals all regularly include these question types.

How does the Law of Total Probability relate to marginal probability? +

The Law of Total Probability is precisely the procedure for computing a marginal probability. A marginal probability P(A) is the unconditional probability of an event A, summed over all possible values of another variable. When the conditioning variable is discrete with a finite partition, the marginal probability is computed exactly by the Law of Total Probability: P(A) = Σ P(A|Bᵢ)P(Bᵢ). This is why statisticians use “marginalize” as a verb meaning “apply the Law of Total Probability.”

What is the Law of Total Expectation and how does it relate to total probability? +

The Law of Total Expectation (Adam’s Law) states that E[X] = E[E[X|Y]] — the expected value of X equals the expected value of the conditional expected value of X given Y. This is the direct generalization of the Law of Total Probability to random variables. The Law of Total Probability computes P(A) = E[1_A] — the expected value of the indicator variable for event A. Adam’s Law extends this to any random variable X.

Why do students find the Law of Total Probability confusing, and how can I master it? +

Students typically struggle for three reasons: (1) They cannot identify a valid partition in the problem context. (2) They confuse P(A|Bᵢ) with P(Bᵢ|A) — reversing the conditional probability. (3) They apply the formula without verifying the partition is valid. The most effective path to mastery is working through 20–30 varied practice problems until the partition-identification step becomes automatic. Use the structured approach: identify A, identify the partition, write down P(Bᵢ) and P(A|Bᵢ) in a table, multiply, sum.

Blog

Laws of Total Probability: A Complete Guide

Laws of Total Probability: Why Every Probability Student Needs to Master This Theorem

Who This Guide Is For

What Is the Law of Total Probability? Definition, Notation, and Prerequisites

What Is Conditional Probability?

What Is a Partition of a Sample Space?

The Law of Total Probability — Formal Statement

The Two-Event Special Case

Why Does the Formula Work? The Derivation

How the Law of Total Probability Connects to Bayes’ Theorem

Bayes’ Theorem: The Formula

Forward vs. Backward Probability Reasoning

⟶ Law of Total Probability (Forward)

⟵ Bayes’ Theorem (Backward)

The Classic Medical Diagnosis Example

Struggling With Probability Assignments?

Law of Total Probability: Step-by-Step Solved Examples

Example 1: Two Boxes of Balls (Classic Introductory Problem)

Problem Statement

Example 2: Three Factory Lines (Manufacturing Quality Control)

Problem Statement

Example 3: Student Exam Passage (Three Preparation Levels)

Problem Statement

Example 4: Insurance Risk Pooling (Actuarial Application)

Problem Statement

Example 5: Email Spam Filtering (Machine Learning Application)

Problem Statement

How to Apply the Law of Total Probability: A Systematic Framework

Read the Problem and Identify Event A

Identify a Valid Partition {B₁, B₂, …, Bₙ}

Collect P(Bᵢ) for All Partition Events

Collect P(A | Bᵢ) for All Partition Events

Compute Each Product P(A | Bᵢ) × P(Bᵢ)

Sum the Products

Sanity-Check the Answer

Essential Probability Concepts Related to the Law of Total Probability

Real-World Applications of the Law of Total Probability Across Fields

Medicine and Epidemiology

Finance and Actuarial Science

Machine Learning and Artificial Intelligence

Engineering — Reliability and Fault Analysis

Genetics and Bioinformatics

Where You Will See This on Standardized Exams

Need Help With Your Statistics Exam?

Common Mistakes Students Make With the Law of Total Probability

Mistake 1: Using an Invalid Partition

Mistake 2: Confusing P(A | B) with P(B | A)

Mistake 3: Ignoring Base Rates (Not Weighting by P(Bᵢ))

Mistake 4: Applying the Formula When Events Are Not a Partition

Mistake 5: Arithmetic Errors in the Final Sum

The History of the Law of Total Probability: Key Figures and Institutions

Andrey Nikolaevich Kolmogorov (1903–1987)

Thomas Bayes (1701–1761) and Richard Price

Pierre-Simon Laplace (1749–1827)

Key Institutions in Probability Education

Extensions of the Law of Total Probability: Expectation, Variance, and Continuous Cases

Law of Total Expectation (Adam’s Law)

Law of Total Variance (Eve’s Law)

Continuous Version of the Law of Total Probability

Statistics Assignment Due Soon?

Frequently Asked Questions: Laws of Total Probability

About Byron Otieno

Leave a Reply Cancel reply