Expected Values and Variance
Statistics & Probability Guide
Expected Values and Variance
The complete student guide — formulas, discrete & continuous variables, step-by-step worked examples, covariance, standard deviation, and the Law of Large Numbers. Everything you need to master E(X) and Var(X) for university.
Order NowThe Foundation
Expected Values and Variance: Why Every Statistics Student Must Master These
Expected values and variance are the backbone of probability — without them, almost nothing else in statistics makes sense. The moment you start studying random variables seriously, these two measures become your primary tools for describing, comparing, and reasoning about distributions. They’re also the two concepts most likely to appear in every statistics assignment you’ll ever write. Getting them right isn’t optional.
The idea of expected value stretches back to the mid-17th century. Blaise Pascal and Pierre de Fermat exchanged letters in 1654 working through the classic “problem of points” — how to fairly split gambling stakes if a game was interrupted. That correspondence birthed the formal theory of probability expectation. Centuries later, Andrei Kolmogorov at Moscow State University formalized the modern measure-theoretic framework that underpins how both expected value and variance are rigorously defined today.
Today, expected value and variance appear everywhere: in the actuarial tables that set your insurance premiums, in the portfolio models that manage your pension, in the loss functions that train neural networks, and in the A/B tests that decide which version of a webpage you see.
μ
Symbol for Expected Value (population mean) — the long-run average of a random variable
σ²
Symbol for Variance — the average squared deviation from the mean
σ
Symbol for Standard Deviation — the square root of variance, in the same units as X
This article focuses on building the deepest possible conceptual understanding alongside practical calculation skills. Formulas only go so far; the goal is for you to know why these formulas exist and when to use each property — because that’s what your professors and future employers are actually testing.
Core Concept
What Is Expected Value? Definition and Intuition
Expected value — also called expectation, mathematical expectation, mean, or first moment — is a way of answering the question: “If I ran this random experiment an infinite number of times, what would the average outcome be?” It doesn’t tell you what will happen in any single trial. It tells you what to expect on average across many trials. That distinction is crucial.
Formally, the expected value of a random variable X, denoted E(X) or μ, is a measure of the central tendency of its probability distribution — the mean value that the variable would take if the experiment were repeated many times. For a discrete random variable, this is the weighted average of all possible outcomes, where each outcome is weighted by its probability.
Expected Value of Discrete Random Variables
Formula — Expected Value (Discrete)
E(X) = Σ xᵢ · P(X = xᵢ)
Sum over all possible values xᵢ. Multiply each value by its probability P(X = xᵢ), then add everything up. All probabilities must sum to 1.
Worked Example 1
Rolling a Fair Six-Sided Die
Each face shows 1–6 with equal probability 1/6.
E(X) = 1·(1/6) + 2·(1/6) + 3·(1/6) + 4·(1/6) + 5·(1/6) + 6·(1/6)
E(X) = (1 + 2 + 3 + 4 + 5 + 6) / 6 = 21/6 = 3.5
Notice: 3.5 is not a face on the die. Expected value doesn’t have to be a possible outcome — it’s the long-run average. Roll the die 10,000 times; your average will be very close to 3.5.
Worked Example 2
Insurance Claim Payout
An insurer pays out $0 (prob. 0.90), $1,000 (prob. 0.07), or $10,000 (prob. 0.03) on a policy.
E(Payout) = 0·(0.90) + 1,000·(0.07) + 10,000·(0.03)
E(Payout) = 0 + 70 + 300 = $370 per policy
This is why insurers charge premiums above $370 — to cover operating costs and maintain profit. Expected value directly prices insurance risk.
Expected Value of Continuous Random Variables
Formula — Expected Value (Continuous)
E(X) = ∫_{-∞}^{∞} x · f(x) dx
f(x) is the probability density function. The integral is the continuous analog of the discrete weighted average. The support may be a subset of ℝ — integrate only where f(x) > 0.
Worked Example 3
Uniform Distribution on [0, 4]
X ~ Uniform(0, 4). The PDF is f(x) = 1/4 for 0 ≤ x ≤ 4, and 0 otherwise.
E(X) = ∫₀⁴ x · (1/4) dx = (1/4) · [x²/2]₀⁴ = (1/4) · (16/2) = (1/4) · 8 = 2
The expected value is the midpoint (0 + 4)/2 = 2. For any Uniform(a, b) distribution, E(X) = (a + b)/2.
In statistics, the sample mean serves as an estimate for the expectation, and is itself a random variable. The sample mean is considered to meet the desirable criterion for a “good” estimator in being unbiased — that is, the expected value of the estimate equals the true value of the underlying parameter.
Key Rules
Properties of Expected Value: What Every Student Must Know
The real power of expected value comes from its properties, not just its definition. These properties let you decompose complicated expectations into simpler pieces, and they’re tested extensively in university statistics courses.
Linearity of Expectation
Linearity Properties
E(aX + b) = a·E(X) + b (for constants a, b ∈ ℝ)
E(X + Y) = E(X) + E(Y) (for ANY random variables X and Y)
E(X + Y) = E(X) + E(Y) (for ANY random variables X and Y)
The second property holds regardless of whether X and Y are independent. This is what makes linearity so powerful — you don’t need independence to add expectations.
Worked Example 4
Using Linearity to Simplify
Suppose E(X) = 5. Find E(3X + 7).
E(3X + 7) = 3·E(X) + 7 = 3·(5) + 7 = 15 + 7 = 22
No distribution. No probabilities. Just linearity. This is how expectation works — the algebra is clean.
Expected Value of Independent Variables
Independence Property
If X and Y are independent: E(XY) = E(X) · E(Y)
This does NOT hold in general. If X and Y are dependent, E(XY) ≠ E(X)·E(Y). Confusing this is one of the most common errors students make.
LOTUS: Law of the Unconscious Statistician
LOTUS (Law of the Unconscious Statistician)
Discrete: E[g(X)] = Σ g(xᵢ) · P(X = xᵢ)
Continuous: E[g(X)] = ∫ g(x) · f(x) dx
Continuous: E[g(X)] = ∫ g(x) · f(x) dx
This is how we compute E(X²) — a critical step in the variance calculation. Set g(X) = X², apply LOTUS directly.
Exam Strategy: Recognizing When to Use Each Property
When an exam problem says “find E(2X – 3Y + 5)”, use linearity: E(2X – 3Y + 5) = 2E(X) – 3E(Y) + 5. When it says “X and Y are independent, find E(XY)”, use the independence property: E(XY) = E(X)·E(Y). When it says “find E(X²)” or “find E(sin X)”, use LOTUS with the original distribution of X.
Statistics Assignment Due Soon?
Our expert statisticians help students solve expected value, variance, hypothesis testing, and probability problems — clear solutions, step-by-step working, fast turnaround.
Get Statistics Help Now Log InMeasuring Spread
Variance: What It Is, Why It Matters, and How to Compute It
Variance measures how spread out a distribution is around its expected value. Two distributions can have identical expected values but completely different variances. If you’re choosing between two investment strategies with the same expected return, the one with lower variance is less risky.
The Definitional and Computational Formulas
Variance — Two Equivalent Formulas
Definitional: Var(X) = E[(X – μ)²]
Computational: Var(X) = E(X²) – [E(X)]²
Computational: Var(X) = E(X²) – [E(X)]²
Both give the same answer. The computational formula is generally faster — compute E(X²) using LOTUS, then subtract the square of E(X). Always use the computational formula for exam speed.
Why Are They Equal?
Expanding: Var(X) = E[(X − μ)²] = E[X² − 2μX + μ²] = E(X²) − 2μ·E(X) + μ². Since μ = E(X), this simplifies to E(X²) − 2[E(X)]² + [E(X)]² = E(X²) − [E(X)]².
Worked Example 5
Variance of a Fair Six-Sided Die
We know E(X) = 3.5. Using the computational formula:
E(X²) = (1 + 4 + 9 + 16 + 25 + 36)/6 = 91/6 ≈ 15.167
Var(X) = E(X²) – [E(X)]² = 91/6 – (3.5)² = 91/6 – 12.25 = 35/12 ≈ 2.917
Standard deviation: σ = √(35/12) ≈ 1.708
Worked Example 6
Variance of Uniform[0, 4] Distribution
E(X) = 2. E(X²) = ∫₀⁴ x² · (1/4) dx = (1/4)·(64/3) = 16/3
Var(X) = 16/3 – 4 = 4/3 ≈ 1.333
Check with formula: (4 − 0)²/12 = 16/12 = 4/3. ✓
Common Mistake: Confusing population variance σ² with sample variance s². Population variance uses μ and divides by N. Sample variance uses x̄ and divides by (n − 1) — Bessel’s correction, making s² an unbiased estimator of σ². Getting these confused is a guaranteed mark-loser.
Variance Rules
Properties of Variance and Standard Deviation
Key Variance Properties
Var(aX + b) = a²·Var(X) (b disappears — shifting doesn’t change spread)
Var(X + Y) = Var(X) + Var(Y) (ONLY when X and Y are independent)
Var(X – Y) = Var(X) + Var(Y) (ONLY when independent — note: still +)
Var(c) = 0 (a constant has zero variance)
Var(X + Y) = Var(X) + Var(Y) (ONLY when X and Y are independent)
Var(X – Y) = Var(X) + Var(Y) (ONLY when independent — note: still +)
Var(c) = 0 (a constant has zero variance)
The most tested trap: Var(2X + 3) = 4·Var(X), not 2·Var(X) + 3. For non-independent variables: Var(X + Y) = Var(X) + Var(Y) + 2·Cov(X, Y).
Worked Example 7
Variance of a Linear Transformation
Suppose Var(X) = 9. Find Var(3X – 5).
Var(3X – 5) = 3²·Var(X) = 9·9 = 81
The -5 shift has zero effect on variance. Only the multiplicative coefficient (3) matters, and it gets squared.
Standard Deviation Properties
SD(aX + b) = |a|·SD(X). Shifting by b has no effect. SD is non-negative. For independent variables: SD(X + Y) = √[Var(X) + Var(Y)] — you cannot just add standard deviations.
The Most Common Error
Students often write SD(X + Y) = SD(X) + SD(Y). This is wrong. Variances add; standard deviations do not. You must add variances first, then take the square root.
Variance of Common Distributions
| Distribution | E(X) | Var(X) | Standard Deviation |
|---|---|---|---|
| Bernoulli(p) | p | p(1 − p) | √[p(1 − p)] |
| Binomial(n, p) | np | np(1 − p) | √[np(1 − p)] |
| Poisson(λ) | λ | λ | √λ |
| Uniform(a, b) | (a + b)/2 | (b − a)²/12 | (b − a)/√12 |
| Normal(μ, σ²) | μ | σ² | σ |
| Exponential(λ) | 1/λ | 1/λ² | 1/λ |
| Geometric(p) | 1/p | (1 − p)/p² | √[(1 − p)]/p |
Struggling with Variance Calculations?
Our statistics experts walk through every step — from setting up the distribution table to applying the right formula — so you actually understand what you’re doing.
Get Expert Statistics Help Log InRelationships Between Variables
Covariance, Correlation, and the Variance of Sums
Covariance is defined as: Cov(X, Y) = E[(X − μₓ)(Y − μᵧ)] = E(XY) − E(X)·E(Y). Positive covariance means variables tend to move together; negative means opposite directions.
Variance of a Sum — General Case
Var(X + Y) = Var(X) + Var(Y) + 2·Cov(X, Y)
Var(X − Y) = Var(X) + Var(Y) − 2·Cov(X, Y)
If X, Y independent: Cov(X,Y) = 0, so Var(X ± Y) = Var(X) + Var(Y)
Var(X − Y) = Var(X) + Var(Y) − 2·Cov(X, Y)
If X, Y independent: Cov(X,Y) = 0, so Var(X ± Y) = Var(X) + Var(Y)
Zero covariance does NOT guarantee independence — it only means there’s no linear relationship.
Pearson Correlation Coefficient
Pearson Correlation
ρ(X, Y) = Cov(X, Y) / (σ_X · σ_Y)
ρ always lies in [−1, +1]. ρ = +1: perfect positive linear relationship. ρ = −1: perfect negative. ρ = 0: no linear relationship.
Worked Example 8
Portfolio Variance with Correlated Assets
Var(A) = 0.04, Var(B) = 0.09, Cov(A, B) = 0.03.
Var(A + B) = 0.04 + 0.09 + 2·(0.03) = 0.19, σ ≈ 0.436
If independent: Var = 0.13, σ ≈ 0.361. Positive covariance increased portfolio risk by ~21%.
Convergence Theory
The Law of Large Numbers and Central Limit Theorem
The Law of Large Numbers states that the sample mean of a large number of independent and identically distributed random variables converges to their expected value as sample size increases — providing theoretical justification for using the sample mean as an estimator of the population mean.
Real-World Consequence: Casinos make consistent profits because the Law of Large Numbers ensures that over millions of bets, their actual payout converges to the expected value. Each individual gambler experiences randomness. The casino experiences statistical certainty.
The Central Limit Theorem (CLT)
Central Limit Theorem
√n · (X̄ₙ − μ) / σ → N(0, 1) as n → ∞
The limiting distribution of sums and averages is normal, regardless of the original distribution shape. Practically useful for n ≥ 30 in most cases.
Advanced Concept
Conditional Expectation and the Law of Total Expectation
Law of Total Expectation (Tower Property)
E(X) = E[E(X | Y)]
Take the conditional expected value of X for each value of Y, then average those conditional expectations weighted by the probability of each Y.
Worked Example 9
Using the Law of Total Expectation
60% of items from Machine A (mean weight 50g), 40% from Machine B (mean weight 60g).
E(Weight) = 0.60·50 + 0.40·60 = 30 + 24 = 54g
The Law of Total Variance
Law of Total Variance
Var(X) = E[Var(X | Y)] + Var[E(X | Y)]
Total variance = average within-group variance + variance of group means. This is the foundation of Analysis of Variance (ANOVA).
Applied Statistics
Real-World Applications of Expected Value and Variance
| Field | Expected Value Application | Variance Application | Key Institution / Scholar |
|---|---|---|---|
| Finance | Expected portfolio return | Portfolio risk (Modern Portfolio Theory) | Harry Markowitz, Chicago / Wharton |
| Insurance | Expected loss per policy, premium pricing | Capital reserve requirements | Actuarial Institute, Lloyd’s of London |
| Machine Learning | Expected loss (MSE, cross-entropy) | Bias-variance tradeoff | MIT CSAIL, Stanford AI Lab |
| Gambling / Gaming | Expected payout per bet | Risk of ruin, bankroll management | Las Vegas casinos, probability theory |
| Quality Control | Expected defect rate | Process variance (Six Sigma) | Motorola, GE, ISO standards |
| Epidemiology | Expected number of cases | Spread variability in outbreak models | CDC, WHO, Johns Hopkins Bloomberg |
Finance: Risk and Return
The celebrated Modern Portfolio Theory by Harry Markowitz (Nobel Prize, 1990) is built entirely on expected value and variance: investors seek to maximize expected return for a given variance level. The covariance between assets determines how much diversification reduces risk.
Machine Learning: Loss Functions
The ubiquitous mean squared error (MSE) loss function is literally the expected value of squared prediction error: E[(Y − Ŷ)²]. Minimizing MSE finds the model whose predictions are closest to true values in expectation. The bias-variance tradeoff — the central challenge of ML model selection — is expressed exactly in these terms.
Exam Strategy
How to Solve Expected Value and Variance Problems: Step-by-Step
1
Identify the Type of Random Variable
Is X discrete (countable outcomes) or continuous (range of values)? Discrete problems use summation; continuous problems use integration. If given a PMF, you’re working discrete. If given a PDF, you’re working continuous.
2
Verify the Distribution Sums / Integrates to 1
Confirm Σ P(X = xᵢ) = 1 or ∫ f(x) dx = 1. If a problem asks you to find an unknown constant c in a PDF or PMF, this is how you solve for it.
3
Compute E(X)
Discrete: E(X) = Σ xᵢ · P(X = xᵢ). Continuous: E(X) = ∫ x · f(x) dx. Build a table for discrete variables — column 1: values, column 2: probabilities, column 3: value × probability.
4
Compute E(X²) Using LOTUS
Discrete: E(X²) = Σ xᵢ² · P(X = xᵢ). Continuous: E(X²) = ∫ x² · f(x) dx. Always compute this before applying the variance formula.
5
Apply the Computational Variance Formula
Var(X) = E(X²) − [E(X)]². Write each step separately: first E(X²), then [E(X)]², then subtract. Don’t try to combine steps mentally.
6
Verify with Sanity Checks
Var(X) must be ≥ 0. Standard deviation must be ≥ 0. E(X) should be within the support of X. Negative variance means there’s an error — check your E(X²) and [E(X)]² computations.
The Probability Table Method for Discrete Variables
Build a systematic table: x | P(X=x) | x·P(X=x) | x²·P(X=x). Sum column 3 for E(X). Sum column 4 for E(X²). Then Var(X) = E(X²) − [E(X)]². This format reduces arithmetic errors and is the format most professors expect to see.
Advanced Tool
Moment Generating Functions
Moment Generating Function Definition
M_X(t) = E(e^{tX}) for t in a neighborhood of 0
E(X) = M_X'(0) and E(X²) = M_X”(0), giving Var(X) = M_X”(0) − [M_X'(0)]². Often faster than direct integration for distributions with complicated PDFs.
For X ~ Normal(μ, σ²): M_X(t) = exp(μt + σ²t²/2). Differentiating twice and evaluating at t = 0 gives E(X) = μ and Var(X) = σ². The MGF approach is clean and fast for well-known distributions.
Key People and Institutions
The People and Institutions Behind Expected Value and Variance Theory
Blaise Pascal, Fermat, and the Origins of Expectation
The formal theory emerges from the 1654 correspondence between Pascal and Fermat solving the problem of how to fairly divide stakes in an interrupted gambling game. Christiaan Huygens published the first formal textbook on probability in 1657, formalizing their ideas.
Jakob Bernoulli and the Law of Large Numbers
Jakob Bernoulli of the University of Basel proved the first version of the Law of Large Numbers in his posthumously published Ars Conjectandi (1713) — establishing that sample averages converge to true expected values with enough observations.
Andrei Kolmogorov — Moscow State University
Andrei Kolmogorov (1903–1987) provided the rigorous axiomatic foundations of modern probability theory in his 1933 monograph. Within his framework, expected value is rigorously defined as the Lebesgue integral of X with respect to the probability measure. He also proved the Strong Law of Large Numbers.
MIT and Modern Probability Education
Today’s probability curriculum at MIT, Stanford, and Harvard carries Kolmogorov’s rigorous framework into modern education. The MIT OpenCourseWare materials by Professors Bertsekas and Tsitsiklis are among the most cited free resources for students studying these topics.
Frequently Asked Questions
Frequently Asked Questions: Expected Values and Variance
What is the expected value in statistics?
The expected value of a random variable X, denoted E(X) or μ, is the long-run average value over many repetitions of an experiment. For discrete X: E(X) = Σ xᵢ · P(X = xᵢ). For continuous X: E(X) = ∫ x · f(x) dx. It doesn’t have to be a possible value — the expected number of heads on a fair coin flip is 0.5.
What is the difference between variance and standard deviation?
Variance Var(X) = E[(X − μ)²] measures average squared deviation from the mean. Standard deviation σ = √Var(X) is the square root of variance, expressed in the same units as X and therefore directly interpretable. If test scores have variance 225, the standard deviation is 15 points — typical scores deviate about 15 points from the mean.
How do you calculate expected value for a discrete random variable?
Step 1: list all possible values of X. Step 2: identify P(X = xᵢ) for each value (verify all sum to 1). Step 3: multiply each value by its probability. Step 4: sum all products. Build a table — column 1: x values, column 2: probabilities, column 3: x·P(x). Sum column 3 to get E(X).
What is the computational formula for variance and why is it preferred?
Var(X) = E(X²) − [E(X)]² is algebraically equivalent to E[(X − μ)²] but faster. The definitional formula requires computing (xᵢ − μ)² for each value. The computational formula only requires E(X) and E(X²), both straight weighted averages. Most textbooks recommend it as the default for hand calculations and exams.
What are the properties of expected value?
(1) Linearity: E(aX + b) = a·E(X) + b. (2) Additivity: E(X + Y) = E(X) + E(Y) for ANY X and Y. (3) Independence: E(XY) = E(X)·E(Y) when X and Y are independent. (4) E(c) = c for any constant. (5) Jensen’s Inequality for convex g: E[g(X)] ≥ g[E(X)]. Linearity and additivity are most heavily tested at undergraduate level.
What is covariance and how does it affect variance of sums?
Cov(X, Y) = E(XY) − E(X)·E(Y) measures how two variables vary together. Positive covariance: they move in the same direction. Negative: opposite directions. It directly affects: Var(X + Y) = Var(X) + Var(Y) + 2·Cov(X, Y). Only when independent (Cov = 0) does this simplify to Var(X) + Var(Y).
What is the difference between population variance and sample variance?
Population variance σ² = (1/N)·Σ(xᵢ − μ)² uses the true mean μ and divides by N. Sample variance s² = (1/(n−1))·Σ(xᵢ − x̄)² uses x̄ and divides by (n−1). The (n−1) denominator is Bessel’s correction — it makes s² an unbiased estimator of σ². Using n would systematically underestimate population variance.
What is LOTUS and when do you use it?
LOTUS — the Law of the Unconscious Statistician — lets you compute E[g(X)] using the original distribution of X. Discrete: E[g(X)] = Σ g(xᵢ)·P(X = xᵢ). Continuous: E[g(X)] = ∫ g(x)·f(x)dx. Most commonly used with g(x) = x² to compute E(X²) for variance. Also used for E(1/X), E(√X), E(e^X), etc.
How does the Central Limit Theorem relate to expected value and variance?
The CLT states that the sample mean of n i.i.d. variables with mean μ and variance σ² is approximately N(μ, σ²/n) for large n. E(X̄) = μ (unbiased) and Var(X̄) = σ²/n, decreasing as n increases. The CLT justifies z-tests, t-tests, and confidence intervals built around ±1.96·σ/√n for 95% coverage.
Master Your Statistics Assignment
Our statistics experts help with expected value, variance, probability distributions, hypothesis testing, regression, and more — step-by-step solutions, fast turnaround.
Order Statistics Help Now Log In
