Assignment Help

Expected Values and Variance

Expected Values and Variance: The Complete Student Guide | Ivy League Assignment Help
Statistics & Probability Guide

Expected Values and Variance

The complete student guide — formulas, discrete & continuous variables, step-by-step worked examples, covariance, standard deviation, and the Law of Large Numbers. Everything you need to master E(X) and Var(X) for university.

Order Now
4.9/5 on Trustpilot
6,200+ assignments completed
Delivered in 3–6 hours
100% plagiarism-free

Expected Values and Variance: Why Every Statistics Student Must Master These

Expected values and variance are the backbone of probability — without them, almost nothing else in statistics makes sense. The moment you start studying random variables seriously, these two measures become your primary tools for describing, comparing, and reasoning about distributions. They’re also the two concepts most likely to appear in every statistics assignment you’ll ever write. Getting them right isn’t optional.

The idea of expected value stretches back to the mid-17th century. Blaise Pascal and Pierre de Fermat exchanged letters in 1654 working through the classic “problem of points” — how to fairly split gambling stakes if a game was interrupted. That correspondence birthed the formal theory of probability expectation. Centuries later, Andrei Kolmogorov at Moscow State University formalized the modern measure-theoretic framework that underpins how both expected value and variance are rigorously defined today.

Today, expected value and variance appear everywhere: in the actuarial tables that set your insurance premiums, in the portfolio models that manage your pension, in the loss functions that train neural networks, and in the A/B tests that decide which version of a webpage you see.

μ
Symbol for Expected Value (population mean) — the long-run average of a random variable
σ²
Symbol for Variance — the average squared deviation from the mean
σ
Symbol for Standard Deviation — the square root of variance, in the same units as X

This article focuses on building the deepest possible conceptual understanding alongside practical calculation skills. Formulas only go so far; the goal is for you to know why these formulas exist and when to use each property — because that’s what your professors and future employers are actually testing.

What Is Expected Value? Definition and Intuition

Expected value — also called expectation, mathematical expectation, mean, or first moment — is a way of answering the question: “If I ran this random experiment an infinite number of times, what would the average outcome be?” It doesn’t tell you what will happen in any single trial. It tells you what to expect on average across many trials. That distinction is crucial.

Formally, the expected value of a random variable X, denoted E(X) or μ, is a measure of the central tendency of its probability distribution — the mean value that the variable would take if the experiment were repeated many times. For a discrete random variable, this is the weighted average of all possible outcomes, where each outcome is weighted by its probability.

Expected Value of Discrete Random Variables

Formula — Expected Value (Discrete)
E(X) = Σ xᵢ · P(X = xᵢ)

Sum over all possible values xᵢ. Multiply each value by its probability P(X = xᵢ), then add everything up. All probabilities must sum to 1.

Worked Example 1

Rolling a Fair Six-Sided Die

Each face shows 1–6 with equal probability 1/6.

E(X) = 1·(1/6) + 2·(1/6) + 3·(1/6) + 4·(1/6) + 5·(1/6) + 6·(1/6)
E(X) = (1 + 2 + 3 + 4 + 5 + 6) / 6 = 21/6 = 3.5

Notice: 3.5 is not a face on the die. Expected value doesn’t have to be a possible outcome — it’s the long-run average. Roll the die 10,000 times; your average will be very close to 3.5.

Worked Example 2

Insurance Claim Payout

An insurer pays out $0 (prob. 0.90), $1,000 (prob. 0.07), or $10,000 (prob. 0.03) on a policy.

E(Payout) = 0·(0.90) + 1,000·(0.07) + 10,000·(0.03)
E(Payout) = 0 + 70 + 300 = $370 per policy

This is why insurers charge premiums above $370 — to cover operating costs and maintain profit. Expected value directly prices insurance risk.

Expected Value of Continuous Random Variables

Formula — Expected Value (Continuous)
E(X) = ∫_{-∞}^{∞} x · f(x) dx

f(x) is the probability density function. The integral is the continuous analog of the discrete weighted average. The support may be a subset of ℝ — integrate only where f(x) > 0.

Worked Example 3

Uniform Distribution on [0, 4]

X ~ Uniform(0, 4). The PDF is f(x) = 1/4 for 0 ≤ x ≤ 4, and 0 otherwise.

E(X) = ∫₀⁴ x · (1/4) dx = (1/4) · [x²/2]₀⁴ = (1/4) · (16/2) = (1/4) · 8 = 2

The expected value is the midpoint (0 + 4)/2 = 2. For any Uniform(a, b) distribution, E(X) = (a + b)/2.

In statistics, the sample mean serves as an estimate for the expectation, and is itself a random variable. The sample mean is considered to meet the desirable criterion for a “good” estimator in being unbiased — that is, the expected value of the estimate equals the true value of the underlying parameter.

Properties of Expected Value: What Every Student Must Know

The real power of expected value comes from its properties, not just its definition. These properties let you decompose complicated expectations into simpler pieces, and they’re tested extensively in university statistics courses.

Linearity of Expectation

Linearity Properties
E(aX + b) = a·E(X) + b   (for constants a, b ∈ ℝ)

E(X + Y) = E(X) + E(Y)   (for ANY random variables X and Y)

The second property holds regardless of whether X and Y are independent. This is what makes linearity so powerful — you don’t need independence to add expectations.

Worked Example 4

Using Linearity to Simplify

Suppose E(X) = 5. Find E(3X + 7).

E(3X + 7) = 3·E(X) + 7 = 3·(5) + 7 = 15 + 7 = 22

No distribution. No probabilities. Just linearity. This is how expectation works — the algebra is clean.

Expected Value of Independent Variables

Independence Property
If X and Y are independent: E(XY) = E(X) · E(Y)

This does NOT hold in general. If X and Y are dependent, E(XY) ≠ E(X)·E(Y). Confusing this is one of the most common errors students make.

LOTUS: Law of the Unconscious Statistician

LOTUS (Law of the Unconscious Statistician)
Discrete: E[g(X)] = Σ g(xᵢ) · P(X = xᵢ)

Continuous: E[g(X)] = ∫ g(x) · f(x) dx

This is how we compute E(X²) — a critical step in the variance calculation. Set g(X) = X², apply LOTUS directly.

Exam Strategy: Recognizing When to Use Each Property

When an exam problem says “find E(2X – 3Y + 5)”, use linearity: E(2X – 3Y + 5) = 2E(X) – 3E(Y) + 5. When it says “X and Y are independent, find E(XY)”, use the independence property: E(XY) = E(X)·E(Y). When it says “find E(X²)” or “find E(sin X)”, use LOTUS with the original distribution of X.

Statistics Assignment Due Soon?

Our expert statisticians help students solve expected value, variance, hypothesis testing, and probability problems — clear solutions, step-by-step working, fast turnaround.

Get Statistics Help Now Log In

Variance: What It Is, Why It Matters, and How to Compute It

Variance measures how spread out a distribution is around its expected value. Two distributions can have identical expected values but completely different variances. If you’re choosing between two investment strategies with the same expected return, the one with lower variance is less risky.

The Definitional and Computational Formulas

Variance — Two Equivalent Formulas
Definitional: Var(X) = E[(X – μ)²]

Computational: Var(X) = E(X²) – [E(X)]²

Both give the same answer. The computational formula is generally faster — compute E(X²) using LOTUS, then subtract the square of E(X). Always use the computational formula for exam speed.

Why Are They Equal?

Expanding: Var(X) = E[(X − μ)²] = E[X² − 2μX + μ²] = E(X²) − 2μ·E(X) + μ². Since μ = E(X), this simplifies to E(X²) − 2[E(X)]² + [E(X)]² = E(X²) − [E(X)]².

Worked Example 5

Variance of a Fair Six-Sided Die

We know E(X) = 3.5. Using the computational formula:
E(X²) = (1 + 4 + 9 + 16 + 25 + 36)/6 = 91/6 ≈ 15.167

Var(X) = E(X²) – [E(X)]² = 91/6 – (3.5)² = 91/6 – 12.25 = 35/12 ≈ 2.917
Standard deviation: σ = √(35/12) ≈ 1.708

Worked Example 6

Variance of Uniform[0, 4] Distribution

E(X) = 2. E(X²) = ∫₀⁴ x² · (1/4) dx = (1/4)·(64/3) = 16/3

Var(X) = 16/3 – 4 = 4/3 ≈ 1.333
Check with formula: (4 − 0)²/12 = 16/12 = 4/3. ✓

Common Mistake: Confusing population variance σ² with sample variance s². Population variance uses μ and divides by N. Sample variance uses x̄ and divides by (n − 1) — Bessel’s correction, making s² an unbiased estimator of σ². Getting these confused is a guaranteed mark-loser.

Properties of Variance and Standard Deviation

Key Variance Properties
Var(aX + b) = a²·Var(X)   (b disappears — shifting doesn’t change spread)

Var(X + Y) = Var(X) + Var(Y)   (ONLY when X and Y are independent)

Var(X – Y) = Var(X) + Var(Y)   (ONLY when independent — note: still +)

Var(c) = 0   (a constant has zero variance)

The most tested trap: Var(2X + 3) = 4·Var(X), not 2·Var(X) + 3. For non-independent variables: Var(X + Y) = Var(X) + Var(Y) + 2·Cov(X, Y).

Worked Example 7

Variance of a Linear Transformation

Suppose Var(X) = 9. Find Var(3X – 5).

Var(3X – 5) = 3²·Var(X) = 9·9 = 81

The -5 shift has zero effect on variance. Only the multiplicative coefficient (3) matters, and it gets squared.

Standard Deviation Properties

SD(aX + b) = |a|·SD(X). Shifting by b has no effect. SD is non-negative. For independent variables: SD(X + Y) = √[Var(X) + Var(Y)] — you cannot just add standard deviations.

The Most Common Error

Students often write SD(X + Y) = SD(X) + SD(Y). This is wrong. Variances add; standard deviations do not. You must add variances first, then take the square root.

Variance of Common Distributions

Distribution E(X) Var(X) Standard Deviation
Bernoulli(p)pp(1 − p)√[p(1 − p)]
Binomial(n, p)npnp(1 − p)√[np(1 − p)]
Poisson(λ)λλ√λ
Uniform(a, b)(a + b)/2(b − a)²/12(b − a)/√12
Normal(μ, σ²)μσ²σ
Exponential(λ)1/λ1/λ²1/λ
Geometric(p)1/p(1 − p)/p²√[(1 − p)]/p

Struggling with Variance Calculations?

Our statistics experts walk through every step — from setting up the distribution table to applying the right formula — so you actually understand what you’re doing.

Get Expert Statistics Help Log In

Covariance, Correlation, and the Variance of Sums

Covariance is defined as: Cov(X, Y) = E[(X − μₓ)(Y − μᵧ)] = E(XY) − E(X)·E(Y). Positive covariance means variables tend to move together; negative means opposite directions.

Variance of a Sum — General Case
Var(X + Y) = Var(X) + Var(Y) + 2·Cov(X, Y)

Var(X − Y) = Var(X) + Var(Y) − 2·Cov(X, Y)

If X, Y independent: Cov(X,Y) = 0, so Var(X ± Y) = Var(X) + Var(Y)

Zero covariance does NOT guarantee independence — it only means there’s no linear relationship.

Pearson Correlation Coefficient

Pearson Correlation
ρ(X, Y) = Cov(X, Y) / (σ_X · σ_Y)

ρ always lies in [−1, +1]. ρ = +1: perfect positive linear relationship. ρ = −1: perfect negative. ρ = 0: no linear relationship.

Worked Example 8

Portfolio Variance with Correlated Assets

Var(A) = 0.04, Var(B) = 0.09, Cov(A, B) = 0.03.
Var(A + B) = 0.04 + 0.09 + 2·(0.03) = 0.19, σ ≈ 0.436
If independent: Var = 0.13, σ ≈ 0.361. Positive covariance increased portfolio risk by ~21%.

The Law of Large Numbers and Central Limit Theorem

The Law of Large Numbers states that the sample mean of a large number of independent and identically distributed random variables converges to their expected value as sample size increases — providing theoretical justification for using the sample mean as an estimator of the population mean.

Real-World Consequence: Casinos make consistent profits because the Law of Large Numbers ensures that over millions of bets, their actual payout converges to the expected value. Each individual gambler experiences randomness. The casino experiences statistical certainty.

The Central Limit Theorem (CLT)

Central Limit Theorem
√n · (X̄ₙ − μ) / σ → N(0, 1) as n → ∞

The limiting distribution of sums and averages is normal, regardless of the original distribution shape. Practically useful for n ≥ 30 in most cases.

Conditional Expectation and the Law of Total Expectation

Law of Total Expectation (Tower Property)
E(X) = E[E(X | Y)]

Take the conditional expected value of X for each value of Y, then average those conditional expectations weighted by the probability of each Y.

Worked Example 9

Using the Law of Total Expectation

60% of items from Machine A (mean weight 50g), 40% from Machine B (mean weight 60g).

E(Weight) = 0.60·50 + 0.40·60 = 30 + 24 = 54g

The Law of Total Variance

Law of Total Variance
Var(X) = E[Var(X | Y)] + Var[E(X | Y)]

Total variance = average within-group variance + variance of group means. This is the foundation of Analysis of Variance (ANOVA).

Real-World Applications of Expected Value and Variance

Field Expected Value Application Variance Application Key Institution / Scholar
FinanceExpected portfolio returnPortfolio risk (Modern Portfolio Theory)Harry Markowitz, Chicago / Wharton
InsuranceExpected loss per policy, premium pricingCapital reserve requirementsActuarial Institute, Lloyd’s of London
Machine LearningExpected loss (MSE, cross-entropy)Bias-variance tradeoffMIT CSAIL, Stanford AI Lab
Gambling / GamingExpected payout per betRisk of ruin, bankroll managementLas Vegas casinos, probability theory
Quality ControlExpected defect rateProcess variance (Six Sigma)Motorola, GE, ISO standards
EpidemiologyExpected number of casesSpread variability in outbreak modelsCDC, WHO, Johns Hopkins Bloomberg

Finance: Risk and Return

The celebrated Modern Portfolio Theory by Harry Markowitz (Nobel Prize, 1990) is built entirely on expected value and variance: investors seek to maximize expected return for a given variance level. The covariance between assets determines how much diversification reduces risk.

Machine Learning: Loss Functions

The ubiquitous mean squared error (MSE) loss function is literally the expected value of squared prediction error: E[(Y − Ŷ)²]. Minimizing MSE finds the model whose predictions are closest to true values in expectation. The bias-variance tradeoff — the central challenge of ML model selection — is expressed exactly in these terms.

How to Solve Expected Value and Variance Problems: Step-by-Step

1

Identify the Type of Random Variable

Is X discrete (countable outcomes) or continuous (range of values)? Discrete problems use summation; continuous problems use integration. If given a PMF, you’re working discrete. If given a PDF, you’re working continuous.

2

Verify the Distribution Sums / Integrates to 1

Confirm Σ P(X = xᵢ) = 1 or ∫ f(x) dx = 1. If a problem asks you to find an unknown constant c in a PDF or PMF, this is how you solve for it.

3

Compute E(X)

Discrete: E(X) = Σ xᵢ · P(X = xᵢ). Continuous: E(X) = ∫ x · f(x) dx. Build a table for discrete variables — column 1: values, column 2: probabilities, column 3: value × probability.

4

Compute E(X²) Using LOTUS

Discrete: E(X²) = Σ xᵢ² · P(X = xᵢ). Continuous: E(X²) = ∫ x² · f(x) dx. Always compute this before applying the variance formula.

5

Apply the Computational Variance Formula

Var(X) = E(X²) − [E(X)]². Write each step separately: first E(X²), then [E(X)]², then subtract. Don’t try to combine steps mentally.

6

Verify with Sanity Checks

Var(X) must be ≥ 0. Standard deviation must be ≥ 0. E(X) should be within the support of X. Negative variance means there’s an error — check your E(X²) and [E(X)]² computations.

The Probability Table Method for Discrete Variables

Build a systematic table: x | P(X=x) | x·P(X=x) | x²·P(X=x). Sum column 3 for E(X). Sum column 4 for E(X²). Then Var(X) = E(X²) − [E(X)]². This format reduces arithmetic errors and is the format most professors expect to see.

Moment Generating Functions

Moment Generating Function Definition
M_X(t) = E(e^{tX})   for t in a neighborhood of 0

E(X) = M_X'(0) and E(X²) = M_X”(0), giving Var(X) = M_X”(0) − [M_X'(0)]². Often faster than direct integration for distributions with complicated PDFs.

For X ~ Normal(μ, σ²): M_X(t) = exp(μt + σ²t²/2). Differentiating twice and evaluating at t = 0 gives E(X) = μ and Var(X) = σ². The MGF approach is clean and fast for well-known distributions.

The People and Institutions Behind Expected Value and Variance Theory

Blaise Pascal, Fermat, and the Origins of Expectation

The formal theory emerges from the 1654 correspondence between Pascal and Fermat solving the problem of how to fairly divide stakes in an interrupted gambling game. Christiaan Huygens published the first formal textbook on probability in 1657, formalizing their ideas.

Jakob Bernoulli and the Law of Large Numbers

Jakob Bernoulli of the University of Basel proved the first version of the Law of Large Numbers in his posthumously published Ars Conjectandi (1713) — establishing that sample averages converge to true expected values with enough observations.

Andrei Kolmogorov — Moscow State University

Andrei Kolmogorov (1903–1987) provided the rigorous axiomatic foundations of modern probability theory in his 1933 monograph. Within his framework, expected value is rigorously defined as the Lebesgue integral of X with respect to the probability measure. He also proved the Strong Law of Large Numbers.

MIT and Modern Probability Education

Today’s probability curriculum at MIT, Stanford, and Harvard carries Kolmogorov’s rigorous framework into modern education. The MIT OpenCourseWare materials by Professors Bertsekas and Tsitsiklis are among the most cited free resources for students studying these topics.

Frequently Asked Questions: Expected Values and Variance

What is the expected value in statistics? +
The expected value of a random variable X, denoted E(X) or μ, is the long-run average value over many repetitions of an experiment. For discrete X: E(X) = Σ xᵢ · P(X = xᵢ). For continuous X: E(X) = ∫ x · f(x) dx. It doesn’t have to be a possible value — the expected number of heads on a fair coin flip is 0.5.
What is the difference between variance and standard deviation? +
Variance Var(X) = E[(X − μ)²] measures average squared deviation from the mean. Standard deviation σ = √Var(X) is the square root of variance, expressed in the same units as X and therefore directly interpretable. If test scores have variance 225, the standard deviation is 15 points — typical scores deviate about 15 points from the mean.
How do you calculate expected value for a discrete random variable? +
Step 1: list all possible values of X. Step 2: identify P(X = xᵢ) for each value (verify all sum to 1). Step 3: multiply each value by its probability. Step 4: sum all products. Build a table — column 1: x values, column 2: probabilities, column 3: x·P(x). Sum column 3 to get E(X).
What is the computational formula for variance and why is it preferred? +
Var(X) = E(X²) − [E(X)]² is algebraically equivalent to E[(X − μ)²] but faster. The definitional formula requires computing (xᵢ − μ)² for each value. The computational formula only requires E(X) and E(X²), both straight weighted averages. Most textbooks recommend it as the default for hand calculations and exams.
What are the properties of expected value? +
(1) Linearity: E(aX + b) = a·E(X) + b. (2) Additivity: E(X + Y) = E(X) + E(Y) for ANY X and Y. (3) Independence: E(XY) = E(X)·E(Y) when X and Y are independent. (4) E(c) = c for any constant. (5) Jensen’s Inequality for convex g: E[g(X)] ≥ g[E(X)]. Linearity and additivity are most heavily tested at undergraduate level.
What is covariance and how does it affect variance of sums? +
Cov(X, Y) = E(XY) − E(X)·E(Y) measures how two variables vary together. Positive covariance: they move in the same direction. Negative: opposite directions. It directly affects: Var(X + Y) = Var(X) + Var(Y) + 2·Cov(X, Y). Only when independent (Cov = 0) does this simplify to Var(X) + Var(Y).
What is the difference between population variance and sample variance? +
Population variance σ² = (1/N)·Σ(xᵢ − μ)² uses the true mean μ and divides by N. Sample variance s² = (1/(n−1))·Σ(xᵢ − x̄)² uses x̄ and divides by (n−1). The (n−1) denominator is Bessel’s correction — it makes s² an unbiased estimator of σ². Using n would systematically underestimate population variance.
What is LOTUS and when do you use it? +
LOTUS — the Law of the Unconscious Statistician — lets you compute E[g(X)] using the original distribution of X. Discrete: E[g(X)] = Σ g(xᵢ)·P(X = xᵢ). Continuous: E[g(X)] = ∫ g(x)·f(x)dx. Most commonly used with g(x) = x² to compute E(X²) for variance. Also used for E(1/X), E(√X), E(e^X), etc.
How does the Central Limit Theorem relate to expected value and variance? +
The CLT states that the sample mean of n i.i.d. variables with mean μ and variance σ² is approximately N(μ, σ²/n) for large n. E(X̄) = μ (unbiased) and Var(X̄) = σ²/n, decreasing as n increases. The CLT justifies z-tests, t-tests, and confidence intervals built around ±1.96·σ/√n for 95% coverage.

Master Your Statistics Assignment

Our statistics experts help with expected value, variance, probability distributions, hypothesis testing, regression, and more — step-by-step solutions, fast turnaround.

Order Statistics Help Now Log In

author-avatar

About Byron Otieno

Byron Otieno is a professional writer with expertise in both articles and academic writing. He holds a Bachelor of Library and Information Science degree from Kenyatta University.

Leave a Reply

Your email address will not be published. Required fields are marked *