Residual Analysis
~ε
Statistics & Regression Guide
Residual Analysis: The Complete Guide for Statistical Modeling
Residual analysis is the diagnostic engine behind every reliable regression model — separating a model that fits the data from one that actually works. This guide covers everything from residual plot interpretation and heteroscedasticity to autocorrelation, normality tests, and outlier detection via Cook’s Distance. Whether you’re completing a statistics assignment or validating a predictive model, this is your complete diagnostic framework — grounded in OLS theory and real-world practice across R, Python, SPSS, and Stata.
What It Is & Why It Matters
What Is Residual Analysis — And Why Your Regression Depends On It
Residual analysis is one of those topics that separates students who understand regression from those who just run it. You can fit a model in seconds. Knowing whether that model is actually telling you something true takes a lot more than a high R² and a significant F-statistic. That’s precisely the gap that residual analysis closes — and it’s where most statistical assignments either earn full marks or fall apart.
At its core, residual analysis is the systematic examination of the leftover information your model didn’t explain. Every fitted regression model generates a prediction for each observation. The residual is the difference between what actually happened and what the model predicted. These leftover errors — if your model is well-specified — should look like noise: random, centered at zero, evenly spread, and normally distributed. When they don’t, you have a problem the model hasn’t accounted for.
4
core OLS assumptions that residual analysis directly tests: linearity, independence, normality, homoscedasticity
1973
the year Francis Anscombe’s Quartet proved that identical regression statistics can hide entirely different data patterns
4/n
the conventional Cook’s Distance threshold for flagging an influential observation (where n = sample size)
What Is a Residual?
A residual is the observed value of the dependent variable minus the model’s predicted (fitted) value for that same observation:
Basic Residual Formula
eᵢ = yᵢ − ŷᵢ
Where eᵢ is the residual for observation i, yᵢ is the observed response value, and ŷᵢ is the model’s fitted value. A positive residual means the model underpredicted; a negative residual means it overpredicted. Residuals are the observable approximations of the theoretical error terms (ε) in the population regression model — but unlike errors, residuals can actually be computed and examined.
The sum of residuals in an OLS model is always exactly zero — a mathematical property, not a coincidence. This is why we can’t just look at the total; we need to look at the pattern. Anscombe’s landmark 1973 paper demonstrated this with devastating clarity: four datasets with identical means, variances, correlations, and regression lines — but wildly different residual structures. Same summary statistics, completely different models. Residual plots caught what the numbers missed.
The Difference Between a Residual and an Error
The error term (ε) is the theoretical, population-level deviation between an observation and the true, unknown regression line — it’s unobservable. The residual (e) is the estimated error computed from your fitted model. It’s what you calculate from sample data. Residuals are estimates of errors. Good residual behavior gives you evidence that the errors are behaving the way OLS assumes they should.
The diagnostic principle: If your model is correctly specified and the OLS assumptions are met, the residuals should look like draws from a white noise process — random, centered at zero, with constant variance. Any systematic pattern in the residuals is evidence that the model has missed something real.
Why Residual Analysis Matters for Your Assignment
In statistics and econometrics courses at universities across the United States and United Kingdom, regression assignments routinely require residual diagnostics as part of the submission. Professors at institutions ranging from Harvard to University College London (UCL) expect you to verify assumptions, not just report coefficients. Missing this step is consistently one of the top reasons students lose marks on quantitative assignments.
Types of Residuals
Raw, Standardized, and Studentized Residuals — What Each One Tells You
Not all residuals are created equal. Residual analysis involves several different forms of residuals, each designed to answer a specific diagnostic question. Using the right type matters — raw residuals alone can mask problems that scaled versions reveal clearly.
Raw (Ordinary) Residuals
Raw residuals are the simplest form: eᵢ = yᵢ − ŷᵢ. They’re on the same scale as the response variable, useful for understanding the practical magnitude of prediction errors. But because they’re not standardized, comparing them across observations with different leverage is misleading. Raw residuals are the starting point, not the endpoint, of rigorous residual analysis.
Standardized Residuals
Standardized residuals divide each raw residual by the estimated standard deviation of all residuals (the RMSE). This produces dimensionless residuals comparable to a standard normal distribution. Observations with standardized residuals beyond ±2 are typically flagged; those beyond ±3 are strong outlier candidates.
Standardized Residual
rᵢ = eᵢ / (s · √(1 − hᵢᵢ))
Studentized Residuals
Studentized residuals account for the fact that high-leverage observations have smaller residuals by construction — the regression line gets pulled toward them. By adjusting for each observation’s specific leverage, studentized residuals give a fairer comparison across all data points. Externally studentized residuals refit the model without the observation in question, making them the most sensitive tool for detecting individual outliers.
Pearson and Deviance Residuals (Generalized Linear Models)
For GLMs (logistic regression, Poisson regression), raw residuals are no longer directly interpretable. GLMs use Pearson residuals (raw residual divided by the square root of the estimated variance function) and deviance residuals (signed square roots of the contribution to the model’s deviance).
When to Use Standardized Residuals
- Screening for outliers across a dataset quickly
- Initial residual plots for exploratory diagnostics
- Checking normality assumptions on a standard scale
- When leverage values are similar across observations
When to Use Studentized Residuals
- Identifying specific influential outliers precisely
- High-leverage design points (controlled experiments)
- Formal outlier testing (Bonferroni-corrected t-tests)
- Final model validation before publishing results
Diagnostic Plots
How to Read Residual Plots — The Core of Residual Analysis
Residual analysis lives and dies by its plots. The human eye is remarkably good at detecting structure in scatter — which is why every major regression textbook leads with graphical diagnostics. The four plots you need to know are described below.
Residuals vs. Fitted Values Plot
This is the workhorse of residual analysis. Plot the raw or standardized residuals on the y-axis against the fitted values (ŷ) on the x-axis. What you want: a horizontal band of points randomly scattered around zero with no obvious pattern. What different patterns mean:
- Curved or U-shaped pattern: Non-linearity. Your model is missing a polynomial term or the relationship isn’t linear.
- Fan or funnel shape: Heteroscedasticity — residual variance changes with fitted values.
- Random scatter around zero: Linearity and homoscedasticity assumptions are satisfied.
- Systematic wave or S-shape: Possible autocorrelation or a missing periodic variable.
Normal Q-Q Plot (Quantile-Quantile Plot)
The Normal Q-Q plot plots the quantiles of your standardized residuals against the quantiles of a theoretical normal distribution. Normally distributed residuals fall approximately along a straight 45-degree line. Deviations tell a specific story:
- S-shaped curve: Heavy tails (leptokurtosis) — common in financial data.
- Banana-shaped curve: Skewness — a log transformation of Y often helps.
- Points diverging at ends: Outliers deserving further investigation.
- Points on the line throughout: Normality assumption satisfied.
Scale-Location Plot (Spread-Location Plot)
This graph shows the square root of the absolute values of standardized residuals against fitted values — designed specifically to detect heteroscedasticity. A roughly horizontal line with evenly spread points indicates constant variance. An upward slope indicates variance increasing with fitted values.
Residuals vs. Leverage (Cook’s Distance Plot)
The leverage of an observation measures how far its predictor values are from the center of the predictor space. This plot, with Cook’s Distance contour lines overlaid, combines both pieces of information: is this observation far from the fit AND from the center of the data? Points in the upper-right or lower-right corners (high residual AND high leverage) are the most concerning.
The R Default Diagnostic Suite
In R, calling plot(model) on any lm object automatically produces all four diagnostic plots. In Python, statsmodels produces equivalent plots through the OLSInfluence class. SPSS generates them through the regression dialog’s “Plots” submenu. Knowing how to produce and read these four plots is a major component of most regression assignment rubrics.
Struggling With Residual Analysis in Your Assignment?
Our statistics experts provide step-by-step guidance on residual plots, assumption testing, outlier detection, and full model diagnostics — delivered fast, available 24/7.
Get Statistics Help Now Log InOLS Assumptions & Violations
Testing OLS Assumptions Through Residual Analysis
Every ordinary least squares regression rests on a set of assumptions. Residual analysis is fundamentally the process of testing whether those assumptions hold in your data. The Gauss-Markov Theorem — named after Carl Friedrich Gauss and Andrei Markov — guarantees that OLS estimators are Best Linear Unbiased Estimators (BLUE) only when these assumptions are satisfied.
Linearity
OLS assumes a linear relationship between the predictors and the response variable. A residuals-vs-fitted plot showing a curved pattern is your primary diagnostic. Remedies include polynomial terms, interaction terms, or transformations of X or Y.
Independence (No Autocorrelation)
OLS assumes residuals are independent. In time series or panel data, serial autocorrelation is common and dangerous: it inflates the apparent precision of estimates, producing artificially narrow confidence intervals. The Durbin-Watson statistic, developed by James Durbin and Geoffrey Watson at the London School of Economics in 1950–51, is the standard test:
- ≈ 2: No autocorrelation — the desired outcome
- < 1.5: Positive autocorrelation
- > 2.5: Negative autocorrelation
Homoscedasticity (Constant Variance)
Heteroscedasticity doesn’t bias OLS coefficient estimates but makes standard errors incorrect, invalidating all inference. The Breusch-Pagan test and the White test are the standard formal diagnostics. Practical remedies include:
- Log transformation of Y: Works when variance grows proportionally with the mean
- Square root transformation: Appropriate for count data
- Weighted Least Squares (WLS): Explicitly down-weights high-variance observations
- Robust standard errors (Huber-White sandwich estimator): Corrects standard errors without changing coefficients
Normality of Residuals
OLS doesn’t require normality for coefficient estimates to be unbiased — but for valid t-tests and F-tests in small samples, normality of residuals matters. In large samples, the Central Limit Theorem typically saves you. When normality is violated, log or Box-Cox transformations of Y often resolve the issue; alternatively, bootstrapped confidence intervals bypass the normality assumption entirely.
No Multicollinearity
The Variance Inflation Factor (VIF) is the standard diagnostic — VIF values above 5 or 10 suggest problematic collinearity. Multicollinearity inflates standard errors and makes individual coefficient estimates unstable, even if the overall model fit is good.
Outliers & Influential Points
Outliers, Leverage, and Cook’s Distance in Residual Analysis
One of the most important things residual analysis does is distinguish between observations that merely don’t fit the model well and observations that are actively distorting it. These are very different problems with very different solutions.
Outliers: Large Residuals
An outlier in regression terms is an observation with an unusually large residual. Standardized residuals beyond ±2 are commonly flagged; beyond ±3 are strong candidates for investigation. But an outlier isn’t automatically a problem — it may reflect a genuine, meaningful observation the model doesn’t account for, or it may reflect a data entry error. Never delete an outlier simply because it doesn’t fit your model.
Leverage: Unusual Predictor Values
Leverage (denoted hᵢᵢ, the i-th diagonal of the hat matrix H) measures how far observation i’s predictor values are from the center of the predictor space. High-leverage observations occupy unusual positions in X-space and have the potential to strongly influence the regression coefficients. The conventional threshold for high leverage is 2p/n.
Cook’s Distance: Combining Outlier and Leverage Information
Cook’s Distance was developed by R. Dennis Cook at the University of Minnesota in 1977. It measures how much the entire vector of fitted values would change if a single observation were removed:
Cook’s Distance
Dᵢ = (eᵢ² / p · MSE) · (hᵢᵢ / (1 − hᵢᵢ)²)
Common thresholds: Dᵢ > 4/n is a commonly used rule of thumb; Dᵢ > 1 indicates more serious concern.
DFFITS and DFBETAS
DFFITS measures the change in the fitted value for observation i when i is deleted. DFBETAS measure the change in each individual regression coefficient when an observation is removed — useful when you want to know which specific coefficients an influential observation is distorting. For standardized DFBETAS, values beyond ±2/√n are flagged.
⚠️ What To Do When You Find Influential Points
Finding an influential observation is the beginning of an investigation, not the end. First: verify the data — is this a recording error? Second: examine whether the observation is meaningful and whether a different model specification handles it better. Third: always report the presence of influential observations and whether results change materially with them removed. Never silently delete observations without documenting and justifying the decision.
Step-by-Step Process
How to Perform Residual Analysis: A Step-by-Step Guide
Understanding the theory of residual analysis is one thing. Executing it systematically on your own regression output is another. The following steps give you a complete process from model fitting through remediation.
1
Fit Your Regression Model
Run your OLS or GLM regression using your chosen software. Before examining any residuals, confirm the model specification is theoretically justified — include predictors that theory or prior research suggests are relevant. A well-specified model is the prerequisite for meaningful residual analysis.
2
Compute and Save Residuals
Extract and store raw residuals, standardized residuals, and studentized residuals. In R: residuals(model), rstandard(model), rstudent(model). In Python statsmodels: model.resid, influence.resid_studentized_internal. In SPSS: save residuals through the regression dialog.
3
Produce the Four Diagnostic Plots
Generate all four standard diagnostic plots: (1) Residuals vs. Fitted, (2) Normal Q-Q, (3) Scale-Location, (4) Residuals vs. Leverage. In R, plot(model) produces all four automatically. In written assignments, interpreting these plots in specific, plain language is where marks are gained or lost.
4
Apply Formal Statistical Tests
Supplement plots with formal tests: Shapiro-Wilk or Anderson-Darling for normality; Breusch-Pagan or White test for heteroscedasticity; Durbin-Watson for autocorrelation; VIF for multicollinearity. In R: shapiro.test(residuals(model)), bptest(model) from lmtest, durbinWatsonTest(model) from the car package.
5
Identify Outliers and Influential Observations
Compute Cook’s Distance, leverage (hat values), DFFITS, and DFBETAS. Flag observations exceeding conventional thresholds. Investigate each one — check for data entry errors, examine what’s special about the observation, and assess whether its inclusion materially changes the model’s key findings. In R: influence.measures(model) produces all diagnostics in one table.
6
Remediate Violations
Apply the appropriate remedy based on confirmed violations. Non-linearity: add polynomial terms or apply Box-Cox transformation. Heteroscedasticity: log-transform Y, use WLS, or apply robust standard errors. Autocorrelation: add lagged predictors or use Newey-West standard errors. Document every remediation step with the diagnostic evidence that justified it.
7
Re-run Diagnostics After Remediation
After any model modification, repeat the full residual analysis on the new model. A transformation that fixes heteroscedasticity may introduce non-normality. Model diagnostics are iterative, not one-shot. The final model you report should be the one whose residuals pass all relevant assumption checks — or whose violations are acknowledged and addressed.
Key Entities & Theorists
Key Figures, Organizations, and Frameworks in Residual Analysis
The development of residual analysis as a formal discipline spans two centuries of statistical innovation. Understanding who the key figures are elevates a university assignment from textbook recitation to genuine disciplinary awareness.
Carl Friedrich Gauss — The Origin of Least Squares
Carl Friedrich Gauss (1777–1855), the German mathematician at the University of Göttingen, developed the method of least squares — the foundation of OLS regression. He published this method in 1809 in Theoria Motus Corporum Coelestium, using it to predict the orbit of Ceres. Without least squares, there are no residuals; without residuals, there is no residual analysis.
Francis Anscombe — The Case for Visual Diagnostics
Francis John Anscombe (1918–2001), a British statistician at Yale University, transformed statistical practice with his 1973 paper introducing Anscombe’s Quartet — four datasets that are statistically identical but completely different when plotted. He proved, visually and persuasively, that graphical residual analysis is indispensable.
James Durbin & Geoffrey Watson — Autocorrelation Testing
James Durbin and Geoffrey Watson at the London School of Economics co-authored the landmark paper introducing the Durbin-Watson statistic in 1950 in Biometrika. The test is now reported by default in virtually every regression software package worldwide.
R. Dennis Cook — Influence Analysis
R. Dennis Cook at the University of Minnesota changed regression diagnostics with his 1977 paper in Technometrics introducing Cook’s Distance — a unified influence measure that combined outlier and leverage information into a single, interpretable statistic.
Halbert White — Robust Standard Errors
Halbert White (1950–2012) at the University of California, San Diego introduced the heteroscedasticity-consistent (HC) covariance estimator in his 1980 paper in Econometrica. Applied economists from MIT to the UK Treasury use White standard errors as a default specification.
| Entity | Affiliation | Key Contribution | Primary Reference |
|---|---|---|---|
| Carl Friedrich Gauss | University of Göttingen, Germany | Method of Least Squares — foundation of OLS and residual computation | Theoria Motus Corporum Coelestium (1809) |
| Francis Anscombe | Yale University, USA | Anscombe’s Quartet — proof that visual residual analysis is essential | The American Statistician (1973) |
| James Durbin & Geoffrey Watson | London School of Economics, UK | Durbin-Watson statistic for autocorrelation in regression residuals | Biometrika (1950, 1951) |
| R. Dennis Cook | University of Minnesota, USA | Cook’s Distance — unified influence measure combining outlier and leverage | Technometrics (1977) |
| Halbert White | UC San Diego, USA | Heteroscedasticity-consistent standard errors; White test | Econometrica (1980) |
| Trevor Breusch & Adrian Pagan | Australian National University | Breusch-Pagan test for heteroscedasticity | Econometrica (1979) |
| Samuel Shapiro & Martin Wilk | Rutgers University / Bell Labs, USA | Shapiro-Wilk test — gold standard for normality of residuals | Biometrika (1965) |
Software Implementation
Residual Analysis in R, Python, SPSS, Stata, and Minitab
Knowing the theory of residual analysis only gets you halfway. Here’s a practical breakdown of how each major platform handles the core diagnostic tasks.
Residual Analysis in R
R has the richest ecosystem for residual analysis. The base plot(model) function on any lm object produces all four standard diagnostic plots instantly. The car package extends this with influencePlot() and crPlots(). The lmtest package provides bptest() (Breusch-Pagan) and dwtest() (Durbin-Watson). The sandwich package computes White standard errors.
Residual Analysis in Python
Python’s statsmodels library handles residual analysis through the OLSInfluence class. Key commands: model.resid (raw residuals), influence.cooks_distance (Cook’s Distance), influence.hat_matrix_diag (leverage values). The seaborn library provides clean residual plots via sns.residplot(). For robust standard errors: model.fit(cov_type='HC3').
Residual Analysis in SPSS
SPSS Statistics handles residual diagnostics through Analyze → Regression → Linear. The “Save” submenu allows saving raw residuals, standardized residuals, studentized residuals, Cook’s Distance, leverage values, DFFITS, and DFBETAS. The “Plots” submenu generates residuals vs. fitted plots and P-P/Q-Q plots.
Residual Analysis in Stata
Stata is the dominant software in econometrics courses. After regress y x1 x2, the key commands are: predict e, residuals, rvfplot (residuals vs. fitted), predict d, cooksd (Cook’s Distance), hettest and whitetst for heteroscedasticity. Stata’s vce(robust) option applies White standard errors to any regression command.
Choosing the Right Tool for Your Assignment
If your course specifies software, use that software. If you have a choice: R is the most flexible and powerful; Python is best in data science contexts; SPSS is most common in social science and psychology courses; Stata is standard in economics and public policy programs. All produce the same four diagnostic plots and key statistics — the syntax differs, but the interpretation is identical.
Need Residual Analysis Done Right for Your Assignment?
From running diagnostics in R or Python to interpreting heteroscedasticity and Cook’s Distance — our statistics specialists deliver precise, well-documented analysis fast.
Start Your Order LoginKey Terms & LSI Concepts
Essential Vocabulary and Related Concepts for Residual Analysis
Graduate-level residual analysis assignments and professional statistical reporting require command of precise vocabulary. The following terms appear frequently in rubrics, professor feedback, peer-reviewed journals, and standard reference texts.
Core Statistical and Procedural Terms
Residual Sum of Squares (RSS) — the total of all squared residuals; the quantity OLS minimizes. Mean Square Error (MSE) — RSS divided by degrees of freedom; the estimate of error variance σ². R-squared — the proportion of variance in Y explained by the model; does NOT indicate whether residual assumptions are met. Adjusted R-squared — R² adjusted for the number of predictors; penalizes unnecessary model complexity.
Hat matrix (H) — the projection matrix H = X(XᵀX)⁻¹Xᵀ; its diagonal elements hᵢᵢ are the leverage values. BLUE (Best Linear Unbiased Estimator) — the status of OLS estimators when all Gauss-Markov assumptions hold. GLS (Generalized Least Squares) — accounts for non-spherical error structure. WLS (Weighted Least Squares) — a special case of GLS where each observation is weighted by the inverse of its error variance.
Newey-West estimator — a heteroscedasticity and autocorrelation consistent (HAC) standard error estimator, particularly important in time series regression. Box-Cox transformation — a family of power transformations parameterized by λ, used to normalize residuals and stabilize variance. Partial regression plot (added-variable plot) — shows the relationship between Y and one predictor after controlling for all others.
Related Statistical Frameworks
Broader conceptual themes important for advanced work include: model misspecification (consequences of omitting relevant variables or including irrelevant ones); cross-validation and out-of-sample residuals (computing residuals on held-out test data to assess generalization); simulation-based residual analysis for non-Gaussian models; and recursive residuals in time series for detecting structural breaks. For students working in Bayesian inference, posterior predictive checks serve the same model-checking function within a different inferential framework.
Frequently Asked Questions
Frequently Asked Questions: Residual Analysis
What is residual analysis in statistics?
Residual analysis is the systematic examination of residuals — the differences between observed values and model-predicted values — after fitting a regression model. Its purpose is to verify whether OLS assumptions are met (linearity, independence, homoscedasticity, normality), identify patterns the model failed to capture, and detect outliers or influential observations. Proper residuals should scatter randomly around zero with constant variance and no systematic structure. Patterns in residuals reveal specific model failures: curvature suggests non-linearity, fan shapes indicate heteroscedasticity, and serial trends suggest autocorrelation.
What does a good residual plot look like?
A good residual plot — residuals versus fitted values — shows points randomly scattered around a horizontal line at zero, with no obvious pattern, no funnel shape, and no curve. The spread of residuals should be roughly constant across all levels of the fitted values (homoscedasticity). There should be no clusters of points above or below zero in any region. Some individual points with larger residuals are expected, but they should appear randomly distributed rather than concentrated in specific regions.
What is heteroscedasticity and how do you detect it?
Heteroscedasticity occurs when the variance of residuals is not constant across all levels of fitted values — it violates the homoscedasticity assumption of OLS. It’s detected visually through a fan or funnel shape in the residuals-vs-fitted plot, and formally through the Breusch-Pagan test or White test. Heteroscedasticity doesn’t bias OLS coefficient estimates but makes standard errors incorrect, invalidating all t-tests and confidence intervals. Fixes include log-transforming Y, using Weighted Least Squares, or applying heteroscedasticity-consistent (White/sandwich) standard errors.
How do you interpret Cook’s Distance?
Cook’s Distance for observation i measures how much the entire set of fitted values changes when observation i is removed. It combines residual size and leverage into a single influence statistic. Conventional thresholds: Cook’s D > 4/n flags an observation for investigation; Cook’s D > 1 indicates serious concern. Finding a high Cook’s Distance is the start of an investigation, not the end — always check whether the observation is a data error, a legitimate outlier the model fails to capture, or a meaningful extreme case.
How do you test normality of residuals?
Normality of residuals is tested visually using a Normal Q-Q plot — normally distributed residuals fall approximately on a straight diagonal line. Formal tests include: Shapiro-Wilk (most powerful for n < 50), Anderson-Darling (better for larger samples), and Kolmogorov-Smirnov. In R: shapiro.test(residuals(model)). Important caveat: OLS coefficient estimates are unbiased even without normality — normality matters most for valid inference in small samples. In large samples (n > 100), the Central Limit Theorem typically ensures valid inference.
What does the Durbin-Watson test measure?
The Durbin-Watson statistic (developed by James Durbin and Geoffrey Watson at the London School of Economics, 1950–51) tests for first-order serial autocorrelation in regression residuals. The statistic ranges from 0 to 4: a value near 2 indicates no autocorrelation; values near 0 indicate positive autocorrelation; values near 4 indicate negative autocorrelation. Rule of thumb: values between 1.5 and 2.5 are generally acceptable. The test is especially important in time series and panel data.
What is the difference between leverage and influence in regression?
Leverage measures how far an observation’s predictor values (X) are from the center of the predictor space — it’s a property of the design, independent of the response Y. Influence measures the actual effect on the fitted model when an observation is included versus excluded. An observation can have high leverage but low influence (if it happens to fall exactly on the regression line). Cook’s Distance captures influence — combining leverage and residual size. The dangerous cases are high-leverage AND large-residual observations.
What happens if you ignore residual analysis?
Ignoring residual analysis risks presenting results that look credible but are statistically invalid. If linearity is violated, coefficient estimates don’t capture the true relationship. If heteroscedasticity is present, standard errors are wrong — and p-values, confidence intervals, and hypothesis test conclusions may all be incorrect. If autocorrelation exists, the model’s apparent precision is inflated. In academic assignments, failing to conduct and report residual analysis is a major source of lost marks.
Can residual analysis be used in non-regression models?
Yes. The logic of residual analysis applies to any predictive model. In ANOVA, residuals are checked for normality and constant variance exactly as in regression. In ARIMA time series models, residuals should resemble white noise. In GLMs, Pearson and deviance residuals substitute for OLS residuals. In machine learning models, out-of-sample prediction errors serve a similar diagnostic function. The principle is universal: if a model generates predictions, examining the leftover errors reveals what it got wrong.
What is the hat matrix in residual analysis?
The hat matrix H = X(XᵀX)⁻¹Xᵀ is the projection matrix that maps observed Y values onto fitted values: ŷ = HY. It “puts the hat on Y,” hence the name. Its diagonal elements hᵢᵢ are the leverage values for each observation — ranging from 0 to 1, where higher values indicate greater leverage. Residuals can be expressed as e = (I − H)Y, making the hat matrix the mathematical bridge between observed values and residuals. High-leverage observations tend to have smaller residuals because the regression line is pulled toward them.
