Factor Analysis: Statistical Method Data Reduction
Statistics & Data Analysis Guide
Factor Analysis: Statistical Method for Data Reduction
Factor analysis reduces large sets of observed variables into smaller latent constructs — called factors — explaining shared variance. This guide covers exploratory vs. confirmatory approaches, factor loadings, eigenvalues, PCA vs. FA, rotation methods, SPSS & R procedures, and APA reporting standards — with worked examples anchored in peer-reviewed literature from UCLA, APA, and the Psychometric Society.
Foundations
Factor Analysis: What It Is, Where It Came From, and Why It Matters
Factor analysis is a statistical technique that addresses one of the most persistent challenges in empirical research: you collect data on many variables, but you suspect most of those variables are measuring a smaller number of underlying things. A psychologist administering a 100-item questionnaire probably isn’t measuring 100 distinct psychological traits. A marketing researcher surveying brand perception across 40 attributes probably isn’t dealing with 40 separate dimensions of consumer attitude. Factor analysis finds those hidden structures. It groups correlated variables into factors — latent constructs that cannot be directly observed but can be inferred from patterns in the data.
The technique has roots going back over a century. Charles Spearman, a British psychologist at University College London, introduced the first version of factor analysis in 1904 to support his theory of general intelligence — what he called the g factor. Spearman noticed that students who performed well in one cognitive test tended to perform well in others, suggesting a common underlying ability driving performance across domains. That insight launched a century of factor analytic work in psychology. Understanding descriptive vs. inferential statistics is the essential foundation before you can fully appreciate what factor analysis is doing with your data.
1904
Year Charles Spearman published the foundational paper introducing factor analysis to measure the g factor
2
Major types: Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA)
300+
Minimum recommended sample size for stable, replicable factor solutions in social science research
Since Spearman, factor analysis has become one of the most widely used methods in psychology, education, sociology, marketing, organizational behavior, and public health research. The Big Five personality model — the most replicated personality framework in existence — was derived through decades of factor analytic studies. The development of virtually every standardized educational test, from SAT subscales to IQ batteries, involved factor analysis at some stage. When researchers at the Educational Testing Service (ETS) develop a new standardized assessment, factor analysis is central to demonstrating that the test measures what it claims to measure. Hypothesis testing and factor analysis together form two of the most critical inferential tools in quantitative research.
What Does Factor Analysis Actually Do?
Strip away the mathematics, and factor analysis does something elegantly simple. It looks at the correlation matrix of your observed variables and asks: which variables tend to move together? Variables that correlate highly with each other — meaning people who score high on one tend to score high on the others — are grouped onto the same factor. Variables that do not correlate with each other end up on different factors, or drop out of the solution entirely.
The result is a reduced-dimensionality representation of your data. If you started with 30 survey items and factor analysis identifies 4 factors, you’ve compressed your data from 30 dimensions to 4. You haven’t lost information about real structure — you’ve revealed it. Each factor can then be named and interpreted based on the items that load heavily onto it. This is why factor analysis is often described both as a data reduction technique and a structure discovery method.
Factor analysis doesn’t just summarize your data — it proposes a theory about it. When you name a factor “Emotional Stability” based on five items that load on it, you are making a theoretical claim that something called emotional stability causes those items to correlate. That claim requires evidence beyond the factor solution itself.
Where Is Factor Analysis Used Today?
The applications span almost every quantitative field. In psychology, it underpins personality assessment, clinical screening instruments, and intelligence testing. In education, institutions like the National Center for Education Statistics (NCES) use it to validate survey instruments measuring student engagement, teacher effectiveness, and learning outcomes. In marketing and consumer research, companies including Nielsen and McKinsey & Company use factor analysis to segment consumer attitudes and identify the dimensions that drive brand preference. In machine learning and natural language processing, Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) are conceptually related to factor analysis, applying similar dimensionality reduction logic to text data.
Two Approaches
Exploratory vs. Confirmatory Factor Analysis: Choosing the Right Approach
The single most important conceptual distinction in factor analysis is between its two main forms: Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA). Confusing them — using one when you should use the other — is one of the most common methodological errors in published research, and one your professor or dissertation committee will immediately identify. The choice between them is not primarily statistical; it’s about your research question and what you know (or don’t know) going into the analysis.
What Is Exploratory Factor Analysis (EFA)?
Exploratory Factor Analysis (EFA) is used when you have no predetermined theory about how your variables should cluster. You’re letting the data tell you what the factor structure looks like. EFA makes no constraints on which variables load on which factors — every observed variable is free to load on every factor. You then look at the resulting pattern of loadings and interpret what the factors appear to represent. This is the appropriate tool when you are developing a new scale, when you are working in a new domain without established theory, or when existing theory may not apply to your specific population.
A classic use case: a researcher at a university develops a new 40-item survey measuring student academic motivation. No established theory perfectly predicts how those 40 items will cluster for this population. The researcher runs EFA and discovers the items naturally group into four factors: intrinsic motivation, extrinsic motivation, academic self-efficacy, and fear of failure. Those four factors then become the subscales of the instrument.
What Is Confirmatory Factor Analysis (CFA)?
Confirmatory Factor Analysis (CFA) is used when you have a specific, theory-driven hypothesis about how variables should load onto factors. You specify the model in advance — which variables go on which factors, how many factors there are, whether factors are allowed to correlate — and then test whether that model fits the observed data. CFA is conducted within the broader framework of Structural Equation Modeling (SEM), using software such as Mplus, R’s lavaan package, AMOS, or IBM SPSS Amos.
CFA produces fit indices — numbers that tell you how well your theorized factor model matches the actual covariance structure of your data. Key fit indices include the Comparative Fit Index (CFI), the Tucker-Lewis Index (TLI), the Root Mean Square Error of Approximation (RMSEA), and the Standardized Root Mean Square Residual (SRMR). Accepted cutoffs per Hu and Bentler’s influential 1999 guidelines suggest CFI ≥ 0.95, RMSEA ≤ 0.06, and SRMR ≤ 0.08 for a good-fitting model.
Exploratory Factor Analysis (EFA)
- No prior theory about factor structure
- Data-driven — factors emerge from correlations
- Used in scale development and early-stage research
- Run in SPSS, R (psych package), SAS
- Output: rotated factor matrix, scree plot, communalities
- Assesses: number of factors, item-factor assignments
- Limitation: results may not generalize to new samples
Confirmatory Factor Analysis (CFA)
- Prior theory specifies factor structure in advance
- Theory-driven — tests a pre-specified model
- Used in scale validation and theory testing
- Run in Mplus, R (lavaan), AMOS, EQS
- Output: fit indices (CFI, RMSEA, SRMR), factor loadings
- Assesses: model fit, construct validity, measurement invariance
- Limitation: requires adequate sample size for complex models
Should You Use EFA First and CFA on a New Sample?
Best practice in psychometrics recommends a two-phase approach: conduct EFA on one sample to develop and refine the factor structure, then cross-validate by running CFA on an independent sample. Using the same dataset for both EFA and CFA is methodologically problematic — it capitalizes on chance-specific features of that dataset rather than establishing genuine generalizability. When a single dataset is your only option, split-half validation is an acceptable compromise.
Practical Decision Rule for Your Assignment
Ask yourself one question: Do I have a pre-specified theory about which variables belong to which factors? If yes — use CFA. If no — use EFA. If you’re developing a new measurement instrument from scratch — start with EFA. If you’re validating a published scale in a new population — use CFA. If your professor hasn’t specified, look at whether the study is exploratory (EFA) or confirmatory (CFA) in nature.
Struggling With Factor Analysis for Your Assignment?
Our statistics experts guide you through EFA, CFA, SPSS outputs, APA reporting, and everything in between — available 24/7 for college and university students.
Get Statistics Help Now Log InCore Concepts
Factor Loadings, Eigenvalues, Communality, and the Math Behind Factor Analysis
To actually use and interpret factor analysis outputs — whether from SPSS, R, or any other software — you need a working understanding of the core statistical concepts. You do not need to derive the algebra by hand. You do need to understand what these numbers mean, where they come from, and what decisions they inform.
What Is a Factor Loading?
A factor loading is the correlation between an observed variable and a latent factor. It tells you how strongly a variable is associated with a factor. Loadings range from −1 to +1. In the rotated factor matrix output from SPSS or R, you will see a table where rows are variables and columns are factors. Each cell contains a loading value. The higher the absolute value of the loading, the stronger the relationship between that variable and that factor.
Standard interpretation thresholds: loadings ≥ 0.70 are considered excellent indicators of a factor; 0.50–0.69 are good; 0.40–0.49 are moderate (acceptable in many contexts); below 0.30 are generally considered too weak to be meaningful. In practice, most researchers report only loadings ≥ 0.30 or 0.40 in their tables to keep outputs readable.
Factor Analysis Model Equation
X_i = λ_i1·F_1 + λ_i2·F_2 + … + λ_im·F_m + ε_i
Where:
X_i = observed variable i
λ_im = factor loading of variable i on factor m
F_m = latent factor m
ε_i = unique variance (error) for variable i
Where:
X_i = observed variable i
λ_im = factor loading of variable i on factor m
F_m = latent factor m
ε_i = unique variance (error) for variable i
What Is an Eigenvalue?
An eigenvalue (also called a characteristic root) represents the total amount of variance in the dataset explained by a given factor. When you run factor analysis, the software extracts factors in descending order of eigenvalue — the first factor explains the most variance, the second factor explains the next most, and so on.
The Kaiser criterion — retain factors with eigenvalues > 1.0 — is the default rule in SPSS and is widely taught. The reasoning: an eigenvalue of 1.0 means the factor explains as much variance as a single original variable, so it’s only worth retaining factors that explain more than that. However, the Kaiser criterion is known to over-extract factors with large datasets and under-extract with small ones. That is why most methodologists today recommend supplementing it with a scree plot and parallel analysis.
Cattell’s Scree Plot
Raymond Cattell introduced the scree plot in 1966 as a visual tool for factor retention. You plot the eigenvalues (y-axis) against factor number (x-axis). The curve typically shows a sharp drop followed by a gradual leveling off. The “elbow” — where the curve changes from steep to flat — marks the point at which adding more factors yields diminishing returns. Factors to the left of the elbow are retained; those to the right are dropped.
What Is Communality?
Communality (h²) is the proportion of a variable’s variance that is explained by the extracted factors. A communality of 0.75 means 75% of that variable’s variance is captured by the factors; the remaining 25% is unique variance not explained by the common factor structure.
Very low communalities (below 0.30) suggest that a variable is poorly represented by the extracted factors — it may not belong in the factor solution at all. Very high communalities (above 0.90) can signal item redundancy.
What Is the Correlation Matrix and Why Does It Matter?
Factor analysis begins with the correlation matrix. Before running the analysis, you should assess suitability using two diagnostic tests:
- Bartlett’s Test of Sphericity: Tests whether the correlation matrix is an identity matrix. A significant result (p < .05) indicates your variables do correlate sufficiently to justify factor analysis.
- Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy: Ranges from 0 to 1. Values above 0.80 are “meritorious”; above 0.90 are “marvelous.” Values below 0.50 are considered unacceptable.
Common Confusion
Factor Analysis vs. Principal Component Analysis: The Difference That Actually Matters
No confusion in multivariate statistics is more persistent than the conflation of factor analysis (FA) and principal component analysis (PCA). They produce similar-looking output. SPSS defaults to PCA when you click “Factor Analysis.” Most textbooks present them side by side. They are not the same thing, and using one when you mean the other is a meaningful methodological error.
The Core Conceptual Difference
PCA asks: What linear combinations of my variables explain the most variance? It is a mathematical transformation with no assumption that latent constructs cause your variables. Factor analysis asks: What latent constructs cause my variables to correlate? It explicitly distinguishes between common variance (shared among variables, attributed to latent factors) and unique variance (specific to each variable, treated as error).
| Feature | Principal Component Analysis (PCA) | Factor Analysis (FA) |
|---|---|---|
| Primary Goal | Data reduction — create uncorrelated components capturing maximum variance | Identify latent constructs explaining shared variance among variables |
| Latent Variables | No — components are mathematical combinations, not theoretical constructs | Yes — factors represent unobserved psychological or social constructs |
| Variance Modeled | Total variance (common + unique) | Common variance only; unique variance is error |
| Assumption About Causality | None — purely mathematical | Factors causally produce observed variable scores |
| Best Used For | Preprocessing data for ML, reducing multicollinearity before regression | Scale development, construct validation, theory testing |
| Software Defaults | SPSS “Factor” defaults to PCA; sklearn.decomposition.PCA in Python | R psych package fa(), SPSS (select Principal Axis Factoring) |
Most Common Student Error: Running PCA in SPSS (the software default) and calling it “factor analysis” in the write-up. If you are conducting true factor analysis, change the extraction method to Principal Axis Factoring or Maximum Likelihood before running the analysis.
Rotation
Rotation in Factor Analysis: Why It Matters and Which Method to Choose
After extracting factors, the initial (unrotated) factor solution is mathematically correct but often difficult to interpret. Rotation transforms the factor solution to make the pattern of loadings more interpretable, without changing how well the model fits the data. This is the principle of simple structure, introduced by Louis Thurstone at the University of Chicago in the 1930s: each variable should load highly on one factor and near zero on the others.
Orthogonal Rotation: Varimax, Quartimax, and Equamax
Orthogonal rotation constrains the rotated factors to remain uncorrelated with each other. The three main orthogonal methods are:
- Varimax (Kaiser, 1958): The most widely used rotation. Maximizes the variance of squared loadings within each factor, producing a solution where each factor has a few high loadings and many near-zero loadings.
- Quartimax: Maximizes variance across rows (variables). Tends to produce a single large general factor. Less commonly used.
- Equamax: A compromise between Varimax and Quartimax. Rarely used and generally not recommended as a default choice.
Oblique Rotation: Direct Oblimin and Promax
Oblique rotation allows factors to correlate with each other. In most real-world social science contexts, this is the more defensible assumption — psychological traits, attitudes, and social phenomena genuinely correlate.
- Direct Oblimin: A flexible oblique rotation where the degree of factor correlation is controlled by a delta parameter (δ). Most commonly used oblique method in educational and psychological research.
- Promax: First runs Varimax, then allows factors to correlate. Commonly used in large datasets and produces results very similar to Direct Oblimin.
Quick Decision Guide: Which Rotation Method?
Use Varimax if: Your factors are theoretically independent, or if inter-factor correlations from an oblique run are all < ±0.30.
Use Direct Oblimin or Promax if: Your factors represent psychological traits, attitudes, or social phenomena that could plausibly correlate — this is the default assumption in most social science research.
Assumptions & Requirements
Factor Analysis Assumptions, Sample Size Requirements, and Data Suitability
Factor analysis works well under specific conditions. Understanding what the method requires — and how to check those requirements — is essential before you run any analysis.
Key Assumptions of Factor Analysis
1. Measurement level: Variables should be measured at the interval or ratio level. Truly categorical or binary variables require polychoric correlations instead of Pearson correlations as input.
2. Adequate sample size: Comrey and Lee (1992) classified 100 as “poor,” 200 as “fair,” 300 as “good,” 500 as “very good,” and 1,000 as “excellent.” Modern simulation research suggests that what matters more is the communality level of variables and the distinctness of factors.
3. Multivariate normality: The method is fairly robust to moderate departures from normality, particularly with large samples. Maximum Likelihood extraction is most sensitive to normality violations; Principal Axis Factoring is more robust.
4. Linear relationships: Factor analysis assumes that relationships between variables are linear. Non-linear relationships may be missed.
5. Absence of multicollinearity and singularity: Variables should correlate, but not too perfectly. Correlations at 0.90 or higher may cause mathematical instability.
How to Check Data Suitability Before Running Factor Analysis
1
Run the KMO and Bartlett’s Tests
In SPSS: Analyze → Dimension Reduction → Factor → Descriptives → check KMO and Bartlett’s Test. KMO ≥ 0.80 is good; 0.60–0.79 is mediocre but acceptable; < 0.60 is problematic. Bartlett’s test should be significant (p < .05).
2
Inspect the Correlation Matrix
Look for correlations between 0.30 and 0.90. If most correlations are below 0.30, there may be insufficient shared variance. If many are above 0.90, multicollinearity may be a problem.
3
Check for Outliers and Missing Data
Use Mahalanobis distance to detect multivariate outliers. Handle missing data before running — listwise deletion, multiple imputation, or FIML are standard options. Avoid pairwise deletion.
4
Assess Distribution Properties
Examine skewness and kurtosis. Values of skewness > ±2 and kurtosis > ±7 (Kline, 2016 guidelines) suggest meaningful non-normality. Consider data transformation for highly skewed variables if using Maximum Likelihood extraction.
Need Help Running Factor Analysis in SPSS or R?
Our experts walk you through every step — from assumption checks to output interpretation and APA write-up. Fast, reliable academic support for university students.
Start Your Order LoginStep-by-Step
How to Run Factor Analysis in SPSS and R: Step-by-Step Procedures
Running Factor Analysis in IBM SPSS Statistics
1
Open the Factor Analysis Dialog
Go to Analyze → Dimension Reduction → Factor. Move all your scale items into the Variables box.
2
Set Descriptives
Click Descriptives. Check: Initial solution, Coefficients, Significance levels, KMO and Bartlett’s test of sphericity. Click Continue.
3
Set Extraction Method
Click Extraction. Change the default to Principal Axis Factoring (or Maximum Likelihood if data meets normality assumptions). Under Extract, choose Eigenvalue > 1 for an initial run. Check the Scree plot box. Click Continue.
4
Set Rotation
Click Rotation. Select Direct Oblimin for your primary run — or Varimax if factors are theoretically independent. Check Rotated solution. Click Continue.
5
Set Options
Click Options and select: Sorted by size and Suppress small coefficients (absolute value below 0.30). Click Continue, then OK.
6
Interpret the Output
Review in order: (1) KMO and Bartlett’s test; (2) Total Variance Explained table; (3) Scree plot; (4) Communalities; (5) Pattern Matrix for oblique solutions; (6) Factor Correlation Matrix.
Running Factor Analysis in R (psych package)
R Code: Factor Analysis Using the psych Package
# Install and load packages
install.packages(“psych”)
library(psych)
# Check assumptions
KMO(your_data)
cortest.bartlett(cor(your_data), n = nrow(your_data))
# Run parallel analysis to determine number of factors
parallel <- fa.parallel(your_data, fm=”pa”, fa=”fa”)
# Run EFA with oblique rotation
efa_result <- fa(your_data, nfactors=3, rotate=”oblimin”, fm=”pa”)
print(efa_result, digits=2, cut=0.30, sort=TRUE)
# View factor diagram
fa.diagram(efa_result)
install.packages(“psych”)
library(psych)
# Check assumptions
KMO(your_data)
cortest.bartlett(cor(your_data), n = nrow(your_data))
# Run parallel analysis to determine number of factors
parallel <- fa.parallel(your_data, fm=”pa”, fa=”fa”)
# Run EFA with oblique rotation
efa_result <- fa(your_data, nfactors=3, rotate=”oblimin”, fm=”pa”)
print(efa_result, digits=2, cut=0.30, sort=TRUE)
# View factor diagram
fa.diagram(efa_result)
Interpretation & APA Reporting
Interpreting Factor Analysis Results and Writing Them Up in APA Format
How to Interpret the Rotated Factor Matrix
The rotated factor matrix (pattern matrix for oblique rotations) is the central output. Reading across a row, you see how strongly each variable relates to each factor. Reading down a column, you see which variables define that factor.
Practical steps: (1) Identify variables with loadings ≥ 0.40 on each factor. (2) Examine whether any variables cross-load (≥ 0.30 on two or more factors) — these may need removal. (3) Look for variables with no substantial loading on any factor — these may also need to be dropped. (4) Name each factor based on the content of its high-loading items. (5) Check that each factor has at least three substantial indicators.
Assessing Reliability: Cronbach’s Alpha per Factor
Once you’ve identified which variables belong to each factor, calculate Cronbach’s Alpha for each subscale. Acceptable thresholds: ≥ 0.70 for research purposes (Nunnally, 1978); ≥ 0.80 for applied/clinical settings. Very high alpha (≥ 0.95) can signal item redundancy.
APA 7th Edition Reporting Standards for EFA
- Method statement: Name the extraction method, rotation method, and software used.
- Assumption checks: Report KMO value and Bartlett’s test result (χ², df, p).
- Factor retention rationale: Describe criteria used and how many factors were retained.
- Total variance explained: Report the percentage of total variance explained.
- Factor loading table: Include all items, their loadings (suppressing < 0.30), communalities, and eigenvalues.
- Factor intercorrelations: If using oblique rotation, report the factor correlation matrix.
- Reliability: Report Cronbach’s Alpha for each factor subscale.
APA 7th edition explicitly states that factor loading tables should suppress values below 0.30 for readability, sort items by their primary factor, and clearly indicate the rotation method in the table title or note.
Real-World Applications
Factor Analysis in Practice: Key Applications, Research Entities, and Domain Examples
Psychology and Psychometrics: The Big Five Personality Model
No application of factor analysis has been more influential than the development of the Big Five personality model. Over decades of independent factor analytic studies by researchers including Lewis Goldberg at the Oregon Research Institute and Paul Costa and Robert McCrae at the National Institute on Aging (NIH), a consistent five-factor structure emerged: Openness to Experience, Conscientiousness, Extraversion, Agreeableness, and Neuroticism (OCEAN). The NEO Personality Inventory (NEO-PI-R) was developed and validated using confirmatory factor analysis.
Education: SAT, ACT, and Intelligence Testing
The Educational Testing Service (ETS) uses factor analysis extensively in developing and validating assessments including the SAT, GRE, and PRAXIS teacher certification tests. Modern intelligence tests including the Wechsler Adult Intelligence Scale (WAIS-IV) use CFA to validate their four-factor structure (Verbal Comprehension, Perceptual Reasoning, Working Memory, Processing Speed).
Marketing and Consumer Research
Nielsen, Ipsos, and other major market research firms routinely apply EFA to survey data to understand what drives customer satisfaction or brand loyalty. A classic example: a bank surveys customers on 25 service quality attributes and discovers through factor analysis that they cluster into four dimensions — the foundation of frameworks like SERVQUAL, developed at Texas A&M University.
Public Health and Epidemiology
The Patient Health Questionnaire (PHQ-9), used worldwide to screen for depression, was validated using CFA to confirm its single-factor structure. The Medical Outcomes Study SF-36, developed by researchers at RAND Corporation and Harvard Medical School, uses factor analysis to identify two summary scores — Physical Component Summary and Mental Component Summary — from 36 health items.
| Domain | Key Application | Key Entity | FA Type Used |
|---|---|---|---|
| Personality Psychology | Big Five Model / NEO-PI-R | Costa & McCrae (NIH); Goldberg (Oregon Research Institute) | EFA + CFA |
| Intelligence Testing | WAIS-IV, SAT, GRE subscale validation | Educational Testing Service (ETS); Pearson Assessment | CFA |
| Clinical Psychology | PHQ-9, GAD-7 scale validation | Spitzer et al. (Columbia University) | CFA |
| Marketing Research | Brand perception, SERVQUAL dimensions | Nielsen; Texas A&M University | EFA |
| Public Health | SF-36 summary scores | RAND Corporation; Harvard Medical School | EFA + CFA |
| Machine Learning / NLP | Latent Semantic Analysis, topic modeling | Google Brain; DeepMind; Stanford NLP Group | PCA/SVD-based |
Avoiding Mistakes
Common Errors in Factor Analysis and How to Avoid Them
Error 1: Using PCA and Calling It Factor Analysis
SPSS defaults to PCA extraction in its Factor Analysis menu. If you are claiming to identify latent constructs, you must use Principal Axis Factoring or Maximum Likelihood extraction, not PCA. This error appears in published literature frequently enough that Fabrigar et al.’s 1999 review in Psychological Methods found it in a majority of EFA studies published in leading psychology journals.
Error 2: Over-Relying on the Kaiser Criterion for Factor Retention
Retaining all factors with eigenvalues > 1 as the sole criterion is a known source of over-extraction. Use parallel analysis as your primary retention method, supported by the scree plot and theoretical considerations.
Error 3: Using the Structure Matrix Instead of the Pattern Matrix
When using oblique rotation, SPSS produces both a Pattern Matrix and a Structure Matrix. The pattern matrix is what you should interpret and report. Using the structure matrix inflates apparent loadings and obscures the unique contribution of each variable to each factor.
Error 4: Ignoring Cross-Loadings
A cross-loading item loads substantially (≥ 0.30) on two or more factors. Better practice is to remove cross-loading items, revise them theoretically, or acknowledge them explicitly in your limitations.
Error 5: Naming Factors Too Creatively
The name you give a factor should accurately reflect the shared content of its high-loading items — nothing more. Overly creative naming violates the scientific principle of parsimony and opens you to valid critique in peer review or dissertation defense.
Dissertation Red Flag: Committees know factor analysis. If your extraction method was PCA, your rotation was Varimax with no theoretical justification, your factor retention was based solely on eigenvalues > 1, and you report the structure matrix — every one of those is a methodological error that will come up in your defense.
Frequently Asked Questions
Frequently Asked Questions About Factor Analysis
What is factor analysis in statistics?
Factor analysis is a statistical method for data reduction that identifies underlying latent constructs (factors) explaining the shared variance among observed variables. It groups correlated variables onto factors, reducing many measured variables to a smaller number of interpretable dimensions. Two main types exist: EFA, which discovers factor structure from data, and CFA, which tests a pre-specified theoretical structure.
What is the difference between EFA and CFA?
Exploratory Factor Analysis (EFA) makes no prior assumptions about which variables load on which factors — it lets the data reveal the structure. Confirmatory Factor Analysis (CFA) tests whether a pre-specified theoretical model fits the observed data, using fit indices like CFI, TLI, and RMSEA. EFA is used in scale development; CFA is used in scale validation within SEM frameworks.
What is a factor loading and what value is considered good?
A factor loading is the correlation between an observed variable and a latent factor — it ranges from −1 to +1. Loadings ≥ 0.70 are excellent; 0.50–0.69 are good; 0.40–0.49 are moderate and contextually acceptable; below 0.30 are too weak for inclusion. For oblique rotations, interpret the Pattern Matrix (not the Structure Matrix).
How is factor analysis different from PCA?
PCA creates mathematical components capturing maximum total variance with no assumption of underlying latent constructs. Factor analysis models only common variance and assumes latent factors causally produce observed variable scores. Use PCA for pure data compression; use factor analysis when identifying and interpreting latent constructs. SPSS defaults to PCA extraction — change to Principal Axis Factoring for true factor analysis.
How many factors should I retain in factor analysis?
Use a combination: (1) Kaiser criterion — eigenvalues > 1.0 as an initial guide; (2) Scree plot elbow; (3) Parallel analysis — the most statistically rigorous method. Always combine statistical criteria with theoretical knowledge about how many constructs are plausibly at work in your domain.
Which rotation method should I use — Varimax or Oblimin?
Run Direct Oblimin (oblique) first. Check the factor correlation matrix — if inter-factor correlations are all below ±0.30, Varimax gives the same interpretation with simpler output. If they exceed ±0.30, oblique rotation is more appropriate. In most social science contexts, oblique rotation is the defensible default.
What sample size do I need for factor analysis?
Comrey and Lee (1992) rated 300 as “good,” 500 as “very good,” and 1,000 as “excellent.” More important than raw sample size are factor communality levels and factor distinctness. For CFA, Kline (2016) recommends at least 200. Never use factor analysis with fewer than 100 participants without strong justification.
What is communality in factor analysis?
Communality (h²) is the proportion of a variable’s variance explained by the retained factors. Variables with communalities below 0.30 are poorly represented and are candidates for removal. Very high communalities (above 0.90) may indicate item redundancy.
How do I report factor analysis in APA 7th edition format?
APA 7th edition requires: (1) name the extraction and rotation method; (2) report KMO and Bartlett’s test; (3) specify factor retention criteria; (4) report total variance explained; (5) include a factor loading table suppressing values < 0.30; (6) report factor intercorrelations if using oblique rotation; (7) report Cronbach’s Alpha per subscale.
Can I use factor analysis with Likert scale data?
Likert scale items (5+ response options) are commonly treated as interval-level data in social science and factor analyzed using Pearson correlations. The more rigorous approach uses polychoric correlations (available in R’s psych package). For 7-point Likert scales with roughly normal distributions, Pearson-based factor analysis typically produces acceptable results.
