When should I use non-parametric tests instead of parametric tests?

Use non-parametric tests when: (1) your data is ordinal (ranked categories) rather than truly continuous; (2) your data is continuous but significantly non-normal and your sample size is too small for the Central Limit Theorem to rescue you; (3) you have extreme outliers that distort means; (4) you are analyzing Likert scale responses; or (5) your sample size is very small (n < 15 per group). If your data is normally distributed or your sample size is large (n ≥ 30), parametric tests are generally preferred as they are more statistically powerful.

How do you interpret the U statistic in the Mann-Whitney test?

The U statistic represents the number of times an observation from one group precedes (is ranked lower than) an observation from the other group. You calculate U1 and U2 for each group; the test statistic is the smaller of the two: U = min(U1, U2). A small U value relative to its maximum (n1 × n2) suggests strong separation between the groups. You then compare U to a critical value from the Mann-Whitney table (for small samples) or convert to a z-score (for large samples) and find a p-value. If p < α (typically 0.05), you reject the null hypothesis of equal distributions.

What effect size should I report for non-parametric tests?

For the Mann-Whitney U test, report r = Z / √N as the effect size, where Z is the z-score from the test and N is the total sample size. Benchmarks: r = 0.1 (small), r = 0.3 (medium), r = 0.5 (large). Alternatively, report the rank-biserial correlation. For the Wilcoxon Signed-Rank test, the same r = Z / √N formula applies. Always report effect sizes alongside p-values — statistical significance does not tell you whether the difference is meaningful in practice. Effect size reporting is required by APA guidelines and most journals.

Can non-parametric tests be used with large samples?

Yes. Non-parametric tests can be used with any sample size, including large ones. However, with large samples, parametric tests become increasingly robust to violations of normality (thanks to the Central Limit Theorem), making them generally preferred for their greater statistical power. For large samples (n > 20 per group for Mann-Whitney, n > 25 pairs for Wilcoxon), both tests use a normal approximation (z-score) rather than exact critical values from tables. The efficiency of non-parametric tests relative to their parametric equivalents is approximately 95% under normality and can exceed 100% under non-normality.

How do I run the Mann-Whitney U test in SPSS?

In SPSS: go to Analyze → Nonparametric Tests → Legacy Dialogs → 2 Independent Samples. Move your dependent variable to the Test Variable List and your grouping variable to the Grouping Variable box. Define groups (enter the values representing each group). Select Mann-Whitney U under Test Type. Click OK. SPSS outputs the U statistic, z-score, and asymptotic significance (p-value). For exact p-values with small samples, click Exact and select Exact. Report: U, z, p, and the effect size r = Z/√N calculated from the output.

What are the assumptions of the Mann-Whitney U test?

The Mann-Whitney U test has three main assumptions: (1) Independence — observations within and between groups must be independent; (2) Ordinal or continuous measurement level — the dependent variable must be at least ordinal (rankings, Likert scales, or continuous data); (3) Similar shape of distributions — both groups should have distributions of the same shape (not necessarily normal, but similarly shaped) if the goal is to compare medians. If the shapes differ substantially, the test compares stochastic dominance rather than medians. Note: normality is NOT an assumption of this test.

Non-parametric Tests: Mann-Whitney U and Wilcoxon Signed-Rank Tests

Q: What is a non-parametric test?

A non-parametric test is a statistical hypothesis test that does not assume data follows a specific distribution — most importantly, it does not require normality. Non-parametric tests work by ranking data rather than using raw values, making them robust to outliers, skewed distributions, and small samples. They are used when the assumptions of parametric tests (like the t-test) cannot be met. Common non-parametric tests include the Mann-Whitney U test, Wilcoxon Signed-Rank test, Kruskal-Wallis test, and Spearman's rank correlation.

Q: What is the Mann-Whitney U test used for?

The Mann-Whitney U test (also called the Wilcoxon rank-sum test) is a non-parametric test used to compare two independent groups. It tests whether one group tends to have higher or lower values than the other — formally, whether the distributions of the two groups are the same. It is the non-parametric alternative to the independent samples t-test and is used when data is ordinal, non-normally distributed, or when sample sizes are small. Common applications include comparing patient outcomes between two treatment groups, comparing scores between two student cohorts, or testing salary differences between two professional groups.

Q: What is the Wilcoxon Signed-Rank test?

The Wilcoxon Signed-Rank test is a non-parametric test for comparing two related (paired) groups or testing whether a single sample's median differs from a hypothesized value. It considers both the direction (sign) and magnitude (rank) of differences between paired observations. It is the non-parametric alternative to the paired samples t-test. Common uses include pre-test vs. post-test comparisons, before-and-after treatment studies, and comparing matched pairs of observations when the difference scores are not normally distributed.

Q: What is the difference between Mann-Whitney U and Wilcoxon Signed-Rank?

The key difference is the study design. Mann-Whitney U compares two independent (unrelated) groups — different people, different samples. Wilcoxon Signed-Rank compares two related (paired) groups — the same people measured twice, or matched pairs. Think of it this way: if you compare Pain Score between Group A (drug) and Group B (placebo) with different patients — that's Mann-Whitney U. If you compare Pain Score Before vs. After treatment in the same patients — that's Wilcoxon Signed-Rank. Choosing the wrong test is a common mistake that invalidates results.

Q: How do you interpret the U statistic in the Mann-Whitney test?

The U statistic represents the number of times an observation from one group precedes (is ranked lower than) an observation from the other group. You calculate U1 and U2 for each group; the test statistic is the smaller of the two: U = min(U1, U2). A small U value relative to its maximum (n1 × n2) suggests strong separation between the groups. You then compare U to a critical value from the Mann-Whitney table (for small samples) or convert to a z-score (for large samples) and find a p-value. If p < α (typically 0.05), you reject the null hypothesis of equal distributions.

Q: What effect size should I report for non-parametric tests?

For the Mann-Whitney U test, report r = Z / √N as the effect size, where Z is the z-score from the test and N is the total sample size. Benchmarks: r = 0.1 (small), r = 0.3 (medium), r = 0.5 (large). Alternatively, report the rank-biserial correlation. For the Wilcoxon Signed-Rank test, the same r = Z / √N formula applies. Always report effect sizes alongside p-values — statistical significance does not tell you whether the difference is meaningful in practice. Effect size reporting is required by APA guidelines and most journals.

Q: Can non-parametric tests be used with large samples?

Yes. Non-parametric tests can be used with any sample size, including large ones. However, with large samples, parametric tests become increasingly robust to violations of normality (thanks to the Central Limit Theorem), making them generally preferred for their greater statistical power. For large samples (n > 20 per group for Mann-Whitney, n > 25 pairs for Wilcoxon), both tests use a normal approximation (z-score) rather than exact critical values from tables. The efficiency of non-parametric tests relative to their parametric equivalents is approximately 95% under normality and can exceed 100% under non-normality.

The Foundation

What Are Non-parametric Tests?

Non-parametric tests sit at the boundary of what makes statistics genuinely useful in the real world. Most classic tests — the t-test, ANOVA, Pearson's correlation — are parametric: they assume your data follows a specific distribution, almost always normal, and they estimate parameters like the mean and standard deviation. When those assumptions hold, parametric tests are powerful. But data is rarely so cooperative.

When your data is ordinal, skewed, bounded, or collected from small samples where you can't verify normality, parametric tests can produce misleading results. Non-parametric tests are the solution. They make no assumption about the underlying distribution. Instead, they work by ranking data — converting raw values into their order positions — and performing inference on those ranks. This makes them genuinely distribution-free. Statistics assignment help for students frequently begins with the question of which test to choose, and understanding parametric versus non-parametric is the foundational decision in any analysis.

~95%

Statistical efficiency of Mann-Whitney U vs. t-test when normality actually holds

>100%

Relative efficiency of non-parametric tests when data is non-normal — they outperform parametric tests

Normality assumptions required — the defining feature of non-parametric inference

Non-parametric Tests: Mann-Whitney U and Wilcoxon Signed-Rank

Why Do Non-parametric Tests Work Without Normality?

The logic is elegant. By converting data to ranks, you strip away information about the raw scale — and with it, the dependence on distribution shape. The number 47 and the number 1,000,000 both become their rank positions. What matters is which observation is larger, not by how much. This rank transformation stabilizes the distribution of the test statistic under the null hypothesis, which is what lets us calculate valid p-values without assuming normality. Sampling methods in applied research — particularly in psychology, health sciences, and education — frequently produce data where non-parametric tests are not just defensible but clearly preferred.

What Is the Difference Between Parametric and Non-parametric Tests?

Parametric tests estimate distributional parameters (mean, variance) and test hypotheses about those parameters. Non-parametric tests test hypotheses about ranks or medians without estimating distribution parameters. The key trade-off: parametric tests are more statistically powerful when their assumptions are met — meaning they are more likely to detect a real effect. Non-parametric tests sacrifice a small amount of power in exchange for robustness. When assumptions are violated, non-parametric tests actually become more powerful than their misbehaving parametric counterparts. Understanding the difference between qualitative and quantitative data is foundational here — non-parametric tests are essential for ordinal-level data that parametric tests were never designed for.

"Non-parametric tests are not the second-best option. For ordinal data and small non-normal samples, they are the correct option. Choosing a t-test on Likert scale data is not safer — it is wrong." — A perspective widely taught in research methods courses at LSE, UCL, and the University of Michigan.

Decision Framework

When to Use Non-parametric Tests: The Decision Framework

Choosing between parametric and non-parametric tests is one of the most common decision points in applied statistics — and one of the most frequently botched. The answer depends on your data, your sample size, and your research design. Here's the framework that works.

Conditions That Call for Non-parametric Tests

You should use a non-parametric test when at least one of these conditions applies to your data or design. First: your dependent variable is ordinal. Likert scales (Strongly Disagree to Strongly Agree), pain ratings (1–10), class rankings — these are ordinal. They have order but no guaranteed equal intervals between values. Treating them as continuous for a t-test is a debated practice; treating them with non-parametric tests is unambiguously defensible. Statistics assignment help for social science and psychology students involves Likert data extensively, and non-parametric tests are the standard choice in many disciplines.

Second: your data is clearly non-normal and your sample is small. With large samples (n ≥ 30), the Central Limit Theorem kicks in and the sampling distribution of the mean becomes approximately normal regardless of the raw data distribution — so t-tests become robust. With small samples, you cannot rely on the CLT. If a Shapiro-Wilk normality test or Q-Q plot reveals significant departures from normality with n < 20, non-parametric tests are safer. Third: your data contains extreme outliers that distort means. Ranks are completely insensitive to outlier magnitude — the largest value always gets rank N regardless of whether it's 100 or 1,000,000.

What Tests Do You Choose for Which Design?

The right non-parametric test depends entirely on your research design — specifically, whether your groups are independent or paired, and how many groups you have. This maps cleanly onto the parametric test landscape:

Two Independent Groups

Parametric: Independent Samples t-test

Non-parametric: Mann-Whitney U Test (Wilcoxon Rank-Sum Test)

Example: Comparing exam scores between a lecture group and a flipped-classroom group where students are different people.

Two Related/Paired Groups

Parametric: Paired Samples t-test

Non-parametric: Wilcoxon Signed-Rank Test

Example: Comparing pain scores before and after treatment in the same patients, or comparing matched pairs of students.

If you have three or more independent groups, the non-parametric alternative to one-way ANOVA is the Kruskal-Wallis test. For three or more related groups, use the Friedman test. This article focuses specifically on the two-group case — Mann-Whitney U and Wilcoxon Signed-Rank — because they are the most commonly tested in university statistics courses and appear most frequently in published research. Understanding simple linear regression and the logic of hypothesis testing is good background, as non-parametric tests follow the same decision logic of null hypothesis significance testing.

Common mistake: Students confuse the Mann-Whitney U test and the Wilcoxon Signed-Rank test because both involve ranks. The distinction is fundamental — independent groups (different people) → Mann-Whitney U; paired/related observations (same people or matched pairs) → Wilcoxon Signed-Rank. Using the wrong test invalidates your analysis completely.

Struggling With Non-parametric Test Assignments?

Our statistics experts can help you choose the right test, run the analysis, and interpret results — with step-by-step explanations and fast turnaround.

Get Statistics Help Now Log In

Test One

The Mann-Whitney U Test: Complete Guide

The Mann-Whitney U test — formally proposed by Henry Mann and Donald Whitney at Ohio State University in 1947, building on earlier work by Frank Wilcoxon — is the most widely used non-parametric test for comparing two independent groups. It is also known as the Wilcoxon rank-sum test, a name that reflects its computational basis. Despite the different names, both refer to the same procedure. Statistics homework help for research methods courses regularly involves the Mann-Whitney U as the go-to alternative when the independent t-test fails its assumptions.

What Does the Mann-Whitney U Test Actually Test?

The formal null hypothesis of the Mann-Whitney U test is that the two populations have the same distribution. In practice — when the distributions have similar shapes — this is equivalent to testing whether the two group medians are equal. More precisely, it tests for stochastic dominance: the probability that a randomly selected observation from Group 1 exceeds a randomly selected observation from Group 2 equals 0.5. If that probability significantly departs from 0.5, the groups differ systematically.

H₀: P(X > Y) = 0.5 (both groups are equally likely to have larger values) H₁: P(X > Y) \neq 0.5 (two-tailed) or P(X > Y) > 0.5 or P(X > Y) < 0.5 (one-tailed) Where X is a random observation from Group 1 and Y is a random observation from Group 2.

Assumptions of the Mann-Whitney U Test

The Mann-Whitney U test is not assumption-free — it just has different and less restrictive assumptions than the t-test. Three conditions must hold:

Independence: Observations within and between both groups must be independent. This means one subject's score does not influence another's. Violating this assumption — common in clustered or repeated-measures data — requires different methods.
Ordinal or continuous measurement: The dependent variable must be at least ordinal. The test works with both Likert-type data and continuous measurements like reaction times, test scores, or blood pressure readings.
Similar distribution shape (for median comparison): If your goal is to compare medians, the two groups should have similarly shaped distributions (both right-skewed, or both roughly symmetric). If shapes differ drastically, the test compares distributions generally rather than medians specifically. This is a nuance frequently overlooked in practice.

Notice what is not on the list: normality. This is the whole point. The Mann-Whitney U test requires no normality assumption — the test statistic's distribution under H₀ is known exactly from combinatorial arguments. This makes it genuinely valid for survey data, clinical ratings, and any ordinal-level measurement. For students in education research at institutions like University of Cambridge, Stanford, or University of Toronto, these tests appear constantly when analyzing Likert-based instruments or student performance rankings.

How to Perform the Mann-Whitney U Test: Step-by-Step

Here is a complete worked example. Suppose a researcher at a US university wants to compare student satisfaction scores (measured on a 0–100 scale with some skewness) between students who received online tutoring (Group A) and in-person tutoring (Group B). The debate between online and in-person learning is active in education research, and comparing satisfaction non-parametrically is often the appropriate choice.

Group A (Online): 72, 65, 88, 55, 79 Group B (In-person): 81, 70, 95, 63, 85, 77 n₁ = 5 (Group A), n₂ = 6 (Group B), N = 11 total

State the Hypotheses

H₀: The satisfaction score distributions are the same for online and in-person tutoring. H₁: The distributions differ (two-tailed). α = 0.05.

Combine and Rank All Observations

Pool all 11 scores and rank from 1 (smallest) to 11 (largest). Tied values receive the average of the ranks they would have occupied.

Score: 55(A), 63(B), 65(A), 70(B), 72(A), 77(B), 79(A), 81(B), 85(B), 88(A), 95(B) Rank: 1 2 3 4 5 6 7 8 9 10 11

Calculate Rank Sums for Each Group

Group A ranks: 1 + 3 + 5 + 7 + 10 = 26 \to R₁ = 26 Group B ranks: 2 + 4 + 6 + 8 + 9 + 11 = 40 \to R₂ = 40 Verification: R₁ + R₂ = 26 + 40 = 66 = N(N+1)/2 = 11\times12/2 = 66 ✓

Calculate U Statistics

U₁ = n₁\cdotn₂ + n₁(n₁+1)/2 - R₁ = 5\times6 + 5\times6/2 - 26 = 30 + 15 - 26 = 19 U₂ = n₁\cdotn₂ + n₂(n₂+1)/2 - R₂ = 30 + 6\times7/2 - 40 = 30 + 21 - 40 = 11 Check: U₁ + U₂ = 19 + 11 = 30 = n₁\timesn₂ ✓ U = min(U₁, U₂) = min(19, 11) = 11

Find the p-value

For small samples (n₁ = 5, n₂ = 6), compare U = 11 against the critical value from Mann-Whitney U tables at α = 0.05 two-tailed. The critical value for n₁ = 5, n₂ = 6 at α = 0.05 (two-tailed) is U_critical = 3. Since U = 11 > U_critical = 3, we fail to reject H₀ (p > 0.05). There is no statistically significant difference between the two groups.

Calculate Effect Size and Report Results

For larger samples, convert to z: μ_U = n₁\cdotn₂/2 = 30/2 = 15 σ_U = \sqrt(n₁\cdotn₂(n₁+n₂+1)/12) = \sqrt(5\times6\times12/12) = \sqrt30 \approx 5.48 z = (U - μ_U)/σ_U = (11 - 15)/5.48 \approx -0.73 Effect size: r = |z|/\sqrtN = 0.73/\sqrt11 \approx 0.22 (small-medium)

Report: "A Mann-Whitney U test indicated no statistically significant difference in satisfaction scores between online (Mdn = 72) and in-person (Mdn = 79) tutoring groups, U = 11, z = -0.73, p > .05, r = .22."

Large-Sample Approximation for Mann-Whitney U

When both sample sizes exceed 20, the U statistic is approximately normally distributed under H₀. This allows use of the z-score formula shown above and standard normal tables or software p-values. Software like SPSS, R, and Python's scipy.stats always use this approach for large samples and can provide exact p-values via permutation for small samples. For social statistics exams, knowing both the hand calculation method and the software interpretation is often required — you need to demonstrate understanding of the mechanics, not just the button to click.

                Key insight: The U statistic directly counts the number of times an observation from Group 1 "wins" against an observation from Group 2 (i.e., is ranked higher). U₁ = 19 means that for 19 out of 30 possible Group A vs. Group B comparisons, the Group A score was higher. This intuitive interpretation — counting pairwise "wins" — is what makes the Mann-Whitney U test conceptually powerful and practically meaningful.
            

Test Two

The Wilcoxon Signed-Rank Test: Complete Guide

The Wilcoxon Signed-Rank test was introduced by Frank Wilcoxon in a landmark 1945 paper, making it one of the oldest non-parametric tests in the statistical literature. It handles paired data — situations where each observation in one group has a corresponding observation in the other. Pre-test/post-test designs, before-and-after intervention studies, and matched-pairs experiments all call for this test. Scientific method and research design courses at universities teach this as the standard approach whenever paired observations violate the normality assumption for a paired t-test.

What Is the Wilcoxon Signed-Rank Test Testing?

The Wilcoxon Signed-Rank test goes beyond the simple sign test (which only counts which direction differences go) by also incorporating the magnitude of differences — ranking them. The null hypothesis is that the median of the paired differences is zero, meaning the treatment or time effect produces no systematic change. The alternative is that the median difference is non-zero (or directional for one-tailed tests). Both the sign and the rank of each difference contribute to the test statistic, making it more powerful than the sign test.

H₀: The median of the differences (D = X₂ - X₁) equals zero H₁: The median of the differences \neq 0 (two-tailed) or > 0 or < 0 (one-tailed)

Assumptions of the Wilcoxon Signed-Rank Test

Three assumptions must hold for a valid Wilcoxon Signed-Rank test. First, the data must be paired — each observation in Condition 1 must have a matched partner in Condition 2. This pairing is the whole basis of the test. Second, the dependent variable must be continuous (or at least ordinal in a way that allows meaningful ranking of differences). Third, the differences between pairs should be symmetrically distributed around the median — the test does not require normality, but asymmetric difference distributions can affect validity. This third assumption is weaker than it sounds: most real-world difference distributions are reasonably symmetric even when the raw scores are not. Expert statistics guidance helps students verify these assumptions through graphical inspection of difference distributions.

How to Perform the Wilcoxon Signed-Rank Test: Step-by-Step

A worked example: A health researcher at a UK university measures anxiety scores (0–40 scale, not normally distributed) in 8 patients before and after a mindfulness intervention. The goal is to determine whether the intervention reduces anxiety. Nursing and health science students frequently encounter exactly this type of pre-post design in clinical research assignments.

Patient: 1 2 3 4 5 6 7 8 Before: 28 35 22 31 19 27 33 24 After: 21 30 25 22 18 19 28 20 Difference: -7 -5 +3 -9 -1 -8 -5 -4 (Positive difference = anxiety increased; negative = decreased)

State Hypotheses and Calculate Differences

H₀: The median anxiety difference (after − before) = 0. H₁: The median difference ≠ 0 (two-tailed; or < 0 if directional). Compute D = After − Before for each pair. Differences of zero are dropped from the analysis — the effective n becomes the count of non-zero differences.

Rank the Absolute Differences (Ignoring Signs)

|D|: 7 5 3 9 1 8 5 4 Rank |D|: 5 3.5 2 8 1 7 3.5 (tied at 5 \to average rank (3+4)/2 = 3.5) Actually ordering: 1(rank1), 3(rank2), 4(rank3), 5&5(ranks3.5&3.5\to wait, recalculate) Sorted |D|: 1, 3, 4, 5, 5, 7, 8, 9 Ranks: 1 2 3 4.5 4.5 6 7 8 (two 5s tied \to average of ranks 4&5 = 4.5)

Assign Signs to Ranks

Patient D |D| Rank Signed Rank 1 -7 7 6 -6 2 -5 5 4.5 -4.5 3 +3 3 2 +2 4 -9 9 8 -8 5 -1 1 1 -1 6 -8 8 7 -7 7 -5 5 4.5 -4.5 8 -4 4 3 -3

Calculate W+ and W−

W+ (sum of positive ranks) = 2 W- (sum of negative ranks) = 6 + 4.5 + 8 + 1 + 7 + 4.5 + 3 = 34 Test statistic: W = min(W+, W-) = min(2, 34) = 2 Verification: W+ + W- = 2 + 34 = 36 = n(n+1)/2 = 8\times9/2 = 36 ✓

Determine the p-value and Conclude

For n = 8 at α = 0.05 (two-tailed), the critical value from Wilcoxon Signed-Rank tables is W_critical = 4. Since W = 2 ≤ 4, we reject H₀ (p < 0.05). The mindfulness intervention significantly reduced anxiety scores.

For large n (> 25), use normal approximation: μ_W = n(n+1)/4 = 8\times9/4 = 18 σ_W = \sqrt(n(n+1)(2n+1)/24) = \sqrt(8\times9\times17/24) = \sqrt51 \approx 7.14 z = (W - μ_W)/σ_W = (2 - 18)/7.14 \approx -2.24 p \approx 0.025 (two-tailed) Effect size: r = |z|/\sqrtN = 2.24/\sqrt8 \approx 0.79 (large)

A complete APA-style report would read: "A Wilcoxon Signed-Rank test indicated that anxiety scores were significantly lower following the mindfulness intervention (Mdn = 21.5) compared to baseline (Mdn = 27.5), W = 2, z = -2.24, p = .025, r = .79." This format — test statistic, z-score, p-value, effect size, medians for each condition — is what psychology and nursing journals at institutions affiliated with the British Psychological Society (BPS) and American Psychological Association (APA) require. Writing a research paper in the health sciences almost always requires non-parametric test results reported in this way.

                Why "Signed-Rank"? The name captures exactly what the test does. It uses both the sign (direction: positive or negative) and the rank (magnitude: how large the difference was, relative to other differences) of each paired difference. This is what makes it more powerful than the simple sign test, which only uses direction. A patient who improved by 20 points counts more heavily than one who improved by 1 point — the rank captures this.
            

Side by Side

Mann-Whitney U vs. Wilcoxon Signed-Rank: The Full Comparison

The two tests covered in this guide are often confused because they both use ranks and both test for differences between two groups. The distinction is clean and absolute: independent groups → Mann-Whitney U; paired groups → Wilcoxon Signed-Rank. Every other feature of each test flows from this fundamental design difference. Here is the complete side-by-side comparison across every dimension you need to know for exams, dissertations, and published research.

Feature	Mann-Whitney U Test	Wilcoxon Signed-Rank Test
Also known as	Wilcoxon Rank-Sum Test, Mann-Whitney-Wilcoxon	Wilcoxon T Test, Paired Wilcoxon
Study design	Two independent groups (different subjects)	Two related/paired groups (same subjects or matched pairs)
Parametric equivalent	Independent samples t-test	Paired samples t-test
Null hypothesis	Both populations have the same distribution (P(X>Y) = 0.5)	Median of paired differences = 0
Test statistic	U = min(U₁, U₂)	W = min(W+, W−)
What is ranked	All observations from both groups combined	Absolute values of differences between pairs
Effect size	r = \|Z\|/√N; rank-biserial correlation	r = \|Z\|/√N (same formula)
Sample size notation	n₁ and n₂ (sizes of each group)	n = number of non-zero difference pairs
Minimum data level	Ordinal	Continuous (or ordinal with rankable differences)
Handles ties	Average tied ranks; use correction in z-formula	Average tied ranks; drop zero differences
Software (SPSS)	Analyze → Nonparametric → 2 Independent Samples	Analyze → Nonparametric → 2 Related Samples
Software (R)	wilcox.test(x, y, paired = FALSE)	wilcox.test(x, y, paired = TRUE)

Notice that in R, both tests are run with wilcox.test() — the paired argument determines which test is performed. This reflects the close mathematical relationship between the two tests, but the interpretation and appropriate context differ completely. Getting them confused in a dissertation methods section will draw pointed criticism from dissertation committees at any serious institution. Statistics experts can help you correctly identify your design and choose the appropriate test before you write a single line of code.

Which Test Is More Powerful?

The Wilcoxon Signed-Rank test (paired design) is generally more powerful than the Mann-Whitney U (independent design) for the same data, because pairing reduces variability. When you control for individual differences by measuring the same person twice, the noise in your comparison decreases, making it easier to detect a real effect. This mirrors the advantage of the paired t-test over the independent t-test in parametric analysis. This is why researchers deliberately design paired studies — matching subjects on key variables, or using repeated measures — when they anticipate that individual differences would otherwise mask the treatment effect.

Need Help Running or Interpreting These Tests?

Our statistics experts help with SPSS, R, Python, and hand calculations for Mann-Whitney U and Wilcoxon tests. Fast turnaround, step-by-step solutions.

Start an Order Login to Account

Results Reporting

Effect Sizes and How to Report Non-parametric Test Results

A p-value tells you whether your result is statistically significant. It does not tell you whether it matters in practice. Effect size fills that gap — it quantifies the magnitude of the difference, independent of sample size. Reporting effect sizes is required by APA Publication Manual (7th edition), expected by journals indexed in PubMed and PsycINFO, and increasingly demanded by dissertation committees across universities in the US and UK. For non-parametric tests, the most common effect size measure is r, the rank-biserial correlation coefficient. Predictive modeling and regression analyses always report standardized effect sizes for the same reason — raw statistics are not comparable across studies or sample sizes.

Calculating Effect Size r for Non-parametric Tests

Effect size r = Z / \sqrtN Where: Z = the z-score from the normal approximation of the test statistic N = total number of observations used in the test (for Mann-Whitney: N = n₁ + n₂) (for Wilcoxon: N = number of non-zero pairs) Interpretation (Cohen's benchmarks): r = 0.10 \to Small effect r = 0.30 \to Medium effect r = 0.50 \to Large effect

For the Mann-Whitney U, an alternative effect size is the rank-biserial correlation (r_rb), calculated as:

r_rb = 1 - (2U) / (n₁ \times n₂) This ranges from -1 to +1 and has the same benchmarks as r above. It can also be interpreted as: the proportion of Group 1 wins minus Group 2 wins in all pairwise comparisons — a directly intuitive measure of separation.

The Complete APA-Style Results Write-Up

Here is the template for reporting these tests in academic papers and dissertations, based on APA 7th edition guidelines. This format is expected at universities using APA style — which includes virtually all psychology, education, nursing, and social science programs in the US and Canada, and is increasingly adopted in UK institutions.

Mann-Whitney U Test: "A Mann-Whitney U test revealed a statistically significant difference in [DV] between [Group 1] (Mdn = [value]) and [Group 2] (Mdn = [value]), U = [value], z = [value], p = [value], r = [value]." Wilcoxon Signed-Rank Test: "A Wilcoxon Signed-Rank test indicated that [DV] was significantly [higher/lower] at [Time 2/Condition 2] (Mdn = [value]) compared to [Time 1/Condition 1] (Mdn = [value]), W = [value], z = [value], p = [value], r = [value]." Always report medians (not means) as the measure of central tendency for non-parametric tests, since the test operates on ranks not raw values.

Always report medians — not means — as your central tendency measure when using non-parametric tests. The mean is sensitive to the outliers and skewness that made you choose non-parametric tests in the first place. The median is the appropriate summary statistic. This is a detail that consistently distinguishes students who understand the underlying logic from those who are following a procedure mechanically. Understanding the difference between mean, median, and mode is foundational to choosing the right summary statistic to accompany your test results.

Reporting mistake to avoid: Do not report "the means were compared using a Mann-Whitney U test." The Mann-Whitney U test does not compare means — it compares distributions (or medians, under equal-shape distributions). Writing "means were compared" when you used a non-parametric test signals that you do not understand what the test actually does.

Implementation

Running Non-parametric Tests in SPSS, R, and Python

In practice, non-parametric tests are almost always run using statistical software rather than by hand. Hand calculations build conceptual understanding — and are required on many statistics exams — but SPSS, R, and Python handle the mechanics instantly for real datasets. Here's how to run both tests in each major package, and what output you need to report. University statistics assignments in the US and UK increasingly require documented software output alongside written interpretation.

SPSS: Mann-Whitney U and Wilcoxon Signed-Rank

SPSS is the most commonly used statistics package in psychology, social science, and health research programs at US and UK universities — used at University of Leeds, UCLA, Michigan State, and hundreds of other institutions. IBM's SPSS Statistics makes non-parametric tests accessible through the menu system.

Mann-Whitney U in SPSS: Analyze \to Nonparametric Tests \to Legacy Dialogs \to 2 Independent Samples 1. Move dependent variable to "Test Variable List" 2. Move grouping variable to "Grouping Variable" \to Define Groups (enter group codes) 3. Ensure "Mann-Whitney U" is checked under Test Type 4. For small samples: click "Exact" \to select "Exact" for exact p-values 5. Click OK Key output to report: U, Standardized Test Statistic (Z), Asymp. Sig. (p-value) Calculate r manually: r = |Z| / \sqrtN --- Wilcoxon Signed-Rank in SPSS: Analyze \to Nonparametric Tests \to Legacy Dialogs \to 2 Related Samples 1. Move both variables (Variable 1 and Variable 2) into the Test Pairs box 2. Ensure "Wilcoxon" is checked under Test Type 3. Click OK Key output: Test Statistic (W), Standardized Test Statistic (Z), Asymp. Sig. (p-value)

R: Both Tests with wilcox.test()

R is increasingly required in quantitative methods courses at research-intensive universities. The wilcox.test() function handles both tests — only the paired argument changes. R is free, open-source, and used widely by researchers at Harvard, Stanford, Oxford, and in industry at companies like Google and Netflix for statistical analysis.

# Mann-Whitney U Test (Independent Groups) groupA <- c(72, 65, 88, 55, 79) groupB <- c(81, 70, 95, 63, 85, 77) wilcox.test(groupA, groupB, paired = FALSE, alternative = "two.sided", exact = TRUE) # exact = TRUE for small samples # Output: W statistic (= U here), p-value # Effect size (install 'rstatix' package) library(rstatix) wilcox_effsize(data, formula, paired = FALSE) # returns r # Wilcoxon Signed-Rank Test (Paired Groups) before <- c(28, 35, 22, 31, 19, 27, 33, 24) after <- c(21, 30, 25, 22, 18, 19, 28, 20) wilcox.test(after, before, paired = TRUE, alternative = "two.sided") # Output: V statistic (= W), p-value # Note: R calls the Wilcoxon paired statistic "V", not "W" — report as W

Python: scipy.stats

Python's scipy.stats library provides both tests. Python is the dominant language in data science and machine learning programs, used extensively at programs offered by MIT, Carnegie Mellon, and Georgia Tech, and in industry across tech companies. For students in data science programs needing data science assignment help, Python implementation is often the primary deliverable.

from scipy import stats import numpy as np # Mann-Whitney U Test group_a = [72, 65, 88, 55, 79] group_b = [81, 70, 95, 63, 85, 77] U_stat, p_value = stats.mannwhitneyu(group_a, group_b, alternative='two-sided') N = len(group_a) + len(group_b) z = stats.norm.ppf(p_value / 2) # approximate z from p r = abs(z) / np.sqrt(N) print(f"U = {U_stat}, p = {p_value:.4f}, r = {r:.3f}") # Wilcoxon Signed-Rank Test before = [28, 35, 22, 31, 19, 27, 33, 24] after = [21, 30, 25, 22, 18, 19, 28, 20] W_stat, p_value = stats.wilcoxon(after, before, alternative='two-sided') print(f"W = {W_stat}, p = {p_value:.4f}")

Regardless of which software you use, the interpretation workflow is identical: check the test statistic against your decision rule (p < α), calculate and report the effect size r, and write up results in the format appropriate for your field and institution. Online resources for students include the R documentation, scipy documentation, and SPSS tutorials from IBM — all freely accessible and worth bookmarking for your statistics courses.

Real-World Use

Non-parametric Tests in Real Research: Applications Across Fields

Non-parametric tests are not theoretical exercises. They appear in published research across medicine, psychology, education, business, and engineering — whenever researchers encounter the small samples, ordinal data, or non-normal distributions that the real world so reliably produces. Understanding where and why these tests are used deepens your ability to apply them correctly in your own work.

Clinical Medicine and Pharmacology

Clinical trials with small patient populations routinely use the Mann-Whitney U test. Rare disease trials at institutions like the National Institutes of Health (NIH) in Bethesda and NHS research units in the UK frequently have samples of 10–30 patients per arm — too small to verify normality confidently. Pain scores, quality of life indices, and functional assessment scales are ordinal by design. The BMJ Evidence-Based Medicine journal regularly publishes studies where Mann-Whitney U tests compare patient outcomes between drug and placebo groups. For nursing students in Boston and other healthcare programs, understanding how to read these results in published literature is as important as running the tests.

Psychology and Behavioral Research

Psychology relies heavily on non-parametric tests. Likert-scale questionnaires dominate experimental psychology, cognitive science, and social psychology — and their ordinal nature makes non-parametric tests the methodologically defensible choice. The Wilcoxon Signed-Rank test is standard for pre-post intervention studies: depression scale scores before and after CBT, anxiety ratings before and after exposure therapy, performance scores before and after training. Journals published by the American Psychological Association (APA) and British Psychological Society (BPS) accept non-parametric test results routinely, provided effect sizes and appropriate descriptive statistics are reported. Psychology assignment help for students covers both the statistical tests and the APA reporting conventions.

Education Research

Education researchers comparing student performance across two teaching methods, two schools, or two demographic groups frequently encounter non-normal distributions — especially with small class sizes, grading ceiling effects, or heterogeneous student populations. A researcher comparing reading comprehension scores between students taught with traditional versus inquiry-based methods at a US middle school, where each class has 20–25 students, would typically use the Mann-Whitney U. UK researchers at institutions like the UCL Institute of Education use the same approach for similar comparisons. Research on online versus in-person education post-COVID has increasingly used non-parametric methods as data from diverse populations and varied assessment contexts proved non-normal.

Business and Management Research

Customer satisfaction surveys, employee engagement scores, and service quality ratings produce ordinal data by design — and non-parametric tests are the appropriate analysis tool. A retailer comparing customer satisfaction (1–5 stars) between two store formats uses Mann-Whitney U. An HR researcher comparing employee engagement before and after a new management program uses Wilcoxon Signed-Rank. Business schools at Wharton, London Business School, and INSEAD teach non-parametric methods in their quantitative research methods courses for exactly these applications. Business management assignment help frequently involves choosing between parametric and non-parametric approaches for ordinal survey data.

"In real research, non-parametric tests are often the honest choice — they do not require you to pretend your ordinal survey data is normally distributed. The Mann-Whitney U and Wilcoxon Signed-Rank tests let you analyze what you actually collected, not what you wish you had collected." — A perspective consistent with guidelines from the American Statistical Association.

Exam Strategy

Common Mistakes and How to Avoid Them on Exams and Assignments

Students lose marks on non-parametric test questions in predictable, preventable ways. The following mistakes appear consistently in exam scripts at statistics courses worldwide — and in dissertation methods sections reviewed by supervisors at Harvard, Edinburgh, McGill, and other research universities. Know these errors cold before any exam or submission. Common academic mistakes in quantitative work follow similar patterns — sloppy test selection, incomplete reporting, and failure to verify assumptions.

Mistake 1: Confusing Independent and Paired Designs

Using Mann-Whitney U for paired data or Wilcoxon Signed-Rank for independent groups. This is the most consequential error — it produces entirely invalid results. Before choosing your test, always ask: "Is the same subject (or a matched partner) contributing to both groups, or are these completely different people?" If the same subject → Wilcoxon Signed-Rank. Different subjects → Mann-Whitney U. Period.

Mistake 2: Reporting Means Instead of Medians

Non-parametric tests are based on ranks, not raw values. The appropriate measure of central tendency is the median, not the mean. Reporting "the mean score for Group A was 72" after a Mann-Whitney U test reveals that you do not understand what the test is doing. Report medians. Always. If you are asked to describe the central tendency of your groups for a non-parametric analysis, use Mdn = [value].

Mistake 3: Omitting Effect Sizes

Reporting "p = 0.03" without effect size r is incomplete by modern standards. A statistically significant result tells you an effect exists. Effect size tells you whether it matters. A study with n = 1,000 per group might find p = 0.001 for a trivially small difference. A study with n = 15 per group might find p = 0.04 for a large, practically important difference. Both facts require the effect size to be interpretable. Calculate r = |Z|/√N and include it in every reported result.

Mistake 4: Treating Non-parametric Tests as Assumption-Free

Students sometimes assume non-parametric tests have no assumptions. They do. Independence of observations is critical — non-parametric tests fail just as badly as parametric ones when observations are correlated. Ordinal measurement level is required. For median comparison using Mann-Whitney U, similar distribution shapes are needed. Verify your assumptions before presenting results as if they were unconditionally valid.

Mistake 5: Applying the Wrong Critical Value Table

Mann-Whitney U and Wilcoxon Signed-Rank have separate critical value tables, and within each table, one-tailed versus two-tailed critical values differ. A common exam error is using a one-tailed critical value for a two-tailed hypothesis, or vice versa. Write out your hypothesis clearly before looking up any table — then make sure the table column you use matches the tail(s) of your hypothesis.

Handling Ties in Rank-Based Tests

Tied observations receive the average of the ranks they would occupy. If two scores both rank 4th and 5th, each receives rank 4.5. For the Mann-Whitney U, large numbers of ties require a correction to the z-formula:

Tie correction for Mann-Whitney U z-statistic: σ_U (corrected) = \sqrt[ n₁\cdotn₂/12 \times (N + 1 - Σtⱼ³/(N(N-1))) ] where tⱼ = number of observations in the jth tied group N = total observations Software packages apply this correction automatically. For hand calculations with many ties, use software verification.

For the Wilcoxon Signed-Rank test, pairs with a difference of exactly zero are excluded from the analysis — reducing the effective n. If many pairs are tied at zero, the test loses power and this should be acknowledged as a limitation. For statistics help with specific exam problems, working through tie-corrected examples before your test is time well spent.

                Quick test selection guide: Two groups, different people, ordinal/non-normal data → Mann-Whitney U. Same people measured twice (or matched pairs), ordinal/non-normal differences → Wilcoxon Signed-Rank. Three or more independent groups → Kruskal-Wallis. Three or more related groups → Friedman test. Non-parametric correlation → Spearman's ρ. Memorize this map and you will select the correct test on every exam.
            

Statistics Assignment or Dissertation Analysis Due?

From test selection and SPSS/R output interpretation to full write-ups with effect sizes — our statistics experts deliver fast, reliable, explained solutions for students at every level.

Get Help With My Assignment Log In

Beyond Two Groups

Related Non-parametric Tests: The Bigger Picture

The Mann-Whitney U and Wilcoxon Signed-Rank tests are the most fundamental non-parametric tests, but they are part of a larger family. As your research becomes more complex — more groups, more variables, more sophisticated designs — you will need the tests that extend the same rank-based logic beyond the two-group case. Here is the landscape you need to navigate.

Kruskal-Wallis Test: Three or More Independent Groups

The Kruskal-Wallis test — developed by William Kruskal and W. Allen Wallis in 1952 — extends the Mann-Whitney U logic to three or more independent groups. It tests whether at least one group tends to have different values than the others. It is the non-parametric alternative to one-way ANOVA. If significant, post-hoc Mann-Whitney U tests with Bonferroni correction are used to identify which pairs differ. A researcher at University of Texas at Austin comparing student satisfaction across three teaching formats (lecture, online, hybrid) with non-normally distributed satisfaction scores would use Kruskal-Wallis. Statistics help for education research regularly involves guiding students through Kruskal-Wallis analyses in SPSS and R.

Friedman Test: Three or More Related Groups

The Friedman test is the non-parametric alternative to repeated-measures ANOVA — it compares three or more conditions measured on the same subjects. Named after economist Milton Friedman (who developed it before his Nobel Prize work), it ranks data within each subject separately and tests whether condition effects are consistent across subjects. A clinical researcher tracking pain scores at baseline, 4 weeks, and 8 weeks in the same patients would use the Friedman test when pain scores are non-normal. Healthcare management assignment help involves these longitudinal non-parametric designs frequently.

Spearman's Rank Correlation

Spearman's rank correlation (ρ) is the non-parametric analogue of Pearson's correlation. Instead of correlating raw scores, it correlates the ranks of two variables. It measures monotonic association — whether as one variable increases, the other tends to increase (or decrease) — without requiring linearity or normality. A value of ρ = 0.7 means that high ranks on Variable 1 tend to pair with high ranks on Variable 2. Spearman's ρ is widely used in psychology, medicine, and social science when variables are ordinal or when a Pearson correlation would be distorted by outliers. It is tested in virtually every university-level research methods course and appears prominently in predictive modeling as a foundation for understanding rank-based association before moving to parametric regression.

Research Scenario	Parametric Test	Non-parametric Equivalent	Key Condition for Non-parametric Use
Two independent groups	Independent t-test	Mann-Whitney U	Non-normal data, ordinal DV, outliers, n < 15
Two related/paired groups	Paired t-test	Wilcoxon Signed-Rank	Non-normal differences, ordinal data, pre-post design
Three+ independent groups	One-way ANOVA	Kruskal-Wallis	Non-normal data, ordinal DV, unequal variances
Three+ related groups	Repeated-measures ANOVA	Friedman Test	Non-normal data, small n per condition
Association between two variables	Pearson's correlation	Spearman's ρ	Ordinal variables, non-linear monotonic relationship, outliers
Single sample vs. known median	One-sample t-test	Wilcoxon Signed-Rank (one-sample version)	Non-normal single sample distribution

Understanding this full family of non-parametric tests — not just the two-group cases — is what separates students who can genuinely analyze data from those who are following a script. Dissertation chapters, published papers, and professional research presentations require the ability to select the appropriate test for any design and justify that selection in writing. Comprehensive knowledge of sampling methods and statistical tests together form the backbone of quantitative research literacy in any discipline.

Frequently Asked

Frequently Asked Questions About Non-parametric Tests

What is a non-parametric test and when should I use one? +

A non-parametric test is a statistical hypothesis test that does not require data to follow a specific distribution. Non-parametric tests rank observations rather than using raw values, making them robust to non-normality, outliers, and ordinal measurement. Use a non-parametric test when your data is ordinal (Likert scales, rankings), when your continuous data is significantly non-normal and your sample is small (n < 30), when you have extreme outliers that distort means, or when your data contains many ties at certain values. When normality holds or samples are large, parametric tests are preferred for their greater statistical power.

What is the Mann-Whitney U test used for? +

The Mann-Whitney U test compares two independent groups to determine whether one group tends to have systematically higher or lower values than the other. It is the non-parametric alternative to the independent samples t-test. Use it when: (1) your two groups consist of different, unrelated subjects; (2) your dependent variable is ordinal or non-normally distributed continuous; (3) your sample is small (n < 30 per group) and normality cannot be confirmed. Common applications include comparing patient outcomes between two treatment groups, comparing student performance between two teaching methods, or comparing customer satisfaction ratings across two product conditions.

What is the difference between Mann-Whitney U and Wilcoxon Signed-Rank? +

The fundamental difference is your research design. Mann-Whitney U is for two independent groups — different people in each group (e.g., Drug A group vs. Drug B group with different patients). Wilcoxon Signed-Rank is for two related/paired groups — the same people measured under both conditions, or carefully matched pairs (e.g., same patients measured before and after treatment). Both tests use ranks and have similar formulas, but they are applied to fundamentally different data structures. Using the wrong test — even with the correct formula — produces invalid results. Always identify your design first: independent or paired?

How do I calculate the effect size for the Mann-Whitney U test? +

Calculate effect size r = |Z| / √N, where Z is the z-score from the test (from the normal approximation or software output) and N is the total number of observations (n₁ + n₂). Benchmarks: r = 0.1 (small), r = 0.3 (medium), r = 0.5 (large). Alternatively, use the rank-biserial correlation: r_rb = 1 − (2U)/(n₁ × n₂). Always report effect sizes alongside p-values — statistical significance tells you an effect exists, effect size tells you whether it matters practically. APA 7th edition requires effect size reporting in all statistical analyses.

Do non-parametric tests have any assumptions? +

Yes — non-parametric tests are not assumption-free. The Mann-Whitney U test requires: (1) independent observations within and between groups; (2) at least ordinal level of measurement; (3) similarly shaped distributions in both groups if the goal is to compare medians. The Wilcoxon Signed-Rank test requires: (1) paired observations; (2) continuous or ordinal measurement; (3) symmetrically distributed differences around the median. Normality is NOT required by either test. The most commonly violated assumption in practice is independence — non-parametric tests are just as invalid as parametric tests when observations are correlated.

How do I run the Wilcoxon Signed-Rank test in R? +

In R, use the wilcox.test() function with paired = TRUE. Example: wilcox.test(after, before, paired = TRUE, alternative = "two.sided"). This returns the V statistic (R's name for W), the p-value, and test information. For effect size, install the rstatix package and use wilcox_effsize(). Note that R's output calls the statistic "V" — when reporting results, refer to it as W. For exact p-values with small samples, R calculates them automatically (when no ties); for large samples or tied data, it uses the normal approximation.

Can I use a Mann-Whitney U test for Likert scale data? +

Yes — the Mann-Whitney U test is appropriate for Likert scale data (e.g., 1–5 or 1–7 response scales) because it only requires ordinal measurement. Likert data has ordered categories but not necessarily equal intervals between them, which violates the continuous measurement assumption of the t-test. Using a Mann-Whitney U test for individual Likert items is unambiguously defensible. For composite Likert scores (mean of multiple items), some methodologists argue that treating them as approximately continuous and using parametric tests is acceptable — particularly with large samples. For most university-level work, non-parametric tests for individual Likert items is the safer and more defensible choice.

What should I do if the Shapiro-Wilk test is significant? +

If the Shapiro-Wilk test (p < 0.05) indicates significant departure from normality, you have several options: (1) If n ≥ 30, the Central Limit Theorem may rescue you — consider proceeding with the parametric test and reporting the Shapiro-Wilk result as a limitation; (2) If n < 30, switch to the appropriate non-parametric test (Mann-Whitney U for independent groups, Wilcoxon Signed-Rank for paired); (3) Transform your data (log, square root) and re-check normality; (4) Use a parametric test with a bootstrap confidence interval. Note: with large samples (n > 100), Shapiro-Wilk becomes hypersensitive and will flag trivially small departures from normality as significant — visual inspection via Q-Q plots becomes more informative than the formal test.

How do I report non-parametric test results in APA format? +

APA 7th edition format for Mann-Whitney U: "A Mann-Whitney U test revealed a significant [or non-significant] difference between [Group 1] (Mdn = X) and [Group 2] (Mdn = X), U = X, z = X, p = .XXX, r = .XX." For Wilcoxon Signed-Rank: "A Wilcoxon Signed-Rank test indicated that [DV] was significantly [higher/lower] at [Condition 2] (Mdn = X) than at [Condition 1] (Mdn = X), W = X, z = X, p = .XXX, r = .XX." Key rules: (1) always report medians, not means; (2) always include effect size r; (3) report exact p-values (e.g., p = .023, not p < .05) whenever possible; (4) use lowercase italics for statistical symbols (U, W, z, p, r, Mdn) in formatted documents.

Is the Mann-Whitney U test the same as the Wilcoxon Rank-Sum test? +

Yes — the Mann-Whitney U test and the Wilcoxon Rank-Sum test are mathematically equivalent procedures for comparing two independent groups. They produce equivalent test statistics and identical p-values. The difference is in how the test statistic is expressed: Wilcoxon's formulation uses the rank sum (R) of one group; Mann and Whitney's formulation uses the U statistic, which counts pairwise comparisons. SPSS refers to the test as "Mann-Whitney," while R's wilcox.test(paired=FALSE) implements Wilcoxon's rank-sum procedure but is used interchangeably. Do not confuse this test with the Wilcoxon Signed-Rank test, which is for paired data — they are different tests despite sharing "Wilcoxon" in the name.

Blog