Non-Parametric Statistics

C5025HF

Dr K.J. Mhango

Welcome

What You Will Learn Today

  • Parametric vs non-parametric intuition
  • Ranking logic and distribution-free ideas
  • Mann–Whitney, Wilcoxon, Kruskal–Wallis
  • Correlations: Spearman & Kendall
  • Exact tests (Fisher)

1: Why Non‑Parametrics?

When parametric assumptions break

  • Non-normal data
  • Skewed distributions
  • Ordinal measurements
  • Heterogeneous variances
  • Small samples
  • Outliers

Ranking-based thinking: The “Race” Analogy

Instead of looking at exact times (values), look at finishing positions (ranks).

The Intuition: Team A vs Team B (Independent)

Imagine a race between two teams.

  • Parametric (t-test): Compares the Average Time of Team A vs Team B.
    • If one runner in Team A takes 5 hours (outlier), Team A’s average is ruined.
  • Non-Parametric (Mann-Whitney): Compares the Finishing Positions (Ranks).
    • It combines everyone (both teams) into one single lineup from 1st to last.
    • Then it looks at where Team A falls vs Team B.
    • If that slow runner takes 5 hours or 5 days, they are still just “Last Place”. The rank (last) is the same.

The Intuition: Before vs After (Paired)

Imagine the same runner running twice (Race 1 vs Race 2).

  • Parametric (Paired t-test): Calculates the improvement (Time 1 - Time 2) for each runner, then averages them.
    • One runner improving by 5 hours makes the whole group look vastly improved on average.
  • Non-Parametric (Wilcoxon Signed-Rank):
    • It calculates the change for each runner.
    • It ranks these changes by size (ignoring direction). Small change = Rank 1, Huge change = Rank 100.
    • Then it asks: “Are the biggest ranks associated with getting faster or getting slower?”
    • If one runner improves by 5 hours, they get the highest rank, but they don’t pull the magnitude of the sum to infinity. It caps the influence of the outlier.

Why this is useful

  • Robust to Outliers: Extreme values don’t pull the results (unlike the mean).
  • Skew doesn’t matter: We are testing the order, not the bell curve.
  • Any units work: cm, mm, log-transformed, or “Likert scales” — as long as you can say “A > B”, you can rank.

What the tests actually ask

  • Mann–Whitney (2 groups): If I pick one random value from Group A and one from Group B, is A likely to be higher than B?
  • Wilcoxon (Paired): It looks at the change in each pair. Do the big changes tend to be increases or decreases?
    • (It ranks the size of the changes, then checks if the positive ranks outweigh the negative ranks).
  • Kruskal–Wallis (>2 groups): Do some groups systematically rank higher than others?

What to report

  • Descriptive stats: Medians and Interquartile Ranges (IQR) for each group.
  • Effect size / Direction: Explain the direction of the difference in plain English.
    • Instead of just “p < 0.05”, say: “Species A tends to have higher values than Species B.”
    • Optional but good: “The probability that a random value from Group A exceeds Group B is approx X%.”
  • Visuals: Always include a boxplot or jitter plot to show the spread and overlap.

2: Choosing the Right Test

Cheat Sheet

Design Parametric Non-parametric
2 independent groups t-test Mann–Whitney
2 paired measures paired t Wilcoxon signed-rank
>2 independent groups ANOVA Kruskal–Wallis
Correlation Pearson Spearman, Kendall
2×2 association χ² test Fisher exact

3: t-test Using iris

Goal

Compare Sepal.Length between two species. We subset to two species for a t-test:

Code
iris2 <- subset(iris, Species %in% c("setosa", "versicolor"))

Visualise

Code
library(ggplot2)
ggplot(iris2, aes(x = Species, y = Sepal.Length, fill = Species)) +
  geom_boxplot(alpha = 0.6,width=0.2) +
  geom_jitter(width = 0.1, alpha = 0.5) +
  theme_minimal() +
  ggtitle("Sepal Length by Species")

Parametric t-test Assumptions

  1. Independence
    • Observations must be independent.
    • In iris, each flower is a unique specimen → ✔ OK.
  2. Normality of the outcome within each group
    • Test with Shapiro–Wilk.
Code
shapiro.test(iris2$Sepal.Length[iris2$Species=="setosa"])

    Shapiro-Wilk normality test

data:  iris2$Sepal.Length[iris2$Species == "setosa"]
W = 0.9777, p-value = 0.4595
Code
shapiro.test(iris2$Sepal.Length[iris2$Species=="versicolor"])

    Shapiro-Wilk normality test

data:  iris2$Sepal.Length[iris2$Species == "versicolor"]
W = 0.97784, p-value = 0.4647
  • If p < 0.05 in either group, normality is questionable for that group.
    • Also use Q–Q plot:
Code
library(ggplot2)
ggplot(iris2, aes(sample = Sepal.Length)) + stat_qq() + stat_qq_line() + facet_wrap(~Species)
  • Points close to the straight line suggest normality; strong bends or S‑shapes suggest non‑normality.
  1. Homogeneity of variances (Levene test)
Code
library(car)
leveneTest(Sepal.Length ~ Species, data = iris2)
Levene's Test for Homogeneity of Variance (center = median)
      Df F value   Pr(>F)   
group  1  8.1727 0.005196 **
      98                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • If p < 0.05, variances differ notably between the two species.

What if assumptions fail?

  • Non-normality → use Mann–Whitney (Wilcoxon rank-sum).
  • Heterogeneous variances → Welch t-test OR non-parametric.

Parametric test

Code
t.test(Sepal.Length ~ Species, data = iris2)

    Welch Two Sample t-test

data:  Sepal.Length by Species
t = -10.521, df = 86.538, p-value < 2.2e-16
alternative hypothesis: true difference in means between group setosa and group versicolor is not equal to 0
95 percent confidence interval:
 -1.1057074 -0.7542926
sample estimates:
    mean in group setosa mean in group versicolor 
                   5.006                    5.936 
  • Look for the p-value and the confidence interval; with normality and similar variances, this is appropriate.

Non-parametric alternative (if assumptions break)

Code
wilcox.test(Sepal.Length ~ Species, data = iris2)

    Wilcoxon rank sum test with continuity correction

data:  Sepal.Length by Species
W = 168.5, p-value = 8.346e-14
alternative hypothesis: true location shift is not equal to 0
  • Use this when data are skewed or have outliers; the p-value tests if one group tends to have higher values.

Example output and interpretation (R)

Code
# Run both to compare

tt_res <- t.test(Sepal.Length ~ Species, data = iris2)
mw_res <- wilcox.test(Sepal.Length ~ Species, data = iris2, exact = FALSE)

tt_res

    Welch Two Sample t-test

data:  Sepal.Length by Species
t = -10.521, df = 86.538, p-value < 2.2e-16
alternative hypothesis: true difference in means between group setosa and group versicolor is not equal to 0
95 percent confidence interval:
 -1.1057074 -0.7542926
sample estimates:
    mean in group setosa mean in group versicolor 
                   5.006                    5.936 
Code
mw_res

    Wilcoxon rank sum test with continuity correction

data:  Sepal.Length by Species
W = 168.5, p-value = 8.346e-14
alternative hypothesis: true location shift is not equal to 0
Code
# Print key summaries for the slide
cat("t-test p-value:", format.pval(tt_res$p.value, digits = 3), "\n")
t-test p-value: <2e-16 
Code
cat("Mann–Whitney p-value:", format.pval(mw_res$p.value, digits = 3), "\n")
Mann–Whitney p-value: 8.35e-14 
  • Interpret the p-values and medians.
  • Interpretation: If data look non-normal or have outliers, prefer Mann–Whitney. Report simply, e.g., “Species A tends to have longer sepals than Species B.”

Paired t-test Using iris

Goal

Compare Sepal.Length vs Sepal.Width for the same flowers. Since these measurements come from the same flower, they are paired.

Visualise

Code
# Reshape data to long format for plotting
library(tidyr)
iris_long <- pivot_longer(iris, 
                          cols = c("Sepal.Length", "Sepal.Width"), 
                          names_to = "Measure", 
                          values_to = "Value")

ggplot(iris_long, aes(x = Measure, y = Value, fill = Measure)) +
  geom_boxplot(alpha = 0.6,width=0.2) +
  geom_jitter(width = 0.1, alpha = 0.2) +
  theme_minimal() +
  ggtitle("Comparison of Paired Measurements")

Paired t-test Assumptions

  1. Independence of pairs
    • Each flower is independent of others → ✔ OK.
  2. Normality of the differences
    • We care about the distribution of \((Sepal.Length - Sepal.Width)\).
Code
diffs <- iris$Sepal.Length - iris$Sepal.Width
shapiro.test(diffs)

    Shapiro-Wilk normality test

data:  diffs
W = 0.94628, p-value = 1.628e-05
  • If p < 0.05, the differences are not normally distributed.
  • Check visually:
Code
ggplot(data.frame(diffs), aes(sample = diffs)) + stat_qq() + stat_qq_line() +
  ggtitle("Q-Q Plot of Differences")

Parametric paired test

Code
t.test(iris$Sepal.Length, iris$Sepal.Width, paired = TRUE)

    Paired t-test

data:  iris$Sepal.Length and iris$Sepal.Width
t = 34.815, df = 149, p-value < 2.2e-16
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 2.627874 2.944126
sample estimates:
mean difference 
          2.786 
  • Checks if the mean difference is non-zero.

Non-parametric alternative (if assumptions break)

Wilcoxon Signed-Rank Test

Code
wilcox.test(iris$Sepal.Length, iris$Sepal.Width, paired = TRUE)

    Wilcoxon signed rank test with continuity correction

data:  iris$Sepal.Length and iris$Sepal.Width
V = 11325, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0
  • Tests if the median difference is non-zero (symmetric assumption) or if the distribution of differences is symmetric around zero.

Example output and interpretation (R)

Code
pair_t <- t.test(iris$Sepal.Length, iris$Sepal.Width, paired = TRUE)
pair_w <- wilcox.test(iris$Sepal.Length, iris$Sepal.Width, paired = TRUE)

pair_t

    Paired t-test

data:  iris$Sepal.Length and iris$Sepal.Width
t = 34.815, df = 149, p-value < 2.2e-16
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 2.627874 2.944126
sample estimates:
mean difference 
          2.786 
Code
pair_w

    Wilcoxon signed rank test with continuity correction

data:  iris$Sepal.Length and iris$Sepal.Width
V = 11325, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0
  • Interpretation: “Sepal length is significantly different from sepal width within the same flowers.”

Chi-square Test Using iris

Goal

Test independence between Species and a discretised version of Sepal.Width. We bin Sepal.Width:

Code
iris$WidthClass <- cut(iris$Sepal.Width, breaks = 3, labels = c("Narrow", "Medium", "Wide"))
tab <- table(iris$Species, iris$WidthClass)

Visualise

Code
ggplot(iris, aes(x = Species, fill = WidthClass)) +
  geom_bar(position = "fill") +
  theme_minimal() +
  labs(y = "Proportion", title = "Width Class Distribution by Species") +
  scale_fill_brewer(palette = "Pastel1")

Chi-square Assumptions

  1. Independence of observations → ✔ OK.
  2. Expected cell counts ≥ 5 in most cells.
Code
chisq.test(tab)$expected
            
               Narrow   Medium Wide
  setosa     15.66667 29.33333    5
  versicolor 15.66667 29.33333    5
  virginica  15.66667 29.33333    5

If expected counts are too small → chi-square invalid. - Scan the table: if any expected cell count is < 5, prefer Fisher’s Exact Test.

Parametric-style categorical test

Code
chisq.test(tab)

    Pearson's Chi-squared test

data:  tab
X-squared = 45.125, df = 4, p-value = 3.746e-09
  • Chi-square p-value tests association; warnings about small counts mean the test may be unreliable.

Non-parametric alternative when assumptions fail

Fisher’s Exact Test

Works even with small expected cell counts.

Code
fisher.test(tab)

    Fisher's Exact Test for Count Data

data:  tab
p-value = 8.429e-11
alternative hypothesis: two.sided
  • Fisher’s p-value is reliable even with small expected counts.

Example output and interpretation (R)

Code
chisq_res <- suppressWarnings(chisq.test(tab))  # warning if small counts
fisher_res <- fisher.test(tab)

chisq_res

    Pearson's Chi-squared test

data:  tab
X-squared = 45.125, df = 4, p-value = 3.746e-09
Code
fisher_res

    Fisher's Exact Test for Count Data

data:  tab
p-value = 8.429e-11
alternative hypothesis: two.sided
Code
chisq_expected <- chisq_res$expected

chisq_expected
            
               Narrow   Medium Wide
  setosa     15.66667 29.33333    5
  versicolor 15.66667 29.33333    5
  virginica  15.66667 29.33333    5
  • Check expected counts; if any are < 5, prefer Fisher’s result.
  • Interpretation: “Species and width class show evidence of association.”

ANOVA Using iris

Goal

Compare Petal.Length across all three species.

Code
iris$Species <- factor(iris$Species)

Visualise

Code
ggplot(iris, aes(x = Species, y = Petal.Length, fill = Species)) +
  geom_boxplot(alpha = 0.6,width=0.2) +
  geom_jitter(width = 0.1, alpha = 0.5) +
  theme_minimal() +
  ggtitle("Petal Length by Species (3 Groups)")

ANOVA Assumptions

  1. Independence → ✔ by design.
  2. Normality within each group
Code
by(iris$Petal.Length, iris$Species, shapiro.test)
iris$Species: setosa

    Shapiro-Wilk normality test

data:  dd[x, ]
W = 0.95498, p-value = 0.05481

------------------------------------------------------------ 
iris$Species: versicolor

    Shapiro-Wilk normality test

data:  dd[x, ]
W = 0.966, p-value = 0.1585

------------------------------------------------------------ 
iris$Species: virginica

    Shapiro-Wilk normality test

data:  dd[x, ]
W = 0.96219, p-value = 0.1098
  • If some species have p < 0.05, normality is doubtful for those groups.
  1. Homogeneity of variances (Levene)
Code
leveneTest(Petal.Length ~ Species, data = iris)
Levene's Test for Homogeneity of Variance (center = median)
       Df F value    Pr(>F)    
group   2   19.48 3.129e-08 ***
      147                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • If p < 0.05, variances differ across species.

If either normality or variance equality fails → cannot trust ANOVA F-test.

Parametric ANOVA

Code
summary(aov(Petal.Length ~ Species, data = iris))
             Df Sum Sq Mean Sq F value Pr(>F)    
Species       2  437.1  218.55    1180 <2e-16 ***
Residuals   147   27.2    0.19                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • In the ANOVA table, check the row for Species: Pr(>F) is the p-value for group differences.

Non-parametric alternative

Kruskal–Wallis Test

Code
kruskal.test(Petal.Length ~ Species, data = iris)

    Kruskal-Wallis rank sum test

data:  Petal.Length by Species
Kruskal-Wallis chi-squared = 130.41, df = 2, p-value < 2.2e-16
  • Kruskal–Wallis p-value tests whether at least one group tends to have higher ranks than the others.

If significant → perform post-hoc Dunn tests.

Code
if (!requireNamespace("FSA", quietly = TRUE)) {
  cat("Package 'FSA' not installed; skipping Dunn post-hoc tests. Install with install.packages('FSA').\n")
} else {
  library(FSA)
  dunnTest(Petal.Length ~ Species, data = iris, method = "bonferroni")
}
              Comparison          Z      P.unadj        P.adj
1    setosa - versicolor  -5.862997 4.545875e-09 1.363763e-08
2     setosa - virginica -11.418385 3.384664e-30 1.015399e-29
3 versicolor - virginica  -5.555388 2.769957e-08 8.309872e-08
  • Post-hoc Dunn tests (when available) show which pairs differ; use adjusted p-values to report significant pairs. If skipped, you can still report the Kruskal–Wallis result and show group medians/IQRs.

Example output and interpretation (R)

Code
aov_res <- aov(Petal.Length ~ Species, data = iris)
kw_res  <- kruskal.test(Petal.Length ~ Species, data = iris)

summary(aov_res)
             Df Sum Sq Mean Sq F value Pr(>F)    
Species       2  437.1  218.55    1180 <2e-16 ***
Residuals   147   27.2    0.19                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
kw_res

    Kruskal-Wallis rank sum test

data:  Petal.Length by Species
Kruskal-Wallis chi-squared = 130.41, df = 2, p-value < 2.2e-16
Code
# Extract and print concise p-values for the slide
s <- summary(aov_res)[[1]]
aov_p <- s[["Pr(>F)"]][1]
  • Interpretation: With skew/outliers or non-normality, rely on Kruskal–Wallis. If significant, follow with pairwise comparisons (e.g., Dunn tests) and report which species differ.

Correlation Using iris

Goal

Assess the relationship between Sepal.Length and Petal.Length.

Visualise

Code
ggplot(iris, aes(x = Sepal.Length, y = Petal.Length)) +
  geom_point(alpha = 0.6, color = "darkblue") +
  geom_smooth(method = "loess", color = "red", se = FALSE) +
  theme_minimal() +
  ggtitle("Sepal vs Petal Length")

Parametric Correlation (Pearson) Assumptions

  1. Linearity
    • The relationship should be a straight line.
  2. Normality
    • Both variables should be normally distributed.
Code
shapiro.test(iris$Sepal.Length)

    Shapiro-Wilk normality test

data:  iris$Sepal.Length
W = 0.97609, p-value = 0.01018
Code
shapiro.test(iris$Petal.Length)

    Shapiro-Wilk normality test

data:  iris$Petal.Length
W = 0.87627, p-value = 7.412e-10
  • If p < 0.05, normality assumption is violated.

Parametric test (Pearson)

Code
cor.test(iris$Sepal.Length, iris$Petal.Length, method = "pearson")

    Pearson's product-moment correlation

data:  iris$Sepal.Length and iris$Petal.Length
t = 21.646, df = 148, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.8270363 0.9055080
sample estimates:
      cor 
0.8717538 
  • Tests for linear correlation.

Non-parametric alternative (if assumptions break)

Spearman’s Rank Correlation

Code
cor.test(iris$Sepal.Length, iris$Petal.Length, method = "spearman")

    Spearman's rank correlation rho

data:  iris$Sepal.Length and iris$Petal.Length
S = 66429, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.8818981 
  • Uses ranks; tests for monotonic relationship (doesn’t have to be a straight line, just consistently increasing or decreasing).

Kendall’s Tau

Code
cor.test(iris$Sepal.Length, iris$Petal.Length, method = "kendall")

    Kendall's rank correlation tau

data:  iris$Sepal.Length and iris$Petal.Length
z = 12.647, p-value < 2.2e-16
alternative hypothesis: true tau is not equal to 0
sample estimates:
      tau 
0.7185159 
  • Better for small samples or many ties.

  • Interpretation: “There is a strong positive monotonic correlation between sepal length and petal length.”

Summary: Use a non-parametric test when:

  • Data are clearly non-normal (Shapiro–Wilk fails; Q–Q plots bend strongly).
  • Variances differ strongly across groups (Levene fails).
  • Data are ordinal (ranks, Likert scores).
  • There are outliers that distort means.
  • Sample size is small, making CLT unreliable.
  • Group distributions have different shapes, not just shifted means.