The mean: Beginnings

Galton, Pearson, and why we still check assumptions

Author

The Shadow of the Bell Curve

Francis Galton loved to measure things. In the Victorian world of steam engines and gas lamps, he believed that almost everything — from the size of peas to the height of people — could be reduced to numbers. His experiments often bordered on the eccentric: he timed how fast people could whistle a tune, drew maps of “beauty” across cities, and even studied fingerprints long before they were used in forensics. But beneath the quirkiness was a powerful insight: when you measured enough individuals, a strange order seemed to emerge from the chaos.

Galton built a contraption he called the Quincunx, a wooden board studded with rows of pegs. Drop peas in at the top, and they bounce left or right at random until they settle at the bottom. Each journey is unpredictable, but the final pattern is always the same: the peas pile up into a graceful bell curve.

To Galton, this was not just a toy. It was a revelation: chance does not mean chaos, but order in disguise. Nature had a hidden geometry, a gentle tug that pulled outcomes toward the centre.

Mathematicians before him, from de Moivre to Laplace, had already formalised this idea in what we now call the Central Limit Theorem: when many small random influences accumulate, the result tends to follow a normal, bell-shaped pattern. Galton’s brilliance was to make this otherwise abstract truth visible. His board gave students and scientists a way to see probability in action.

And beyond this, in his studies of heredity, Galton noticed something similar: tall parents usually have children a little shorter, and short parents usually have children a little taller. He called this tendency “regression to the mean.” Different problem, same central insight — randomness, when seen in the aggregate, bends toward balance. Galton believed this curve wasn’t just a pattern in peas, but a law of life itself.

One of Galton’s most devoted students, Karl Pearson, carried this vision forward. Pearson admired Galton almost to the point of worship. To him, Galton was a genius who had glimpsed an underlying truth about the universe. But Pearson wanted to take it further. While Galton had focused on averages and general tendencies, Pearson asked a new question: what about relationships?

Pearson worked with Galton’s height data, comparing parents with their children. He wanted to quantify just how strongly one predicted the other. From this pursuit came the correlation coefficient — that little number between −1 and 1 that you still use today to judge the strength of an association. Suddenly, relationships in data could be expressed with precision: not just “tall parents have tall kids,” but “the relationship is this strong.” It was a stunning leap forward.

But there was a catch. Pearson took Galton’s rough idea of correlation and hammered it into precise formulas. Yet the mathematics behaved most beautifully when the data followed the smooth, symmetric bell curve that Galton had idolised with his Quincunx. If your data wandered too far from that shape (skewed, heavy-tailed, or lopsided), the formulas could mislead.

In other words, Pearson’s correlation was born in the shadow of Galton’s bell curve. The apple didn’t fall far from the tree: a new science was built on the same foundation of normality. That’s why, even today, statisticians are trained to check assumptions before trusting their tests.

Here is where the story becomes important for you today. Pearson’s brilliance was real, but he was also constrained by his teacher’s worldview. He had stood on the shoulders of a giant, but the shadow of that giant meant he looked at data through the lens of the bell curve. It is no accident that so many of the statistical tests we use still begin with the question: does your data follow a normal distribution?

This is why modern statistical practice is so full of caveats, assumptions, and diagnostic checks. When your software warns you that your sample is skewed, or that variances are unequal, it is pointing back to this history. Galton and Pearson built tools that assume a certain kind of order in the data. But the real world is often lopsided, messy, and resistant to those tidy shapes.

So what’s the point? - Those warnings from your stats software aren’t just nagging. They are echoes of Galton and Pearson, whose tools only worked cleanly under the bell curve.

  • When you check for normality, you’re not blindly following rules — you’re asking whether your data is playing by the rules that Galton’s Quincunx and Pearson’s formulas assumed.

  • And when the rules break, you’re stepping into the same dilemma they faced: how far can we trust conclusions built on a geometry of symmetry, when real life so often bends and skews away from the centre?

The story of Galton and Pearson is a story about genius — but also about limitation. The bell curve is elegant, powerful, and deeply useful. Yet it is not reality itself. It is an approximation. And to be a good statistician is not only to use the tools they gave us, but to know when their assumptions begin to crack.


Galton’s Board (Quincunx) — placeholder image

Galton Board Simulator (Self-Contained)

Galton Board

Watch balls bounce through a triangular grid of pegs and build a bell-shaped distribution.
Balls Dropped0
Mean Bin (μ)
Std Dev (σ)
Rows+1 Bins13
Balls Histogram Theoretical (Binomial / Normal approx)

Tip: increase rows to see the bell curve emerge; add a bias to skew the distribution.


Pearson’s Bell Curve: Mean and Standard Deviation

Code
library(ggplot2)
library(dplyr)

# Parameters
mu <- 0
sd <- 1

# Continuous grid for density
x <- seq(mu - 4*sd, mu + 4*sd, length.out = 2000)
df <- data.frame(x = x, y = dnorm(x, mean = mu, sd = sd))

# Regions for shading: (-3,-2), (-2,-1), (-1,1), (1,2), (2,3)
cuts <- c(-Inf, -3, -2, -1, 1, 2, 3, Inf)
labels <- c("<-3σ", "[-3σ,-2σ]", "[-2σ,-1σ]", "[-1σ,1σ]", "[1σ,2σ]", "[2σ,3σ]", ">3σ")

df$band <- cut(df$x, breaks = cuts, labels = labels, right = TRUE, include.lowest = TRUE)

# Helper to compute area percentages for annotated bands (approximate by numerical integration)
area <- df %>%
  group_by(band) %>%
  summarise(p = sum(y) * (max(x) - min(x)) / (length(x) - 1), .groups = "drop") %>%
  mutate(pct = round(100*p, 1))

# Build plot
p <- ggplot(df, aes(x, y)) +
  # Shaded middle 68% (-1σ to 1σ)
  geom_area(data = subset(df, x >= mu - sd & x <= mu + sd),
            aes(y = y), alpha = 0.25) +
  # Outline of the density
  geom_line(linewidth = 1) +
  # Mean and SD guide lines
  geom_vline(xintercept = mu, linetype = "solid", linewidth = 0.8) +
  geom_vline(xintercept = mu + c(-1, 1)*sd, linetype = "dashed", linewidth = 0.6) +
  geom_vline(xintercept = mu + c(-2, 2)*sd, linetype = "dotted", linewidth = 0.6) +
  geom_vline(xintercept = mu + c(-3, 3)*sd, linetype = "dotdash", linewidth = 0.6) +
  # Labels for mean and SDs
  annotate("text", x = mu, y = dnorm(mu, mu, sd) + 0.02, label = "mean (μ)", vjust = 0) +
  annotate("text", x = mu + sd, y = dnorm(mu + sd, mu, sd) + 0.02, label = "+1σ", vjust = 0) +
  annotate("text", x = mu - sd, y = dnorm(mu - sd, mu, sd) + 0.02, label = "−1σ", vjust = 0) +
  annotate("text", x = mu + 2*sd, y = dnorm(mu + 2*sd, mu, sd) + 0.02, label = "+2σ", vjust = 0) +
  annotate("text", x = mu - 2*sd, y = dnorm(mu - 2*sd, mu, sd) + 0.02, label = "−2σ", vjust = 0) +
  annotate("text", x = mu + 3*sd, y = dnorm(mu + 3*sd, mu, sd) + 0.02, label = "+3σ", vjust = 0) +
  annotate("text", x = mu - 3*sd, y = dnorm(mu - 3*sd, mu, sd) + 0.02, label = "−3σ", vjust = 0) +
  labs(x = "Value", y = "Density",
       subtitle = "Shaded region ≈ 68% of the area (−1σ to +1σ). Dashed = ±1σ, dotted = ±2σ, dotdash = ±3σ.") +
  theme_minimal(base_size = 13)

p
Figure 1: Normal curve with mean and ±1/2/3 standard deviation divisions (the 68–95–99.7 rule).