When many independent, comparable, finite-variance contributions are summed or
averaged, the aggregate converges to a Gaussian envelope characterized by mean
and variance alone — an attractor that forgets the shapes of its parts.
No faithful explanation at this level. All three generators marked eli5 na (3-of-3 consensus). Any 5-year-old framing collapses into 'averaging makes things land in the middle / bell curves come from nature liking them,' which discards the load-bearing content: that the Gaussian shape arises specifically as an attractor from summing many independent, comparable-size, finite-variance influences and forgets the inputs' shapes.
Why Sums Make Bells
Roll one die and the six outcomes are all equally likely — flat, no bump. But roll five dice and add them up, and now you almost never get a very low or very high total and usually get something in the middle, so a bump forms in the center. The central limit theorem says this happens almost no matter what you start with: when you add up many independent random things of similar size, the totals pile up into the same bell-shaped curve. It doesn't matter what shape each individual thing had — the adding washes that out. That's why the bell curve shows up so often in the world.
The Bell-Curve Attractor
The central limit theorem says that when many independent random influences of comparable size are summed or averaged, the distribution of the result tends toward a normal (Gaussian) bell shape, regardless of the shapes of the individual contributions. The classic version needs only that the contributions be independent, identically distributed, and have finite variance. The real payload isn't the bell curve itself but the attractor property: a wide class of aggregation procedures collapses messy micro-randomness into a single envelope described by just two numbers, the mean and the variance. As the number of summands grows, the sample mean narrows around the true mean at a rate of 1 over the square root of n, and the shape of the fluctuations forgets the underlying distribution. This is why the normal distribution is so common: not because nature favors bell curves, but because summing many independent small influences is itself an attractor toward the normal. It also has failure modes: if contributions are dependent, variance is infinite, or one contribution dominates, the result flows to a different attractor instead.
The Central Limit Theorem states that when many independent random influences of comparable size are summed or averaged, the distribution of the resulting aggregate tends toward a normal (Gaussian) shape — regardless of the shapes of the individual contributions. The classical Lindeberg-Levy form requires only that contributions be independent, identically distributed, and have finite variance; generalizations relax both the identical-distribution and independence assumptions. The structural payload is not the bell curve but the attractor property: a wide class of aggregation procedures collapses heterogeneous micro-randomness to a single two-parameter (mean and variance) macro-envelope. As the number of summands grows, the sample mean's distribution narrows around the true mean at a rate of 1 over the square root of n, and the shape of the fluctuations forgets the underlying distribution. The decisive content is the dissociation of an aggregate's distribution from its constituents': below some level of aggregation, the joint behavior of millions of microscopic contributions is intractable; above it, the system is described by two numbers. This makes precise why the normal is so ubiquitous — not because nature favors bell curves, but because summation of many independent small influences is itself an attractor toward the normal — and it separates "normal because the mechanism is Gaussian" (rare) from "normal because aggregation washed out the mechanism" (common). The theorem brings its own failure modes: under dependence, infinite variance, or a single dominant contribution, the aggregate flows to a different attractor (a stable law, an extreme-value distribution, a persistent fat tail).
Statistics: confidence intervals, hypothesis tests, and standard errors rest on the asymptotic normality of estimators, even for non-normal data.
Physics: thermal noise, Brownian motion, and Maxwell–Boltzmann velocities are Gaussian because each is a sum of many tiny independent kicks.
Biology: continuously varying traits are approximately normal as a sum of many allelic plus environmental contributions (the Fisher infinitesimal model).
Finance: portfolio-return machinery rests on aggregate-return normality, and its failures (heavy tails, dependence) are central risks.
Metrology: error budgets sum many tiny independent error sources and treat the residual as Gaussian, which makes error bars meaningful.
Signal processing: summed sensor noise is modeled as additive white Gaussian noise, enabling matched and Kalman filters.
Explains why the normal distribution is ubiquitous — not because nature favors
bell curves, but because summation is itself an attractor — and separates
"normal because the mechanism is Gaussian" from "normal because aggregation
erased the mechanism."
Reduces the joint distribution of millions of microscopic contributions to a
two-parameter problem, with a clean scaling law: aggregate fluctuation shrinks
as 1/√n.
Installs a master question — do the CLT preconditions hold? — and generalizes to
a map of aggregation attractors (sums to Gaussian, maxima to extreme-value,
products to log-normal, heavy tails to stable laws).
The sum of n fair ±1 coin flips, divided by √n, converges to a standard Gaussian — by n = 30 the binomial histogram is already bell-shaped — yet replacing the coin with a Cauchy contribution breaks finite variance and the average stays as wide as a single draw.
Parents (1) — more general patterns this builds on
Central Limit TheorempresupposesAggregation — The CLT is a specific claim about the limiting SHAPE a SUM-aggregation converges to under finite variance — the Gaussian attractor. Presupposes aggregation (the bare combining operation); other rules (max, product) flow to other attractors.
Central Limit Theorem is not Scale Invariance because the CLT manufactures a characteristic width (finite variance), whereas scale-invariant power laws have no such width and are precisely the heavy-tailed regime where the CLT fails.
Central Limit Theorem is not Aggregation because aggregation is the bare act of combining parts, whereas the CLT is the specific claim about the limiting shape a sum-aggregation converges to under finite variance.
Central Limit Theorem is not Heavy Tailed Distributions because heavy tails are the named complement the CLT excludes by its finite-variance and no-dominant-term preconditions, exactly where the 1/√n shrinkage breaks down.