Multiple Comparisons Correction¶

Prime #: 446
Origin domain: Statistics & Experimental Design
Aliases: Multiplicity Adjustment, Multiple Comparisons Problem
Related primes: Hypothesis Testing (Null vs. Alternative), Statistical Significance (p-Value), Type I & Type II Errors, Statistical Power, Reproducibility & Replicability, Selection Bias

Core Idea¶

Multiple Comparisons Correction adjusts significance criteria or p-values when performing numerous hypothesis tests, preventing inflation of Type I error rates (false positives) that arise from repeated testing.

How would you explain it like I'm…

Lots-of-Tests Fairness Rule

If you flip a coin one time, getting heads doesn't surprise you. But if you flip it a hundred times, a big lucky streak is almost guaranteed somewhere. Scientists who run many tests at once need to be extra strict, or random luck will look like a real discovery.

Lucky Result Correction

Scientists set a rule that a result counts as 'real' if it would happen by random chance less than five percent of the time. That rule is fine for one test. But if you run a hundred tests, you'd expect about five lucky-looking results even when nothing is going on. Multiple comparisons correction is the set of math tricks for tightening that rule when you run lots of tests, so you don't fool yourself with random noise.

Multiple Testing Correction

When researchers run many statistical tests in the same study, the chance of at least one false positive shoots up: a hundred independent tests at the usual 5 percent threshold will produce a false alarm more than 99 percent of the time, even if nothing is really going on. Multiple comparisons correction is the family of methods that adjusts either the per-test thresholds or the p-values themselves to control error at the level of the whole family of tests — for example, Bonferroni correction (strict, controls the chance of any false positive) or Benjamini–Hochberg (less strict, controls the expected fraction of false positives among reported findings).

When many hypothesis tests are run in a single study, the per-test false-positive rate (typically alpha = 0.05) does not bound the study-level false-positive rate. A study performing 100 independent tests, each at alpha = 0.05, has roughly a 99.4% chance of producing at least one false positive even if every null hypothesis is true. Multiple comparisons correction is the family of techniques that adjust per-test thresholds or p-values to control a chosen error rate at the family level: the family-wise error rate (FWER, the probability of any false rejection), the false discovery rate (FDR, the expected proportion of false discoveries among rejections), or alternatives such as per-family error rate or false coverage rate. The two dominant traditions, Bonferroni-style FWER control (strict) and Benjamini-Hochberg FDR control (less conservative), encode different answers to how aggressively multiplicity should be penalized given the downstream cost of false discoveries.

Broad Use¶

Gene Expression Studies: Checking thousands of genes for differential expression—without correction, many "significant" hits could be random noise.
Marketing / UX A/B Testing: Testing many variations (color, wording, layout) can lead to a spurious "success" if each is tested at α=0.05.
Educational Interventions: Trying multiple subgroups (gender, region, income) for an effect can yield false positives if each group is tested separately.
Quality Control: Tracking many defect metrics, each with its own hypothesis test, demands correction to avoid believing random spikes represent real issues.

Clarity¶

Reveals that p=0.05 means a 1 in 20 chance of error per test, so if you do 20 tests, you might expect at least one false positive on chance alone.

Manages Complexity¶

Methods like Bonferroni, Holm, or Benjamini-Hochberg control the family-wise error rate or false discovery rate, ensuring conclusions remain robust across multiple simultaneous tests.

Abstract Reasoning¶

Demonstrates how repeated "trials" naturally amplify random hits, paralleling gambler's ruin or "searching until you find something." Controlling the inflated error is crucial for multi-hypothesis scenarios.

Knowledge Transfer¶

Cognitive Psychology: Running multiple questionnaires or sub-tests on the same participants can yield spurious correlations if not corrected.
Machine Learning: Feature selection across hundreds of potential predictors can yield false associations unless adjusting for multiple comparisons.

Example¶

A medical genetics lab screening 10,000 genetic markers for a disease trait uses a false discovery rate method to avoid concluding "we found a gene association!" for random outliers among thousands of tests.

Relationships to Other Abstractions¶

Current abstraction Multiple Comparisons Correction Prime

Parents (1) — more general patterns this builds on

Multiple Comparisons Correction presupposes Type I & Type II Errors Prime

Multiple-comparisons correction presupposes the false-positive/false-negative error framework because it controls aggregate false positives by trading threshold stringency against missed true effects.

Children (3) — more specific cases that build on this

Benjamini–Hochberg Procedure Domain-specific is a kind of Multiple Comparisons Correction

Benjamini-Hochberg is multiple-comparisons correction specialized to a rank-based step-up rule that controls false discovery rate rather than familywise error rate.
Bonferroni Correction Domain-specific is a kind of Multiple Comparisons Correction

Bonferroni Correction is a strict specialization of Multiple Comparisons Correction.
Look-Elsewhere Effect Domain-specific is a kind of Multiple Comparisons Correction

The Look-Elsewhere Effect is multiple-comparisons correction specialized to a scanned or continuous search space, where a local extremum is converted to global significance through an effective trials factor.

Hierarchy paths (6) — routes to 6 parentless roots

Multiple Comparisons Correction → Type I & Type II Errors → Hypothesis Testing (Null vs. Alternative) → Statistical Inference → Inductive Reasoning

Not to Be Confused With¶

Multiple Comparisons Correction is not Hypothesis Testing (Null vs. Alternative) because Multiple Comparisons Correction is a correction procedure applied when conducting multiple tests to control family-wise error rates, while Hypothesis Testing is the framework for a single test controlling Type I error at the per-test level.
Multiple Comparisons Correction is not Statistical Power because Multiple Comparisons Correction manages false-positive inflation from multiple testing, while Statistical Power is the probability a test correctly rejects a false null hypothesis given effect size and sample size.
Multiple Comparisons Correction is not Reproducibility & Replicability because Multiple Comparisons Correction addresses inflated false-positive rates within a study, while Reproducibility & Replicability is the independent verification of findings across studies or analyses.
Multiple Comparisons Correction is not Statistical Significance (p-Value) because Multiple Comparisons Correction adjusts significance thresholds or p-values to control error rates across multiple tests, while Statistical Significance evaluates each test's p-value against a threshold.
Multiple Comparisons Correction is not Confirmation Bias because Multiple Comparisons Correction is a statistical procedure controlling for systematic inflation of false positives, while Confirmation Bias is a cognitive phenomenon of selective processing favoring held beliefs.