Reproducibility & Replicability¶

Prime #: 441
Origin domain: Statistics & Experimental Design
Also from: Philosophy
Aliases: Replication, Reproducible Research, Scientific Replication, Computational Reproducibility, Reproducibility
Related primes: Randomization, Hypothesis Testing (Null vs. Alternative), Statistical Significance (p-Value), Statistical Power, Selection Bias, Confounding, Effect Size, Sampling (Representativeness)

Core Idea¶

Reproducibility & Replicability emphasize the importance of being able to reproduce an experiment or analytic procedure under similar conditions—using the same data or methods (reproducibility) or different samples/contexts (replicability)—to confirm the robustness and generalizability of findings.

How would you explain it like I'm…

Check It Again

If someone bakes a cake and says it tastes amazing, you should not believe them until another person, in another kitchen, bakes it the same way and gets the same yummy cake. Science is like that. One person finding something cool is not enough. Other people have to do the test again and get the same answer before we really trust it.

Other Scientists Checking the Result

Scientists try to find true facts about the world, but a single experiment can give the wrong answer by accident. So they check each other's work in two ways. Reproducibility means: if I take your data and your computer code, do I get the same numbers? Replicability means: if I run a new experiment like yours, do I get the same kind of result? When lots of studies fail this check, scientists call it a replication crisis, and they work on better habits — like sharing data and writing down their plan before they start.

Independent Verification of Findings

Reproducibility and replicability are the two ways science checks itself. Reproducibility is the computational standard: another researcher takes your data and your analysis code and gets your exact numbers. Replicability is the scientific standard: another team collects fresh data using similar methods and finds a similar result. The 2019 National Academies report made this distinction official because the words used to be used interchangeably. The reason both matter is that any single study can be misleading — through random chance, selective reporting, hidden choices in analysis, or publication bias that favors surprising results. Since around 2011, large projects in psychology, medicine, and economics have shown that a sizable share of published findings don't replicate, sparking reforms like preregistration and open data.

Reproducibility and replicability are the twin standards by which scientific findings earn the status of reliable knowledge through independent verification. Reproducibility, in the contemporary technical sense, refers to the computational standard: another investigator, given the original data and analysis code, should be able to recompute the reported numerical results exactly. Replicability refers to the scientific standard: an independent team, collecting fresh data under similar conditions and applying similar methods, should obtain consistent results. The 2019 National Academies of Sciences report formalized this distinction, which had previously been muddled under the single word "replication." Both rest on a philosophical commitment, rooted in Popper's falsifiability and Merton's norms of universalism and communism, that scientific claims must be checkable by anyone, not authoritative pronouncements. The construct gained urgency with the "replication crisis," launched by Ioannidis's 2005 argument that most published findings may be false and confirmed by the Open Science Collaboration's 2015 Reproducibility Project (around 36% replication rate in psychology), Begley and Ellis's 2012 Nature audit (47 of 53 preclinical cancer landmarks did not replicate), and parallel results in experimental economics. The diagnosed causes — publication bias toward novel significant results, p-hacking, garden-of-forking-paths analytic flexibility, the winner's curse in selected studies, underpowered designs — have driven a reform wave including preregistration, registered reports, mandatory data-and-code sharing, replication-positive journals, and meta-science infrastructure.

Broad Use¶

Scientific Experiments: Labs attempt to replicate each other's results, ensuring that observed phenomena aren't flukes or data quirks.
Social Sciences: Replication crises in psychology highlight how initial "significant" effects sometimes fail with new samples or reanalysis, challenging accepted theories.
Medical Research: High stakes demand that drug efficacy be tested multiple times across diverse populations, verifying consistent effectiveness.
Data Science: Analysts must provide code, data, and model definitions so others can reproduce a reported finding or replicate with fresh data.

Clarity¶

Distinguishes between reproducibility (same data/code, same result) and replicability (similar experiment or data leading to consistent conclusions)—both guarding against spurious or one-off findings.

Manages Complexity¶

Mandates structured documentation of methods and raw data. By verifying repeatability, researchers or practitioners avoid chasing illusions that vanish outside a unique set of conditions.

Abstract Reasoning¶

Demonstrates that robust knowledge can't hinge on a single demonstration; it must withstand repeated checks across contexts, bridging meta-science, methodology, and practical reliability.

Knowledge Transfer¶

Engineering & Product Testing: Ensure new designs or tests yield consistent performance across labs, factories, or user scenarios.
Machine Learning & AI: Code and hyperparameters must be shared to confirm reported accuracies are reproducible by others, preventing "secret sauce" or irreproducible performance claims.

Example¶

The "Reproducibility Project: Psychology" re-ran 100 high-profile psychology studies; around 60% found weaker or inconsistent effects, underscoring that reproducibility is a critical safeguard in empirical research.

Relationships to Other Abstractions¶

Current abstraction Reproducibility & Replicability Prime

Parents (1) — more general patterns this builds on

Reproducibility & Replicability is a kind of Verification Prime

Reproducibility and replicability are a specialization of verification in which the conformance check is repeating the study to confirm the finding.

Children (1) — more specific cases that build on this

Inter-Annotator Agreement Domain-specific is a decomposition of Reproducibility & Replicability

Inter-Annotator Agreement is the categorical-coding instrument for testing whether a fixed procedure reproduces across independent applications.

Hierarchy path (1) — routes to 1 parentless root

Reproducibility & Replicability → Verification → Evaluation → Comparison → Self Checking

Not to Be Confused With¶

Reproducibility & Replicability is not Statistical Inference because reproducibility is the ability to obtain consistent results using the same data and methods, while statistical inference is the reasoning process of drawing population-level conclusions from sample data—reproducibility addresses whether results are stable; statistical inference addresses whether conclusions about populations are justified.
Reproducibility & Replicability is not Hypothesis Testing (Null vs. Alternative) because reproducibility concerns the consistency of results when methods are re-executed, while hypothesis testing is a specific decision procedure using p-values to evaluate null hypotheses—reproducibility is about stability; hypothesis testing is about deciding between competing hypotheses.
Reproducibility & Replicability is not Confounding because reproducibility is the ability to re-obtain consistent results under the same procedural conditions, while confounding is the structural problem in which an unmeasured variable obscures causal inference—confounding can affect reproducibility if confounder changes between runs; a study can reproduce confounded results reliably.