Imputation¶
Core Idea¶
Imputation is the move of filling missing values from patterns in the available data under an explicit assumption about the missingness mechanism, with the resulting uncertainty propagated downstream. The fill is modeled data, never recovered truth — and treating it as the true value is the principal failure mode.
How would you explain it like I'm…
Penciling in the Smudges
Filling the Blanks Honestly
Modeled, Not Recovered
Broad Use¶
- Statistics: multiple imputation, expectation-maximization, k-nearest- neighbor, and chained-equation methods under the MCAR/MAR/MNAR frame.
- Survey research: item and unit non-response filled under explicit assumptions about why people did not answer.
- Historical demography: census gaps and lost parish records imputed from partial records under stated missingness assumptions.
- Climate science: proxy-series gaps filled with declared assumptions about the underlying process.
- Genetics: missing genotypes imputed against reference haplotype panels encoding population structure.
- Archaeology: eroded inscriptions and incomplete remains filled from contextual analogy, offered as declared inference.
Clarity¶
Forces four disclosures that "we filled in the gaps" hides — gap structure, fill model, missingness assumption, and propagated uncertainty — and exposes implicit imputation (complete-case deletion silently assumes MCAR).
Manages Complexity¶
Turns a diffuse "our data has holes" into a finite task — name the gaps, declare a model, propagate uncertainty, run sensitivity analysis.
Abstract Reasoning¶
A conclusion that flips under a reasonable alternative missingness assumption was never robust — and the most dangerous case, missing-not-at-random, cannot be detected from observed data alone.
Knowledge Transfer¶
- Population genetics: the multiple-imputation framework moved from survey non-response into haplotype-based genotype imputation intact.
- Demographic reconstruction: model-based gap-filling from climate proxies travels into records with gaps of known structure.
- Latent-variable modeling: expectation-maximization recurs across latent-class, hidden-Markov, and factor-analysis models.
Example¶
Multiple imputation of a missing income field regresses income on observed covariates, draws several completed datasets, and widens the final intervals — and a sensitivity re-imputation under MNAR checks whether the conclusion survives.
Relationships to Other Primes¶
Foundational — no parent edges in the catalog.
Children (1) — more specific cases that build on this
- Missing Data Mechanisms (MCAR, MAR, MNAR) decompose Imputation — The MCAR/MAR/MNAR classification is the load-bearing missingness-assumption COMPONENT that imputation depends on to specify its fill. The file: 'imputation depends on that classification to specify its assumption.' The taxonomy is the input; imputation is the whole response (fill + propagated uncertainty).
Not to Be Confused With¶
- Imputation is not a Distributional Assumption because a distributional assumption is a standing premise about a process's shape, whereas imputation is the whole gap-filling discipline that may invoke such an assumption but adds the model, the fill, and the uncertainty propagation.
- Imputation is not Statistical Inference because inference draws conclusions about a population, whereas imputation is the intermediate step that produces filled data on which inference is then performed.
- Imputation is not Interpolation because interpolation is geometric fill along a known interior path, whereas imputation fills from the structure of observed cases under an explicit missingness model.