Garbage In, Garbage Out¶

Prime #: 879
Origin domain: Information Theory
Subdomain: data quality and pipeline integrity → Information Theory
Aliases: Gigo

Core Idea¶

The quality of a transformation's output is bounded above by the quality of its inputs: no internal sophistication can repair defects already present in the input. The load-bearing claim is the non-substitutability of downstream sophistication for input quality — an output-quality problem is structurally an input problem.

How would you explain it like I'm…

Rotten Eggs, Bad Cake

If you bake a cake with rotten eggs, no matter how fancy your oven is, the cake will still taste bad. A great oven can't fix bad ingredients. So if you want a good cake, you have to start with good eggs, not just a fancier oven.

Bad In, Bad Out

Garbage in, garbage out means that if you put bad information into something, you'll get bad results out, no matter how clever the machine in the middle is. If your starting data is full of mistakes or is biased, then better computers and fancier programs can't truly fix it. In fact, they sometimes make it worse by spreading the errors around or hiding them under polished-looking results. So the real way to fix a bad-output problem is to fix the inputs: collect better data, use better sensors, check your sources. Spending more effort on the fancy processing part won't help once the inputs are the thing holding you back.

Inputs Set the Ceiling

Garbage in, garbage out (GIGO) is the observation that the quality of a transformation's output is capped by the quality of its inputs: no amount of internal sophistication can repair defects, errors, or biases already in the input. At best the output is input-quality-conserving; in practice it is often degrading, because the process can amplify input noise, propagate errors through correlated variables, or add its own artefacts. It is never input-quality-improving in a way that survives adversarial inputs. A system that looks like it cleans up bad inputs is either using extra trusted information not in the bad input, or producing polished outputs that aren't actually faithful to the truth. The load-bearing point isn't the trivial 'input quality matters,' but the non-substitutability of downstream sophistication for input quality. That is why the structural fix for a GIGO-caused output problem is always an input intervention (better collection, sensors, source vetting), never more downstream cleverness, and why piling sophistication on bad data can manufacture false confidence.

Garbage in, garbage out (GIGO) is the structural observation that the quality of a transformation's output is bounded above by the quality of its inputs: no amount of internal sophistication can repair defects, errors, biases, or distortions already present in the input. The output is, at best, input-quality-conserving; in practice it is often input-quality-degrading, since the transformation may amplify input noise, propagate input errors through correlated downstream variables, or add its own processing artefacts on top. It is not input-quality-improving in any structural sense that survives adversarial inputs. A system that appears to clean up bad inputs is either using additional trusted information not contained in the bad input, or producing apparently-clean outputs that are not actually faithful to ground truth. The pattern asserts a quality floor set by inputs: investing arbitrary effort in downstream sophistication yields diminishing or zero returns once that floor is binding, and it can manufacture false confidence, polished-looking results whose underlying input defects are no longer visible to the consumer. The structural fix for a GIGO output problem is therefore always an input intervention (better collection, better sensors, better source vetting, source-quality measurement), never a downstream-sophistication intervention. The load-bearing claim is not the trivial 'input quality matters' but the non-substitutability of downstream sophistication for input quality, which is what generates the recurring, expensive failure mode the principle warns against. Its formal backbone is the data-processing inequality: for any chain X to Y to Z, no processing of Y can increase its mutual information with X.

Broad Use¶

Computing and data engineering: bad inputs produce bad outputs regardless of program correctness.
Machine learning: label noise and biased corpora set the ceiling on accuracy and fairness — "you can't model your way out of bad data."
Statistics and meta-analysis: a synthesis inherits the bias of its primary studies, so evidence frameworks grade the underlying trials.
Accounting and audit: reports are bounded by the integrity of transaction records; major audit failures are GIGO at the data layer.
Intelligence analysis: assessment quality is bounded by source quality, with notorious failures traceable to bad source intelligence.
Policy modelling: sophisticated models on bad input parameters yield high-confidence wrong answers.
Legal adjudication: verdicts are bounded by evidence quality, hence chain-of-custody.

Clarity¶

Re-orders the diagnosis — when output quality disappoints, ask what is the quality of the inputs? before what is wrong with the processing? — and names the false confidence danger of polished outputs that hide input defects.

Manages Complexity¶

Relocates a confusing class of expensive "model/report/assessment failure" surprises to the input layer, turning them into one checkable question about where the quality floor sits.

Abstract Reasoning¶

Rests on the data-processing inequality — for any chain X → Y → Z, no processing of Y can increase its information about X — so downstream effort yields zero marginal return once the input floor is binding.

Knowledge Transfer¶

Computing → statistics → ML: the principle became a methodological refrain, then data-centric AI as its programmatic form.
Across substrates: auditors cite ML failures, ML researchers cite intelligence failures, intelligence analysts cite replication failures — one shared diagnosis.
Anywhere a transformation maps quality-bearing inputs: locate the input-quality floor, intervene at the input, recognize sophistication is not a substitute.

Example¶

A validated clinical-decision-support model under-flags one patient population; the fix is not a bigger model but the input — the target proxy "future spending" diverges from "future need" along access lines, so only redefining the target raises the ceiling.

Relationships to Other Primes¶

Parents (1) — more general patterns this builds on

Garbage In, Garbage Out presupposes Transformation — GIGO is a quality-MONOTONICITY constraint on a transformation (fidelity to ground truth cannot rise across a single-input map — the data-processing inequality); it presupposes the transformation whose output quality it bounds. The file: transformation is its genus, GIGO 'a constraint on one dimension of it'.

Path to root: Garbage In, Garbage Out → Transformation

Not to Be Confused With¶

Garbage In, Garbage Out is not Transformation in general because a transformation is any input-to-output mapping with no claim about quality direction, whereas GIGO is the specific quality-monotonicity constraint that fidelity cannot rise across a single-input map.
Garbage In, Garbage Out is not the negation of Refinement because refinement improves an artifact against a goal (which processing genuinely can do), whereas GIGO bounds the artifact's fidelity to ground truth (which a function of the defective input cannot raise).
Garbage In, Garbage Out is not a Robustness deficit because robustness is graceful degradation under perturbed inputs, whereas GIGO is the orthogonal claim that no processing recovers fidelity the input never carried.