A calibration anomaly is a quantitative gap in which a model's independently constrained prediction diverges from observation by a factor too large for noise and too persistent across methods for measurement error — so the divergence stands as evidence about which assumption is wrong, with the size of the gap itself the diagnostic content.
No faithful explanation at this level. At a 5-year-old level the concept collapses into either 'the model is just wrong' (binary disconfirmation) or 'someone measured badly' (one-sided measurement error) — the two things the prime explicitly defines itself against. Its load-bearing content (a surviving quantitative gap whose SIZE carries diagnostic information, given an independent risky prediction and characterized measurement uncertainty) cannot be honestly conveyed without notions a young child does not have. Two of three generators marked this na.
How Wrong Tells You Why
A calibration anomaly happens when a theory makes a number prediction, you carefully measure the real number, and the two disagree by way too much to be a coincidence or a sloppy measurement. The interesting part is not just that they disagree, but how big the gap is. A small gap (like being off by a fifth) just means you should tweak a setting. A big gap (off by ten times) means something important is missing from your theory. A truly enormous gap means the whole idea was probably about the wrong thing. So the size of the mismatch is itself a clue telling you what kind of thing went wrong, and you can't make it disappear just by collecting more data.
The Gap Is The Clue
A calibration anomaly is the pattern where a model with independently set parameters predicts a quantity, observation measures that same quantity, and the two diverge by a factor too large to blame on noise or measurement error — so the gap stands as quantitative evidence about which of the model's assumptions must be wrong. Several things must hold: the model's prediction is 'risky' because its inputs were not fitted to the observation in question; the observation has a characterized uncertainty; the predicted-versus-observed ratio is far enough from one to exclude statistical noise; and the divergence is too persistent across measurement methods to be one-sided measurement error. Unlike outright disconfirmation (which is yes-or-no) and unlike noise (which stays within the model's uncertainty), a calibration anomaly survives normal attempts to dismiss it. The crucial idea is that the gap is information: a 20% miss invites a small tweak, a factor-of-three miss invites a search for missing physics, and a truly enormous miss forces rethinking the whole category.
A calibration anomaly is the structural pattern in which a theoretical model with independently constrained parameters predicts a quantity, observation measures it, the two diverge by a factor large enough to rule out noise and measurement error, and the divergence stands as a quantitative gap that constrains which of the model's assumptions, inputs, model class, or boundary conditions must be wrong. Its commitments are: the model produces a quantitative prediction whose inputs are not fit to the observation in question, so the prediction is risky in the Popperian sense; observation independently measures the same quantity with characterized uncertainty; the predicted-versus-observed ratio is far enough from one to exclude statistical noise; the divergence is too persistent across measurement methods to be one-sided measurement error; and the size of the gap is itself the diagnostic content — small mismatches invite parameter tuning, large mismatches force model revision, and orders-of-magnitude mismatches force category revision. It is structurally different from outright disconfirmation, which is binary, and from statistical noise, which sits within model uncertainty. The calibration anomaly survives normal-science attempts to dismiss it: it cannot be tuned away within plausible parameter ranges, cannot be measurement-error-d away, and does not vanish with more data. What it can do is force re-examination of which load-bearing assumption is wrong, with the scale of the gap restricting the space of plausible candidates. The prime forces into view that the gap is information: a twenty-percent miss invites a fitting tweak; a factor-of-three miss invites a missing-physics search; a factor-of-ten-to-the-sixtieth miss invites a wholesale reframing of what the theory was about.
Finance: the equity premium puzzle — a consumption model predicts a tiny risk premium while observation gives a large one, a factor-of-sixty gap.
Cosmology: the vacuum-energy problem — a quantum-field estimate exceeds the observed value by ~120 orders of magnitude, the largest mismatch in physics.
Particle physics: the muon's anomalous magnetic moment disagrees with the Standard Model, small enough that recalculation might close it but persistent enough to motivate new physics.
Astrophysics: galaxy rotation curves diverge from Newtonian prediction by a factor of several, historically motivating dark matter.
Climate science: paleoclimate reconstructions diverge from model hindcasts in specific epochs, driving revision of sensitivity estimates.
Machine learning: scaling laws miss measured loss at new compute levels, flagging a model-class problem.
It converts a vague sense that "the prediction was off" into a graded diagnostic where a five-percent miss, a five-sigma miss, and a factor-of-10^60 miss are structurally different signals calling for different kinds of theoretical work.
It compresses a discipline-by-discipline catalog of "puzzles," "tensions," and "problems" into one structural object with a shared four-step protocol: establish independence, characterise the observation, confirm the gap exceeds noise and measurement error, and read the revision space off the gap size.
It supports using gap-size as a sieve over candidate explanations — a 120-order gap immediately rules out any mechanism contributing only a factor of ten — and treats cross-method persistence as evidence pointing suspicion at the theory rather than the instrument.
Across formal sciences: a financial economist, cosmologist, and climate modeler carry the same diagnostic discipline, supplying only the local model and measurement.
As a protective discipline: the restraint to keep the gap visible as information rather than tuning it away resists the anomaly cleanup / p-hacking failure mode in every substrate.
The vacuum-energy problem: summing quantum-field zero-point energies predicts a vacuum energy density ~10^120 times the measured value, and the magnitude itself sieves the candidates — nothing but an almost-exact cancellation or a selection argument can bridge that many orders of magnitude.
Calibration Anomaly is not Calibration because a calibration anomaly is the gap that resists legitimate alignment and constrains which assumption is wrong, whereas calibration freely tunes parameters to close gaps.
Calibration Anomaly is not Measurement Uncertainty because an anomaly exceeds the stated error band and survives improved measurement, whereas a discrepancy within the band is noise.
Calibration Anomaly is not Correlation because it is a magnitude gap on a single quantity whose size is the diagnostic content, not a claim about association between variables.