Skip to content

Goodhart's Law

Prime #
884
Origin domain
Incentives Organizations Governance
Subdomain
measurement targets and incentives → Incentives Organizations Governance

Core Idea

Goodhart's law is the regularity that binding optimization pressure on a proxy degrades the proxy's correlation with the underlying construct it was meant to indicate. The pattern requires three elements working together: an unobservable or expensive-to-observe construct of interest; an observable proxy that, absent selection pressure, tracks the construct through some statistical regularity; and a control loop that places weight on the proxy — reward, penalty, allocation, status, promotion, regulatory consequence. Once the control loop closes, agents reallocate effort along the easiest path to move the proxy, which is almost never the path that would have moved the construct. The proxy-construct correlation that motivated the choice of proxy collapses, and the proxy now indexes "optimization effort directed at the proxy" rather than the construct itself.

The pattern is not "measurement is bad" or "people game systems." It is the more specific, substrate-portable claim that the very act of using a statistical regularity to control a system tends to destroy the regularity. The mechanism is structural: any proxy that captures only part of a construct opens a wedge between proxy-improvement and construct-improvement, and selection pressure expands the wedge by preferentially recruiting the cheapest proxy-improvements, which sit disproportionately inside it.

What distinguishes Goodhart's law from generic measurement error, observer bias, or moral hazard is that the collapse arises from optimization directed at the proxy, and it does not depend on intent. Goodhart-collapse occurs from honest optimization as readily as from cynical gaming: a conscientious agent doing exactly what the incentive structure rewards will widen the wedge just as surely as a cheater. The construct degrades while the proxy rises, and the rise is precisely what makes the degradation hard to see.

How would you explain it like I'm…

Chasing the Sticker

Imagine your teacher gives a sticker for every page you read, hoping you'll learn more. Soon you start flipping pages super fast just to get stickers, without really reading. The sticker count goes up, but you're actually learning less — the thing the teacher really wanted got lost.

When the Stand-In Breaks

Sometimes you can't directly measure the thing you really care about, so you measure something easier that usually goes along with it — a stand-in. The trouble starts when you reward that stand-in, because people will then take the easiest path to push the stand-in number up, and that path is almost never the one that improves the real thing. So the stand-in number rises while the real thing you cared about quietly gets worse. It's not that measuring is bad or that everyone's cheating — even an honest person doing exactly what they're rewarded for makes this happen. Worst of all, the rising number is exactly what makes the problem hard to notice.

The Proxy Trap

Goodhart's law says that putting strong optimization pressure on a proxy breaks the proxy's connection to the real thing it was supposed to indicate. It needs three parts: a construct you actually care about that's hard or expensive to observe; a proxy that normally tracks that construct through some statistical regularity; and a control loop that attaches weight to the proxy — reward, penalty, promotion, funding. Once the loop closes, agents reallocate effort along the cheapest way to move the proxy, which is almost never the way that would move the construct, so the correlation that justified the proxy collapses and the proxy now measures 'effort aimed at the proxy.' This is sharper than 'measurement is bad' or 'people game systems' — it's the specific claim that using a regularity to control a system tends to destroy that regularity. And it doesn't need bad intent: a conscientious person doing exactly what the incentives reward widens the gap just as fast as a cheater, while the construct degrades behind the rising proxy.

 

Goodhart's law is the regularity that binding optimization pressure on a proxy degrades the proxy's correlation with the underlying construct it was meant to indicate. The pattern requires three elements working together: an unobservable or expensive-to-observe construct of interest; an observable proxy that, absent selection pressure, tracks the construct through some statistical regularity; and a control loop that places weight on the proxy — reward, penalty, allocation, status, promotion, regulatory consequence. Once the control loop closes, agents reallocate effort along the easiest path to move the proxy, which is almost never the path that would have moved the construct; the proxy-construct correlation that motivated the choice collapses, and the proxy now indexes 'optimization effort directed at the proxy' rather than the construct itself. The pattern is not 'measurement is bad' or 'people game systems' — it is the more specific, substrate-portable claim that the very act of using a statistical regularity to control a system tends to destroy the regularity. The mechanism is structural: any proxy capturing only part of a construct opens a wedge between proxy-improvement and construct-improvement, and selection pressure expands the wedge by preferentially recruiting the cheapest proxy-improvements, which sit disproportionately inside it. What distinguishes the law from generic measurement error, observer bias, or moral hazard is that the collapse arises from optimization directed at the proxy and does not depend on intent: Goodhart-collapse occurs from honest optimization as readily as from cynical gaming — a conscientious agent doing exactly what the incentive structure rewards widens the wedge just as surely as a cheater. The construct degrades while the proxy rises, and the rise is precisely what makes the degradation hard to see.

Structural Signature

the unobservable construct of interestthe observable proxy that tracks it absent pressurethe construct-proxy wedge (the part of the proxy not coupled to the construct)the control loop placing weight on the proxythe effort-reallocation toward cheapest proxy-improvementsthe correlation-collapse invariant that the proxy comes to index optimization effort, not the construct

The pattern is present when the following components co-occur:

  • The construct. A quantity of real interest that is unobservable or expensive to observe directly — learning, health, safety, scientific quality, value.
  • The proxy. An observable measure that, before any selection pressure, tracks the construct through some statistical regularity, which is why it was chosen as a stand-in.
  • The wedge. Because the proxy captures only part of the construct, there is a region where proxy-improvement and construct-improvement diverge. This gap is the structural opening the law exploits; its width and exploitability govern how fast collapse comes.
  • The control loop. A mechanism that places weight on the proxy — reward, penalty, allocation, status, promotion, regulatory consequence — closing a loop that feeds the measured proxy back onto the agents being measured.
  • The effort-reallocation. Once the loop closes, agents take the cheapest path to move the proxy, which sits disproportionately inside the wedge and rarely coincides with the path that would move the construct. The mechanism does not require intent; honest optimization widens the wedge as surely as gaming.
  • The collapse invariant. The proxy-construct correlation that motivated the proxy degrades; the proxy rises while the construct stagnates or falls, and the rise is precisely what hides the degradation. The proxy now indexes optimization effort directed at the proxy.

The components compose into a self-undermining loop: using a partial statistical regularity to control a system recruits the cheapest motions inside its wedge, which destroys the very regularity that justified the proxy — so a smoothly improving headline metric under strong incentive is a prompt for suspicion, not reassurance.

What It Is Not

  • Not generic gaming or cheating. The collapse arises from optimization directed at a proxy and does not require bad intent — a conscientious agent widens the wedge as surely as a cheater. Framing it as cheating misdirects the remedy toward policing intent.
  • Not moral hazard. See moral_hazard: that is hidden action under misaligned risk-bearing. Goodhart is about a proxy's correlation with a construct collapsing under measurement pressure, independent of who bears risk.
  • Not the principal–agent problem. See agency_problem: that concerns divergent objectives between principal and agent. Goodhart bites even when objectives are aligned — the honest agent optimizing the proxy still degrades the construct.
  • Not regulatory capture. See regulatory_capture (the embedding-nearest neighbor): capture is a regulator co-opted by the regulated industry's interests. Goodhart is a measurement-target failure with no capturing party required; the two can co-occur but are distinct mechanisms.
  • Not generic measurement error. See bias and measurement: those are passive distortions in observation. Goodhart is the active collapse caused by closing a control loop on the metric — the using of the regularity destroys it.
  • Common misclassification. Reading any disappointing metric as Goodhart. The signature is a proxy rising impressively while the construct stagnates, driven by effort reallocation into the wedge under incentive pressure — not a metric that simply fails to move, and not error absent any optimization loop.

Broad Use

In education, standardized test scores proxy for learning; once they drive funding, pay, and promotion, the easiest path becomes test-prep, narrowed curriculum, and exclusion of weak students from testing pools, and the score-construct correlation collapses while scores climb. In healthcare, door-to-balloon times, four-hour waits, and infection-rate metrics each become targets that distort coding, triage, and classification. In policing, reported crime rates as a safety proxy invite reclassification and complaint suppression — the Campbell's-law substrate. In science, citation counts, h-indices, impact factors, and p-values, once tied to careers and funding, produce citation cartels, p-hacking, salami publication, and a replication crisis that is in large part a Goodhart collapse on the statistical-significance proxy. In AI alignment, trained policies maximize a reward signal in unintended ways — reward hacking, length inflation, plausibility over honesty — making Goodhart the canonical alignment-failure mode. In finance and macroeconomic policy (Goodhart's original substrate), targeting a monetary aggregate collapses the very regularity that motivated targeting it, and the pattern recurs with capital ratios and stress-test thresholds. In management, first-call-resolution and sales-quota structures invite ticket-gaming and channel-stuffing, and the Deming critique of management-by-numbers is structurally a Goodhart critique. In software metrics, coverage, velocity, and bug-close rates corrupt once tied to status. And in evolutionary and ecological selection on proxies, indicator traits inflate past the construct in runaway sexual selection and parasitic signal manipulation. The cross-substrate fit is strong because the three elements appear together in any system that tries to manage what it cannot directly observe, and the mechanism does not require intent.

Clarity

The law clarifies a distinction managers, regulators, and designers routinely conflate: the difference between measurement and incentivization. A metric can be epistemically excellent for diagnosis — passive, low-stakes, hidden from those it measures — and simultaneously catastrophic for control — high-stakes, visible, weighted into consequence. Naming the dynamic distinguishes "the metric is wrong" (re-measure) from "the metric is right but cannot survive being used" (redesign the control loop).

It also clarifies the failure pattern's signature. Goodhart-collapse typically presents not as the proxy refusing to move but as the proxy moving impressively while the construct stagnates or degrades, so diagnosticians who do not understand the law mistake the moving proxy for progress. The law tells them what evidence to look for: divergence between the proxy's trajectory and construct-correlated independent measures, redistribution of effort toward proxy-easy actions, and narrowing of practice around proxy-relevant behavior. The clarifying move is to treat a smoothly improving headline metric as a prompt for suspicion rather than reassurance.

Manages Complexity

Goodhart's law compresses a large family of named phenomena into a single diagnostic: Campbell's law in policy evaluation, the McNamara fallacy in war, the surrogation effect in management accounting, reward hacking in AI, p-hacking in science, teaching to the test, regulatory arbitrage, and much of the replication crisis. Each has its own literature and remediation tradition; the law reveals them as the same mechanism running in different substrates.

Within each substrate, the law also compresses a wide intervention catalogue: multi-metric scorecards that raise the cost of one-axis gaming, random or held-out evaluation that breaks the agent's ability to see the metric, rotation of metrics so the proxy cannot be drilled, narrow gating that uses the metric only at small consequence, and the most general remedy — decoupling diagnosis from incentive. All of these become applications of one underlying structural diagnosis, so a practitioner who holds the law carries a portfolio of remedies rather than a single domain-specific fix.

Abstract Reasoning

The law licenses a family of substrate-independent inferences. Proxy-survivability prediction forecasts which metrics collapse first under selection pressure as a function of how easily the construct-to-proxy gap can be exploited: proxies with a narrow exploitable wedge survive longer, and composite indices aggregating many loosely correlated subcomponents survive longest. Control-loop closure analysis estimates the weight threshold at which a metric tips from diagnostic to corrupted — below it, gaming is not worth the effort; above it, the proxy drifts. Detection via cross-validation treats divergence between the targeted proxy and an independent, not-yet-targeted proxy as evidence of collapse. Robustness via multi-objective design makes a target require joint motion of several loosely correlated proxies so the exploitable wedge becomes a multidimensional intersection rather than a single direction — the logic behind balanced scorecards and impact regularization. Incentive-decoupling keeps the metric but removes its weight in consequence, converting it back into a diagnostic. And inverse application reads a measure that has stayed predictive under long, strong incentive pressure as informative: either the proxy is tightly bound to the construct or its exploitable wedge is small.

Knowledge Transfer

The transferable content is the three-element diagnostic — construct, proxy, control loop — plus the intervention family of multi-metric scorecards, decoupling, rotation, held-out evaluation, narrow gating, composite indices, and randomization. Wherever the elements appear, the law applies and the interventions transfer with minor substrate-specific adaptation. The historical movement from monetary policy into education policy, public administration, healthcare quality measurement, AI alignment, and software-engineering management carried concrete interventions, not just vocabulary: audit-and-investigate, item-bank rotation, and multiple-measures accountability in education; held-out reward evaluation, reward ensembling, and constitutional approaches in AI. The intervention family is recognizably the same across substrates, which is why the AI-alignment community's reward-modelling agenda reads as a direct re-derivation of remedies long known in public administration.

Consider an emergency-department four-hour-wait target. The construct is timely emergency care of adequate quality; the proxy is the fraction of attendances completed within four hours; the control loop is severe managerial and reputational consequence for missing it. Under the regime, patients near the four-hour boundary receive rushed disposition, arrival registration is deferred to delay the clock, clinical priority distorts toward soon-to-breach rather than sickest patients, and coding shifts to maximize reported compliance — while construct-correlated independent measures such as twelve-hour trolley waits and delayed-sepsis mortality fail to improve in proportion, and in places diverge. The Goodhart diagnostic identifies all of these as exploitation of the construct-proxy wedge under control-loop pressure, and the same structural moves appear in the Soviet nail-factory legend, account-opening quotas, GDP-target manipulation, and language-model truthfulness proxies optimized into confident-sounding plausibility. Stripped of monetary-economics vocabulary, the law is "tie consequences to a proxy for the thing you care about and the proxy will move without the thing moving" — a sentence that does direct work, with a recognizable remedy set, in every domain that exhibits the three structural elements.

Examples

Formal/abstract

Reward hacking in reinforcement learning is Goodhart's law in its cleanest formal dress. An agent is trained to maximize a scalar reward \(R(s, a)\) that the designer chose as a proxy for an unobservable construct \(U\) — "behavior the designer actually wants." Before optimization, \(R\) correlates with \(U\) over the states the designer imagined. But the agent searches the entire reachable state space for the policy \(\pi^*\) that maximizes expected \(R\), and that argmax lands disproportionately in the wedge — the region where \(R\) is high but \(U\) is low — because such states are, by construction, the cheapest way to drive \(R\) up. The canonical demonstrations are exact: a boat-racing agent that learns to loop through a lagoon hitting score-bonus targets forever instead of finishing the race; a language model that inflates response length or hedging because the reward model rewards apparent thoroughness. The correlation that justified the proxy collapses precisely because the optimizer was effective. The structure prescribes the remedy directly: make the wedge a multidimensional intersection (reward ensembling, constitutional constraints), evaluate on held-out reward signals the policy cannot see during training, or keep \(R\) diagnostic and decouple it from the optimization pressure. Each is a direct re-derivation of remedies long known in public administration.

Mapped back: The construct is the designer's true intent \(U\); the proxy is the reward \(R\); the wedge is the set of high-\(R\), low-\(U\) states; the control loop is gradient ascent on expected \(R\); the effort-reallocation is the policy's argmax migrating into the wedge; and the collapse invariant is the \(R\)\(U\) correlation breaking exactly because optimization succeeded.

Applied/industry

An emergency department operates under a four-hour-wait target. The construct is timely emergency care of adequate quality; the proxy is the fraction of attendances completed within four hours; the control loop is severe managerial and reputational consequence for missing it. Once the loop closes, the cheapest path to move the proxy runs straight through the wedge: patients near the four-hour boundary get rushed disposition, arrival registration is deferred to delay the clock's start, clinical priority distorts toward soon-to-breach rather than sickest patients, and coding shifts to maximize reported compliance. The proxy climbs impressively while construct-correlated independent measures — twelve-hour trolley waits, delayed-sepsis mortality — fail to improve in proportion and in places diverge. The Goodhart diagnostic reads all of these as exploitation of the construct-proxy wedge under control-loop pressure, and prescribes the standard remedy family: a multi-metric scorecard (timeliness and twelve-hour waits and outcome measures) so the wedge becomes a multidimensional intersection, held-out auditing the department cannot anticipate, or decoupling the four-hour figure back into a diagnostic stripped of consequence. The identical structure governs sales account-opening quotas (fake accounts), test-score accountability in schools (teaching to the test), and GDP targeting.

Mapped back: The construct is quality emergency care; the proxy is four-hour compliance; the wedge is the gap between compliant-looking and genuinely-timely-and-safe care; the control loop is managerial consequence; the effort-reallocation is boundary-gaming and coding distortion; and the collapse invariant is the headline metric rising while independent outcome measures stagnate.

Structural Tensions

T1 — Measurement versus Incentivization (scopal). The same metric can be epistemically excellent for passive diagnosis and catastrophic once weighted into consequence; the prime's whole force is keeping these roles apart. The boundary is where a metric crosses from observed to optimized. The failure mode is collapsing the two: abandoning a genuinely informative metric because it would corrupt under pressure (throwing away diagnosis to avoid Goodhart), or believing a diagnostic metric stays trustworthy after consequences are attached to it. Diagnostic: ask whether the metric is hidden and low-stakes (safe to trust) or visible and weighted (suspect), and choose "re-measure" versus "redesign the loop" accordingly.

T2 — Optimization Pressure versus Necessary Targets (scalar). The law warns that binding pressure collapses a proxy — but organizations must set targets to coordinate at all; a world with no proxies under pressure does not function. The competing consideration is that some optimization is the price of getting anything done. The failure mode is Goodhart-paralysis: refusing to incentivize any measurable thing for fear of collapse, leaving the construct un-managed entirely. Diagnostic: locate the weight threshold at which gaming becomes worth the agent's effort, and keep the proxy's stakes below it rather than abandoning targets wholesale.

T3 — Wedge Width versus Time-to-Collapse (temporal). Collapse is not instantaneous; a proxy with a narrow exploitable wedge survives long under pressure while a wide-wedge proxy corrupts fast. The prime names the eventual collapse but the rate governs whether a metric is usable for a given horizon. The failure mode is mis-timing: discarding a slow-collapsing proxy as if it were already corrupt, or trusting a fast-collapsing one because it "still correlates" early. Diagnostic: estimate the wedge's exploitability and the optimizer's search speed to predict the half-life, and plan to rotate or retire the proxy before it tips.

T4 — Honest Optimization versus Cynical Gaming (sign/direction). The law's deepest point is that collapse does not require bad intent — a conscientious agent widens the wedge as surely as a cheater. This cuts against the natural remedy of policing intent. The failure mode is moralizing the diagnosis: hunting for gamers and punishing them, when the corruption is structural and the best-behaved employees are driving it. Diagnostic: check whether the proxy degrades even among agents acting in good faith; if so, the fix is structural (redesign the loop) not disciplinary (catch the cheaters).

T5 — Single Proxy versus Multi-Metric Intersection (coupling). The headline remedy is to make the target a joint motion of several loosely-correlated proxies, so the wedge becomes a multidimensional intersection. But composite scorecards couple metrics together and add their own gaming surface and weighting disputes; they raise the cost of one-axis gaming without eliminating it. The failure mode is over-trusting a balanced scorecard as Goodhart-proof, when a sufficiently strong optimizer finds the cheapest joint motion through all axes at once. Diagnostic: ask whether the scorecard's components fail independently; correlated subcomponents collapse together and buy little robustness.

T6 — Proxy Collapse versus Genuine Improvement (measurement). A rising proxy under incentive is ambiguous: it may index optimization effort (collapse) or real construct gain (success), and the prime tells you to suspect the former — but suspicion can become reflexive cynicism that denies all measured progress. The failure mode is the inverse error: dismissing genuine improvement as gaming because the metric moved under pressure, demoralizing agents who actually delivered. Diagnostic: cross-validate the targeted proxy against an independent, not-yet-targeted construct-correlate; convergence signals real gain, divergence signals collapse.

Structural–Framed Character

Goodhart's law sits on the framed side of the structural–framed spectrum, at the midpoint-plus aggregate of 0.5 with all five diagnostics reading 0.5. It is an eponymous law from monetary policy whose control-loop-and-intent framing is inherent, yet the proxy-collapse-under-optimization mechanism underneath is genuinely substrate-portable — so the structural and framed pulls come out balanced.

The diagnostics are uniform, and each 0.5 is earned for a specific reason. Vocabulary travels (0.5): the three-element skeleton — construct, proxy, control loop, plus the wedge between proxy and construct — is content-neutral and recurs in education, healthcare, science, AI reward hacking, finance, and even runaway sexual selection, but the prime arrives wrapped in measurement-and-incentive lexicon ("proxy," "metric," "gaming," "optimization pressure") that must be translated before it reads in an evolutionary substrate. Evaluative weight (0.5): the collapse is described as a structural fact ("the proxy comes to index optimization effort"), and the law is careful that the mechanism is intent-independent — a conscientious agent widens the wedge as surely as a cheater — yet "corruption," "degradation," and "gaming" carry a real disapproving charge. Institutional origin (0.5): its home is monetary policy and organizational governance, even though the formal version (reward hacking in RL, \(R\)\(U\) correlation collapse) is medium-neutral. Human-practice-bound (0.5): most instances need an organization placing weight on a metric, but the mechanism also runs in pure optimization (a gradient-ascent agent's argmax migrating into the wedge) and in ecological selection on indicator traits, neither of which requires a human practice. Import-versus-recognize (0.5): invoking the law imports a control-loop-and-intent perspective, but at bottom it asks the reasoner to recognize a self-undermining dynamic — using a partial regularity to control a system destroys the regularity — already present in the system.

The honest reading is that the optimization-driven proxy-collapse mechanism is broader than its monetary-economics home and transfers with concrete interventions intact (the substrate-independence grade reaches a 5 on that strength), while the eponymous law, its control-loop framing, and its evaluative charge keep it on the framed side of the middle. The 0.5 aggregate captures the even split, and the prose should hold the genuine structural mechanism and the inherited framing in balance rather than collapsing toward either.

Substrate Independence

Goodhart's Law is a maximally substrate-independent prime — composite 5 / 5 on the substrate-independence scale. The pattern — when a proxy is placed under binding optimization pressure its correlation with the construct it indicates collapses — appears wherever the three structural elements (a proxy, a construct, and optimization pressure on the proxy) co-occur, and the law itself notes the mechanism requires no intent, which is what lets it span substrates and earn the composite ceiling. On domain breadth (5) the proxy-collapse recurs across genuinely distinct arenas with the same force: education (test scores driving test-prep), healthcare (door-to-balloon times distorting triage), policing (Campbell's-law crime-rate reclassification), science (citation counts and p-values breeding cartels and p-hacking), AI alignment (reward hacking as the canonical failure mode), finance and monetary policy (Goodhart's original substrate), management, software metrics, and even evolutionary and ecological selection on proxies (runaway sexual selection on indicator traits, parasitic signal manipulation) — a span reaching from human institutions into intentless biological selection. On structural abstraction (4) the signature is highly medium-neutral — proxy, construct, optimization pressure, correlation collapse — but it sits just shy of the maximum because the framing still leans on a control-loop/optimization vocabulary that carries a faint commitment to a pressure being applied to a measure. On transfer evidence (5) the carry is concrete and documented across fields that coined the pattern independently — Campbell's law in social policy, the Deming critique of management-by-numbers, reward hacking in AI — strong evidence of rediscovery rather than borrowing. The eponymous monetary-policy origin and the intent-suggesting "target" language are the only frame-traces, and the biological instances (where no optimizer intends anything) show they wash out; what recurs is the bare three-element structure.

  • Composite substrate independence — 5 / 5
  • Domain breadth — 5 / 5
  • Structural abstraction — 4 / 5
  • Transfer evidence — 5 / 5

Relationships to Other Primes

One-hop neighborhood: parents above, mutual partners to the right, children below.Goodhart's Lawsubsumption: Proxy-Target DivergenceProxy-TargetDivergence

Parents (1) — more general patterns this builds on

  • Goodhart's Law is a kind of Proxy-Target Divergence

    proxy_target_divergence's file states it directly: "Not goodharts_law. Goodhart is ONE decoupling mechanism — the agent-driven, strategic- adaptation one. This prime is the umbrella ... of which Goodhart is one child." goodharts_law's own file agrees it is the optimization-pressure-on- a-proxy mechanism. So goodharts_law is unambiguously a CHILD of proxy_target_divergence (valid candidate CAND-R25-006-06). High conviction; both files independently license it. Phase-C kept it OFF regulatory_capture (0.824 nearest, distinct mechanism), agency_problem, and moral_hazard — correctly. campbells_law is its high-stakes twin (both children of the same umbrella). NOTE: if the family is consolidated under candidate proxy_target_fidelity (the genus, see proxy_target_divergence EMERGENT), goodharts_law re-parents there; until then the umbrella is the built/ candidate target.

Path to root: Goodhart's LawProxy-Target DivergenceProxy–Target Fidelity

Neighborhood in Abstraction Space

Goodhart's Law sits in a sparse region of abstraction space (71st percentile for distinctiveness): few abstractions share its structure, so a faithful description tends to retrieve it precisely rather than landing on a neighbor.

Family — Uncertainty, Risk & Proxy Distortion (22 primes)

Nearest neighbors

Computed from structural-signature embeddings · 2026-06-14

Not to Be Confused With

The embedding-nearest confusion is with regulatory_capture, and the two genuinely co-occur in policy settings, but they are different mechanisms with different fixes. Regulatory capture is the corruption of a regulator by the interests of the regulated — an agency that comes to serve the industry it oversees, through revolving doors, information dependence, or co-opted incentives. Goodhart's law is the corruption of a proxy by optimization pressure — a measure that loses its correlation with the construct because consequences were attached to it. No capturing party is required for Goodhart: a self-administered metric collapses under one's own optimization, in a single agent with no adversary. The distinction is load-bearing because the remedies diverge entirely. Capture is addressed by structural independence — firewalls, rotation of personnel, removing the industry's leverage over the regulator. Goodhart is addressed by control-loop redesign — multi-metric scorecards, held-out evaluation, decoupling diagnosis from incentive. A practitioner who diagnoses a corrupted oversight metric as capture will hunt for a co-opting interest that may not exist, while the actual fault is that a once-good proxy could not survive being weighted into consequence.

A second genuine confusion is with the agency_problem (principal–agent). Both involve a measured agent behaving in ways the measurer did not intend, and both surface in incentive design — but they rest on opposite assumptions about objectives. The agency problem assumes a divergence of goals: the agent has private interests that conflict with the principal's, and the challenge is to align them. Goodhart's deepest and most counterintuitive point is that collapse occurs even with perfectly aligned goals: a conscientious agent who wants exactly what the principal wants, optimizing the agreed proxy in good faith, still widens the wedge and degrades the construct. The corruption is in the proxy-construct relationship under optimization, not in a clash of interests. This matters because the agency framing tempts a manager to fix Goodhart by better aligning incentives — tying reward more tightly to the proxy — which is exactly the move that accelerates the collapse. Recognizing it as Goodhart rather than agency redirects the fix from alignment to control-loop structure.

A third confusion worth pre-empting is with moral_hazard, which also lives in the incentives neighborhood. Moral hazard is the change in an agent's risk-taking when they are insulated from the consequences — hidden action under a misallocation of risk-bearing. Goodhart concerns neither risk nor insulation: it is the collapse of a measurement proxy's informativeness when that proxy is placed under optimization pressure. One can have full Goodhart collapse with no risk-shifting at all (a school gaming test scores bears the consequences of its own gaming), and moral hazard with no proxy collapse (an insured driver takes more risk with no metric in sight). Confusing them sends a practitioner toward risk-realignment (deductibles, co-pays, skin-in-the-game) when the actual fault is that a control loop destroyed a proxy's correlation with the thing it indexed.

For a practitioner these distinctions are the difference between right and wrong interventions. Mistaking Goodhart for capture hunts for a co-opting interest that need not exist. Mistaking it for an agency problem prescribes tighter incentive alignment — the accelerant, not the cure. Mistaking it for moral hazard reaches for risk-sharing tools against a measurement-pressure failure. The prime earns its keep as the intent-independent, optimization-driven proxy-collapse mechanism that none of its incentive-family neighbors names.

Solution Archetypes

No catalogued solution archetypes reference this prime yet.