Proxy–Target Fidelity¶

Prime #: 1097
Origin domain: Measurement And Psychometrics
Subdomain: construct measurement → Measurement And Psychometrics
Also from: Computer Science & Software Engineering, Medicine, Economics, Management
Aliases: Proxy Validity, Surrogate Fidelity

Core Idea¶

Proxy–target fidelity is the structural relation between an observable stand-in and the unobservable (or expensive-to-observe) thing it stands in for, and the single question the relation poses: how faithfully does movement in the proxy reflect movement in the target? Almost nothing one actually cares about is directly readable. Ability, health, welfare, code quality, employee performance, customer satisfaction, model competence — these are targets: latent, diffuse, or only visible long after the fact. So an actor substitutes a proxy: a test score, a biomarker, gross domestic product, a passing test suite, a sales figure, a click-through rate, a benchmark accuracy. The proxy is chosen because it is cheap, fast, or simply available where the target is not. Proxy–target fidelity is the degree to which the substitution is faithful — the degree to which reading the proxy is reading the target, moving the proxy is moving the target, and inferring from the proxy is inferring about the target.

The defining commitments are four. First, there is a target: the thing of genuine interest, which is unobservable, latent, costly, or slow. Second, there is a proxy: an observable, measurable, or actionable quantity put in the target's place. Third, there is a standing-in relation: someone has designated the proxy to represent the target — for measurement, for optimization, for decision, or for inference. Fourth, and crucially, there is a fidelity: the proxy and target are correlated to some degree across the regime that matters, and the whole value of the substitution rides on that correlation being high enough for the use at hand. The prime names the relation and its gradient: fidelity is never perfect (the proxy is not the target, or there would be no proxy), and the central fact is that fidelity is use-relative and regime-relative — a proxy faithful enough to describe a population may be far too lossy to optimize against one actor, and a proxy faithful in the regime where it was validated may collapse outside it.

The structural signature distinguishes proxy–target fidelity from both the bare measurement relation and from any one of its failure modes. It is more than measurement because measurement asks "does this instrument read its own quantity accurately?" while proxy fidelity asks the prior question "does this quantity, even read perfectly, track the different quantity I actually care about?" — a thermometer can be perfectly accurate about temperature and a terrible proxy for comfort. And it is the genus over a family of named failures rather than any one of them: when fidelity is merely imperfect we get measurement error and partial validity; when an optimizer pushes on the proxy and the fidelity erodes under that very pressure, we get the Goodhart family (goodharts_law, proxy_target_divergence); when a medical proxy fails to carry a treatment's true effect on outcomes, we get the surrogate-endpoint problem; when an organizational metric diverges from the performance it was meant to capture, we get KPI gaming and Campbell's-law effects. What proxy–target fidelity provides as a prime is the recognition that all of these are the same relation read at different points on its fidelity gradient and under different pressures — and that the first move in every one of them is to make the target, the proxy, and the claimed fidelity between them explicit, rather than letting the proxy silently become the target.

How would you explain it like I'm…

Does the Stand-In Match?

You can't see how warm someone really feels inside, so you look at a thermometer instead. A thermometer is a stand-in: a thing you CAN see that's supposed to tell you about a thing you CAN'T. Proxy-target fidelity just means asking, how good is that stand-in? When the room number goes up, does the person actually feel hotter, or is the stand-in fibbing?

How Good Is the Stand-In

Most things we truly care about, like how smart someone is or how healthy they are, are hard to see directly. So we pick a stand-in we can see, like a test score or a temperature reading, and use it in place of the real thing. Proxy-Target Fidelity is how faithfully that stand-in follows the real thing: high fidelity means reading the stand-in really is like reading the real thing, low fidelity means it can fool you. A stand-in is never perfect, because if it were a perfect copy it would just be the real thing. And here's the tricky part: a stand-in can be good enough for one job, like describing a whole crowd, yet much too sloppy for another job, like deciding about one single person.

Faithfulness of the Proxy

Proxy-target fidelity is the relationship between an observable stand-in (a proxy) and the unobservable or expensive thing it stands in for (the target), plus the one question that relationship raises: how faithfully does movement in the proxy reflect movement in the target? Almost nothing we actually care about — ability, health, code quality, customer satisfaction — is directly readable, so we substitute something cheap, fast, or available, like a test score, a biomarker, or a click-through rate. Fidelity is how faithful that substitution is, and it is crucial that fidelity is never perfect: the proxy is not the target, or there would be no need for it. It is also use-relative and regime-relative: a proxy faithful enough to describe a whole population may be far too lossy to optimize against one person, and a proxy faithful where it was tested may collapse outside that range. This is more than ordinary measurement error — a thermometer can read temperature perfectly and still be a terrible proxy for comfort — because the question isn't whether the instrument reads its own quantity accurately but whether that quantity, even read perfectly, tracks the different quantity you actually care about.

Proxy-target fidelity is the structural relation between an observable stand-in and the unobservable or expensive-to-observe thing it stands in for, together with the single question that relation poses: how faithfully does movement in the proxy reflect movement in the target? Because almost nothing of genuine interest — ability, health, welfare, code quality, model competence — is directly readable, an actor substitutes a proxy (a test score, a biomarker, GDP, a passing test suite, a click-through rate) chosen for being cheap, fast, or simply available where the target is not. Four commitments define it: a target that is latent, costly, or slow; a proxy put in its place; a standing-in relation, where someone has designated the proxy to represent the target for measurement, optimization, decision, or inference; and a fidelity — the degree of correlation across the regime that matters, on which the whole value of the substitution rides. Crucially the prime names the relation and its gradient: fidelity is never perfect, and it is use-relative and regime-relative, so a proxy faithful enough to describe a population may be far too lossy to optimize against one actor, and one validated in one regime may collapse outside it. Its signature distinguishes it from bare measurement — measurement asks whether an instrument reads its own quantity accurately, while proxy fidelity asks the prior question of whether that quantity, read perfectly, tracks the different quantity actually cared about — and it is the genus over a family of named failures: imperfect fidelity gives measurement error and partial validity; fidelity eroding under optimization pressure gives the Goodhart family; a medical proxy failing to carry a treatment's true effect gives the surrogate-endpoint problem; an organizational metric diverging from real performance gives KPI gaming and Campbell's-law effects.

Structural Signature¶

the unobservable (or costly) target — the observable proxy put in its place — the standing-in relation (a designation by some agent for some use) — the fidelity (the strength of proxy–target coupling across the relevant regime) — the use-relativity and regime-relativity of that fidelity — the substitution risk that the proxy is acted on as if it were the target

Proxy–target fidelity is present when each of the following holds:

A target (the thing of interest). A latent, diffuse, slow, or expensive quantity that is what one actually cares about — ability, health, welfare, quality, performance, model competence — and which is not directly readable at the moment and cost one needs it.
A proxy (the stand-in). An observable, measurable, or actionable quantity put in the target's place because it is cheap, fast, available, or legible: a test score, a biomarker, an index, a metric, a benchmark.
A standing-in relation (the designation). Some agent has appointed the proxy to represent the target for a definite use — measurement, optimization, decision, inference, reward. The relation is not given by nature; it is a chosen substitution, and the choice can be apt or inapt.
A fidelity (the coupling strength). The proxy and target co-move to some degree over the regime that matters; the substitution's entire value depends on this coupling being high enough for the use. Fidelity is the graded quantity the prime is named for, and it is never unity.
Use- and regime-relativity (the scope invariant). Fidelity is not a property of the proxy alone but of the proxy-for-this-target-in-this-regime-for-this-use triple. A proxy faithful for description can be unfaithful for optimization; a proxy faithful in-sample can be unfaithful out of it; a proxy faithful for a population can be unfaithful for an individual.
The substitution risk (the diagnostic invariant). Because the proxy is the only thing one can see or touch, there is a standing pressure to treat the proxy as the target — to optimize it, report it, reward it, and believe it as if fidelity were perfect. The prime's whole cautionary content lives here: the proxy is a map, and acting on the map as if it were the territory is the failure the relation warns against.

The components compose into a single relation — an observable stands in, with imperfect and conditional fidelity, for an unobservable that some agent actually values — and it is the pairing of imperfect fidelity with the pressure to substitute that generates the whole downstream family: honest description, lossy inference, surrogate-endpoint failure, and Goodhart-style erosion are all readings of that one pairing under different uses and pressures.

What It Is Not¶

Not measurement accuracy (see measurement). measurement asks whether an instrument reads its own quantity faithfully — is the thermometer accurate about temperature, the scale about mass? Proxy–target fidelity asks the prior and different question of whether that quantity, even read perfectly, tracks the other quantity one actually cares about. A perfectly accurate thermometer is a poor proxy for thermal comfort, and a flawlessly precise test can be an invalid proxy for the ability it claims to measure. Measurement error is noise in the proxy; fidelity failure is a gap between proxy and target that survives even with a noiseless instrument.
Not Goodhart's law (see goodharts_law). goodharts_law is a specific dynamic: when a proxy becomes a target of optimization, the optimization pressure degrades the proxy's fidelity, because effort flows to whatever raises the proxy whether or not it raises the target. Proxy–target fidelity is the static genus — the relation and its gradient — of which Goodhart is the most important failure mode under optimization. Fidelity can be low for many reasons that have nothing to do with optimization (a badly chosen surrogate, a regime shift, an inherently lossy stand-in), and a proxy can be optimized without Goodharting if fidelity happens to be robust to the pressure. The prime names the relation; Goodhart names what optimization does to it. (Deliberately not absorbed — see the coordination flag.)
Not proxy–target divergence (see proxy_target_divergence). proxy_target_divergence names the event or condition of the proxy and target coming apart — the realized loss of fidelity. Proxy–target fidelity is the relation whose strength can be high or low; divergence is what it looks like when fidelity is (or becomes) low. The genus admits faithful proxies; divergence is one pole of the gradient.
Not construct validity (see construct_validity). construct_validity is the psychometric specialization — does a test or instrument actually measure the theoretical construct it claims to? It is proxy–target fidelity for the specific case where the target is a latent psychological construct and the proxy is a measurement instrument, with its own machinery (convergent and discriminant validity, nomological nets). Proxy–target fidelity is the cross-domain genus of which construct validity is the measurement-science instance.
Not a surrogate endpoint as such. A surrogate endpoint (a biomarker standing in for a clinical outcome) is one instance of the relation in medicine; proxy–target fidelity is the general relation, of which the surrogate-endpoint validity problem is the clinical reading.
Not the proxy being wrong. A proxy is not faithful or unfaithful in the abstract — fidelity is relative to a target, a regime, and a use. The prime is not the claim that a proxy is bad; it is the relation whose adequacy must be assessed against a specified purpose. A proxy can be excellent for one use and useless for another with no change in the proxy itself.
Common misclassification. Reading high in-sample or descriptive correlation as licensing optimization or out-of-regime inference — assuming that because the proxy tracked the target where it was checked, it will keep tracking it where it is now being pushed or extrapolated. Catch it by asking the three scope questions explicitly: faithful for which target, in which regime, under which use (describe / infer / optimize)? — because fidelity validated for description does not transfer to optimization, and fidelity validated in one regime does not transfer to another.

Broad Use¶

Proxy–target fidelity, read as "how well does the stand-in track the thing it stands in for," recurs wherever something of interest is unobservable and something observable is put in its place. In measurement and psychometrics it is the core of validity theory: a test score is a proxy for a latent ability, and construct validity is precisely the question of whether the score tracks the construct rather than test-taking skill, coaching, or an irrelevant correlate — the whole apparatus of convergent and discriminant validation exists to estimate the fidelity. In medicine, surrogate endpoints are proxies for clinical outcomes: a drug that lowers a biomarker (blood pressure, viral load, tumor size, LDL cholesterol) is assumed to improve the outcome the biomarker stands in for (stroke, mortality, survival), and the surrogate-validity literature is a sustained reckoning with cases where the proxy moved but the outcome did not — drugs that improved the biomarker while harming patients. In machine learning, almost every trained system optimizes a proxy: a loss function proxies for the true objective, a labeled-accuracy metric proxies for real-world competence, a reward model proxies for human preferences, an offline benchmark proxies for deployment performance — and reward misspecification, benchmark overfitting, and specification gaming are all fidelity failures between the optimized proxy and the intended target. In economics, indicators are proxies: GDP proxies for welfare, unemployment rate for labor-market health, CPI for cost of living, credit scores for default risk — and the recurring critique (GDP omits leisure, inequality, environmental cost) is a fidelity argument about how far the indicator tracks the underlying good. In management and public administration, key performance indicators proxy for performance: test scores for school quality, response times for service quality, arrest counts for policing effectiveness, citations for research impact — and Campbell's law and the entire literature on metric gaming describe the fidelity eroding under the pressure to score well. In finance, a credit rating proxies for default probability and a backtest proxies for live strategy performance. Across all of these, the recurring structure is identical: an observable stands in for an unobservable, the substitution's value rests on a fidelity that is imperfect and conditional, and the recurring hazard is the proxy being acted on as if it simply were the target.

Clarity¶

Naming proxy–target fidelity separates two questions that practitioners chronically fuse: is this number accurate? and does this number track what I actually care about? The first is a measurement question, answerable by calibration and precision; the second is a fidelity question, answerable only by relating the proxy back to the target — and a proxy can ace the first while failing the second completely. A blood-pressure reading can be flawlessly accurate and a poor surrogate for cardiovascular mortality; a test can be perfectly reliable and an invalid measure of the ability it claims; a benchmark can be measured to four decimal places and bear little relation to deployed competence. The clarifying force of the prime is to convert "the metric went up" into "did the target go up, or only the proxy, and what is the fidelity between them in the regime we are now in?" — relocating the discussion from the number's precision to the number's standing-in relation with the latent thing.

The prime also clarifies a recurring confusion about scope of warrant. Practitioners routinely validate a proxy in one mode — usually description ("this index correlates with welfare across countries") — and then silently extend the warrant to a different mode — optimization ("so let us maximize the index") or out-of-regime inference ("so it will hold for this new policy") — where the fidelity is not established and often does not hold. Naming the relation forces the scope to be stated: fidelity is always for a target, in a regime, under a use, and the prime makes each of those three qualifiers a thing that must be specified rather than assumed. It thereby separates the legitimate use of a proxy (read it as a noisy, conditional, partial window onto the target) from the illegitimate one (treat it as the target itself), which is the line every downstream failure — Goodhart, surrogate collapse, KPI gaming — crosses.

Manages Complexity¶

Proxy–target fidelity is, at first, what creates tractability: the whole reason to use a proxy is that the target is unobservable or expensive, and substituting a cheap observable is the move that makes measurement, decision, and optimization possible at all. One cannot directly measure "scholastic aptitude," "cardiovascular health," "model competence," or "national welfare," so one substitutes a test, a biomarker, a benchmark, an index — and an intractable latent becomes a tracked number. The prime's complexity-management value is to name both halves of this bargain: the proxy buys tractability and spends fidelity, and the discipline is to keep the price visible. A practitioner who forgets the fidelity cost has not removed the complexity of the target; they have hidden it inside a number that no longer reliably points at the thing they care about.

Recognizing the relation directs the central management move: bound and monitor the fidelity rather than assume it. The mature disciplines do exactly this. Psychometrics does not just compute a test score; it estimates the score's validity against external criteria and reports the construct it does and does not capture. Drug regulators do not accept a surrogate endpoint on faith; they require the surrogate to be validated against hard outcomes before it can carry an approval, and they downgrade surrogates that have failed. Good metric design pairs every optimized proxy with a guardrail — a second measure of the target that the optimizer is not allowed to game — precisely to detect fidelity erosion. ML practitioners hold out distributions and track multiple metrics so that a proxy's collapse against the true objective becomes visible rather than silent. The unifying complexity move is the same across substrates: a proxy is a compression of the target, the compression is lossy, and managing the system means continuously checking that the loss has not grown large enough — through regime shift, optimization pressure, or simple mis-choice — to make the cheap observable a misleading guide to the expensive thing it stands in for.

Abstract Reasoning¶

The proxy–target-fidelity pattern licenses several substrate-independent moves. Always name the target behind the metric: whenever a number is being read, optimized, or rewarded, ask what unobservable it is standing in for, because the number's whole meaning is parasitic on that relation, and a metric whose target cannot be named is a metric that cannot be validated. Estimate the fidelity, do not assume it: treat the proxy–target coupling as a quantity to be measured against external criteria (convergent validation, surrogate validation, guardrail metrics) rather than a property granted by the proxy's mere existence. Specify the scope triple: fidelity holds for a target, in a regime, under a use, so before extending a proxy from where it was validated, check whether the target, the regime, or the use has changed — fidelity validated for cross-sectional description does not transfer to optimization or to a new population. Anticipate erosion under pressure: when a proxy is turned into a target of optimization or reward, expect its fidelity to fall (the Goodhart move), because effort flows to whatever raises the proxy whether or not it raises the target — so optimization warrants a guardrail that the static-description use does not. Distinguish lossy-but-honest from gamed: a proxy can have low fidelity because it is an inherently partial window (lossy compression) or because an agent is exploiting the gap (gaming), and the two call for different fixes — better proxy versus incentive redesign. And read the proxy as a window, never the room: the disciplined stance is to treat every proxy as a noisy, conditional, partial view of a latent target, drawing inferences with the fidelity explicitly in the loop, and refusing the slide from "the proxy moved" to "the target moved" that every fidelity failure depends on.

Knowledge Transfer¶

Because proxy–target fidelity is the bare relation of an observable standing in for a valued unobservable, a technique built around it in one field transfers to any other by re-identifying the target, the proxy, and the fidelity claim between them. The validation discipline of psychometrics — never trust a score as a measure of a construct until you have related it to external criteria and shown it captures the intended construct and not its correlates — transfers directly to medicine (a biomarker must be validated against hard clinical outcomes before it can stand in for them), to machine learning (a benchmark or reward model must be checked against real-world competence before it is trusted as the objective), and to public administration (a KPI must be related to the performance it claims before it is used to allocate or reward). The surrogate-endpoint caution of medicine — a proxy that moves the right way can coincide with the target moving the wrong way, so a treatment effect on the surrogate does not establish an effect on the outcome — transfers as a general warning against inferring target gains from proxy gains: a model that scores higher on a benchmark may be worse in deployment, a school whose test scores rose may be educating worse, a strategy whose backtest improved may lose money live. The guardrail-metric design pattern — pair every optimized proxy with an unoptimized measure of the target to detect erosion — transfers from ML reward design to organizational performance management to economic-indicator dashboards, because all of them face the identical structural risk that optimizing the stand-in degrades its fidelity to the thing. The scope-of-warrant discipline — fidelity is for-this-target-in-this-regime-under-this-use and does not extend for free — transfers across every field that validates a proxy in one setting and is tempted to apply it in another. In every transfer the practitioner runs the same diagnosis: name the target, name the proxy, ask how the fidelity between them was established and over what regime and for what use, ask whether the current use (especially optimization) is one the fidelity was validated under, and pair the proxy with a guardrail wherever pressure on it is expected — and the transfer is secure because none of these steps names the substrate: a psychometrician validating a test, a regulator vetting a surrogate endpoint, an ML engineer auditing a reward model, and an administrator designing a performance metric are reasoning about the same relation, distinguished only by what is latent and what stands in for it.

Examples¶

Formal/abstract¶

The surrogate-endpoint validity problem is proxy–target fidelity in a clean inferential register. Let the target be a clinical outcome — say, all-cause mortality or stroke incidence — which is slow, expensive, and ethically costly to measure (one must wait, sometimes years, and count hard events). Let the proxy be a biomarker — blood pressure, LDL cholesterol, viral load, tumor shrinkage — which is fast and cheap to read. The standing-in relation is the regulatory and clinical decision to let a treatment's effect on the biomarker stand in for its effect on the outcome, so that a trial can read out in months on a surrogate rather than years on mortality. The fidelity is the degree to which the treatment's effect on the surrogate predicts its effect on the outcome — and the formal subtlety the prime names is that a correlation between surrogate and outcome in untreated populations is not sufficient for fidelity: what is required is that the treatment effect on the surrogate carry the treatment effect on the outcome, which can fail even when surrogate and outcome are tightly correlated at baseline (the Prentice criterion makes this precise). The use- and regime-relativity is sharp: a surrogate validated for one drug class can be invalid for another that moves the biomarker through a different mechanism. And the substitution risk is realized in documented cases where a drug improved the surrogate (suppressed arrhythmias, raised HDL, lowered glucose) while increasing mortality — the proxy moved the intended way and the target moved the opposite way, the exact catastrophe of treating a stand-in as the thing. The structural payoff is that the entire surrogate apparatus is one reading of the prime: an observable substituted for an unobservable, with a fidelity that must be established for the specific intervention and regime and never assumed from mere correlation.

Mapped back: The surrogate-endpoint case instantiates every component — an unobservable target (clinical outcome), an observable proxy (biomarker), the designation that lets the proxy stand in (regulatory acceptance of the surrogate), the fidelity (treatment effect on surrogate predicting treatment effect on outcome), its regime-relativity (mechanism- and class-specific validity), and the substitution risk realized as a drug that helped the proxy and harmed the patient — and shows the prime's core pairing: imperfect, conditional fidelity plus the pressure to treat the proxy as the target.

Applied/industry¶

A machine-learning team optimizing a recommendation system against click-through rate runs the identical structure in a computational substrate, with no medical vocabulary. The target is something latent and genuinely valued — long-run user satisfaction, retention, the user's own sense that time on the product was well spent — which is diffuse, slow to measure, and not available at training time. The proxy is click-through rate (or watch time, or engagement), an observable that is abundant, immediate, and trivially loggable, and is therefore made the optimization objective. The standing-in relation is the team's decision to treat CTR as the objective because the true target is unmeasurable in the training loop — a substitution made for tractability. The fidelity is the degree to which raising CTR raises satisfaction, and it is exactly the use-relative, regime-relative quantity the prime names: across a broad population in a stable product, CTR may correlate decently with satisfaction (the descriptive regime), but under optimization pressure the fidelity erodes — the system learns to raise clicks through clickbait, outrage, and compulsive-but-unsatisfying engagement, raising the proxy while lowering the target. This is the Goodhart move occurring inside the fidelity relation: the proxy was an acceptable window onto the target until it was turned into the target of optimization, at which point effort flowed to whatever moved CTR regardless of satisfaction. The prime's management response is the industry's: pair the optimized proxy with guardrail metrics the optimizer is not allowed to game (long-horizon retention, survey-based satisfaction, complaint rates), hold out distributions to detect collapse, and treat a rising proxy with a flat or falling guardrail as evidence of fidelity erosion rather than success. The same structure governs a reward model standing in for human preferences (reward hacking), a benchmark standing in for deployed competence (benchmark overfitting), and a labeled-accuracy metric standing in for real-world performance under distribution shift.

Mapped back: The CTR-optimization case runs the prime end-to-end — a latent target (satisfaction), an observable proxy (clicks), the designation of the proxy as objective for tractability, the fidelity that holds for description but erodes under optimization, and the substitution risk realized as a system that raised the proxy while harming the target — and demonstrates the transfer: a regulator vetting a surrogate biomarker and an ML team auditing an engagement metric are reasoning about the same standing-in relation, distinguished only by what is latent and what stands in for it.

Structural Tensions¶

T1 — Proxy versus Target (The Substitution Slide). The prime's foundational tension is that the proxy is the only thing one can see or act on, so there is a standing pressure to treat it as the target — to optimize it, report it, reward it, and believe it as if fidelity were perfect. The failure mode is reification of the proxy: the stand-in silently becomes the goal, and the latent thing it was meant to serve drops out of view, so the organization maximizes test scores instead of learning, biomarkers instead of health, clicks instead of satisfaction. Diagnostic: ask whether anyone in the system can still name the target behind the metric and whether decisions are checked against the target or only against the proxy; if the proxy has become the thing people defend and the target has gone unspoken, the slide has already happened.

T2 — Descriptive Fidelity versus Optimization Fidelity (The Goodhart Threshold). A proxy's fidelity validated for describing a population need not survive optimizing against it, because optimization pressure flows to whatever raises the proxy whether or not it raises the target. The tension is between a proxy that is an honest window under observation and the same proxy degrading the moment it becomes a lever. The failure mode is warrant transfer: taking a correlation established in the descriptive regime as license to optimize, and being surprised when the proxy rises while the target stalls or falls. Diagnostic: ask whether the proxy is being observed or pushed on; if it has become a target of optimization or reward, assume fidelity will erode and demand a guardrail measure of the target, because the descriptive correlation does not warrant the optimization use. (This is where the genus meets goodharts_law, deliberately named and not absorbed.)

T3 — In-Regime versus Out-of-Regime Fidelity (Extrapolation). Fidelity is established over some regime — a population, a range, a mechanism, a time — and need not hold outside it; a proxy validated cross-sectionally can fail under a new policy, a new drug class, a shifted distribution, a different subpopulation. The tension is between the regime where fidelity was checked and the regime where the proxy is now applied. The failure mode is extrapolated fidelity: carrying a proxy's validated coupling into a regime where the proxy–target relation is different, so inferences and decisions rest on a fidelity that no longer holds. Diagnostic: ask whether the target, population, mechanism, or distribution has changed since validation; if the proxy is being used outside the regime where its fidelity was established, the coupling must be re-validated, not assumed to carry.

T4 — Lossy-but-Honest versus Gamed (Source of the Gap). Low fidelity can arise because the proxy is an inherently partial window (an honest but lossy compression of the target) or because an agent is exploiting the proxy–target gap (gaming, teaching to the test, hacking the reward). The tension is between a fidelity problem that is structural and one that is adversarial. The failure mode is misdiagnosed gap: treating gaming as if it were mere lossiness (and trying to fix it with a marginally better proxy, which the agent will also game) or treating honest lossiness as if it were gaming (and adding surveillance where a better measure was needed). Diagnostic: ask whether the gap grows specifically under the incentive to score well — if effort visibly bends toward the proxy at the target's expense, it is gaming and needs incentive redesign; if the gap is stable regardless of incentive, it is lossiness and needs a better proxy.

T5 — Single Proxy versus Multi-Proxy Triangulation (Coverage). A single proxy captures one facet of a multi-dimensional target and leaves the rest unobserved, so a high reading on it can coexist with collapse on dimensions it does not see. The tension is between the legibility of one number and the multidimensionality of the thing it stands in for. The failure mode is facet blindness: optimizing or reporting a single proxy and being blindsided by failure in the unmeasured dimensions of the target (raising throughput while quality craters, raising test scores while curiosity dies, raising one biomarker while another worsens). Diagnostic: ask whether the target is plausibly multidimensional and whether the proxy set covers its dimensions; if a single proxy is carrying a rich target, expect uncovered dimensions to be where the surprises live, and triangulate with proxies chosen to span the facets the first one misses.

T6 — Proxy Validation versus Proxy Cost (The Reason It Was a Proxy). The entire reason for the proxy is that the target is unobservable or expensive — but validating a proxy requires comparing it against the target, which means observing the very thing the proxy was meant to avoid observing. The tension is between the need to establish fidelity and the cost that motivated the substitution in the first place. The failure mode is validation skipped for the reason the proxy exists: never checking the proxy against the target (because doing so is expensive) and therefore using a proxy of unknown fidelity indefinitely, mistaking "we have always used this number" for "this number tracks the target." Diagnostic: ask when the proxy was last validated against a direct (even if costly, even if sampled) measure of the target; if the answer is "never" or "long ago, in a different regime," the fidelity is assumed rather than known, and a periodic expensive ground-truth check is the price of trusting the cheap proxy at all.

Structural–Framed Character¶

Proxy–target fidelity sits in the low-mixed-but-structural-leaning band of the structural–framed spectrum, with a frontmatter aggregate of 0.3. The underlying relation — one observable standing in, with imperfect and conditional fidelity, for an unobservable that some agent values — is genuinely structural and medium-neutral, but it carries enough evaluative weight and presupposes enough of a designating, valuing agent that it does not read as a pure 0.0 like a topological or arithmetic predicate.

The diagnostics resolve as a structural-leaning mixture. The vocabulary travels moderately (vocab_travels 0.3): "proxy," "target," "fidelity," "surrogate," and "validity" are recognizably the same concern across psychometrics, medicine, ML, and economics, but each field has its own home lexicon (construct validity, surrogate endpoint, reward misspecification, indicator critique) and the unity is recognized rather than carried by a single traveling vocabulary. It carries real evaluative weight (evaluative_weight 0.5): "fidelity" is an explicitly graded quality — high fidelity is good, low fidelity is bad relative to the purpose the proxy serves — so the prime is not a value-neutral structural fact like an edge in a graph but a relation whose adequacy is judged against a use. Its origin is not institutional (institutional_origin 0.2): the relation predates and outruns any one field's measurement bureaucracy and is a general feature of acting through stand-ins, though it is most sharply theorized inside specific institutional measurement practices (regulatory surrogate validation, psychometric validity standards), which lifts the score slightly off zero. It is moderately human-practice-bound (human_practice_bound 0.4): the relation presupposes an agent who has designated one thing a proxy for another and who cares about the target, which is more practice-laden than a relation that holds in inanimate nature without anyone watching — though the coupling between a physical observable and a physical latent (a tracer for a flow) can hold with no human in the loop, which keeps it from being fully practice-bound. And invoking it largely recognizes rather than imports (import_vs_recognize 0.2): to identify a proxy–target relation is mostly to notice that a stand-in is already serving for a latent thing, adding little interpretive frame beyond making the substitution explicit.

The contrast with the prime's nearest neighbor underscores the read: where construct_validity is the evaluative, institutionally-theorized specialization for psychological measurement — laden with its own validation standards and professional norms — proxy–target fidelity is the more structural genus underneath it, the bare relation of a stand-in tracking what it stands for, which holds equally for a biomarker, a benchmark, and an economic index. The 0.3 aggregate is honest: more structural than a framed practice like fairness or governance, but carrying the evaluative grading and the designating agent that keep it off the pure-structural floor.

Substrate Independence¶

Proxy–target fidelity is highly but not maximally substrate-independent — composite 4 / 5 on the substrate-independence scale. Its signature — an observable standing in, with imperfect and conditional fidelity, for an unobservable that some agent values — is stated in largely relational terms and recurs with the same structure across psychometrics (construct validity), medicine (surrogate endpoints), machine learning (proxy objectives, reward models, benchmarks), economics (indicators like GDP and CPI), management and public administration (KPIs), and finance (ratings, backtests) — a domain breadth (5) spanning measurement, biological, computational, economic, and organizational substrates. The structural abstraction is high but recorded at 4 rather than 5 because the schema presupposes a valuing or inferring agent who has designated something a stand-in for something else: unlike a bare topological predicate (non-locality) or an arithmetic one (a random walk's √n law), the relation only fully makes sense relative to a purpose and a chooser, a slightly more committed frame that keeps it a notch below the pure-formal ceiling. The transfer evidence is strong and documented (4): the surrogate-endpoint literature in medicine, construct-validity theory in psychometrics, reward-misspecification and benchmark-overfitting work in ML, and the indicator-critique tradition in economics are visibly the same concern, and validation disciplines, guardrail-metric designs, and the Goodhart caution move recognizably across these fields — but the pattern travels under enough field-specific names that its unity is recognized when pointed out rather than catalogued under one banner, which holds transfer at 4 rather than 5. High abstraction and maximal breadth with strong (not maximal) cross-naming and a mild agent-relativity place this among the catalog's strong-but-not-canonical structural primes, a measurement-relational genus rather than a pure formal invariant.

Composite substrate independence — 4 / 5
Domain breadth — 5 / 5
Structural abstraction — 4 / 5
Transfer evidence — 4 / 5

Relationships to Other Primes¶

Foundational — no parent edges in the catalog.

Children (3) — more specific cases that build on this

Construct Validity is a kind of Proxy–Target Fidelity

The file: construct_validity is the PSYCHOMETRIC specialization (proxy=instrument, target=latent construct) of the cross-domain proxy->target fidelity genus. Clean child.
Cue Outcome Decoupling is a kind of Proxy–Target Fidelity

child of emergent proxy_target_fidelity
Proxy-Target Divergence is a kind of Proxy–Target Fidelity

child of emergent proxy_target_fidelity

Neighborhood in Abstraction Space¶

Proxy–Target Fidelity sits among the more crowded primes in the catalog (23^rd percentile for distinctiveness): several abstractions describe nearly the same structure, so a description that fits it will tend to fit its neighbors too — transporting it usually means disambiguating within this family rather than landing on it exactly.

Family — Uncertainty, Risk & Proxy Distortion (22 primes)

Nearest neighbors

Proxy-Target Divergence — 0.79
Campbell's Law — 0.77
Construct Validity — 0.77
Goodhart's Law — 0.72
Evidence — 0.71

Computed from structural-signature embeddings · 2026-06-14

Not to Be Confused With¶

The most important confusion is the prime's founding contrast with construct_validity, its nearest neighbor (similarity 0.71). Construct validity is the psychometric specialization: the question of whether a test or instrument actually measures the latent psychological construct it claims to measure, equipped with its own machinery — convergent and discriminant validity, the nomological net, criterion and content validity. Proxy–target fidelity is the cross-domain genus of which construct validity is the measurement-science instance: the bare relation of an observable standing in for a valued unobservable, which holds equally for a biomarker standing in for a clinical outcome, a benchmark standing in for deployed competence, and an economic index standing in for welfare. The distinction is load-bearing because the genus carries cases construct validity never touches — a surrogate endpoint, a reward model, a KPI — and because conflating them tempts a practitioner to reach only for psychometric tools (factor analysis, reliability coefficients) when the situation is, say, a reward model being gamed under optimization, which calls for guardrails and incentive redesign instead. Construct validity is proxy–target fidelity for the case where the target is a psychological construct and the proxy is a measurement instrument; the prime is the relation in general.

A second and especially important confusion — flagged here deliberately and not collapsed into the prime — is with goodharts_law. Goodhart's law is a specific dynamic within the fidelity relation: when a proxy becomes a target of optimization or reward, the optimization pressure degrades its fidelity, because effort flows to whatever raises the proxy whether or not it raises the target ("when a measure becomes a target, it ceases to be a good measure"). Proxy–target fidelity is the static genus — the relation and its gradient — of which Goodhart is the dominant failure mode under optimization. The two must be kept separate for two reasons. First, fidelity can be low for reasons entirely unrelated to optimization: a badly chosen surrogate, a regime shift, an inherently lossy stand-in — none of these is a Goodhart effect, yet all are fidelity failures, so folding Goodhart into the genus would wrongly suggest every low-fidelity proxy is being gamed. Second, a proxy can be optimized without Goodharting when its fidelity happens to be robust to the pressure — Goodhart is a contingent dynamic, not an identity. The prime therefore names Goodhart as a child/sibling (the optimization-pressure pole of the gradient) and leaves the Goodhart-family consolidation to be wired at incorporation, rather than absorbing it. An analyst who conflates them either over-applies the Goodhart story to every imperfect proxy or, conversely, misses that the reason their optimized proxy is failing is the specific erosion-under-pressure dynamic that Goodhart names precisely.

A third confusion is with proxy_target_divergence and with measurement. proxy_target_divergence names the realized condition of the proxy and target having come apart — the low-fidelity pole made into an event; proxy–target fidelity is the relation whose strength can be high or low, of which divergence is one outcome, so the genus admits faithful proxies that divergence (as a named failure) does not describe. measurement is the prior-but-different relation of an instrument reading its own quantity accurately; proxy–target fidelity asks whether that quantity, even read perfectly, tracks the different quantity one cares about. A noiseless instrument can be a faithless proxy (perfect thermometer, poor comfort proxy), and a noisy instrument can still be a faithful-enough proxy if its bias is in the right direction — the two failures (measurement error in the proxy, fidelity gap between proxy and target) are independent, and improving the instrument does nothing to close a fidelity gap that lives in the choice of what to measure.

For a practitioner these distinctions decide what the actual problem is and therefore what to do about it. Confusing the prime with construct_validity narrows a general fidelity problem to a psychometric one and reaches for the wrong toolkit. Confusing it with goodharts_law either over-applies the gaming story to every imperfect proxy or misses the specific optimization-erosion dynamic when it is the real cause. Confusing it with proxy_target_divergence mistakes a relation with a gradient for a single failure event, hiding the faithful-proxy cases. Confusing it with measurement mistakes a gap in what is measured for noise in how well it is measured, and sends effort to instrument precision when the fix is a better-chosen or guardrailed proxy. The unifying discipline is the prime's three-part check: name the target behind the metric, ask how the fidelity between proxy and target was established and over what regime and for what use, and check whether the current use — especially optimization — is one the fidelity was validated under; only then is acting on the proxy a defensible substitute for acting on the target it stands in for.

Solution Archetypes¶

No catalogued solution archetypes reference this prime yet.