Construct Validity¶
Core Idea¶
Construct validity asks whether a measurement actually captures the theoretical construct it claims to, rather than a correlated surrogate. The structure is a three-layer gap: a named construct in the modeler's theory, an operationalized proxy chosen to stand for it, and the signal the proxy actually emits — with each transition able to leak.
How would you explain it like I'm…
The Sneaky Measure Check
Does The Stand-In Match?
The Three-Layer Measurement Gap
Broad Use¶
- Psychometrics: does a score measure the named trait or merely test-taking practice and self-report etiquette?
- Education: does an exam measure understanding or the ability to perform one task format? Teaching-to-the-test is the collapse.
- AI evaluation: does a benchmark measure the named capability or memorization, prompt-format sensitivity, or contamination?
- Public-sector metrics: does an aggregate measure prosperity or care quality, or a monetized administrative correlate?
- Software metrics: does a count measure productivity or the cost of typing and the politics of estimation?
- Clinical measurement: does a marker measure the disease or a downstream effect that can respond to treatment without it?
Clarity¶
Separates getting precise numbers from getting the right numbers: two programs with identical reliability can differ wildly in validity, one of them precisely measuring the wrong thing.
Manages Complexity¶
Reduces the open-ended worry "are we measuring the right thing?" to a structured battery of paired probes — convergent, discriminant, nomological, predictive, face — each with a known failure signature that locates the leak precisely.
Abstract Reasoning¶
Supports inference about what an instrument actually responds to, and a structural prediction — the stakes loop: making a proxy consequential erodes its construct-to-proxy bridge over time, anticipated before it is observed.
Knowledge Transfer¶
- Psychometrics → AI eval: the convergent/discriminant/nomological battery ports almost unchanged to benchmark design.
- System evaluation → public metrics: the lesson that gaming destroys validity carries to school accountability and safety statistics.
- Clinical surrogates → any optimization: the surrogate-endpoint failure mode — proxy moves while construct does not — transfers to any convenient-proxy target.
Example¶
A depression self-report that correlates with every other self-report regardless of construct but weakly with clinician interviews has leaked at the construct-to-proxy bridge: it measures willingness to endorse negative statements, not depression.
Relationships to Other Primes¶
Parents (1) — more general patterns this builds on
- Construct Validity is a kind of Proxy–Target Fidelity — The file: construct_validity is the PSYCHOMETRIC specialization (proxy=instrument, target=latent construct) of the cross-domain proxy->target fidelity genus. Clean child.
Path to root: Construct Validity → Proxy–Target Fidelity
Not to Be Confused With¶
- Construct Validity is not Validation because validation asks whether an artifact conforms to its specification, whereas construct validity asks the prior question of whether the chosen proxy stands for the named construct at all.
- Construct Validity is not Goodhart's Law because Goodhart names the dynamic erosion of a proxy under stakes, whereas construct validity names both the standing gap (which can fail at time zero) and the erosion, and supplies the probe battery.
- Construct Validity is not Measurement because measurement is the operation of assigning a value to an observable, whereas construct validity interrogates whether that observable was the right thing to measure.