A cue that once reliably tracked an outcome has its coupling broken — by environmental shift, adversarial spoofing, or the actor's own optimization — while the well-tuned cue-following behavior persists and grows more harmful the better it tracks the now-decoupled cue.
Imagine a dog learns that a ringing bell means dinner is coming, so it runs to its bowl every time the bell rings. But one day the bell keeps ringing and no dinner ever comes, yet the dog still races to the bowl. Cue Outcome Decoupling is when a signal you trusted stops being connected to the thing it used to mean, but you keep chasing the signal anyway.
The Signal Went Hollow
You often rely on a cue, an easy-to-spot sign, because in the past it reliably went together with something you really care about but can't see directly, like food, safety, or success. Then one day that link quietly breaks: the world changes, someone fakes the signal, or the thing that made the link disappears. The cue is still there and you still react to it, but its meaning has been silently cut off from the outcome. Cue Outcome Decoupling is exactly this break, and the sneaky part is that the better you are at following the cue, the more confidently you head straight into trouble.
When The Proxy Breaks
An actor relies on a cue, an easy-to-measure proxy, because in its history that cue was reliably correlated with an underlying outcome it actually cares about but can't directly observe at decision time, like food, safety, profit, or ground truth. Later the correlation breaks: the world shifts, the channel is spoofed, the system is optimized through, or the mechanism that produced the correlation changes. The cue is still there and the cue-following behavior still fires, but the cue's meaning has been silently severed from the outcome. Its anatomy has six parts: a cue, an outcome, a historical coupling that made the cue informative, a fast cue-response well-tuned to that coupling, a slow update channel for the cue's meaning, and an event that breaks the coupling. The failure is generic and silent: the actor acts MORE reliably the better it is at cue-tracking, straight into harm. The fix is also generic, restore the coupling, speed up updating, or wrap the cue with a sanity check, and the diagnostic is: when a rule keys on something, ask what made that something informative, and check whether it still holds.
An actor relies on a cue, a perceptible, easy-to-measure proxy, because in the actor's history that cue was reliably correlated with an underlying outcome the actor actually cares about: food, safety, profit, ground truth, mission success. At some later point the correlation breaks: the world shifts, the channel is spoofed, the system is optimized through, or the mechanism that produced the correlation changes. The cue is still there, the cue-following behavior still fires, but the meaning of the cue has been silently severed from the outcome it once tracked. The structural anatomy has six parts: a cue observable by the actor; an outcome the actor cares about but cannot directly observe at decision time; a historical coupling, the regularity that made the cue informative; a fast cue-response well-tuned to that coupling; a slow update channel for the meaning of the cue; and an event that breaks or reverses the coupling. The failure is generic: the actor continues to act more reliably the more capable it is at cue-tracking, straight into harm or waste. The structural intervention is also generic, restore the coupling, speed up the update channel, or wrap the cue with a sanity check, and the diagnostic is generic: if a decision rule keys on something, ask what made that something informative, and check whether the answer still holds. What changes in a reader's view of a system is that 'the system is failing' splits into two questions: is the agent's response well-tuned (usually yes, which is why the failure is silent), and has the cue's meaning shifted (the load-bearing question); and the surface-identical phenomenon of a sensor going dark is separated from the deeper phenomenon of a cue that still reads correctly but no longer connects to the outcome.
Splits "the system is failing" into two questions — is the response well-tuned? (usually yes, which is why the failure is silent) and has the cue's meaning shifted? (the load-bearing one) — and separates a dark sensor from a cue that still reads correctly but no longer connects.
Reduces a sprawling failure family to a four-question diagnostic — what cue, what outcome, what mechanism made the correlation, does it still hold? — plus three remedies: restore the coupling, accelerate the update, sanity-check the cue.
Carries a counter-intuitive prediction: more capable cue-tracking makes the failure worse — the best-tracking turtles drown first, the sharpest classifier generalizes worst — so capability is risk-amplifying under cue drift.
A pneumonia detector trained at one hospital latches onto a scanner watermark that correlated with disease (it marked already-sick inpatients); at a new hospital the watermark is absent, the coupling is severed, yet the model keeps weighting the now-meaningless cue — and a model that learned it more sharply generalizes worse.
Cue Outcome Decoupling is not Goodhart's Law because it is the broader structure breaking by any of three mechanisms, whereas Goodhart is only the optimization-driven subtype; a turtle's drift is environmental, a phishing victim's adversarial.
Cue Outcome Decoupling is not Learning because it is the failure phase after a learned mapping's coupling breaks, whereas learning is the acquisition of the response; more learning makes the eventual decoupling worse.
Cue Outcome Decoupling is not Sensor Failure because the cue is still present, measurable, and triggers the right response — only the response no longer connects — whereas a sensor failure is the cue itself going dark.