Signal Detection Theory¶

Prime #: 1181
Origin domain: Psychology Cognitive
Subdomain: decision and psychophysics → Psychology Cognitive
Aliases: Sdt, Roc Analysis

Core Idea¶

In any setting where an observer must decide whether a particular state of the world — the "signal" — is present against a noisy background, every decision factorizes into two independent components. The first is sensitivity: how well the observer's internal evidence separates signal-present from signal-absent worlds. The second is a criterion: how much evidence the observer requires before responding "present." The choice of criterion is a free policy variable. It shifts the trade among hits, misses, false alarms, and correct rejections, moving the operating point along a receiver-operating-characteristic (ROC) curve whose shape is fixed by sensitivity. Confusing the two — reading high false-alarm rates as low sensitivity, or vice versa — produces persistent diagnostic errors.

The load-bearing structural commitment is that sensitivity and criterion are orthogonal, and that this orthogonality bounds what any decision policy can achieve. No criterion can transcend the underlying sensitivity; it can only redistribute that sensitivity's errors. To lower false alarms by tightening the criterion is to raise misses by a quantifiable amount, sliding along the existing ROC; to lower both at once requires a different and better ROC, which means improving sensitivity — better instruments, better features, more evidence per decision — not adjusting the cutoff. The framework reduces any binary decision under uncertainty to a common 2×2 outcome matrix and two scalar summaries, one for sensitivity and one for criterion, and it cleanly separates three quantities that ordinary language fuses into "how good is this test?": the discrimination capacity of the evidence, the decision rule applied to it, and the cost-of-error structure that, together with base rates, picks the optimal operating point. It is, in effect, a coordinate system for decisions under noise.

How would you explain it like I'm…

The Smoke Alarm Dial

Imagine you're listening for your mom calling your name in a noisy playground. Two different things matter. One is how good your ears are at telling her voice apart from all the noise. The other is how sure you want to be before you yell 'Coming!' — because if you answer too easily you'll run over when it wasn't her, but if you wait for total certainty you'll miss her sometimes. Those two things are separate, and you can change how careful you are without changing how good your ears are.

Sharpness And Caution

Signal detection theory is about any time you have to decide whether something real is there when there's a lot of confusing noise around it. It says every such decision really has two separate parts. The first is how well your evidence can tell 'it's there' apart from 'it's not there' — call that your sensitivity. The second is how much proof you demand before you say 'yes, it's there' — call that your criterion, and you get to choose it. Choosing a stricter criterion gives fewer false alarms but more misses; a looser one does the opposite. The big lesson is that you can't escape your sensitivity by changing your criterion — you can only trade one kind of mistake for another.

Sensitivity Versus Criterion

Signal detection theory says that whenever an observer must decide whether a signal is present against a noisy background, every decision factors into two independent parts. The first is sensitivity: how well the observer's internal evidence separates signal-present from signal-absent worlds. The second is a criterion: how much evidence the observer demands before responding 'present.' The criterion is a free policy choice; shifting it trades among hits, misses, false alarms, and correct rejections, moving the operating point along an ROC curve whose shape is fixed by sensitivity. The load-bearing point is that sensitivity and criterion are orthogonal: no criterion can transcend the underlying sensitivity, only redistribute its errors. Tightening the criterion to cut false alarms raises misses by a quantifiable amount; lowering both at once requires a better ROC, meaning improved sensitivity (better instruments, better features, more evidence), not a different cutoff. Confusing the two, reading high false-alarm rates as low sensitivity, causes persistent diagnostic errors.

In any setting where an observer must decide whether a particular state of the world — the signal — is present against a noisy background, every decision factorizes into two independent components. The first is sensitivity: how well the observer's internal evidence separates signal-present from signal-absent worlds. The second is a criterion: how much evidence the observer requires before responding 'present.' The choice of criterion is a free policy variable; it shifts the trade among hits, misses, false alarms, and correct rejections, moving the operating point along a receiver-operating-characteristic (ROC) curve whose shape is fixed by sensitivity. Confusing the two — reading high false-alarm rates as low sensitivity, or vice versa — produces persistent diagnostic errors. The load-bearing commitment is that sensitivity and criterion are orthogonal, and this orthogonality bounds what any decision policy can achieve. No criterion can transcend the underlying sensitivity; it can only redistribute that sensitivity's errors. Lowering false alarms by tightening the criterion raises misses by a quantifiable amount, sliding along the existing ROC; lowering both at once requires a different and better ROC, i.e. improving sensitivity — better instruments, better features, more evidence per decision — not adjusting the cutoff. The framework reduces any binary decision under uncertainty to a common 2x2 outcome matrix and two scalar summaries, separating three things ordinary language fuses into 'how good is this test?': the discrimination capacity of the evidence, the decision rule applied to it, and the cost-of-error structure that, with base rates, picks the optimal operating point. It is, in effect, a coordinate system for decisions under noise.

Structural Signature¶

the latent binary state — the noisy evidence variable — the two evidence distributions (signal-absent, signal-present) — the sensitivity that separates them — the freely-chosen criterion partitioning the evidence axis — the ROC relating the two error rates as the criterion sweeps — the cost-and-base-rate structure selecting the operating point

The pattern is present when each of the following holds:

A latent binary state. A signal is either present or absent — disease, target, perpetrator, guilt — and is not directly observed.
A noisy evidence variable. The decider observes only a graded evidence signal correlated with, but not determined by, the latent state.
Two evidence distributions. The evidence has one distribution when signal is absent and another when present; these overlap.
Sensitivity. A scalar captures how far apart the two distributions sit — the discrimination capacity of the evidence, fixed by instruments, features, and information per decision.
A criterion. A freely-chosen threshold partitions the evidence axis into "report present" and "report absent"; it is a policy variable independent of sensitivity.
An ROC and a 2x2 outcome. Sweeping the criterion traces a receiver-operating-characteristic curve whose shape is fixed by sensitivity, mapping every decision into hit / miss / false alarm / correct rejection.
A cost-and-base-rate structure. Error costs together with the population base rate select the optimal operating point on the fixed ROC.

These compose around one orthogonality: sensitivity fixes the achievable error trade-off (the ROC), and criterion only redistributes errors along it. Lowering both error rates at once requires a better ROC — improve the evidence — not a moved cutoff, so the central diagnostic is whether a performance complaint is a sensitivity problem or a criterion problem.

What It Is Not¶

Not type_i_type_ii_errors. That names the two error kinds (false alarm, miss). Signal detection theory adds the generative model — two overlapping evidence distributions — that factorizes the error trade-off into a sensitivity that fixes the ROC and a criterion that distributes errors along it (see type_i_type_ii_errors).
Not hypothesis_testing_null_vs_alternative. Null-hypothesis testing fixes a significance level and asks whether to reject. SDT treats the cutoff as a free policy variable selected by error costs and base rates, and separately characterizes the evidence's discrimination capacity across the whole ROC (see hypothesis_testing_null_vs_alternative).
Not signaling. Despite embedding-nearness, signaling concerns a sender strategically emitting a costly cue to convey type. SDT concerns a receiver discriminating a latent state from noisy evidence. One is about emission; the other about detection.
Not calibration. Calibration is whether stated probabilities match observed frequencies. SDT is about discrimination (separating the two states) and criterion placement — a perfectly discriminating detector can be miscalibrated, and vice versa.
Not selection_bias. Selection bias distorts which data enter the sample. SDT presumes the evidence distributions and asks how to decide against them; a biased sample would corrupt the estimated distributions but is a different defect.
Common misclassification. Reading a high false-alarm rate as a bad instrument (low sensitivity) when it is a lenient criterion, or the reverse. The two are orthogonal: moving the cutoff slides along one ROC and cannot lower both error rates. The tell: would the complaint be fixed by moving the cutoff (criterion) or does it require a better ROC (sensitivity)?

Broad Use¶

The same sensitivity-criterion factorization recurs across substrates that share nothing but the structure of a decision against noise. In psychophysics, it is the detection of faint stimuli against perceptual noise, the original setting. In radar and sonar, it is distinguishing real targets from clutter under jamming, the field where the theory developed alongside its psychological form. In medical screening, the entire ROC-and-AUC vocabulary of mammography, lab assays, and rapid tests is signal-detection theory applied. In machine-learning classification, it is precision-recall and ROC curves, decision thresholds on probability outputs, and cost-sensitive threshold tuning. In eyewitness identification, it separates a witness's sensitivity to recognize a perpetrator from the witness's willingness to identify someone. In security screening — airport scanners, content moderation, fraud detection — it is the choice of criterion given the costs of false alarms versus misses. In judicial decision-making, the sensitivity of evidence to guilt and the criterion of "reasonable doubt" are explicitly named in legal doctrine. In astronomy and gravitational-wave detection, it is matched-filter threshold-setting against detector noise. And in memory research, recognition memory is analyzed as old-versus-new signal detection. The framework is the same in each; the substrate-specific work is locating the two evidence distributions, the criterion, and the cost structure.

Clarity¶

The theory clarifies by separating two questions that ordinary language fuses into "how good is this test?": how informative is the underlying evidence? — sensitivity — and what decision rule are we applying to it? — criterion. Once separated, many disputes resolve. "The test has too many false alarms" can mean either that the underlying sensitivity is poor, a measurement problem, or that the criterion is set too lenient for the cost structure, a policy problem — and the corrective actions are entirely different. The frame likewise distinguishes the cost-of-error structure from the discrimination capacity, two things that together but separately determine the right operating point. The clarifying force is to make visible that an error rate is a joint product of evidence quality and decision policy, so that the right fix can be aimed at the component actually responsible rather than at the conflated whole.

Manages Complexity¶

The theory collapses any binary decision under uncertainty into a common 2×2 confusion matrix — hit, miss, false alarm, correct rejection — with two scalar summaries: a sensitivity index for the evidence and a criterion for the policy. The full noise-plus-signal apparatus reduces to one curve, the ROC, and one operating point on it. Disputes about test performance, screen calibration, and decision thresholds all proceed inside this shared frame across domains, so an argument about mammography, a spam filter, and a legal standard can be conducted in the same terms. The compression is sharp because it isolates exactly two degrees of freedom: everything about a binary decision under noise that matters is captured by where the ROC sits, set by sensitivity, and where on it the operating point lies, set by criterion. A potentially bewildering performance debate thereby reduces to two questions — is the ROC good enough, and is the operating point in the right place — each with a distinct remedy.

Abstract Reasoning¶

The framework supports a family of inferences that pre-theoretic talk about "good tests" cannot pose cleanly. At a given criterion, what are the costs of the unavoidable error mix given the population's base rate? How would the optimal criterion change if base rates shifted? Could the false-alarm rate be lowered without raising the miss rate — that is, is there room on the sensitivity dimension, or only on the criterion dimension? What is the cost of treating sensitivity and criterion as linked when they are independent? These questions concern the relationship among discrimination capacity, decision policy, and error cost under noise, a relationship indifferent to whether the decider is a biological perceiver, a physical detector, a statistical procedure, or a judicial body. To reason with the theory is to reason about which of two orthogonal levers — improve the evidence or move the cutoff — a given performance complaint actually calls for, a distinction that holds in every substrate where decisions are made against noise.

Knowledge Transfer¶

The portable interventions follow directly from the orthogonality. When the cost of misses rises — a deadly disease, a security threat — lower the criterion, accepting more false alarms. When the cost of false alarms rises — expensive follow-up, low base rate of true positives — raise it. To improve on the sensitivity dimension requires entirely different work: better instruments, better features, more training data, more evidence per decision. The framework forces this distinction explicitly, so that effort is never wasted adjusting a cutoff when the real deficit is in the evidence, or rebuilding an instrument when the real problem is a mis-set criterion.

The structural roles map across substrates. The latent binary state is the disease, target, perpetrator, threat, or guilt that may or may not be present; the noisy evidence variable is the image, reading, memory trace, or signal observed by the decider; the two evidence distributions are the signal-absent and signal-present densities whose separation is sensitivity; the criterion is the threshold partitioning the evidence axis into "report present" and "report absent"; the ROC curve relates the two error rates as the criterion sweeps; and the cost-of-error structure, with base rates, selects the optimal operating point. A radiologist deciding whether a higher false-positive rate warrants a stricter cutoff or a better imaging modality, a machine- learning engineer tuning a classification threshold against misclassification costs, and a court setting an evidentiary standard are performing the same structural act: choosing an operating point on a fixed ROC, and recognizing that only a better ROC can lower both error rates at once. The diagnostic — is this a sensitivity problem or a criterion problem? — travels unchanged across psychophysics, radar, medical screening, machine learning, eyewitness procedure, security, law, and astronomy. Because the intervention vocabulary is identical across these media, a practitioner who has separated sensitivity from criterion in one domain can import the whole apparatus — characterize the ROC, locate the operating point, price the error costs — into any domain that makes binary decisions against noise.

Examples¶

Formal/abstract¶

Take the Gaussian equal-variance model, the theory's analytic core. The latent binary state is signal-absent versus signal-present. The noisy evidence variable \(x\) is drawn from \(N(0, 1)\) when the signal is absent and from \(N(d', 1)\) when present, so the two evidence distributions are unit-variance normals separated by \(d'\) — the scalar sensitivity. The observer reports "present" whenever \(x\) exceeds a criterion \(c\). From these two parameters every outcome follows by integration: the hit rate is the area of the signal-present density above \(c\), the false-alarm rate is the area of the signal-absent density above \(c\), and sweeping \(c\) traces the ROC curve whose bow toward the top-left corner is fixed entirely by \(d'\). The orthogonality is exact and visible: move \(c\) left and you raise both hits and false alarms, sliding along one ROC; you cannot lower both error rates without increasing \(d'\), which means a different, better ROC — better evidence, not a moved cutoff. The cost-and-base-rate structure selects the optimal \(c\): the likelihood-ratio criterion that minimizes expected cost is \(c^* = \ln[(p_{\text{absent}} \cdot C_{FA}) / (p_{\text{present}} \cdot C_{miss})]\), so a rarer signal or a costlier false alarm pushes the criterion higher. The diagnosis this licenses is the prime's central question made quantitative: a performance complaint is a sensitivity problem if it requires raising \(d'\) and a criterion problem if it only requires moving \(c\).

Mapped back: the two normals are the evidence distributions, \(d'\) is the sensitivity fixing the ROC, \(c\) is the freely-chosen criterion redistributing errors along it, and the likelihood-ratio cutoff is the cost-and-base-rate operating point — the framework's roles in closed form.

Applied/industry¶

Two applied substrates carry the identical structure. First, mammographic screening. The latent state is cancer present or absent; the evidence is the radiologist's read of the image; the two distributions are the appearance densities of malignant versus benign tissue, whose overlap fixes sensitivity (set by imaging modality and reader skill). The criterion is how suspicious a finding must look before the radiologist recalls the patient. The clarifying payoff is decisive: a program with "too many false positives" faces two entirely different fixes. If the underlying read is poorly discriminating, that is a sensitivity problem — buy better imaging (tomosynthesis), train readers, add evidence — which moves to a better ROC. If the read is fine but the recall threshold is set too lenient for the cost structure and the low base rate of cancer, that is a criterion problem — raise the cutoff, trading a few more missed cancers for far fewer needless biopsies. Confusing the two wastes effort rebuilding an instrument when the cutoff was the issue, or vice versa. Second, a machine-learning fraud classifier outputs a probability score; the evidence distributions are the score densities for legitimate versus fraudulent transactions, their separation summarized by AUC (sensitivity), and the decision threshold on the score is the criterion. When fraud losses (cost of a miss) rise relative to the friction of a false decline (cost of a false alarm), the engineer lowers the threshold — the same move a court makes when it sets "beyond reasonable doubt" high because the cost of a false conviction is judged to exceed the cost of a false acquittal. To lower both error types at once, only a better model — more features, more data, higher AUC — will do.

Mapped back: tumor appearance and transaction-score densities are the evidence distributions; imaging modality and model AUC are the sensitivity; recall threshold and decision threshold are the criteria; and biopsy-versus-miss and fraud-loss-versus-friction are the cost structures selecting the operating point — the same sensitivity/criterion factorization across medicine, machine learning, and law.

Structural Tensions¶

T1 — Sensitivity versus Criterion (scopal). The framework's whole leverage is the orthogonality of sensitivity (the achievable trade-off) and criterion (where you sit on it), but the two are constantly conflated in practice — a high false-alarm rate is read as a bad instrument when it is a lenient cutoff. Failure mode: rebuilding the detector (expensive) when only the threshold needed moving, or endlessly tuning the threshold when the ROC itself is too poor to meet both targets. Diagnostic: would the complaint be fixed by moving the cutoff (criterion) or does it require a better ROC (sensitivity)? You cannot lower both error rates by moving the cutoff.

T2 — The Distributions May Not Be Stationary (temporal). Sensitivity assumes two fixed evidence distributions, but in adversarial or drifting settings the signal-present distribution moves — fraudsters adapt, disease presentations shift, the underlying populations change — so a criterion optimal yesterday is mis-placed today and an ROC measured once is stale. Failure mode: trusting a frozen operating point while the distributions drift apart or together beneath it, silently degrading performance. Diagnostic: are the evidence distributions estimated on data contemporaneous with deployment, or assumed constant since calibration? Adversaries make them non-stationary by design.

T3 — Base Rate Dominates at the Extremes (scalar). The optimal criterion depends on base rate, and at extreme base rates (very rare signal) even excellent sensitivity yields mostly false positives among the alarms — the base-rate term swamps the sensitivity term. Failure mode: deploying a high-\(d'\) screen against a rare condition and being overwhelmed by false positives, then blaming the instrument when the base rate was the governing factor. Diagnostic: is the precision complaint a sensitivity problem or a base-rate problem? At low prevalence, no criterion delivers high precision without near-perfect sensitivity.

T4 — Cost Structure Is Often Unstated (measurement). Selecting the operating point requires the cost-of-error structure, but those costs (a missed cancer vs. a needless biopsy, a fraud loss vs. a declined customer) are frequently incommensurable, contested, or political — the framework demands a number the domain cannot cleanly supply. Failure mode: an operating point chosen by default or by whoever shouts loudest, presented as if it were the SDT-optimal cutoff. Diagnostic: are the relative error costs explicit and agreed, or is the criterion being set while pretending the cost ratio is obvious? The math is exact only once the costs are named.

T5 — Binary Decomposition versus Graded Reality (scopal). SDT reduces the problem to a latent binary state and a single criterion, but many real decisions are multi-class, sequential, or genuinely continuous, and forcing them into present/absent discards structure. Failure mode: collapsing a graded severity assessment into one threshold, losing the information that a multi-criterion or regression treatment would retain. Diagnostic: is the latent state truly binary, or is a binary cut being imposed on an ordinal/continuous reality for the convenience of the 2x2? The factorization is clean only when the underlying decision is actually dichotomous.

T6 — Sensitivity Has Diminishing, Costly Returns (sign/direction). The prescription "to lower both error rates, improve sensitivity" is correct but treats a better ROC as freely available, when more features, more data, or better instruments cost real resources and yield diminishing \(d'\) gains. The competing move — accept the current ROC and optimize the criterion — is sometimes the rational choice. Failure mode: pouring resources into marginal sensitivity improvements when re-pricing the criterion against the true cost structure would have captured most of the value. Diagnostic: what is the marginal cost of a unit of \(d'\) versus the value of re-optimizing \(c\)? Improving the instrument is right only when the criterion is already well-placed.

Structural–Framed Character¶

Signal detection theory sits at the structural end of the structural–framed spectrum, consistent with its aggregate of 0.1. It is a formal decision-under-noise framework — every binary decision factorizes into a sensitivity that fixes the achievable error trade-off and a freely-chosen criterion that distributes errors along it — and that factorization holds in any substrate where an observer decides about a latent state from noisy evidence, with nothing tied to a particular field's assumptions.

Nearly every diagnostic reads structural. The vocabulary is mathematical and substrate-neutral: ROC, AUC, \(d'\), criterion, likelihood-ratio cutoff describe a psychophysical detection task, a radar return, a mammogram, a fraud classifier, and a jury verdict in exactly the same terms, each domain reading off the same coordinate system without importing a home lexicon. The framework carries no inherent approval or disapproval: a criterion is neither good nor bad until the error costs are specified, and the theory deliberately separates the value-laden cost structure from the value-neutral discrimination capacity. It is thoroughly human-practice-independent — a matched-filter gravitational-wave detector and an astronomical photon-counter instantiate the same sensitivity/criterion factorization with no human perceiver present. And invoking it merely recognizes a structure already latent in any decision against noise — two overlapping evidence distributions partitioned by a threshold — rather than importing an interpretive overlay.

The only criterion above zero is institutional origin, scored at the midpoint, reflecting the framework's genesis as a named construction at the confluence of psychophysics and wartime radar engineering. But that mild origin charge is the sole deviation from a pure-structural profile; the theory is recognized, not imported, on every other axis, which is exactly why the grade places it among the catalog's paradigmatically structural members.

Substrate Independence¶

Signal detection theory is a maximally substrate-independent prime — composite 5 / 5 on the substrate-independence scale. Its core factorization — every decision under noise splits into a sensitivity that fixes the achievable error trade-off and a freely-chosen criterion that distributes errors along it — is a formal decision-under-noise structure whose vocabulary (d′, ROC curve, criterion, AUC) is mathematical and medium-neutral. Domain breadth is a full 5: the identical model governs psychophysics (its origin), radar and sonar detection, medical screening and diagnostic tests, machine-learning classifier evaluation, eyewitness identification, security screening, jury and legal decision-making, astronomical source detection, and memory recognition. Structural abstraction is 5, since the signature carries no domain commitments — any system that must call signal-versus-noise from overlapping distributions instantiates it directly. Transfer evidence is 5: the same ROC/criterion apparatus, with its identical derivations, ports verbatim from psychophysics to radar to oncology screening to ML, used as the same tool in each. Maximal on every axis, this is one of the catalog's canonical substrate-neutral analytical primes.

Composite substrate independence — 5 / 5
Domain breadth — 5 / 5
Structural abstraction — 5 / 5
Transfer evidence — 5 / 5

Relationships to Other Primes¶

Parents (1) — more general patterns this builds on

Signal Detection Theory presupposes Type I & Type II Errors

The file: the 2x2 outcome matrix IS the type-I/type-II framework (false alarms = type I, misses = type II); SDT ADDS the generative model (two overlapping evidence distributions) that factorizes the error trade-off into a sensitivity fixing the ROC and a criterion distributing errors along it. Built on the error-types pair.

Path to root: Signal Detection Theory → Type I & Type II Errors → Trade-offs → Constraint

Neighborhood in Abstraction Space¶

Signal Detection Theory sits among the more crowded primes in the catalog (12^th percentile for distinctiveness): several abstractions describe nearly the same structure, so a description that fits it will tend to fit its neighbors too — transporting it usually means disambiguating within this family rather than landing on it exactly.

Family — Cue-Outcome Drift & Silent Failure (18 primes)

Nearest neighbors

Absence Of Evidence Vs Evidence Of Absence — 0.75
False Positive Paradox — 0.75
Texas Sharpshooter Fallacy — 0.75
Absence as Information — 0.74
Clustering Illusion — 0.74

Computed from structural-signature embeddings · 2026-06-14

Not to Be Confused With¶

Signal detection theory is most precisely confused with type_i_type_ii_errors, because the 2×2 outcome matrix at its heart is the type-I/type-II framework — false alarms are type-I errors and misses are type-II. The distinction is that the type-I/type-II framework names the two error kinds and notes a trade-off between them, while SDT supplies the generative model that explains and quantifies that trade-off. SDT posits two overlapping evidence distributions (signal-absent and signal-present), and from that model it factorizes performance into two orthogonal quantities the bare error-types framework does not isolate: a sensitivity that fixes the entire achievable trade-off (the ROC curve) and a criterion that picks one operating point along it. This factorization carries content the error-types framing cannot: it explains why lowering one error raises the other (you are sliding along a fixed ROC), and it identifies when you can lower both (only by improving sensitivity to a better ROC). A reasoner who has only the type-I/type-II framework knows there is a trade-off but cannot tell a sensitivity problem from a criterion problem — cannot say whether a high false-alarm rate calls for a better instrument or merely a moved cutoff. SDT's whole diagnostic leverage is exactly that separation, which the error-types pair, taken alone, does not provide.

A second confusion is with hypothesis_testing_null_vs_alternative, since both decide between two states from noisy evidence and both involve a threshold. The difference is in how the threshold is treated and what is held fixed. Null-hypothesis significance testing conventionally fixes the type-I error rate (the significance level α) and asks whether the evidence crosses it to reject the null — the cutoff is set by convention, and the alternative's discriminability often goes uncharacterized. SDT treats the cutoff as a free policy variable to be chosen by the error costs and the base rate (via the likelihood-ratio criterion), and it separately characterizes the evidence's discrimination capacity across the entire range of possible cutoffs (the whole ROC), not just at one significance level. Where NHST asks "is there enough evidence to reject at this fixed level?", SDT asks "given these two distributions, what is the best operating point, and is the ROC itself good enough?" A reasoner who fuses them will treat the decision threshold as a fixed convention (α = 0.05) when SDT shows it should move with costs and base rates, and will summarize a detector by a single significance test when its full performance is an ROC.

A third worthwhile contrast is with calibration, because both concern the quality of a decision system and both surface in evaluating classifiers and judgments. But they measure different things. Calibration asks whether the probabilities a system outputs match observed frequencies — when it says 70%, does the event happen 70% of the time? SDT's sensitivity asks whether the system can discriminate the two states at all — how far apart the evidence distributions sit — and its criterion asks where the decision cutoff is placed. These come apart sharply: a detector can be excellently calibrated yet have poor discrimination (its probabilities are honest but uninformative, hovering near the base rate), and a detector can discriminate beautifully yet be badly calibrated (its rankings are perfect but its probability scale is distorted). The practitioner consequence is that calibration problems are fixed by re-scaling outputs (Platt scaling, isotonic regression) while SDT problems are fixed by improving the evidence (sensitivity) or repricing the cutoff (criterion). Confusing them leads to re-calibrating a system whose real deficit is discrimination, or rebuilding the evidence pipeline when the rankings were fine and only the probability scale was off.

These distinctions matter because each neighbor obscures a different lever. Confusing SDT with the type-I/type-II pair loses the sensitivity-versus-criterion factorization that tells you which fix to apply; confusing it with null-hypothesis testing freezes a cutoff that should move with costs and base rates; and confusing it with calibration aims a probability-rescaling remedy at a discrimination or criterion problem. SDT's distinctive contribution — sensitivity fixes the achievable error trade-off and criterion only redistributes errors along it, so every performance complaint is one or the other — is precisely what none of these neighbors supplies alone.

Solution Archetypes¶

No catalogued solution archetypes reference this prime yet.