Correlated-Source Attribution Failure¶
Core Idea¶
Correlated-source attribution failure is the structural pattern in which an estimator — formal or informal — combines several information sources to attribute an observed effect to its inputs, but those sources share underlying variation, so the estimator's joint inference can remain strong while attribution to any individual source becomes unstable, contradictory, or manipulable. Identification of distinct sources depends on those sources varying independently; when they share variation, individual identifiability collapses without the joint predictive content being affected at all. The pathology is that the system continues to look well-determined at the aggregate level — the prediction is good, the joint likelihood is high — even as the per-source attributions oscillate wildly, swap signs on resampling, or carry error bars wide enough to span any plausible interpretation. Naive consumers read sharp single-source claims ("witness A's testimony was decisive," "ad spend is the driver") that the underlying data cannot support.
The structural core is geometric. Identifying N sources requires N independent dimensions of input variation; if the inputs span only K < N dimensions, only K joint contrasts are identified, and the remaining N − K dimensions of attribution lie in a null space where any assignment is consistent with the data. The joint fit lives in the column space, which the data constrain; the marginal attribution lives in the basis chosen within that space, which the data do not constrain. This holds for any procedure that decomposes an observed effect into contributions of correlated inputs — regression coefficients, feature-importance scores, a Bayesian posterior over input weights, or a tribunal's per-witness credibility assignment all inherit the same non-identifiability. The two things naive practice fuses — joint inferential strength and marginal identifiability of each input — are structurally distinct, and the failure is the silent collapse of the second while the first stays healthy.
How would you explain it like I'm…
Two Friends, One Project
Good Total, Shaky Blame
Strong Whole, Unstable Parts
Structural Signature¶
several information sources — shared variation across them — an estimator decomposing an observed effect into per-source contributions — the constrained joint inference (column space) — the unconstrained marginal attribution (null space within it) — the dimensionality invariant (N attributions need N independent dimensions)
The pattern is present when each of the following holds:
- Multiple sources. Several inputs — regressors, sensors, witnesses, raters, ensemble members, mediators — jointly bear on an observed effect.
- Shared variation. The sources do not vary independently; they share underlying variation (collinearity, common noise, common briefing, shared rubric or training data).
- A decomposing estimator. A formal or informal procedure attributes the effect to individual sources — coefficients, importance scores, posterior weights, per-witness credibility.
- Strong joint inference. The bundle's predictive content lives in the column space the data constrain, so the aggregate fit, fused estimate, or corroborated chorus stays healthy.
- Collapsed marginal identifiability. Per-source attribution lives in the basis chosen within that space, which the data do not constrain; it becomes unstable, sign-flipping on resampling, or arbitrary.
- The dimensionality invariant. Identifying N sources requires N independent dimensions of input variation; if the inputs span only K < N, the remaining N − K dimensions admit any assignment consistent with the data.
The components compose so that joint inferential strength and marginal identifiability — which naive practice fuses — come apart: a strong joint signal carries no information about whether the per-source decomposition is determined. The diagnostic redirects from "is my model wrong?" to "are my sources sharing variation?", and the warning sign is counterintuitive — confidence in the joint result is highest exactly when the marginal attributions have hollowed out.
What It Is Not¶
- Not correlation.
correlationis the measure of association between sources; correlated-source attribution failure is the consequence of that association — the collapse of per-source identifiability while joint inference stays strong. Correlation is the cause; the attribution failure is the structural effect. - Not responsibility attribution.
responsibility_attributionassigns moral or causal credit/blame to agents; this prime is the statistical non-identifiability of per-source contributions when inputs share variation, a measurement structure not a normative judgement. - Not triangulation.
triangulationcombines independent sources to converge on a truth; correlated-source attribution failure is what happens when the sources are not independent — the corroboration is illusory because the shared variation hollows out each source's marginal weight. - Not selection bias.
selection_biasdistorts inference through non-random sampling; correlated-source attribution failure leaves the joint inference intact and corrupts only the attribution to individual sources, a different locus. - Not regression to the mean.
regression_to_the_meanis the tendency of extreme measurements to be followed by less extreme ones; this prime concerns the dimensionality of source variation, unrelated to extremity dynamics. - Common misclassification. Reading a confident-looking joint fit (high R², corroborated chorus) as warrant for a sharp single-source claim. Catch it by checking whether the per-source attribution survives resampling or swaps signs, and whether the sources span enough independent dimensions; a strong joint result is uninformative about marginal identifiability.
Broad Use¶
- Regression and econometrics: multicollinearity, the canonical case — correlated regressors leave the joint fit intact while each individual coefficient becomes unstable, with sign flips on resampling.
- Multi-sensor fusion: correlated noise across redundant sensors yields false precision in the joint estimate while leaving each sensor's calibration unconstrained, because the shared error contributes nothing to disagreement.
- Multi-witness testimony: witnesses who share a source — a briefing, an article, a rumour — provide weaker independent evidence than naive aggregation implies; the joint testimony sounds corroborated while each per-witness contribution is thin.
- Rater aggregation: raters trained on the same rubric, or who see each other's work, produce correlated scores that overstate inter-rater reliability and leave each rater's marginal contribution unidentifiable.
- Ensemble forecasting: an ensemble of similarly trained models has effective size smaller than its count; per-model feature attributions are weaker than per-model importance scores suggest.
- Causal mediation: when candidate mediators are correlated, total mediation is well-identified but per-mediator contributions cannot be separately quantified.
- Analytical aggregation: multiple analysts drawing on one underlying source give an illusion of corroboration no stronger than the source they share.
Clarity¶
The label separates two things that naive practice fuses: the joint inferential strength (how well the bundle predicts) and the marginal identifiability of each input (how confidently any one input can be credited). When attribution becomes unstable, the prime redirects the diagnostic question from "is my model wrong?" — which it usually is not — to "are my sources sharing variation I have not acknowledged?" The clarifying force is to relocate the problem from the estimator to the source structure: the estimator is doing exactly what it should with the variation it is given, and the failure is that the variation does not span enough independent dimensions to support the per-source claims being read off it. Naming the pattern also exposes a recurring rhetorical trap — that a high joint R², a confident-sounding bundle, or a corroborated-feeling chorus is routinely cited as evidence for a single-source claim that the data's dimensionality cannot warrant. Once the distinction is named, the right move (interrogate source independence) becomes obvious where re-tuning the estimator would have been the reflexive, wasted response.
Manages Complexity¶
The pattern compresses the per-substrate vocabularies of multicollinearity, common-method bias, common-cause inflation, ensemble redundancy, and group-think corroboration into one shape with a small, portable intervention catalogue. Decorrelate by design: instrumentation, randomisation, blinding, source-separation protocols, independent sampling, multi-trait/multi-method designs. Quantify the shared variance: variance-inflation-factor diagnostics, MTMM matrices, effective-ensemble-size metrics, source-overlap audits. Shift the attribution target: move from per-input claims ("witness A says…") to joint-bundle claims ("the testimony taken together implies…"), trading attribution sharpness for honest uncertainty. Regularise with explicit priors: ridge, lasso, or hierarchical pooling that trade unbiased per-source attribution for stable, modest pooled estimates. The compression is that these are the same four moves wearing different domain names, so an analyst who has learned them in one substrate does not re-derive them in the next. Each move trades a specific currency — attribution sharpness for stability, or independence-by-design for upstream cost — and naming the trade lets the analyst choose deliberately rather than defaulting to the estimator-tuning that the joint-fit health falsely seems to license.
Abstract Reasoning¶
The pattern follows from a dimensionality fact: attribution to N sources is identified only if the inputs supply N independent dimensions of variation; if they supply only K, the unidentified N − K dimensions admit any assignment consistent with the data. The prime trains a reasoner to separate the column space (joint fit, constrained) from the basis within it (marginal attribution, unconstrained), and to recognise that a strong joint signal carries no information about whether the per-source decomposition is determined. The diagnostic questions are: how many genuinely independent dimensions do my sources span? Does the per-source attribution survive resampling, or does it swap signs? Is what I want to claim a joint-bundle property or a marginal one? The non-obvious move the prime licenses is to distrust per-source confidence precisely when the joint inference looks healthiest, because the health of the joint fit is exactly what masks the marginal collapse. The reasoning generalises beyond linear estimators to any decomposition of an observed effect into contributions of correlated inputs, which is why the same caution applies to feature attributions, posterior weight estimates, and human credibility judgements alike.
Knowledge Transfer¶
Once recognised in one substrate, the diagnostic ports without redesign. The lab scientist who learns to check sensor cross-correlation before quoting per-channel uncertainty has acquired the same move the econometrician makes when checking variance-inflation factors and the psychometrician makes when crossing trait with method; the auditor who insists two reviewers be blinded to each other applies the same intervention as the forecaster who selects models from disjoint training corpora and the survey designer who varies item wording to break common-method variance. The role mappings transfer directly — correlated sources ↔ regressors / sensors / witnesses / raters / ensemble members / mediators; shared variance ↔ collinearity / common noise / common briefing / shared rubric / shared training data; joint inference ↔ fit / fused estimate / corroborated testimony / reliability; marginal attribution ↔ coefficient / per-sensor calibration / per-witness credibility / per-feature importance. The intervention vocabulary — assess source independence, decorrelate by design, shift the attribution target, regularise — is invariant; the only substrate-specific work is identifying what counts as a source and what mechanism creates the shared variance. The transferred and non-obvious prediction is that a confident-looking joint result is the warning sign, not the reassurance: the better the bundle predicts while the per-source story stays contested, the more likely the sources share variation that has hollowed out the individual attributions. This is also what links the prime to wisdom-of-crowds reasoning, which presupposes the very source independence the prime warns is violated — the prime is the structural explanation for why naive aggregation of correlated sources underdelivers exactly when it looks most corroborated.
Examples¶
Formal/abstract¶
Multicollinearity in linear regression is the prime's canonical worked instance, and it exhibits the column-space/null-space split with full geometric rigour. The multiple sources are two regressors x₁ and x₂ predicting an outcome y; the decomposing estimator is ordinary least squares, which attributes y's variation to per-source coefficients β₁ and β₂. Suppose x₁ and x₂ are highly correlated — shared variation. The strong joint inference is real: the fitted plane predicts y well, the joint R² is high, and predictions for new cases that respect the x₁–x₂ correlation are tight, because the predictive content lives in the column space spanned by the regressors, which the data constrain. But the marginal attribution collapses. Geometrically, when x₁ and x₂ nearly coincide, the data pin down only their sum's effect (the one well-identified direction in the column space) and leave the contrast between them — the direction along which β₁ rises as β₂ falls with the fit essentially unchanged — almost entirely unconstrained. This is the null space within the column space: any (β₁, β₂) trading off along the contrast is nearly equally consistent with the data, so the individual coefficients have enormous standard errors and flip sign on resampling. The dimensionality invariant states it exactly: identifying two coefficients requires two independent dimensions of input variation; correlated regressors supply effectively one, so one dimension of attribution lies in a null space admitting any assignment. The prime's counterintuitive diagnostic bites here — confidence in the joint fit is highest exactly when the marginal attributions have hollowed out, so a high R² is the warning sign, not the reassurance. The correct move is to interrogate the source structure (compute variance-inflation factors, check x₁–x₂ correlation) rather than re-tune the estimator, which is doing exactly what it should with the variation it was given.
Mapped back: Multicollinearity instantiates every role of the signature — correlated regressors as sources, OLS as the decomposing estimator, a constrained joint fit (column space), unconstrained sign-flipping coefficients (null space within it), and the dimensionality invariant — and shows the prime's central separation of joint inferential strength from marginal identifiability as a literal geometric fact.
Applied/industry¶
Multi-witness testimony in a legal tribunal and multi-sensor fusion in an engineered system are the same correlated-source-attribution-failure object on a forensic and an instrumentation substrate, and reading both through the prime redirects the diagnosis from the estimator to the source structure. In the testimony case the sources are several witnesses; the decomposing estimator is the tribunal's informal per-witness credibility assignment; and the shared variation arises when the witnesses drew on a common origin — the same briefing, the same news article, the same overheard rumour. The joint inference sounds powerful: a chorus of corroborating accounts feels like strong evidence. But because the accounts share their source, the marginal contribution of each is thin, and the bundle carries no more independent weight than the single shared origin — yet a naive fact-finder reads a sharp single-witness claim ("witness A's account was decisive") the dimensionality cannot support. The prime's intervention catalogue maps directly: decorrelate by design (sequester witnesses, take statements before they confer), quantify the shared variance (audit for a common source), and shift the attribution target from per-witness claims to an honest joint-bundle claim that discounts the shared origin. In the sensor case the sources are redundant sensors; the estimator is a fusion algorithm producing a joint estimate plus per-sensor calibration; and correlated noise across the sensors (a shared power supply, a common environmental disturbance) is the shared variation. The fusion yields false precision — the joint estimate looks tight because shared error contributes nothing to disagreement — while each sensor's individual calibration stays unconstrained. The same moves apply: decorrelate by design (independent power, diverse sensing modalities), quantify the shared variance (cross-correlation diagnostics before quoting per-channel uncertainty), and shift the target to joint claims. A lab engineer who has learned to check sensor cross-correlation before trusting per-channel calibration is making the identical move as the tribunal that sequesters witnesses before crediting any one.
Mapped back: Multi-witness corroboration and multi-sensor fusion are the same structural object as multicollinearity — correlated sources whose shared variation leaves a strong joint inference intact while hollowing out per-source attribution — so in each the warning sign is a confident-looking joint result, and the fix is to interrogate and break source independence rather than re-tune the estimator.
Structural Tensions¶
T1 — Joint Inference versus Marginal Attribution (Scopal). The prime's whole force is splitting these two, but the split is itself the standing hazard: a result that is rock-solid as a joint-bundle claim is read as a per-source claim it cannot support, and conversely a genuinely identified marginal effect is dismissed because "everything's collinear." The failure mode is mismatching the claim's scope to what the data constrain. Diagnostic: ask whether the assertion is about the bundle's prediction (constrained, often fine) or about one source's contribution (unconstrained under sharing); the same dataset licenses the former and forbids the latter, and conflating them is the error in both directions.
T2 — Confident Joint Fit as Reassurance versus Warning (Sign/Direction). The prime inverts the naive reading — high joint R² is the warning sign, not comfort. But over-applied, this breeds reflexive distrust of every strong joint result as secretly hollow. The failure mode is treating predictive strength as evidence of attribution collapse, when many strong joint fits rest on genuinely independent sources. Diagnostic: the joint fit's health is simply uninformative about marginal identifiability, not negatively correlated with it; check source independence directly (VIFs, cross-correlation) rather than inferring collapse from confidence, since a strong fit with independent inputs is exactly as trustworthy as it looks.
T3 — Decorrelate-by-Design versus Destroying the Signal (Coupling). The headline fix — make sources vary independently — can remove the very co-variation that carried the joint signal, or be impossible because the sources are correlated for a real reason (the witnesses saw the same event; the sensors track one true state). The failure mode is forcing independence that fractures a genuine common cause, trading non-identifiability for a model of something that no longer exists. Diagnostic: ask whether the shared variation is nuisance (common briefing, shared rubric — break it) or signal (a real common cause the sources correctly reflect — model it); where it is signal, the move is to model the shared factor explicitly, not to decorrelate it away.
T4 — Dimensionality Invariant versus Effective Dimensionality (Measurement). The clean rule — N attributions need N independent dimensions — treats independence as binary, but real sources are partially correlated, occupying a fractional effective dimensionality that shifts with the sample. The failure mode is declaring attribution "identified" because the inputs are nominally distinct while their effective rank is far below their count, so the marginal estimates are fragile without being formally non-identified. Diagnostic: measure effective dimensionality (condition number, effective sample size, eigenvalue spread), not just whether sources are technically distinct; near-collinearity hollows attribution continuously, and the binary invariant misses the dangerous middle where estimates are unstable but not provably unidentified.
T5 — Regularisation as Cure versus Hidden Bias (Sign/Evaluation). Shifting to ridge, lasso, or hierarchical pooling stabilises the per-source estimates — but it does so by imposing a prior that picks a particular point in the null space the data left open, trading honest "we can't tell" for a confident, biased answer. The failure mode is reading a regularised coefficient as if the data identified it, when the stability came from the penalty, not the evidence. Diagnostic: ask whether the stable estimate reflects the data or the prior; under collinearity the regulariser is choosing the attribution, so a lasso that zeroes one of two correlated sources has made an arbitrary selection, and the apparent resolution must be reported as prior-driven, not data-driven.
T6 — Static Source Structure versus Drifting Correlation (Temporal). The diagnosis is usually run once, but source correlation drifts — regressors that were independent become collinear as a regime changes, raters who were blinded begin conferring, sensors that diverged start sharing a developing fault. The failure mode is certifying attribution as identified at setup, then reading per-source claims long after the sources quietly began sharing variation. Diagnostic: re-audit source independence on the timescale the correlation can change, not just at design; an attribution model validated under independent inputs silently hollows out when the sources drift into sharing, and feedback monitoring of the correlation structure, not a one-time VIF check, keeps the marginal claims honest.
Structural–Framed Character¶
Correlated-Source Attribution Failure sits on the structural side of the structural–framed spectrum — mixed-structural, with an aggregate of 0.3, the structural geometry clearly dominating. At its core is a dimensionality fact: identifying N sources requires N independent dimensions of input variation, so when sources share variation the joint inference (column space) stays constrained while marginal attribution (the null space within it) collapses. Two diagnostics read fully structural and three carry partial or zero weight that reflects the prime's home in estimation practice.
Two criteria read clean structural. It carries no inherent approval or disapproval (0.0): the non-identifiability is a value-neutral geometric fact, not a normative judgment — the prime is explicit that the estimator "is doing exactly what it should with the variation it is given." And its institutional_origin is 0.0: the column-space/null-space split is a property of linear algebra and statistics with no institutional minting; it holds wherever a decomposition meets correlated inputs.
Three criteria carry partial weight, which holds the aggregate at 0.3. The vocabulary travels only partway (0.5): "column space," "null space," "marginal identifiability," "variance inflation" port across regression, sensor fusion, multi-witness testimony, and ensembles, but an estimation lexicon comes along. The human_practice_bound and import_vs_recognize scores are partial (0.5 each) because the pattern presupposes an estimator or analyst — a practice of decomposing an observed effect into per-source contributions; the geometry is substrate-free, but it manifests as a failure only where someone is attempting attribution, so invoking it imports a modicum of that estimation frame. The honest reading is that the dimensionality skeleton is genuinely substrate-free — recurring identically across regression, fusion, testimony, and rating — which is why the aggregate sits low at 0.3; but it lives wherever an estimator does, giving it the mild estimation-practice framing the grade records.
Substrate Independence¶
Correlated Source Attribution Failure is a strongly substrate-independent prime — composite 4 / 5 on the substrate-independence scale. Its domain breadth is broad: the multicollinearity-style identifiability collapse, in which correlated sources cannot be told apart so credit or blame cannot be uniquely assigned, recurs in regression (collinear predictors), sensor fusion (correlated sensor errors), multi-witness testimony (witnesses who share a source), inter-rater agreement, model ensembles (correlated base learners), causal mediation analysis, and intelligence aggregation (reports tracing to one origin). Its structural abstraction is genuine: the signature — multiple sources, hidden correlation among them, and the resulting collapse of unique attribution — is stated relationally, without domain-specific commitments. What holds the composite below ceiling is that the instances cluster on an estimation-and-inference substrate: the failure has bite wherever one is trying to attribute a quantity to distinguishable sources, so the prime leans on measurement and inference settings rather than spanning physical or biological media indifferently. That estimation-cluster centre of gravity is what keeps domain breadth, structural abstraction, and transfer evidence each at a solid 4 and fixes the composite at a strong 4.
- Composite substrate independence — 4 / 5
- Domain breadth — 4 / 5
- Structural abstraction — 4 / 5
- Transfer evidence — 4 / 5
Relationships to Other Primes¶
Parents (2) — more general patterns this builds on
-
Correlated-Source Attribution Failure is a kind of Confounding
The island's canonical hub correlation is itself detached, so the bridge is drawn to a GIANT prime: correlated_source_attribution_failure (cand) is the failure where a hidden COMMON DRIVER couples supposedly-independent sources, hollowing out per-source attribution — structurally a confounding situation (a shared latent cause inducing spurious cross-dependence). confounding is canonical and giant. child_of confounding is directionally sound (this is confounding read as an attribution/ dimensionality failure) and bridges the cluster; cross_dimensional_leakage (single shared-variance source inflating apparent cross-covariance) is its near-twin and stays internal. Medium because the prime adds an attribution/identifiability angle beyond bare confounding rather than being a textbook sub-case.
-
Correlated-Source Attribution Failure presupposes Correlation
The downstream estimation PATHOLOGY that correlation produces: when sources share variation, joint inference stays strong while per-source attribution collapses into a null space. Presupposes correlation as its cause; the file is explicit that 'correlation is the cause, the attribution failure is the structural effect'.
Path to root: Correlated-Source Attribution Failure → Correlation
Neighborhood in Abstraction Space¶
Correlated-Source Attribution Failure sits among the more crowded primes in the catalog (18th percentile for distinctiveness): several abstractions describe nearly the same structure, so a description that fits it will tend to fit its neighbors too — transporting it usually means disambiguating within this family rather than landing on it exactly.
Family — Unclustered & Miscellaneous (91 primes)
Nearest neighbors
- Correlation — 0.76
- Diversification — 0.75
- Statistical Inference — 0.73
- Confounding — 0.73
- Modifiable Areal Unit Problem — 0.72
Computed from structural-signature embeddings · 2026-06-14
Not to Be Confused With¶
The nearest neighbour is correlation, and the two are related as cause to structural consequence. Correlation is the bare statistical fact that two or more sources co-vary — a property of the data measurable on its own. Correlated-source attribution failure is the downstream estimation pathology that this co-variation produces: when sources share variation, the joint inference stays strong (the column space the data constrain) while the marginal attribution to each source collapses into a null space where any assignment is consistent with the data. Correlation is necessary for the failure but is not the failure; one can observe correlation and reason about it correctly without ever attempting the per-source decomposition that the failure corrupts. The prime's content is the dimensionality argument — that identifying N sources requires N independent dimensions of input variation, and that shared variation hollows out attribution while leaving prediction intact — which bare correlation does not supply. A practitioner who stops at "the sources are correlated" notices the association; the attribution-failure prime tells them the precise, counterintuitive consequence — that their confident joint fit is exactly the warning sign that their per-source story has become unidentifiable.
Correlated-source attribution failure is also distinct from triangulation, with which it stands in near-opposition. Triangulation is the methodological strategy of combining independent sources, methods, or vantage points so that their convergence on a conclusion increases confidence — its entire validity rests on the sources being independent. Correlated-source attribution failure is precisely what happens when that independence assumption fails: the sources appear to corroborate, but because they share variation the apparent convergence carries no more weight than the single shared origin, and a chorus of agreeing witnesses who all read the same article is not triangulation but its illusion. The prime is, in effect, the structural explanation for why naive triangulation under-delivers exactly when it looks most corroborated. A practitioner who believes they are triangulating when the sources are correlated will over-trust a conclusion that rests on one shared input wearing several costumes; recognising the attribution failure is what flags the corroboration as hollow and redirects effort to interrogating and breaking source independence.
A third confusion is with selection_bias, because both are inference pathologies and both can produce misleading conclusions from data. But they corrupt different loci. Selection bias distorts inference through non-random sampling — the data that enter the analysis are systematically unrepresentative, threatening the validity of the conclusion itself. Correlated-source attribution failure leaves the joint inference perfectly valid (the bundle predicts well, the conclusion about the aggregate may be sound) and corrupts only the attribution to individual sources. The discriminating question is whether the problem is which data got in (selection) or how a valid joint signal is decomposed across correlated inputs (attribution failure). A practitioner who diagnoses an unstable coefficient as selection bias will scrutinise the sampling frame, when the joint fit is fine and the real issue is that the regressors share variation — a different problem requiring source-independence diagnostics (variance-inflation factors, effective dimensionality), not sampling correction.
For practitioners these distinctions decide the remedy. Stop at correlation and you note the association without grasping its attribution consequence. Mistake the situation for triangulation and you trust illusory corroboration. Mistake it for selection bias and you audit the sample when the sample was fine. Naming correlated-source attribution failure correctly relocates the problem from the estimator to the source structure, and points to the right move: assess and break source independence rather than re-tune the model, with the standing reminder that a confident-looking joint result is the warning sign, not the reassurance.
Solution Archetypes¶
No catalogued solution archetypes reference this prime yet.