Partition Dependence of Aggregates¶

Prime #: 1054
Origin domain: Mathematics And Formal Systems
Subdomain: aggregation and coarse graining → Mathematics And Formal Systems

Core Idea¶

Partition dependence of aggregates is the structural commitment that any statistic computed on partition-aggregated fine-grained data is a function of the partition itself, not solely of the underlying data. Whenever an analytical pipeline collapses many point-level observations into a smaller number of partition-defined units and computes statistics on those units, the partition is a non-neutral input to the result. Repartitioning the same data shuffles variation between within-unit and between-unit components, and every observable defined on the partition — means, correlations, regression coefficients, inequality indices, modular structures, trend slopes — moves accordingly. No partition is statistically privileged a priori; the analyst's choice of partition is a free parameter that enters the output as much as the data does.

The structural force is geometric and substrate-independent. Given a base measure space and a partition into cells, the coarsened variable (the conditional expectation given the partition) collapses within-cell variation and preserves between-cell variation, so any statistic computed on the coarsened variable is sensitive to where the cell boundaries fall and how many cells there are. This is not measurement error, not sampling artifact, not model misspecification — it is intrinsic to the aggregation step. The analyst who treats partition-dependent results as if they were partition-free findings about the underlying data has silently substituted a property of the partition for a property of the world. The pattern decomposes into two recognisable effects: a scale effect (results change when the aggregation level changes — finer or coarser bins, smaller or larger units, shorter or longer time windows) and a zoning effect (results change when units of the same scale are drawn with different boundaries). In its most dramatic form the partition reverses the sign of an inferred relationship (Simpson's paradox); in its generic form it produces quantitative partition-sensitivity even without sign reversal.

How would you explain it like I'm…

The Sorting-Boxes Trick

Imagine you have a big pile of marbles and you sort them into boxes, then count the average in each box. If you sort them into different boxes, the averages come out different even though you never added or took away a single marble. So the boxes you pick change the answer, not just the marbles.

Grouping Changes The Answer

When you have lots of little pieces of data and you bunch them into groups, then measure things like averages or trends, your answer depends on how you drew the groups. Make the groups bigger or smaller, or slide where the lines between them fall, and the numbers shift. Nobody added new information, you just regrouped the same stuff. So whenever someone groups data first and measures second, the grouping is secretly part of the answer.

Partition-Dependent Statistics

Partition Dependence of Aggregates says that any number you compute after lumping fine-grained data into groups is partly a fact about the groups, not purely a fact about the data. Regrouping the same data moves variation between the within-group part and the between-group part, so means, correlations, regression slopes, and inequality measures all shift. This is not measurement error or a sampling fluke, it is built into the act of lumping. Two effects show up: a scale effect when you change how big the groups are, and a zoning effect when you redraw boundaries at the same size. In its most dramatic form a relationship can even flip sign, which is Simpson's paradox.

Partition Dependence of Aggregates is the structural claim that any statistic computed on partition-aggregated data is a function of the partition itself, not solely of the underlying observations. When a pipeline collapses point-level data into a smaller set of partition-defined units and computes on those units, the partition is a non-neutral input. The mechanism is geometric: the coarsened variable is the conditional expectation given the partition, which discards within-cell variation and keeps between-cell variation, so any statistic on it is sensitive to where the cell boundaries fall and how many cells there are. No partition is statistically privileged a priori, so the analyst's choice is a free parameter entering the output as much as the data does. The pattern splits into a scale effect, where results change with aggregation level, and a zoning effect, where results change as same-size units are drawn with different boundaries. Treating partition-dependent results as partition-free findings silently substitutes a property of the partition for a property of the world. Its sharpest form is sign reversal under Simpson's paradox; its generic form is quantitative partition-sensitivity without reversal.

Structural Signature¶

the fine-grained base data — the imposed partition into cells — the aggregation (coarse-graining) step — the within-versus-between variance split the partition fixes — the observable computed on the coarsened units — the partition-as-free-parameter invariant (scale effect and zoning effect)

A result exhibits partition dependence of aggregates when each of the following holds:

Fine-grained base data. There is an underlying set of point-level observations on a base measure space, prior to any grouping.
An imposed partition. The data is divided into cells — areal units, time windows, histogram bins, clusters, categories, districts — by a partition that is an analyst's choice, not a property of the data.
An aggregation step. Many point-level observations are collapsed into a smaller number of partition-defined units, replacing each cell's contents with a summary (the conditional expectation given the partition).
A variance split fixed by the partition. Coarse-graining collapses within-cell variation and preserves between-cell variation; the partition determines how total variation divides into within and between components. This split is the formal engine of the dependence.
An observable on the coarsened units. A statistic — mean, correlation, regression coefficient, inequality index, modularity, trend slope — is computed on the aggregated units.
A partition-as-free-parameter invariant. The observable is a function of the partition as well as the data: it moves under a scale effect (changing the number/size of cells) and a zoning effect (redrawing boundaries at the same scale), up to and including sign reversal (Simpson's paradox).

The dependence is intrinsic to the aggregation step — not measurement error, sampling artifact, or misspecification. Treating a partition-dependent result as a partition-free finding silently substitutes a property of the partition for a property of the world; the matched response is variance decomposition plus sensitivity analysis across the refinement lattice of alternative partitions.

What It Is Not¶

Not aggregation. Aggregation is the operation of summarising many observations into fewer. This prime is the structural consequence — that the operation's output depends on how the partition is drawn. The operation is the verb; the prime is a fact about the verb's output.
Not the modifiable_areal_unit_problem. MAUP is the spatial-specific child; this prime is the substrate-general umbrella covering time windows, histogram bins, network resolution, clustering, and price baskets, not only areal units.
Not simpsons_paradox. Simpson's paradox is the sign-reversal limiting case; this prime covers quantitative partition-sensitivity even when no sign reversal occurs.
Not the ecological fallacy. The ecological fallacy is the downstream inferential trap of reading individual-level relationships off aggregated statistics; this prime is the upstream structural fact about the partition that makes that trap possible.
Not selection_bias or sampling artifact. Those concern which data entered; partition dependence is intrinsic to how the data, once entered, is grouped — it persists in a perfect census with no sampling at all.
Common misclassification. Reporting a partition-dependent statistic as a partition-free finding about the world ("income predicts disease"). If the correlation flips across census tracts, counties, and ZIP codes, the headline describes the lines drawn, not the data.

Broad Use¶

The partition-as-co-determinant shape recurs wherever fine-grained data is aggregated and statistics are computed on the aggregated units. In spatial statistics and geography, the modifiable areal unit problem is the canonical demonstration: census-tract, ZIP-code, school-district, and watershed analyses yield substantially different correlations and inequality indices on identical point-level data. In time-series and macroeconomics, the choice of accounting period (quarterly versus annual, fiscal versus calendar) changes apparent volatility, growth rates, and regression significance. In histograms and density estimation, the same continuous data displays as unimodal, bimodal, or skewed under different bin widths or kernel bandwidths. The pattern recurs in network science (modular structure is partition-dependent, with resolution-parameterised community detection exposing different module sets on the same graph), in causal inference (Simpson's paradox as the sign-reversal symptom of collapsing or un-collapsing a covariate), in political districting (gerrymandering as deliberate exploitation of the zoning effect, identical voter distributions yielding different outcomes under different maps), in epidemiology (disease rates and exposure correlations depending on the geographic units and time windows), in price indices (the same prices yielding different inflation rates under different basket weightings), in machine-learning clustering (the same dataset yielding different clusters under different K, distance metrics, or seeds), in cognitive categorisation (frequency and similarity judgments depending on category granularity), and in accounting (cost-allocation rules and reporting-period length changing reported performance). The pattern shows up across most of applied statistics, much of empirical social science, all of spatial and temporal analysis, and any process that builds categories from continuous reality.

Clarity¶

Partition dependence clarifies by separating properties of the data from properties of the partition. Conventional reporting tends to present partition-aggregated statistics as if they described the underlying phenomenon; the prime makes the partition visible as a co-determinant of the result, on equal analytical footing with the data. Once named, the analyst's report acquires a new obligation: which partition did I use, why, and how do my conclusions move under plausible alternatives? This converts an implicit and unexamined modelling choice into an explicit and defensible one.

The clarifying force extends to revealing the structural unity of a family of phenomena that have traditionally been catalogued separately — the modifiable areal unit problem in geography, Simpson's paradox in epidemiology, periodisation effects in macroeconomics, resolution limits in network science, bin-choice in histograms, K-choice in clustering, basket-weight choice in price indices, gerrymandering in politics. These are all instances of the same structural pattern, and the substrate-specific labels obscure the unity; naming the umbrella makes it visible that the same diagnostic should not have a different prime in each domain. The prime also distinguishes itself carefully from neighbours. It is not aggregation, the operation of summarising many observations into fewer; it is the structural consequence that the operation's output depends on how the partition is drawn — the operation is the verb, the prime is the structural fact about the verb's output. It is the umbrella over the spatial-specific modifiable areal unit problem and over the sign-reversal-specific Simpson's paradox, covering quantitative partition-sensitivity even when no sign reversal occurs. And it is distinct from the downstream ecological fallacy (the inferential trap of reading point-level relationships off aggregated statistics), which is the trap the prime's upstream structural fact makes possible. Holding these apart keeps the prime from being mistaken for the operation, for one of its instances, or for its downstream consequence.

Manages Complexity¶

The prime compresses a sprawling family of "watch out for X" methodological warnings into a single diagnostic checklist that ports across domains. Is there an aggregation step in the pipeline? What partition does it use — scale, boundaries, bin widths, time windows, category definitions? Is the partition chosen substantively (theory-driven, prior to seeing results) or empirically (chosen to produce a target result)? What is the sensitivity of the reported conclusions to plausible alternative partitions? And does the substantive interpretation survive at the limit of point-level data, or only at the chosen partition? A single checklist replaces a long list of domain-specific gotchas — gerrymandering, bin-choice, period-choice, cluster-K, resolution-parameter, basket-weight — with one structural understanding and one mitigation pattern: sensitivity analysis across alternative partitions, plus explicit substantive justification for the partition chosen.

The compression is operational because the prime sits at a clean formal location that supplies the reasoning tools. Any statistic defined on the partition-coarsened representation is a function of the partition as well as the data, and the total variation in the underlying variable splits into within-partition and between-partition components whose split the partition determines — so any partition-dependent statistic is essentially reading that split, and making the variance decomposition explicit is the first managing move. For some statistics one can compute the range of values taken across all partitions at a given scale, bounding what the data permits and treating the chosen partition's value as a point in that range. The choice of partition can be read as a Bayesian prior about which sources of variation are signal and which are noise, surfacing an implicit commitment. And where point-level data is available, computing the statistic at the point-level limit and comparing it to the aggregated values is informative. Because these moves act on a single formal object, the prime turns an open-ended methodological worry into a bounded set of decompositions and sensitivity analyses.

Abstract Reasoning¶

Partition dependence trains a reasoner to interrogate any aggregated statistic through the role of the partition. The reasoner asks: where in the pipeline does aggregation occur, and what partition does it impose? How does the total variation split between within-cell and between-cell components under this partition? And how would the statistic move under plausible alternative partitions at the same and different scales? Because these questions reference only the abstract roles — fine-grained base data, partition, aggregation step, observable — they apply to a spatial analysis, a time series, a network, or a clustering without translation, and the same reasoning that handles the modifiable areal unit problem handles periodisation effects and resolution limits.

Several reusable moves follow. The variance-decomposition move treats any partition-dependent statistic as a reading of the within-versus-between split, so the reasoner makes the split explicit rather than treating the statistic as a property of the data alone. The sensitivity-bound move treats the chosen partition's value as one point in the range the data permits across partitions at a given scale, so the reasoner reports a range rather than a single anchored point estimate. The partition-as-prior move surfaces the implicit commitment about which variation is signal, so the reasoner can defend or revise it. The refinement-lattice move organises sensitivity analyses by the lattice of partitions ordered by refinement, since coarsening averages within-cell variation away while refining preserves it. And the inversion move compares the aggregated statistic to its point-level limit where point-level data exists, with divergence informative and agreement reassuring — the formal content of when the ecological-fallacy inversion fails. The same reasoning that tells a geographer to test correlations across spatial units tells a macroeconomist to test a returns series across accounting periods, because both are reasoning about a statistic as a function of the partition.

Knowledge Transfer¶

The transferable content of the prime is the partition–data separation together with the sensitivity-analysis intervention family. Wherever fine-grained data is aggregated for analysis, the prime applies and the same intervention family ports with minor substrate-specific adaptation: report the partition explicitly, justify the choice substantively rather than by the result it produces, test sensitivity to alternatives, and present the partition-uncertainty alongside the data-uncertainty rather than a single anchored point estimate. The transfer has moved historically from spatial statistics into epidemiology (cluster detection, rate comparison across administrative levels), network science (resolution-limit literature, multi-resolution community detection), causal inference (Simpson's paradox, the distinction between conditioning on and marginalising over partition variables), machine learning (clustering stability, ensemble clustering, consensus partitions), macroeconometrics (time-aggregation bias, temporal disaggregation), and political science (mathematical redistricting, compactness criteria).

The transfer is deep because the intervention family is recognisably the same in each substrate. A public-health agency studying neighbourhood income and chronic-disease prevalence makes the partition-dependence concrete: aggregating individual records to census tracts yields a strongly negative income–prevalence correlation, to counties a weakly positive one, to ZIP codes a near-zero one, while the point-level relationship is moderately negative — so the report can support markedly different policy conclusions depending on which spatial unit it uses. The diagnostic identifies all of these as the same data viewed through different partitions: the scale effect operates because finer partitions preserve within-county heterogeneity the county aggregation smooths away, and the zoning effect operates because different boundary definitions at the same scale allocate variance differently. The identical structural pattern appears in a macroeconomist's choice of accounting period (the same returns series mean-reverting at one period and momentum at another), a network scientist's choice of resolution parameter (four communities at one resolution and twenty at another), a redistricting commission's choice of boundaries (different partisan outcomes from the same voter distribution), and a clustering analyst's choice of K (different segment structures from the same database). Because the intervention family — report, justify, sensitivity-test, present the range — is substrate-neutral, a practitioner who has applied it in one domain can apply it in another on first contact, and the strip-the-jargon form ("when you aggregate data by drawing lines around it, what you see depends on where the lines are") does load-bearing work across geography, epidemiology, macroeconomics, network science, causal inference, clustering, gerrymandering, price-index construction, accounting, and cognitive categorisation.

Examples¶

Formal/abstract¶

Simpson's paradox is the pattern in its sharpest, sign-reversing form, and it isolates the variance-split engine exactly. Take fine-grained base data on individuals, each with a treatment indicator and an outcome. Computed at the point level, the treatment is associated with a better outcome. Now impose a partition by collapsing a confounding covariate — say, aggregate over disease severity so that mild and severe cases are pooled into one cell. The aggregation step replaces each cell's contents with its conditional mean, and the variance split fixed by the partition now mixes a between-group difference (severe cases both received the treatment more often and had worse outcomes) into the treatment-outcome observable. The result is a sign reversal: the aggregated correlation shows the treatment associated with a worse outcome, the exact opposite of the within-group relationship. This is not measurement error or a bad sample — it is the intrinsic consequence of the partition, demonstrable by the algebra of conditional expectations: the marginal association is a severity-weighted blend of the conditional associations plus a term driven entirely by how treatment and outcome each vary between the pooled groups. The partition-as-free-parameter invariant is starkly visible here because un-collapsing the covariate (refining the partition to separate severity strata) restores the true direction. The matched response is the prime's: decompose the variance into within-stratum and between-stratum components, and compare the aggregated statistic against its point-level limit, where divergence flags exactly this trap.

Mapped back: The individual records are the fine-grained base data, pooling over severity is the imposed partition, the severity-weighted blend is the variance split the partition fixes, and the reversed correlation is the observable on coarsened units — partition dependence in its sign-reversing limiting case, with the within-versus-between split as the formal engine.

Applied/industry¶

A public-health agency and a redistricting commission run the identical structure in unrelated applied substrates. The agency studies neighbourhood income and chronic-disease prevalence: aggregating individual records to census tracts yields a strongly negative income-prevalence correlation, to counties a weakly positive one, to ZIP codes a near-zero one, while the point-level relationship is moderately negative. The same data, four partitions, four substantive conclusions — and a policy report could defensibly support markedly different targeting depending on which spatial unit it silently chose. The diagnostic names all four as one dataset through different partitions: the scale effect operates because finer partitions preserve within-county heterogeneity that county aggregation smooths away, and the zoning effect operates because different boundary definitions at the same scale allocate variance differently. The matched intervention is the substrate-neutral family — report the partition explicitly, justify it substantively rather than by the correlation it yields, sensitivity-test across the refinement lattice, and present the range of correlations the data permits rather than one anchored estimate. A redistricting commission exploits the same zoning effect deliberately: an identical voter distribution, partitioned into districts under different boundary maps, yields different partisan seat outcomes — gerrymandering is partition dependence weaponised, and the defence (compactness criteria, sensitivity to alternative maps) is the same intervention family used to constrain rather than exploit the free parameter. An epidemiologist guarding against a spurious income gradient and a districting analyst auditing a map for partisan bias are applying one diagnostic to one structural fact.

Mapped back: Individual health records and individual voters are fine-grained base data; census tracts/counties/ZIP codes and district maps are imposed partitions; the shifting correlation and the shifting seat count are observables that move with the partition under the scale and zoning effects — the same prime in epidemiology and electoral geography, defended by the same report-justify-sensitivity-test response.

Structural Tensions¶

T1 — Property of the Data versus Property of the Partition (scopal). The whole point is that an aggregated statistic depends on the partition as well as the data, so it is not a clean property of the underlying world. The characteristic failure is reporting a partition-dependent result as a partition-free finding — "income predicts disease" — silently substituting a feature of the chosen units for a feature of reality. The diagnostic is to ask whether the conclusion would survive a different partition: if the correlation flips sign across census tracts, counties, and ZIP codes, the headline describes the lines drawn, not the data, and must be reported as partition-conditional.

T2 — Scale Effect versus Zoning Effect (scopal). Partition dependence has two distinct components that move results separately: the scale effect (changing the number or size of cells) and the zoning effect (redrawing boundaries at the same scale). The failure is testing robustness against only one — varying aggregation level while holding boundaries fixed, or vice versa — and declaring stability that the untested dimension would destroy. The diagnostic is to perturb both independently: a result stable across scales can still be gerrymander-fragile at fixed scale, so sensitivity analysis must sweep cell size and boundary placement to characterise the full partition-dependence.

T3 — Within-Cell versus Between-Cell Variance (scalar / local-global). The formal engine is that coarse-graining collapses within-cell variation and preserves between-cell variation, so any aggregated statistic reads a partition-fixed split of total variance. The failure is interpreting a between-cell relationship as if it held within cells (or at the individual level) — the ecological-fallacy inversion — because the partition has hidden the within-cell structure. The diagnostic is to make the variance decomposition explicit and, where point-level data exists, compare the aggregated statistic to its point-level limit: divergence between the two flags that the partition is averaging away the variation the interpretation depends on.

T4 — Substantive Partition versus Result-Driven Partition (sign/direction). A partition can be chosen substantively (theory-driven, fixed before seeing results) or empirically (selected because it produces a target number), and the two have opposite epistemic standing. The failure is the result-driven choice masquerading as substantive — picking the bin width, period, or boundary map that yields the desired correlation and presenting it as neutral. The diagnostic is to ask whether the partition was committed before the statistic was computed; a partition selected to produce its result is gerrymandering whether the substrate is a district map or a histogram, and its output reflects the analyst's target, not the data.

T5 — Aggregation Operation versus Its Structural Consequence (scopal). The prime is not aggregation (the operation of summarising) but the structural fact that the operation's output depends on how the partition is drawn — the verb versus a fact about the verb's output. The failure is treating aggregation as a neutral, lossless reduction whose result inherits the data's authority, missing that the summarising step injects the partition as a co-determinant. The diagnostic is to locate the aggregation step in the pipeline and ask what was chosen there: every collapse of point-level data into units is a partition commitment, and a pipeline that reports the aggregate without surfacing that choice has buried a free parameter.

T6 — Single Point Estimate versus Partition-Induced Range (measurement). For many statistics the data permits a range of values across admissible partitions at a given scale, so the chosen partition's value is one point in that range rather than the answer. The failure is reporting a single anchored estimate with data-uncertainty bands while omitting the often-larger partition-uncertainty, overstating precision. The diagnostic is to compute or bound the statistic across the refinement lattice of alternative partitions and present the resulting range alongside the sampling error: a point estimate that ignores how far the number moves under reasonable repartitioning misrepresents what the data actually determines.

Structural–Framed Character¶

Partition dependence of aggregates sits firmly at the structural end of the structural–framed spectrum, consistent with its structural label and aggregate of 0.0. It is a bare measure-theoretic fact — any statistic computed on partition-aggregated data is a function of the partition, not solely of the underlying data — and every diagnostic reads structural.

No home vocabulary travels with it: the same coarse-graining fact is recognised as the modifiable areal unit problem in spatial statistics, binning sensitivity in histograms, window choice in time-series, resolution dependence in network community detection, Simpson's paradox in causal inference, and gerrymandering sensitivity in political districting — each told in its own field's words, with the scale-effect/zoning-effect decomposition describing the same geometric fact under all of them (vocab_travels 0). It carries no inherent approval or disapproval: a partition-dependent result is neither good nor bad, only partition-relative; the prime is explicit that this is intrinsic to aggregation, not error or misconduct (evaluative_weight 0). Its origin is mathematical and measure-theoretic, statable purely in terms of a base measure space, a partition into cells, and the within-versus-between variance split, with no appeal to human institutions (institutional_origin 0). It runs indifferently across any substrate where fine-grained data is coarse-grained — pixels into regions, events into time windows, nodes into communities — requiring no human practice to obtain (human_practice_bound 0). And invoking it merely recognises a sensitivity already wired into the aggregation step rather than importing an interpretive frame; the partition is a free parameter whether or not the analyst notices (import_vs_recognize 0). On every criterion it reads structural, with no inherited frame beneath the formal coarse-graining skeleton.

Substrate Independence¶

Partition dependence of aggregates is a maximally substrate-independent prime — composite 5 / 5 on the substrate-independence scale. Its domain breadth is total: the partition-as-co-determinant fact recurs, recognised rather than translated, across spatial statistics (the modifiable areal unit problem), time-series and macroeconomics (accounting-period choice), histograms and density estimation (bin width and bandwidth), network science (resolution-dependent community detection), causal inference (Simpson's paradox as the sign-reversal symptom), political districting (gerrymandering), epidemiology, price indices, machine-learning clustering, cognitive categorisation, and accounting — anywhere fine-grained reality is coarse-grained into units. Its structural abstraction is complete because the prime is a measure-theoretic fact about a base measure space, a partition into cells, and the within-versus-between variance split — carrying no domain content, so pixels into regions, events into time windows, and nodes into communities all instantiate the identical scale-effect/zoning-effect decomposition with no human practice required. Its transfer evidence is the strongest kind: the same geometric fact is proved and demonstrated across these fields with formal models that carry across — the MAUP decomposition, Simpson's reversal, gerrymandering's identical-voters-different-maps construction are the same theorem in different dress — and the prime is explicit that this is intrinsic to aggregation rather than error, so the diagnostic ports without modification. Recognised everywhere, translated nowhere, and grounded in a single measure-theoretic invariant, the composite of 5 is fully earned.

Composite substrate independence — 5 / 5
Domain breadth — 5 / 5
Structural abstraction — 5 / 5
Transfer evidence — 5 / 5

Relationships to Other Primes¶

Parents (1) — more general patterns this builds on

Partition Dependence of Aggregates presupposes Aggregation

The file: this prime is the structural CONSEQUENCE of the aggregation operation — that the operation's output depends on how the partition is drawn. Presupposes aggregation as the collapsing step (the verb; this is a fact about the verb's output). The 0.921 nearest is aggregation.

Path to root: Partition Dependence of Aggregates → Aggregation → Micro Macro Linkage

Neighborhood in Abstraction Space¶

Partition Dependence of Aggregates sits among the more crowded primes in the catalog (7^th percentile for distinctiveness): several abstractions describe nearly the same structure, so a description that fits it will tend to fit its neighbors too — transporting it usually means disambiguating within this family rather than landing on it exactly.

Family — Aggregation & Scale Artifacts (16 primes)

Nearest neighbors

Computed from structural-signature embeddings · 2026-06-14

Not to Be Confused With¶

The nearest confusion is with aggregation itself, and the distinction is the difference between an operation and a structural fact about that operation. Aggregation is the verb — the act of collapsing many point-level observations into a smaller number of partition-defined units and computing a summary on them. Partition dependence of aggregates is not that act but the consequence of it: that the summary so computed is a function of the partition as well as of the underlying data, so repartitioning the same data shifts the result. A reader who conflates the two treats aggregation as a neutral, lossless reduction whose output inherits the data's authority, missing that the summarising step injects the partition as a co-determinant on equal footing with the data. The discriminating move is to locate the aggregation step in the pipeline and ask what was chosen there — every collapse of point-level data into units is a partition commitment, and the prime is precisely the warning that this commitment governs the result. Naming aggregation describes what was done; naming partition dependence flags that what was done has an unexamined free parameter.

A second confusion is with the prime's own named children — the modifiable_areal_unit_problem and simpsons_paradox — each of which is a special case that the umbrella subsumes. MAUP is the spatial instance: results change when areal units are redrawn (zoning) or rescaled (scale), and it is catalogued in geography as if it were a geography-specific gotcha. Simpson's paradox is the sign-reversal limiting case: the partition does not merely perturb a statistic but flips the direction of an inferred relationship, the most dramatic symptom of the same engine. The prime's contribution is to recognise that these are not different phenomena needing different primes in each domain but one structural pattern indexed by substrate and severity — MAUP is partition dependence in space, periodisation effects are partition dependence in time, resolution limits are partition dependence in networks, and Simpson's paradox is the case where the partition-sensitivity is large enough to reverse a sign. Treating the prime as MAUP ties it wrongly to geography; treating it as Simpson's paradox restricts it to sign reversals and misses the generic quantitative sensitivity that occurs without any reversal. The discriminating fact is that the prime covers partition-sensitivity that merely moves a statistic, not only the spectacular cases that flip it.

A third confusion, downstream, is with the ecological fallacy. The ecological fallacy is an inferential trap: reading an individual-level relationship off an aggregated statistic, concluding something about persons from a correlation computed on groups. Partition dependence of aggregates is the upstream structural fact that makes that trap possible — because the partition fixes how total variation splits into within-cell and between-cell components, the between-cell relationship the aggregate reports need not match the within-cell or individual-level one. The relationship is cause-and-trap: the prime is the geometric fact about coarse-graining, and the ecological fallacy is one inferential error that fact enables. Conflating them mislocates the fix. The remedy for the ecological fallacy is "do not infer individual relationships from aggregates"; the remedy for partition dependence is broader — decompose the variance, sensitivity-test across partitions, and report the range — and it applies to every partition-dependent statistic, not only to the specific inferential leap from group to individual.

These distinctions matter because they fix what to report and what to remedy. An aggregation framing treats the result as a clean property of the data; a MAUP or Simpson framing ties the issue to one substrate or one severity; an ecological-fallacy framing addresses only the group-to-individual leap — whereas the prime's response (variance decomposition plus sensitivity analysis across the refinement lattice, presenting the partition-induced range) applies to any statistic computed on partition-aggregated data, whatever the substrate and whether or not a sign reverses.

Solution Archetypes¶

No catalogued solution archetypes reference this prime yet.