Partition Dependence of Aggregates¶

Prime #: 1054
Origin domain: Mathematics And Formal Systems
Subdomain: aggregation and coarse graining → Mathematics And Formal Systems

Core Idea¶

Any statistic computed on partition-aggregated data is a function of the partition itself, not solely of the underlying data. Repartitioning shuffles variation between within-cell and between-cell components, so means, correlations, and slopes all move — the analyst's choice of partition is a free parameter that enters the output as much as the data does.

How would you explain it like I'm…

The Sorting-Boxes Trick

Imagine you have a big pile of marbles and you sort them into boxes, then count the average in each box. If you sort them into different boxes, the averages come out different even though you never added or took away a single marble. So the boxes you pick change the answer, not just the marbles.

Grouping Changes The Answer

When you have lots of little pieces of data and you bunch them into groups, then measure things like averages or trends, your answer depends on how you drew the groups. Make the groups bigger or smaller, or slide where the lines between them fall, and the numbers shift. Nobody added new information, you just regrouped the same stuff. So whenever someone groups data first and measures second, the grouping is secretly part of the answer.

Partition-Dependent Statistics

Partition Dependence of Aggregates says that any number you compute after lumping fine-grained data into groups is partly a fact about the groups, not purely a fact about the data. Regrouping the same data moves variation between the within-group part and the between-group part, so means, correlations, regression slopes, and inequality measures all shift. This is not measurement error or a sampling fluke, it is built into the act of lumping. Two effects show up: a scale effect when you change how big the groups are, and a zoning effect when you redraw boundaries at the same size. In its most dramatic form a relationship can even flip sign, which is Simpson's paradox.

Partition Dependence of Aggregates is the structural claim that any statistic computed on partition-aggregated data is a function of the partition itself, not solely of the underlying observations. When a pipeline collapses point-level data into a smaller set of partition-defined units and computes on those units, the partition is a non-neutral input. The mechanism is geometric: the coarsened variable is the conditional expectation given the partition, which discards within-cell variation and keeps between-cell variation, so any statistic on it is sensitive to where the cell boundaries fall and how many cells there are. No partition is statistically privileged a priori, so the analyst's choice is a free parameter entering the output as much as the data does. The pattern splits into a scale effect, where results change with aggregation level, and a zoning effect, where results change as same-size units are drawn with different boundaries. Treating partition-dependent results as partition-free findings silently substitutes a property of the partition for a property of the world. Its sharpest form is sign reversal under Simpson's paradox; its generic form is quantitative partition-sensitivity without reversal.

Broad Use¶

Spatial statistics: the modifiable areal unit problem — census-tract, ZIP-code, and watershed analyses yield different correlations on identical point data.
Macroeconomics: choice of accounting period (quarterly vs. annual) changes apparent volatility and regression significance.
Histograms: the same data displays as unimodal or bimodal under different bin widths.
Network science: modular structure is resolution-dependent; different community sets emerge from one graph.
Causal inference: Simpson's paradox as the sign-reversal symptom of collapsing a covariate.
Political districting: gerrymandering as deliberate exploitation of the zoning effect.
Epidemiology and price indices: rates and inflation depending on the units, windows, or basket weights chosen.

Clarity¶

It separates a property of the data from a property of the partition, and reveals that MAUP, Simpson's paradox, bin-choice, and gerrymandering are one structural pattern indexed by substrate and severity.

Manages Complexity¶

It replaces a long list of domain-specific gotchas with one diagnostic checklist and one mitigation: sensitivity analysis across alternative partitions plus explicit substantive justification.

Abstract Reasoning¶

It decomposes into a scale effect (changing cell size) and a zoning effect (redrawing boundaries at fixed scale), and treats any aggregated statistic as a reading of the within-versus-between variance split.

Knowledge Transfer¶

Epidemiology: the spatial-statistics diagnostic carries to rate comparison across administrative levels.
Machine learning: it carries to clustering stability and consensus partitions over choice of K.
Political science: it carries to mathematical redistricting and compactness criteria.

Example¶

A health agency finds neighborhood income and disease prevalence strongly negative by census tract, weakly positive by county, and near zero by ZIP code, on the same individual records — four partitions, four defensible policy conclusions.

Relationships to Other Primes¶

Parents (1) — more general patterns this builds on

Partition Dependence of Aggregates presupposes Aggregation — The file: this prime is the structural CONSEQUENCE of the aggregation operation — that the operation's output depends on how the partition is drawn. Presupposes aggregation as the collapsing step (the verb; this is a fact about the verb's output). The 0.921 nearest is aggregation.

Path to root: Partition Dependence of Aggregates → Aggregation → Micro Macro Linkage

Not to Be Confused With¶

Partition Dependence is not Aggregation because aggregation is the operation of summarising many observations into fewer, whereas this prime is the structural consequence that the operation's output depends on how the partition is drawn.
Partition Dependence is not the Modifiable Areal Unit Problem because MAUP is the spatial child, whereas this prime is the substrate-general umbrella covering time windows, bins, clusters, and baskets too.
Partition Dependence is not Simpson's Paradox because Simpson's paradox is the sign-reversal limiting case, whereas this prime covers quantitative partition-sensitivity even when no sign reverses.