Partition Dependence of Aggregates¶
Core Idea¶
Any statistic computed on partition-aggregated data is a function of the partition itself, not solely of the underlying data. Repartitioning shuffles variation between within-cell and between-cell components, so means, correlations, and slopes all move — the analyst's choice of partition is a free parameter that enters the output as much as the data does.
How would you explain it like I'm…
The Sorting-Boxes Trick
Grouping Changes The Answer
Partition-Dependent Statistics
Broad Use¶
- Spatial statistics: the modifiable areal unit problem — census-tract, ZIP-code, and watershed analyses yield different correlations on identical point data.
- Macroeconomics: choice of accounting period (quarterly vs. annual) changes apparent volatility and regression significance.
- Histograms: the same data displays as unimodal or bimodal under different bin widths.
- Network science: modular structure is resolution-dependent; different community sets emerge from one graph.
- Causal inference: Simpson's paradox as the sign-reversal symptom of collapsing a covariate.
- Political districting: gerrymandering as deliberate exploitation of the zoning effect.
- Epidemiology and price indices: rates and inflation depending on the units, windows, or basket weights chosen.
Clarity¶
It separates a property of the data from a property of the partition, and reveals that MAUP, Simpson's paradox, bin-choice, and gerrymandering are one structural pattern indexed by substrate and severity.
Manages Complexity¶
It replaces a long list of domain-specific gotchas with one diagnostic checklist and one mitigation: sensitivity analysis across alternative partitions plus explicit substantive justification.
Abstract Reasoning¶
It decomposes into a scale effect (changing cell size) and a zoning effect (redrawing boundaries at fixed scale), and treats any aggregated statistic as a reading of the within-versus-between variance split.
Knowledge Transfer¶
- Epidemiology: the spatial-statistics diagnostic carries to rate comparison across administrative levels.
- Machine learning: it carries to clustering stability and consensus partitions over choice of K.
- Political science: it carries to mathematical redistricting and compactness criteria.
Example¶
A health agency finds neighborhood income and disease prevalence strongly negative by census tract, weakly positive by county, and near zero by ZIP code, on the same individual records — four partitions, four defensible policy conclusions.
Relationships to Other Primes¶
Parents (1) — more general patterns this builds on
- Partition Dependence of Aggregates presupposes Aggregation — The file: this prime is the structural CONSEQUENCE of the aggregation operation — that the operation's output depends on how the partition is drawn. Presupposes aggregation as the collapsing step (the verb; this is a fact about the verb's output). The 0.921 nearest is aggregation.
Path to root: Partition Dependence of Aggregates → Aggregation → Micro Macro Linkage
Not to Be Confused With¶
- Partition Dependence is not Aggregation because aggregation is the operation of summarising many observations into fewer, whereas this prime is the structural consequence that the operation's output depends on how the partition is drawn.
- Partition Dependence is not the Modifiable Areal Unit Problem because MAUP is the spatial child, whereas this prime is the substrate-general umbrella covering time windows, bins, clusters, and baskets too.
- Partition Dependence is not Simpson's Paradox because Simpson's paradox is the sign-reversal limiting case, whereas this prime covers quantitative partition-sensitivity even when no sign reverses.