Distinctiveness and the Neighborhood Structure of Abstraction Space¶
Why some primes are easy to find and others get crowded out¶
A conceptual paper for the Encyclopedia of Abstractions project
Abstract¶
Two primes can be equally apt for a problem and yet differ enormously in how easily a reasoner finds them. The reason is geometric: the primes do not sit in isolation but in a space of neighbors, and some regions of that space are crowded while others are sparse. A prime in a crowded region — surrounded by near-synonyms that describe almost the same structure — is hard to pin down, because a description that fits it fits its neighbors too. A prime in a sparse region stands alone, so a faithful description lands on it precisely. This paper names that property distinctiveness (equivalently, neighborhood density), explains why it, rather than a prime's position on the structural–framed spectrum, is what governs cross-domain retrievability, and describes how the Encyclopedia measures it and surfaces it on each prime's page.
1. The question distinctiveness answers¶
The structural/framed distinction asks how a prime travels — whether it carries an interpretive frame with it. Distinctiveness asks a different and largely independent question: once a prime is the right answer, how easily can it be found?
These come apart in practice. Reasoning that starts from a concrete, far-domain situation has to recognize which abstraction is operating. If the operative pattern is recursion, a domain-neutral description of "a process defined in terms of smaller instances of itself" is just as true of iteration, self-similarity, and divide-and-conquer. The description lands in the right neighborhood but cannot single out the exact prime, because several primes share that neighborhood. By contrast, a description of sovereignty — final authority within a bounded domain, recognized from outside — has few close competitors, so it retrieves sovereignty and little else. Recursion is crowded; sovereignty is distinctive.
2. What "neighborhood in abstraction space" means¶
The Encyclopedia represents each prime as a point in a high-dimensional space, placed by an embedding of its structural signature — the substrate-neutral description of the pattern, not its prose definition. Primes whose signatures describe similar structure end up near each other; primes describing different structure end up far apart. "Neighborhood" is then literal: a prime's neighbors are the primes closest to it in this space, and its distinctiveness is how far away those neighbors are.
- A crowded prime sits in a dense cluster of near-synonyms. Its nearest neighbors are very close; many primes describe nearly the same thing.
- A distinctive prime sits in a sparse region. Its nearest neighbor is comparatively far; little else describes its structure.
This is not a defect of the crowded primes. It is a fact about the geometry of the lexicon — that the catalog contains families of closely related patterns — and it is exactly the kind of relational fact the catalog should make explicit. (It is also the Yoneda intuition from category theory in plain dress: an object is characterized by its relationships to the others. A prime's findability turns out to depend on its neighbors, not on its intrinsic content.)
3. Why distinctiveness, not framing, governs retrieval¶
The project tested the natural hypothesis that structural primes — being domain-neutral — would be easy to retrieve across domains, while framed primes would resist travel. Direct experiment refuted it. When far-domain instances were used to retrieve their own primes, the framed primes were retrieved better, not worse (median rank near the top), while several structural primes were retrieved poorly — recursion and symmetry buried far down the ranking. The cause was not portability but neighborhood density: the structural primes that failed were generic ones sitting in dense clusters (recursion among iteration and self-similarity; symmetry among invariance and conservation), so an instance landed in the right region but the exact prime was crowded out by its siblings. The framed primes that succeeded happened to be sparsely-neighbored, so instances landed precisely on them. The lone framed prime that retrieved poorly, authority, did so for the same reason the structural ones did — its near-twin sovereignty outranked it.
This re-derives, in embedding space, a dissociation that the cognitive science of analogy established decades ago: in Gentner's MAC/FAC account, retrieval is driven by surface and topical similarity while the structural mapping that follows is driven by deeper structure. Distinctiveness is the catalog's measure of the first stage — how surface-separable a prime is from its neighbors. The practical consequence is a design rule the search obeys: for a crowded prime, the right unit of retrieval is the family, not a single best match. The reasoner can recognize the neighborhood reliably and then disambiguate within it; insisting on one exact slug is what fails.
4. How it is measured¶
Distinctiveness is computed, versioned, and recomputable — not authored by hand. For each prime:
- Its structural signature is embedded with a local sentence-embedding model; the embedding is compared to every other prime's by cosine similarity.
- The headline number is a percentile, not a raw similarity. In this embedding space the absolute similarities are highly compressed — almost everything looks fairly similar — so a raw score barely separates crowded primes from typical ones. Ranking each prime's mean similarity-to-nearest-neighbors against the whole catalog turns that weak signal into an interpretable 0–1 scale: 0 = the most crowded prime in the catalog, 1 = the most distinctive.
- Two complementary signals are kept alongside the percentile, because crowding has two flavors: sitting in a broadly dense region (captured by the mean over the nearest neighbors) versus having a few very close twins in an otherwise sparse region (captured by the single nearest similarity and a count of neighbors above a high threshold). A prime can be one without the other.
- Finally, the catalog is partitioned into families — clusters of mutually-near primes — so that each prime's neighborhood has a name and a page.
The full procedure is a single re-runnable script, prime_labeling.py, which versions its output by corpus snapshot and runs in both incremental and batch-recompute modes. In brief: signatures are embedded with a local bge-small model; distinctiveness is the corpus percentile of each prime's mean cosine to its k = 10 nearest neighbors; the near-twin count uses a threshold set near the top of the pairwise-similarity distribution; and families come from k-means — deliberately, rather than threshold-graph clustering, which chains the dense core of this space into a single blob.
Two honesty notes. First, the embedding is an off-the-shelf model; it solves the "surface" layer of retrieval well but not the deep structural layer, so distinctiveness is a useful but imperfect proxy. Second, distinctiveness is relative and drifts: a once-distinctive prime becomes crowded when near-synonyms are added to the catalog, so the numbers are recomputed as the corpus grows and every figure is stamped with the snapshot it was computed from.
5. Orthogonal to structural ↔ framed¶
Distinctiveness and the structural–framed character are independent axes, and the clearest evidence is that all four combinations occur:
- Structural and crowded — symmetry (a pure relational pattern, but ringed by invariance, scale, and duality).
- Structural and distinctive — conservation laws (equally substrate-neutral, but sparsely neighbored).
- Framed and crowded — authority (sitting amid governance, legitimacy, and due process).
- Framed and distinctive — sovereignty (a heavy frame, but few structural neighbors).
So a prime's page reports the two properties separately and never infers one from the other. Knowing that a prime is framed tells you how to treat it when transporting it; knowing that it is crowded tells you whether to expect a clean single match or a family when looking for it.
6. Reading it on a prime's page¶
Each prime's Neighborhood in Abstraction Space section reports its distinctiveness percentile (as a plain band — crowded, mid, or distinctive), the family it belongs to (with a link to that family's page and its other members), and its nearest neighbors with similarity scores. The reading is practical:
- Distinctive primes can be matched directly: a faithful description will retrieve the prime itself.
- Crowded primes are best approached through their family: expect to recognize the neighborhood first and disambiguate among the listed neighbors, since a short description will not separate the prime from its siblings.
In short, distinctiveness converts a hidden property of the lexicon's geometry into usage-ready guidance — and it is the variable the Encyclopedia's cross-domain search is built around. For the broader argument about what kind of object a prime is and why retrieval, not catalog content, is the binding constraint on cross-domain reasoning, see the companion paper The Calculus of Abstraction.