From Candidate to Catalog¶
How Prime Abstractions Are Identified, Drafted, and Refined¶
A methodology companion to How Prime Abstractions Are Identified, which states the criteria; this paper describes the procedure those criteria are applied through. Sibling to Structural and Framed Primes, Substrate Independence, The Hierarchy DAG, and Curriculum Construction over a Prime Catalog, each of which develops one element of the broader pipeline this paper documents.
Abstract¶
The Encyclopedia of Abstractions is a catalog of prime abstractions — recurring structural patterns that travel across at least three domains of human knowledge. The catalog's force depends on more than the entries themselves; it depends on the procedure by which a candidate concept becomes an accepted, fully specified entry. This paper documents that procedure. We describe how a candidate is evaluated against the seven inclusion criteria and the six guidance rules for distinguishing prime from domain-specific patterns; how an accepted candidate is drafted in a concise form that states the thesis; how that draft is then elaborated into a long-form entry with structural signature, neighbor distinctions, formal and applied examples, structural tensions, and verified citations; and how the entry is integrated into the catalog's category, origin-domain, hierarchy, and learnability views. The procedure has evolved across roughly fifty drafting cycles, has been operated end-to-end by a single curator (Kurt Zoglmann) with the substantial assistance of triangulated LLM agents, and remains in tension with itself in places. The point is not that the procedure is finished; the point is that the catalog is the output of something articulable, repeatable, and improvable, rather than the output of taste.
1. Why a methodology paper exists¶
A catalog of abstractions is only as strong as the method that built it. The criteria for inclusion can be stated in the abstract — and they are, in How Prime Abstractions Are Identified — but criteria do not by themselves produce a catalog. A procedure produces a catalog. The procedure decides which candidates clear the bar, which prose conventions the entries share, what counts as a sufficient set of examples, what counts as a sufficient distinction from a neighboring prime, and how a claim made in an entry gets backed by a citation that survives verification. A reader who wants to evaluate the catalog has to evaluate the procedure that produced it, and to do that the procedure has to be visible.
A few facts about the project that frame everything below. The catalog currently holds 655 primes and 625 solution archetypes. The procedure has been operated across roughly fifty drafting cycles — the three most recent covered 42, 57, and 75 primes — and has produced the templates, rubrics, and safeguards documented here. The catalog is the work of a single curator, me — Kurt Zoglmann, who is the author of this paper. LLM agents are the primary producers of per-prime prose under the curator's direction; the curator authors no per-prime prose at scale, and instead authors the screening decisions, the templates, the rubrics, the dispatch and audit scripts, and the rulings on the cases the procedure flags for human judgment. We will return to this division of labor in §9.
Three commitments shape the procedure that follows, and each shows up in specific procedural choices below. First, the catalog is intended to be extendable. A procedure that lives only in one person's head cannot be handed off; whether or not other contributors join, the procedure's articulability is the precondition for them being able to. Second, the procedure is the mid-2026 state of something still evolving. Several elements that are now load-bearing were added in response to specific failures the prior version had no defense against; future cycles will surface new failure modes and new safeguards in turn. A reader should not treat this paper as a canonical specification; they should treat it as an articulation of what we have learned so far. Third, the procedure makes heavy use of large language models, not as a stylistic crutch but as the primary producers of per-prime prose under tight specification and disciplined audit. The reader is owed an accounting of where machine generation happens, where curator decisions concentrate, and what discipline binds the two together so that machine output reaches a publishable bar.
2. The bar — when does a candidate become a prime?¶
The substantive question this section answers is: what makes a concept eligible to be a prime, and what disqualifies it? The procedure begins here because a procedure that drafts unqualified candidates well still produces a worse catalog than a procedure that rejects them at the screening step. Screening also recurs at two later points in the pipeline with progressively more evidence available; we describe the second and third gates in §3.
The seven inclusion criteria — and the operational translation
A candidate is evaluated against seven criteria established in How Prime Abstractions Are Identified. The criteria are familiar (Core Idea, Broad Use, Provides Clarity, Manages Complexity, Facilitates Abstract Reasoning, Enables Knowledge Transfer, Example); the procedure's contribution is to translate each into a screening question a curator can apply in a few minutes.
- Broad Use asks whether the curator can name three distinct domains in which the pattern operates with the same structural force, not one domain plus metaphorical extensions.
- Knowledge Transfer asks whether the pattern, transported to a new field, carries enough structure to suggest interventions, or merely supplies vocabulary.
- Provides Clarity asks whether the abstraction changes what a reader can see in a system, not merely what they can call it.
- Example asks whether at least one example simultaneously satisfies all the other six criteria. An example that only demonstrates Core Idea is doing the entry a disservice; it must do every job at once.
The remaining three criteria (Manages Complexity, Facilitates Abstract Reasoning, Core Idea) get most of their work at the elaboration stage rather than at screening. They are present in screening, but rarely the single basis on which a candidate fails or passes.
The seven-criteria check, applied honestly, rejects more candidates than it admits. A representative recent rejection: a proposed boundary maintenance candidate that, on inspection, was a label for what boundary and segmentation already do together; it failed under criteria Broad Use (no third domain offered a usage distinct from the existing primes) and Clarity (it added a word, not a way of seeing). The rejection is the point.
We do not currently maintain a rejection ledger. We know the rejection rate is substantial but cannot quote a defensible number; "more rejected than admitted" is informal estimate, not measured statistic. This is a methodological limit we name as such.
Six guidance rules for prime vs. domain-specific
The harder question is the boundary case — the candidate that names a concept everyone has heard of, appears in multiple fields, and yet on close inspection is domain-specific dressed up in transferable language. How Prime Abstractions Are Identified sets out six guidance rules, summarized briefly here (the canonical statement is in that companion):
- Structural mechanism versus mere metaphor — does the candidate name a pattern that reappears in different fields without heavy reinterpretation, or is it invoked elsewhere primarily for illustrative language?
- Genuine cross-domain usage — does it appear in multiple fields with a shared structural blueprint, or only under the heavy terminology of its origin?
- Strip away the domain terminology — does the abstraction still hold if we remove the specialized jargon ("constitutional law," "defendant," "hardware/firmware")? If rewriting it in generic terms dissolves the clarity, it is likely domain-specific.
- Mapping strength — do the same mechanisms and roles (not just the same words) appear across the candidate's claimed domains?
- Core pattern versus domain-specific instantiation — does a specialized label point to a more general higher-order abstraction that should be the prime instead?
- Domain-accented primes — for abstractions tied to sociocultural institutions (accountability, consent), can we show explicit instances of the structural blueprint working in non-legal or non-political realms?
The rule that does the most work in practice is rule 3 — the strip-the-jargon test. It is the cheapest to apply, the hardest to game, and the one most likely to expose a candidate that looked prime under casual reading. The current procedure errs in its direction; we will return to this in §9 as one of the procedure's standing limits.
Where candidates come from — and a blind-spot disclosure
Two sources, in our practice. The first is archetype-pipeline backflow: the catalog also holds solution archetypes — recurring intervention patterns that name source primes as their structural anchors. When work on a solution archetype names a prime in its source_primes field that does not yet exist as a canonical entry, that gap surfaces as a candidate. The second source is curator-authored screening: candidates are proposed from broad surveys of domain literatures, evaluated against the seven criteria and six rules, and the survivors enter the catalog. Both sources produce candidates that fail screening — and the failures are themselves informative, because they teach us where our intuitions about "this feels prime" reliably miss the bar.
We should be direct about what this candidate-discovery procedure does and does not cover. The current corpus draws primarily on Anglophone scientific, technical, and institutional literatures, weighted toward the curator's own training in mathematics, software engineering, and systems thinking. Candidates from non-Western intellectual traditions, from oral cultures, from disciplinary traditions the curator has not systematically surveyed, and from the natural sciences outside the curator's reading range, are under-represented. The catalog's claim to cross-domain coverage should be read with that caveat. We expect future contributors with different backgrounds — and future cycles aimed at deliberately uncovered regions — to surface candidates this curator has not seen.
What gets rejected, and the close-but-distinct principle
We reject candidates that are slug-variants of existing primes, candidates that are prescriptive rather than structural (those belong in the solution-archetype catalog), candidates that name a property of a system rather than a reusable structure, candidates that operate in only one domain however prominent, and candidates whose distinction from a near-existing prime, when written out, dissolves on inspection.
A principle that often surprises new contributors: close-but-distinct primes are intentional. The rejection above turns on whether the distinction dissolves on inspection; the principle here turns on cases where the distinction holds even though the textual definitions overlap. Closeness in language is not the test; closeness in cross-domain pattern is. The catalog deliberately holds primes that sit near each other — feedback near damping, signaling near information_asymmetry, modularity near composition — because each picks up a genuinely different cross-domain pattern. The screening question is "does the difference correspond to a different structural pattern that travels?" When the answer is yes, the new candidate is admitted, and the procedure makes the Distinction from Neighbors section of the elaborated entry carry the discrimination explicitly.
3. The pipeline at a glance¶
A candidate that clears screening moves through a sequence of stages, each with its own discipline and its own deliverable:
screen-1 → admit candidate against criteria + guidance rules
draft (concise) → 8-section thesis-form entry with carry-forward metadata
screen-2 → re-screen with the clarity the concise draft now provides
grade → structural-framed character + substrate independence
screen-3 → re-screen against grading signal
elaborate → 16-section detailed entry written by drafting agent
audit → structural conformance + parity checks per wave
verify → 3-round citation resolution (independent flag · resolve · check)
promote → byte-for-byte copy from staging into canonical store
integrate → category, origin-domain, hierarchy, learnability wiring
A few terms recur from here on, worth defining once at the outset: a wave is one parallel dispatch of seven or eight LLM agents, each working on a different prime; staging refers to the directory entries live in until they pass every check (we promote from staging to the canonical store via byte-for-byte copy in the promotion step); a slug is a prime's canonical machine-name (lowercase, underscored — feedback, due_process, tragedy_of_the_commons).
Screening recurs three times because each new artifact in the pipeline gives the curator more evidence to decide on. The first screen admits or rejects the bare candidate against the criteria and guidance rules. The second screen runs after the concise draft, when the Core Idea and Distinction from Neighbors sections have sharpened what the candidate actually is — borderline cases sometimes look different at this point, and a candidate that survived screen-1 by a hair sometimes fails screen-2 because the draft made the resemblance to an existing prime visible. The third screen runs after grading, when the structural-framed and substrate-independence assessments occasionally surface a domain-anchoring that the initial screening missed. Screens 2 and 3 reject few candidates in practice — the first screen does most of the work — but they catch what the first screen could not have seen yet. In what follows we treat screening as a single discipline that recurs, not three separate disciplines.
Why the stages are separate
The pipeline is staged this way because conflating any two stages produces a characteristic failure mode. Three are worth naming explicitly because several §7 safeguards are downstream of them:
- Mix the elaboration and verification stages and you get fabricated citations. A drafting agent asked to also resolve citations will invent a plausible-sounding source rather than admit it cannot find one.
- Mix the elaboration and audit stages and you get author-trusted self-reports that turn out wrong. Drafting agents reliably claim "all 15 anchors placed" when the file contains 13.
- Mix the audit and verification stages and you get parity-clean entries with no real citations. The anchors line up; the sources don't exist yet.
Each stage's discipline is what its prior stage's output cannot encode: screening is editorial judgment against criteria; drafting is "state the thesis"; grading is independent multi-rater assessment; elaboration is "substantiate the thesis"; audit is mechanical conformance checking that does not trust author self-reports; verification is independent substantive review plus real-source research, not prose extension; promotion is byte-for-byte copy plus diff confirmation; integration is editorial placement in the catalog's browse views.
The stages also produce separately useful artifacts. The concise draft is the thesis form of the entry; many readers will prefer it. The elaborated entry is the defense; the reader who wants to evaluate the claim wants the elaboration. The grades are usable independently for search filtering and curriculum ordering. The hierarchy edges (developed in The Hierarchy DAG) layer on top. The learnability tiering (developed in Curriculum Construction over a Prime Catalog) sits at the end of the pipeline because it consumes signals from every prior stage.
4. The concise draft¶
The concise draft is the thesis-form entry. It is about 250–600 words and contains eight sections:
## Core Idea
## Broad Use
## Clarity
## Manages Complexity
## Abstract Reasoning
## Knowledge Transfer
## Example
## Distinction from Neighbors
The order is chosen so the entry reads as a single argument. Core Idea states what the pattern is; Broad Use establishes that it travels; the next four sections walk through the same four criteria the candidate was screened against, now answered for this specific prime; Example gives the load-bearing demonstration; and Distinction from Neighbors — the section the elaborated entry will substantially expand — names which primes sit nearest in abstraction space and what the difference is, neighbor by neighbor.
The frontmatter carries fields that travel into the elaborated entry verbatim: the slug, display name, categorical placement (the prime's location in the curated ontology), origin domain it comes from, aliases, the substrate_independence grade and reasoning (a composite 1–5 score plus four sub-axes, discussed in §5), and the similarity_to_nearest_existing_prime record (which existing prime the candidate is closest to, and the cosine similarity score).
The last two fields exist because they encode the screening evidence; preserving them into the elaborated entry preserves the audit trail. An early drafting cycle — we number them internally as DP-NN, where DP stands for drafting pass; this was DP-53, covering 57 primes — mistakenly stripped these two frontmatter fields as "curator-review aids," and every elaborated entry in the cycle had to be repaired by re-injecting the blocks from the concise draft. The current procedure copies all concise-draft frontmatter verbatim; only schema_version changes (1 → 2).
We keep both forms — concise and elaborated — deliberately. The concise form is the thesis; the elaborated form is the defense. They serve different readers. A learner approaching the catalog wants the thesis; an analyst evaluating whether a prime fits a problem wants the defense. On the rendered site they appear as alternative views of the same prime; the reader chooses which one to read. The slight maintenance cost of keeping both in sync is worth the friction reduction.
The eight sections are mostly the right eight. One present limit, named in §9: the Example section can land generic when the drafter is rushed; the procedure does not yet mechanically catch a weak example, and curator review still bites here.
5. Structural-framed character and substrate independence¶
Two graded portability assessments live alongside the prose entry. Each is machine-injected — written by a separate grading pipeline and inserted into the entry by a script, not authored by the drafting agent. (The reason for the separation is in §7; the short version is that asking the drafter to both produce and explain the grade would let the audit be re-narrated by the auditee.)
A note on what "multiple rater passes" means throughout this paper. In current practice the raters are independent invocations of the same LLM, not different model families or human raters. The independence we get is run-to-run sampling variance plus prompt-level rater-role assignment, not model diversity. This matters for the epistemic value of the triangulation: it disciplines against single-roll volatility and against the agent's tendency to commit early to a label, but it does not give the assurance that human inter-rater agreement, or cross-model agreement, would. The procedure makes the triangulation as strong as the available infrastructure permits; the reader should weight conclusions accordingly.
Structural-framed character
Every prime is graded on where it sits along the structural-framed spectrum. The full theoretical development is in Structural and Framed Primes; the operational summary is that some primes (feedback, threshold, equilibrium, recursion) are pure relational patterns that travel light across domains, while others (sovereignty, procedural_fairness_due_process, legitimacy, property_rights) carry an institutional or normative frame that travels with them and resists clean separation. Which kind a prime is — and how strongly — substantially shapes how the prime should be used and what to expect when transporting it to a far field.
Each prime is scored independently by multiple rater passes against a five-criteria rubric (documented in the companion paper). The aggregate maps to one of four labels — structural, mixed-structural, mixed-framed, framed — by threshold. We carry two flags alongside the label: a rater-agreement score recording how reliably individual raters reached the same label they were assigned to the consensus, and a boundary flag set when raters disagreed enough that the label could plausibly have gone the other way. Boundary cases are surfaced to the curator as the input queue for human judgment.
This grading was added partway through the project, after we observed in practice that feedback and procedural_fairness_due_process behaved differently when transported across domains — feedback transferred clean while due process kept dragging the institutional setting with it. The structural-framed grade is the operationalization of that observation. It has been retroactively applied to the entire corpus, so every prime now carries the grade; future cycles grade as part of routine work.
Substrate independence
A second portability assessment, distinct in its question. Substrate independence asks: how widely does the prime's logic transfer across the substrates a system can be implemented in — physical, biological, computational, social, cognitive, formal? The answer is recorded as a composite 1–5 score, four sub-axis scores (domain_breadth, structural_abstraction, transfer_evidence, plus a reasoning narrative), and a generated human-friendly paragraph that sits in the entry as its own section. The rubric and the calibration anchors are described in the Substrate Independence explainer.
The two assessments answer different questions: structural-framed asks what the prime carries with it when transported; substrate-independence asks how far it transports.
6. The detailed entry¶
The elaborated entry is the long-form defense of the thesis the concise entry stated. It is typically between 3,500 and 5,500 words and contains sixteen sections in a fixed order:
## Core Idea
## Structural Signature
## What It Is Not
## Broad Use
## Clarity
## Manages Complexity
## Abstract Reasoning
## Knowledge Transfer
## Examples (with ### Formal/abstract and ### Applied/industry subsections)
## Structural Tensions
## Structural–Framed Character
## Substrate Independence
## Distinction from Neighbors
## Solution Archetypes
## Notes
## References
The eight criteria sections (Core Idea through Knowledge Transfer) expand the concise form's content with substantive new material rather than padding. Each must do more than restate the criterion; it must offer specific evidence — named domains, named patterns, named transfers — that the reader can evaluate.
The other novel sections each carry distinct work:
Structural Signature
A new section, present only in the elaborated form, naming the role-structure of the prime via a Sig role-phrases block of five to seven bullets. Each bullet names a role and what it does: "the receptor responds to the signal," "the amplifier increases magnitude without changing form," "the comparator contrasts state against reference." The signature is what allows the prime to be recognized in an unfamiliar context: a reader who can see the roles can identify the pattern in a new system whether or not the system uses the prime's vocabulary. We have found that a strong Structural Signature tends to pull the rest of the entry into focus; a weak one usually signals the entry's Core Idea hadn't quite landed yet.
What It Is Not
Scope-clarification work in plain language — common misreadings, conflations, adjacent concepts the prime is sometimes mistaken for. It is not a neighbor-by-neighbor comparison. The two roles are distinct enough that mixing them broke an entire drafting pass (DP-53 again): drafters used the concise draft's Distinction from Neighbors content to fill the elaborated What It Is Not section, leaving no DfN section at all and a What It Is Not that was mis-scoped to close primes rather than misreadings. The repair was corpus-wide.
Examples — formal/abstract + applied/industry, each with a "Mapped back"
The Examples section is split into two ### subsections. Formal/abstract gives an example in mathematical, scientific, or structurally pure form. Applied/industry gives one in a domain where the prime is doing actual decision-relevant work. Each subsection closes with a Mapped back paragraph that explicitly returns the example to the role-structure named in Structural Signature.
The split is not cosmetic. A prime that admits a clean formal example but no applied one (or vice versa) is a prime whose claim to cross-domain status is weaker than its other sections suggest. The Mapped back discipline is what enforces the connection between an example and the signature; without it, examples tend to drift toward "an interesting instance" rather than "an instance the signature predicts."
Structural Tensions — exactly six
Six T-tensions in **T1: Title.** Body… form at column 0. The exact-six rule sounds arbitrary but is calibrated: fewer than six tends to mean the drafter found one or two obvious tensions and stopped; more than six tends to mean the drafter padded with second-order observations that belong in Notes. Six lands at the inflection point where the drafter has had to find the non-obvious tensions, which are usually the most useful.
Each tension names a live productive tension the prime is in contact with — a place where the prime is not the whole story, where a competing prime applies, where a downstream decision must be made. For feedback, this might include **T1: Speed vs. stability.** (faster sensors give shorter response but risk oscillation), **T3: Local vs. global.** (loop-local optima can fail at the system level), and so on. The exact-six discipline forces the drafter to find tensions of multiple kinds — temporal, scalar, scopal — rather than six instances of the same kind dressed differently.
Distinction from Neighbors — sharp when similarity is high
The concise form's terse neighbor-by-neighbor comparison expands here to 400–700 words of prose paragraphs, one to three per named neighbor. If the prime's similarity_to_nearest_existing_prime score is high (the concise screening evidence that this prime sits close to an existing one), the section gets especially sharp treatment, because the closer the neighbor, the more carefully the difference has to be drawn for the catalog to make its work visible.
The closing five sections (Structural-Framed Character, Substrate Independence, Solution Archetypes, Notes, References) are either machine-injected (the two grading sections, per §5) or straightforward inventory: which solution archetypes the prime anchors or contributes to; curator observations that don't fit cleanly elsewhere; the bibliographic backing for the body's claims, structured as we describe in §7.
We target 3,500–5,500 words for the elaborated entry. The lower bound is the more important number: entries below it almost always read as inflated concise drafts; entries above it almost always either repeat themselves or wander.
7. Quality controls¶
The procedure incorporates several safeguards. Each was added in response to a specific failure mode that the procedure's earlier version had no defense against. The failure modes ground the safeguards; the procedure reads as overengineered without them.
The audit-don't-trust principle
LLM drafting agents are asked to self-report a small set of structural metrics: the number of sections, the number of citation anchors, the word count, the number of structural tensions. These self-reports are routinely wrong in a characteristic way: the agent says "all 15 citation anchors placed" when the file contains 13. In an early cycle, an agent assigned seven primes claimed success on all seven and had actually densified one; the other six were near their concise-draft length. The procedure now runs a structural audit after every drafting wave — the curator's own scripts (not the drafting agent) grep for section count, tension count, citation-anchor count, word count, and heading set — and any entry that fails dispatches a focused fixup before the next wave starts.
The principle has two specializations. Where a single agent produces structure, a mechanical audit follows: the audit catches what the agent's self-report missed. Where a structural attribute could be re-narrated by the agent that authored it — as in the structural-framed and substrate-independence sections — a separate generator authors the attribute directly, machine-injected from its output, so that the audit (the rubric scores) and the prose explaining it cannot diverge under the drafting agent's pen. Asking the drafter to also author the section would let the audit be re-narrated by the auditee.
The principle has limits we should be specific about. Mechanical audits catch structural defects — missing anchors, drifted headings, out-of-range IDs — but the audit-don't-trust principle as just described is not how substantive defects get caught. Substantive defects — confidently-asserted claims in the body of an entry that are wrong on the merits — are caught (when they are caught) by a different mechanism: an independent review pass that runs as the first round of the citation-resolution stage described below.
What that mechanism catches: every claim a separate review agent, reading the entry in a fresh context window with no exposure to the drafter's confidence, judged to be possibly hallucinated, less certain than the prose admits, or the kind of statement a skeptical audience would want backing for. The independence is what gives the flagging its bite — the reviewer's calibration is not corrupted by the drafter's. The flagged claims become the inputs to the resolution stage, which either finds a real source for each or fails to find one; failure surfaces the claim as suspect and the curator or a follow-up agent rewrites it.
What it doesn't catch: claims the drafting agent was confident-but-wrong about and that the reviewing agent also read as plausible. A prime entry that confidently states a misleading fact about feedback in cellular biology — and that the independent reviewer reads as plausible and does not flag — will pass every check and ship with the error. This residual error class is where curator spot-checks, reader reports, and re-reading in subsequent drafting cycles still bind. The two-agent review reduces the substantive-error rate substantially; it does not eliminate it. We name this as the procedure's most important remaining substantive-error class.
Dual-placement parity
Each substantive claim in the body of an elaborated entry carries a unique anchor that ties the body sentence to a References-section entry. In our scheme the anchor is an HTML comment sitting at the end of the sentence the claim occupies (it looks like `` — a batch tag plus a sequence number, useful for the curator and ignored by the reader). Each anchor has a corresponding entry in the References section. The parity check, run once per drafting cycle, verifies that every ID appears exactly once inline and exactly once in References, in the right ID range for the right slug, with no duplicates and no out-of-range IDs.
The check catches subtler failures than missing anchors. In one cycle, an entry for convexity_and_non_linearity (since renamed) used an ID that belonged to a different prime (coordination_problem_and_equilibrium_selection) — the drafter had drifted out of its assigned range. The parity check caught it; a script stripped the stray anchor from both the body and the References stub. Without the check, the body would have asserted a claim the references no longer backed, and a different prime's references would have backed a claim it didn't make, in two unrelated entries at once.
Citation resolution is a separate, three-round stage
Citation resolution runs after the drafting agent has finished writing the elaborated entry. The stage has three rounds, each with its own discipline, run by independent agents (different context windows, different prompts).
Round 1 — Independent flagging. A separate review agent in a fresh context window reads the v2 entry and places FACT anchors at any sentence whose claim the reviewer judges to be possibly hallucinated, less certain than the prose admits, or of the kind a skeptical audience would want substantiated. The reviewer's independence from the drafter is the round's load-bearing property: the reviewer has not produced the prose, so the reviewer's confidence is not corrupted by the drafter's. Each anchored sentence becomes a stub in References marked "pending verification" — a descriptor of what the claim says, ready for Round 2 to resolve.
Round 2 — Reference-finding. Separate agents work through the stubs one at a time, web-searching a verifiable source for each claim, writing the proper bibliographic entry, and placing the markdown footnote reference inline at the end of the sentence the claim occupies, attached to the existing anchor. Agents that cannot find a real source must log the failure rather than fabricate one; the failure surfaces as a substantive-error signal that the curator or a follow-up agent acts on (typically by rewriting the sentence to make a weaker claim the available evidence does support).
Round 3 — Verification. A third set of agents re-checks the resolved citations via independent web search, looking for fabrications that slipped through Round 2 and for citations that don't actually back the specific claim attached to them.
The three-round structure exists because the failure modes the stage is designed to catch are themselves three distinct kinds of failure. Round 1 catches unflagged substantive uncertainty — claims the drafter was confident in that don't survive an independent read. Round 2 catches unresolvable claims — flagged sentences for which no real source can be found, which usually means the claim is wrong or overstated. Round 3 catches plausible-but-wrong citations — sources that were found in Round 2 but on inspection don't actually back what the sentence asserts.
Mixing any of these rounds with drafting produces fabricated citations. A drafting agent asked to also resolve its own citations tends to invent a plausible-sounding source — author, year, title, venue — rather than admit it cannot find one. In one resolution run, an agent fabricated a source even with the verification prompt in place; we caught it on review and reverted. The procedure now requires that resolving agents log their search results, which makes fabrication visible in the audit trail.
A related subtler failure: a drafting agent can fabricate a prose-level source — naming "Kuznetsov and Wallace (2021)" in the body of an entry — that the Round 2 agent will then sensibly decline to cite, leaving the bogus author name stranded in the sentence. This happened with the entry for stressor_induced_adaptation; the Round 2 agent correctly cited a real source but left the fake name. The procedure now greps the post-resolution prose for author-year mentions that don't correspond to a real footnote, and fixes any sentence that names a source the footnote doesn't back.
Stage-specific concurrency caps (environmental)
We audit and sync staging after every wave before launching the next; that discipline is the part that transfers. The specific concurrency numbers are environmental and time-bound. As of mid-2026, on the platform this procedure runs on, drafting waves dispatch seven to eight parallel agents and citation-resolution waves stay at five — we learned the cap on resolution by exceeding it, when ten citation agents firing together produced a 529 rate-limit response and eight of the ten errored out before writing anything (no corruption, but a wasted wave). The specific numbers will change as APIs and rate limits evolve; the discipline of running waves rather than blind concurrency, and auditing before scaling, is what we expect to persist.
Reconcile, don't restart
Some entries arrive at the citation-resolution stage in a partial state: the drafting agent placed some real citations and left others as pending stubs. Restarting from the pending stubs alone is the naive procedure — and it works for the stubs but also clobbers or duplicates the real citations the drafter already produced. The current procedure detects partial-state entries (any entry with both pending stubs and resolved references) and dispatches a reconciliation agent under a different prompt: verify the existing real citations are not fabricated, fill the missing anchors, remove the leftover stubs. Less rework, less risk of clobbering good work. The pattern surfaced when five entries in one cycle turned out to be in this mixed state simultaneously; the from-scratch resolution prompt would have made things worse.
Heading-set audits, not just count
An audit that checks only that an entry has sixteen ## headings will pass entries that have sixteen non-canonical headings — "Examples in Context" instead of "Examples," "Design Implications" instead of "Notes." Two such entries survived early audits because those audits only counted heading totals. One had ## Notes after ## References and used ## Structural Tensions and Failure Modes instead of ## Structural Tensions; both deviations were quietly catalog-incompatible. The fix is to audit the heading set against the canonical sixteen, not just the count. Fix-it agents handle rename-with-content-remap cases; the mechanical part of the audit handles the rest.
8. Integration into the catalog¶
A clean entry that has cleared drafting, grading, elaboration, audit, parity, and citation resolution is promoted — copied byte-for-byte from staging into the canonical prime store. Promotion is the moment the entry becomes visible in catalog browse views. Promotion itself is mechanical; what matters more, and is easier to get wrong, are the integration steps that follow.
Three integration steps run after promotion. Two of them work cleanly in the current procedure; the third currently exposes a known vocabulary-drift problem we name as an open item in §9.
Ontology placement
Every prime occupies a position in the curated category tree (meta/02_ontology.md). The position determines which Category the prime appears under in the categorical browse view. The procedure adds a one-line entry — the prime's slug, with an asterisk suffix if the categorical placement is secondary rather than primary — under the right category section, in the right order. A reconciliation script then propagates the category back into the prime's frontmatter, so that the frontmatter and the category tree never drift. The characteristic failure when this step is skipped: the prime disappears silently from the categorical browse view despite existing in every other view.
Origin-domain classification
Each prime also has an origin domain — the field of human knowledge it is most strongly anchored in. The origin domain is one of the sixty-six canonical domains documented in the catalog's domains-of-human-knowledge inventory and recorded in a per-domain origin registry. The by-domain browse view is built from that registry.
The characteristic failure here is subtler than missing entries — it is vocabulary drift. Three slug vocabularies for the same domains are currently in tension: the canonical domain slugs in the inventory (computer_science, psychology); the origin-registry values, which sometimes use longer compound forms (computer_science_software_engineering); and the per-prime origin_domain frontmatter, which has sometimes drifted to sub-domains (decision_theory, quantum_mechanics, game_theory).
This is not a small alignment problem; it is a current-state corpus-integrity issue. As of mid 2026 the per-domain registry covers 468 of the corpus's primes, with the remaining several hundred silently absent from the by-domain browse view until the reconciliation completes. We name this honestly here rather than gesturing at "in progress"; the reader who is evaluating whether the catalog is fit for systematic use should know the current state.
Learnability tiering
The final integration step is per-prime learnability. For each new prime, three independent LLM generators plus one judge produce an ELI ladder: explanations at five levels (ELI5/ELI10/ELI15/ELI18/specialist) plus everyday-language names at the first three levels. An ELI5 entry is a kindergarten-vocabulary explanation in 30–60 words; ELI10 a fifth-grader explanation; ELI15 a high-school freshman explanation; and so on up to specialist (which assumes home-domain background and runs 100–250 polished words). The judge selects the best generator output at each level, or marks a level not available if at least two generators independently said no faithful explanation at that level was possible.
The ladder feeds the curriculum scorer, which re-ranks the entire corpus into five learnability tiers via a difficulty-weighted topological sort that honors the Hierarchy DAG's prerequisite edges. The new prime's tier is the output; it is consumed in the by-learnability browse view and on the prime's own page. The characteristic failure when this step is skipped: the new prime shows up in by-learnability with no kid-friendly explanation, breaking the curriculum view for any reader treating the catalog as a learning sequence.
9. How the procedure has evolved, and where it is still evolving¶
The procedure described above was not designed in advance and then operated. It was discovered, cycle by cycle, across roughly fifty drafting passes. To put a few specifics on the count: DP-52 covered 42 primes; DP-53, 57; DP-54, 75. Several elements that are now load-bearing — the structural-framed grading, the substrate-independence section, the learnability tiering, the dual-placement parity check, the citation-resolution-as-separate-stage discipline — were added in response to failures the prior version had no defense against. The version of the procedure documented here is what works as of mid 2026.
Tooling gaps with concrete next steps
Three open items are concrete, scoped, and likely to close in the next few cycles:
- Origin-domain vocabulary reconciliation. The three slug vocabularies described in §8 are not yet aligned; 468 of the corpus's primes are currently in the per-domain registry, with the rest absent from the by-domain browse view until the reconciliation completes.
- Substrate-independence rubric recalibration. The rubric is calibrated against a small set of anchor primes; as the catalog grows into regions of substrate-independence space the anchors don't cover well, the rubric needs explicit recalibration. The procedure does not yet have a recalibration step.
- Structural-framed boundary-queue cadence. The grading produces a rater-agreement score and a boundary flag for primes where raters disagreed; these get surfaced to the curator, but the review queue is worked through ad hoc rather than on a cadence.
A separate present limit, named in §4: the concise draft's Example section can land generic when the drafter is rushed. The procedure does not yet mechanically catch a weak example.
Structural limits we are not planning to fix in the short term
Four limits are deeper than tooling gaps and the procedure does not have a near-term plan for closing them.
The audit asymmetry. The audit-don't-trust principle (§7) applies exclusively to LLM agents. The curator's own judgment — at the screening step, at boundary cases in structural-framed, at ontology placement, at ELI-ladder not available flags — is not independently audited. There is no second-curator review. This is structurally interesting because §7 elevates audit discipline to the procedure's central commitment; that commitment is asymmetric in practice.
Inter-curator agreement is unmeasured. The procedure has not been operated end-to-end by anyone other than its author. The inter-curator agreement that would tell us whether the procedure is reproducible — whether a second curator trained on this paper would produce a similar catalog — has not been measured. The catalog is therefore one curator's procedurally-disciplined catalog, not a tested-reproducible catalog. We name this as a structural limit rather than as a planned step.
The candidate-screening kernel. The candidate-screening discipline (§2) is where human judgment concentrates most heavily. We have not found a way to mechanize the rejection of close-but-fundamentally-domain-specific candidates that doesn't either over-reject or under-reject in ways the curator's judgment doesn't. We expect this kernel to shrink as tooling matures — better-calibrated similarity classifiers, written rejection ledgers usable as training data, multiple-curator workflows — but the current state is that human judgment is binding here, and we do not see a clear short-term path to reducing it without unacceptable error rates.
Conflict-of-interest disclosure. The catalog is the work of a single curator whose intellectual interests shape what feels prime. The current corpus over-represents the curator's reading range — systems thinking, software engineering, cybernetics-adjacent literatures, formal methods — and under-represents traditions the curator has not systematically engaged with. Readers should weight cross-domain claims accordingly. A second curator with different intellectual interests would likely surface candidates this curator has not seen, and would likely judge close-but-distinct cases differently at the margin.
Versioning, retraction, and citability
The catalog is git-versioned and the public site shows current HEAD. A reader citing an entry today cannot easily retrieve the state of that entry from six months ago; we do not currently surface per-entry version history to readers. We treat substantive errors as edits-in-place rather than as retractions-with-history; there is no public retraction log. For a reader using the catalog as a citable source, this is a real limit, and a future version of the procedure will likely need to address it (per-entry version chips, a public revision log, a retraction policy with public records). We are naming the gap rather than gesturing at a plan.
The role of LLM agents — accounting
The role of LLM agents in the procedure is what we have found works for the present generation of models; it will need recalibration as models change. Drafting, grading, and citation resolution are predominantly machine-generated under tight prompts and disciplined audits. The curator authors no per-prime prose at scale — almost every word a reader encounters in a prime entry was generated by an LLM agent under a specific drafting prompt — and instead authors the templates, rubrics, dispatch and audit scripts, screening decisions, and rulings on the cases the procedure flags for human judgment. Curator judgment concentrates at four points: candidate acceptance (§2), ontology and origin-domain placement (§8), boundary cases in the structural-framed grade (§5), and ELI-ladder not available flags (§8). The triangulation pattern — multiple independent generators plus a judge step — is the single most useful piece of discipline we have found; we noted in §5 the limits on what "independence" means when the raters are the same model.
A short stance on authorship attribution. The catalog is published under the curator's name; the drafting agents are tools, not co-authors, in the way that a word processor is a tool. We do not currently attach per-sentence provenance metadata identifying which agent produced which prose. This is a defensible editorial choice but it is a choice; a future version of the procedure may add provenance tags at the section or sentence level, particularly as norms around AI-assisted scholarship mature.
10. What's transferable, and what isn't¶
A reader who wants to build something analogous — a catalog of mathematical patterns, of organizational anti-patterns, of historical analogies, of cognitive biases — should not adopt the present procedure wholesale. The principles transfer; the tooling mostly does not; and even the principles are conditional on infrastructure the paper has so far described in passing.
The central principle, with its exception
The principle is this: any stage where a generative agent produces checkable structure must be followed by a mechanical audit it didn't author. Most §7 safeguards are specializations of that principle — audit-don't-trust on heading counts, dual-placement parity on citations, separation of drafting from verification on sources, machine injection of grades the drafter could have re-narrated, heading-set checks against canonical inventories, concurrency caps that don't depend on the agent reporting whether the wave succeeded.
The principle has an exception we should be explicit about. For stages where the agent's output is substantive prose rather than checkable structure — the per-prime body claims, the ELI-ladder content, the structural-framed labels for boundary cases — mechanical audit is not available. The closest substitute is the triangulated-generators-plus-judge pattern, with the understanding that this is weaker than mechanical audit and is the place where the procedure remains most vulnerable. As we noted in §5 and §7, our triangulation uses the same model in parallel, which gives sampling variance but not model diversity. Treat the central principle as universal where structure is checkable; treat the substitute as our best current approach where it is not.
Supporting disciplines
Several disciplines transfer alongside the central principle:
- The separation of drafting from citation resolution. Mixing them produces fabricated sources.
- The triangulated-generator-plus-judge pattern. It is the most useful tool we have for any subjective grade, with the caveats above on what its independence assumes.
- The reconcile-don't-restart principle. Partial-state entries deserve reconciliation, not restart, when generation is expensive.
- The two-form policy — preserve the thesis form alongside the defense form — applies to any catalog where readers will arrive with mixed depth of engagement.
- The practice of pinning down a fixed template for each form. A different catalog will have a different template; the act of fixing one and treating drift from it as a defect is what transfers.
- The practice of calibrating subjective rubrics against named anchor cases. The anchor set will be different in a different catalog; the discipline of choosing anchors before raters start grading is the part that survives.
The infrastructure these disciplines assume
The disciplines above presume infrastructure: LLM API access at scale, agent-orchestration tooling, the ability to dispatch and audit waves of parallel agents, version-controlled staging directories, and scripts the curator can write and maintain. For a project without this infrastructure, the disciplines collapse to "use multiple AI calls and reconcile them by hand," which is not the same procedure. The principles transfer; the infrastructure is part of the procedure, not background. A reader at an organization considering whether to adopt this approach should treat the infrastructure cost as the first question, not the last.
What does not transfer cleanly are the specific instances of the patterns: our citation-anchor scheme, our section template, our particular rubric calibrations, our directory layout, our scripts, our wave-dispatch tooling. Each is coupled to a single curator's working environment; you will have your own.
The contributor-facing runbook (PRIME_DENSIFICATION_RUNBOOK.md in the repo root) holds the operational depth a serious adapter would want — exact agent prompts, failure-mode catalog, repair scripts, tooling references. The conceptual paper you are reading is the why; the runbook is the what we actually run.
11. Companion documents¶
- How Prime Abstractions Are Identified — the criteria and guidance rules referenced in §2.
- Structural and Framed Primes — the typology behind the structural-framed grade (§5).
- Substrate Independence — the rubric behind the substrate-independence grade (§5).
- The Hierarchy DAG — how typed relations between primes are recorded and used.
- Curriculum Construction over a Prime Catalog — how the ELI ladder and tiers are built (§8).
- The Calculus of Abstraction — the framing of primes as a lexicon and operations on them as a grammar.
- The Limits of Runtime Scaffolding — the empirical retrospective on whether the catalog plus a runtime pipeline improves frontier-model reasoning.
The contributor-facing runbook (PRIME_DENSIFICATION_RUNBOOK.md, in the repo root) is the operational document this paper distills. The public paper is the why; the runbook is the what we run.