Substrate-Independence Framework and Scoring Methodology¶
This document describes the substrate-independence framework — what it is, why it matters for the Encyclopedia of Abstractions' central claim, and the method used to score the catalog's 655 primes on it. The methodology is reproducible: it can be re-run as the catalog grows, or if the rubric needs to be re-tuned.
Companion documents: - The agent-facing rubric and anchor examples (input to the dual-pass scoring agents). - A human-readable summary of the scoring results with all 655 primes grouped by composite score. - The canonical scores file.
Hypothesis¶
The Encyclopedia of Abstractions' central claim — that catalog-augmented reasoning complements chain-of-thought — is strongest for problems anchored in substrate-independent primes (recurring structural patterns that transfer with the same logic across radically different substrates: physical, biological, computational, social, cognitive, formal). For domain-flavored primes, the catalog's leverage degrades because the "transfer" is often metaphorical rather than structural. Substrate independence is therefore a property worth scoring, both to (a) prioritize where archetype-drafting effort produces the most cross-domain reasoning value, and (b) make the catalog's claim falsifiable per problem class.
Scoring rubric¶
Each prime is scored on four axes, each 1-5:
domain_breadth: how many distinct substrate types (not just industries) the prime genuinely spans. Substrate types: physical, biological, computational, social, cognitive, formal.structural_abstraction: whether the structural signature uses substrate-agnostic vocabulary, or imports domain-specific concepts.transfer_evidence: whether the V2 examples cross substrates, or just cross industries within one substrate.composite_substrate_independence: overall judgment, defaulting to the average of the three axes but with documented agent override permitted.
Anchor calibration¶
Without anchors, agents gravitate toward 3-4 and the scores become useless for prioritization. The full set of anchor examples is recorded in SCORING_PROMPT_TEMPLATE.md. Key anchors:
- 5 (universal):
feedback,boundary,causality,relation - 4 (genuine cross-substrate):
tragedy_of_the_commons,tipping_points_or_phase_transitions,monitoring,equity - 3 (multi-domain, anchored):
bayesian_updating,network_effect,markov_decision_processes_mdps - 2 (mostly domain-bounded):
confounding,selection_bias,delphi_method - 1 (domain technique with structural framing): the catalog's domain-specific entries that are essentially methodologies
Dual-pass protocol¶
Each prime is scored independently by two agents (Pass A and Pass B) operating from identical inputs. Pass agreement is then evaluated:
| |Δ| in composite | Reconciliation method |
|---|---|
| 0 | Pass A's record retained, with the longer reasoning text preserved. |
| 1 | The pass with the longer (more substantive) reasoning is taken. The other pass's composite score is recorded as _alternate_pass_score for traceability. |
| 2 or more | A tiebreaker agent reads both reasonings and produces a final score with explicit engagement of both arguments. |
The dual pass exists to (a) catch single-agent calibration drift, (b) surface genuinely ambiguous primes for explicit resolution rather than averaging away the disagreement.
In the May 2026 run, the gap distribution across 509 dual-scored primes was:
| |Δ| | Primes |
|---|---|
| 0 | 284 (56%) |
| 1 | 197 (39%) |
| 2 | 25 (4.9%) |
| 3 | 2 (0.4%) |
Total tiebreaker cases: 27.
Score-distribution characteristics¶
The May 2026 run produced this distribution across all 511 primes:
| Composite | Count | % |
|---|---|---|
| 0 (no data) | 1 | 0.2% |
| 1 | 43 | 8.4% |
| 2 | 107 | 20.9% |
| 3 | 126 | 24.7% |
| 4 | 154 | 30.1% |
| 5 | 80 | 15.7% |
A healthy distribution — agents used the full range, and the share at 4-5 (45.8%) roughly matches my prior-conversation hypothesis that 40-50% of catalog primes carry the cross-domain transfer load.
Frontmatter integration¶
The reconciled scores are written into each prime_abstractions/v2/{slug}.md frontmatter as:
substrate_independence:
composite_substrate_independence: <int 1-5>
domain_breadth: <int 1-5>
structural_abstraction: <int 1-5>
transfer_evidence: <int 1-5>
reasoning: |
<multi-sentence paragraph from the chosen pass or tiebreaker>
Eleven primes were skipped during this run because they have no frontmatter (data-quality issues): arbitrage_finance, backcasting, causal_layered_analysis_cla, conflict_of_interest, local_autonomy_tiered_escalation, no_one_is_above_the_rules, procedural_fairness_due_process, proportionality, separation_of_powers, plus 2 others. These should be repaired in a separate catalog cleanup pass.
Re-prioritization formula¶
Once substrate scores exist, candidate archetype priority is re-computed:
substrate_factor = (mean_substrate_independence_of_source_primes - 1) / 4 # maps 1-5 to 0-1
substrate_weighted_score = base_score × (1 + α × substrate_factor)
with α = 0.4 (chosen so that a candidate anchored on all-5 primes gets a 40% boost over the same candidate anchored on all-1 primes; α = 0.3-0.5 produces qualitatively similar tier shifts).
Minimum-coverage rule¶
After re-tiering, a coverage check ensures every prime that currently has zero existing archetypes lands at least one Tier-1 candidate. For each gap-zero prime, the highest substrate-weighted candidate that targets it is boosted into Tier 1 if not already there. In the May 2026 run, only 1 boost was needed; almost all gap-zero primes already had a Tier-1 candidate naturally.
Reproducibility¶
To re-run the scoring (e.g., after catalog updates):
# Phase 0: setup
python3 scripts/substrate_setup.py
# Phase 1: dual pass
# (launch 26 agents, 13 batches × 2 passes, via the orchestration in this conversation)
# Phase 2: reconciliation
python3 scripts/substrate_reconcile.py
# (then launch tiebreaker agent for cases in _tiebreaker_input.yaml)
# Phase 3: finalize and write to v2 frontmatter
python3 scripts/substrate_finalize.py
# Phase 4: re-prioritize candidates
python3 scripts/substrate_reprioritize.py
Inputs:
- dist/encyclopedia.primes.jsonl
- dist/encyclopedia.archetypes.jsonl
- solution_archetypes/_search_space/consolidated_proposals.yaml (or prioritized_proposals_moderate_v2.yaml if available)
- solution_archetypes/_search_space/substrate_independence/SCORING_PROMPT_TEMPLATE.md (rubric and anchors)
Outputs:
- solution_archetypes/_search_space/substrate_independence/scores_final.yaml — canonical reconciled scores
- solution_archetypes/_search_space/substrate_independence/_reconciliation_report.md — what got reconciled how
- solution_archetypes/_search_space/prioritized_proposals_substrate_weighted.yaml — re-tiered candidates
- solution_archetypes/_search_space/SUBSTRATE_WEIGHTED_REPORT.md — top candidates and boost log
- Modified prime_abstractions/v2/{slug}.md files (now include substrate_independence block)
Tunable parameters¶
If you want to re-tune without re-running the dual-pass:
α(substrate weight) insubstrate_reprioritize.py: lower α reduces substrate's influence on tiering; higher α prioritizes substrate-independent primes more aggressively.MIN_BOOST_TARGET_TIER1: the score threshold for Tier 1. Lowering it makes Tier 1 broader.- Tiebreaker thresholds in
substrate_reconcile.py: the gap at which programmatic vs. agent reconciliation kicks in (currently 1 vs. 2+).
Known limitations¶
-
Single grader-agent class. All scoring agents (and the tiebreaker) are Claude-class. Systematic biases shared across model class cannot be ruled out. A human-expert calibration round on a sample (e.g., 50 primes) would be valuable evidence that the scoring tracks expert judgment.
-
Anchor-set sensitivity. The 14 anchor examples shape the entire distribution. If those anchors are mis-calibrated, the whole score range is shifted. The anchors were chosen by the project author plus me; an external review of the anchor list would strengthen the methodology.
-
Static at this snapshot. As the catalog grows, primes' transfer evidence may strengthen (more cross-domain examples added). Re-running after each major content update is intended.
-
Empty-frontmatter primes excluded. The 11 primes without frontmatter could not be scored. They should either be repaired (frontmatter authored) or formally retired from the catalog.
-
Dual-pass is not triple-pass. A single tiebreaker agent makes the call on 27 ambiguous primes. For high-stakes downstream uses, those 27 cases should get a third independent score before being treated as authoritative.
What this enables¶
- Per-prime priority-of-attention signal: the highest-substrate-independence primes are the ones whose archetype-drafting investment produces the most cross-domain leverage.
- Catalog quality auditing: low-substrate-independence primes whose claimed "cross-domain" status is exaggerated can be re-classified as domain concepts (with appropriate aliasing) rather than primes.
- Pipeline-claim sharpening: the "complement to CoT" hypothesis can now be tested per-prime-substrate-band, instead of as one undifferentiated claim. The expectation is that the catalog's edge over bare CoT is largest on problems anchored on Band-1 (composite 4-5) primes and smallest on problems anchored on Band-3 (composite 1-2) primes.
- Falsifiability for the encyclopedia's central claim: if downstream experiments show no substrate-band-dependence, the substrate-independence framework would be evidence against the catalog's central claim, not for it.