Skip to content

Substrate-Independence Framework and Scoring Methodology

This document describes the substrate-independence framework — what it is, why it matters for the Encyclopedia of Abstractions' central claim, and the method used to score the catalog's 655 primes on it. The methodology is reproducible: it can be re-run as the catalog grows, or if the rubric needs to be re-tuned.

Companion documents: - The agent-facing rubric and anchor examples (input to the dual-pass scoring agents). - A human-readable summary of the scoring results with all 655 primes grouped by composite score. - The canonical scores file.

Hypothesis

The Encyclopedia of Abstractions' central claim — that catalog-augmented reasoning complements chain-of-thought — is strongest for problems anchored in substrate-independent primes (recurring structural patterns that transfer with the same logic across radically different substrates: physical, biological, computational, social, cognitive, formal). For domain-flavored primes, the catalog's leverage degrades because the "transfer" is often metaphorical rather than structural. Substrate independence is therefore a property worth scoring, both to (a) prioritize where archetype-drafting effort produces the most cross-domain reasoning value, and (b) make the catalog's claim falsifiable per problem class.

Scoring rubric

Each prime is scored on four axes, each 1-5:

  1. domain_breadth: how many distinct substrate types (not just industries) the prime genuinely spans. Substrate types: physical, biological, computational, social, cognitive, formal.
  2. structural_abstraction: whether the structural signature uses substrate-agnostic vocabulary, or imports domain-specific concepts.
  3. transfer_evidence: whether the V2 examples cross substrates, or just cross industries within one substrate.
  4. composite_substrate_independence: overall judgment, defaulting to the average of the three axes but with documented agent override permitted.

Anchor calibration

Without anchors, agents gravitate toward 3-4 and the scores become useless for prioritization. The full set of anchor examples is recorded in SCORING_PROMPT_TEMPLATE.md. Key anchors:

  • 5 (universal): feedback, boundary, causality, relation
  • 4 (genuine cross-substrate): tragedy_of_the_commons, tipping_points_or_phase_transitions, monitoring, equity
  • 3 (multi-domain, anchored): bayesian_updating, network_effect, markov_decision_processes_mdps
  • 2 (mostly domain-bounded): confounding, selection_bias, delphi_method
  • 1 (domain technique with structural framing): the catalog's domain-specific entries that are essentially methodologies

Dual-pass protocol

Each prime is scored independently by two agents (Pass A and Pass B) operating from identical inputs. Pass agreement is then evaluated:

|Δ| in composite Reconciliation method
0 Pass A's record retained, with the longer reasoning text preserved.
1 The pass with the longer (more substantive) reasoning is taken. The other pass's composite score is recorded as _alternate_pass_score for traceability.
2 or more A tiebreaker agent reads both reasonings and produces a final score with explicit engagement of both arguments.

The dual pass exists to (a) catch single-agent calibration drift, (b) surface genuinely ambiguous primes for explicit resolution rather than averaging away the disagreement.

In the May 2026 run, the gap distribution across 509 dual-scored primes was:

|Δ| Primes
0 284 (56%)
1 197 (39%)
2 25 (4.9%)
3 2 (0.4%)

Total tiebreaker cases: 27.

Score-distribution characteristics

The May 2026 run produced this distribution across all 511 primes:

Composite Count %
0 (no data) 1 0.2%
1 43 8.4%
2 107 20.9%
3 126 24.7%
4 154 30.1%
5 80 15.7%

A healthy distribution — agents used the full range, and the share at 4-5 (45.8%) roughly matches my prior-conversation hypothesis that 40-50% of catalog primes carry the cross-domain transfer load.

Frontmatter integration

The reconciled scores are written into each prime_abstractions/v2/{slug}.md frontmatter as:

substrate_independence:
  composite_substrate_independence: <int 1-5>
  domain_breadth: <int 1-5>
  structural_abstraction: <int 1-5>
  transfer_evidence: <int 1-5>
  reasoning: |
    <multi-sentence paragraph from the chosen pass or tiebreaker>

Eleven primes were skipped during this run because they have no frontmatter (data-quality issues): arbitrage_finance, backcasting, causal_layered_analysis_cla, conflict_of_interest, local_autonomy_tiered_escalation, no_one_is_above_the_rules, procedural_fairness_due_process, proportionality, separation_of_powers, plus 2 others. These should be repaired in a separate catalog cleanup pass.

Re-prioritization formula

Once substrate scores exist, candidate archetype priority is re-computed:

substrate_factor = (mean_substrate_independence_of_source_primes - 1) / 4   # maps 1-5 to 0-1
substrate_weighted_score = base_score × (1 + α × substrate_factor)

with α = 0.4 (chosen so that a candidate anchored on all-5 primes gets a 40% boost over the same candidate anchored on all-1 primes; α = 0.3-0.5 produces qualitatively similar tier shifts).

Minimum-coverage rule

After re-tiering, a coverage check ensures every prime that currently has zero existing archetypes lands at least one Tier-1 candidate. For each gap-zero prime, the highest substrate-weighted candidate that targets it is boosted into Tier 1 if not already there. In the May 2026 run, only 1 boost was needed; almost all gap-zero primes already had a Tier-1 candidate naturally.

Reproducibility

To re-run the scoring (e.g., after catalog updates):

# Phase 0: setup
python3 scripts/substrate_setup.py

# Phase 1: dual pass
# (launch 26 agents, 13 batches × 2 passes, via the orchestration in this conversation)

# Phase 2: reconciliation
python3 scripts/substrate_reconcile.py
# (then launch tiebreaker agent for cases in _tiebreaker_input.yaml)

# Phase 3: finalize and write to v2 frontmatter
python3 scripts/substrate_finalize.py

# Phase 4: re-prioritize candidates
python3 scripts/substrate_reprioritize.py

Inputs: - dist/encyclopedia.primes.jsonl - dist/encyclopedia.archetypes.jsonl - solution_archetypes/_search_space/consolidated_proposals.yaml (or prioritized_proposals_moderate_v2.yaml if available) - solution_archetypes/_search_space/substrate_independence/SCORING_PROMPT_TEMPLATE.md (rubric and anchors)

Outputs: - solution_archetypes/_search_space/substrate_independence/scores_final.yaml — canonical reconciled scores - solution_archetypes/_search_space/substrate_independence/_reconciliation_report.md — what got reconciled how - solution_archetypes/_search_space/prioritized_proposals_substrate_weighted.yaml — re-tiered candidates - solution_archetypes/_search_space/SUBSTRATE_WEIGHTED_REPORT.md — top candidates and boost log - Modified prime_abstractions/v2/{slug}.md files (now include substrate_independence block)

Tunable parameters

If you want to re-tune without re-running the dual-pass:

  • α (substrate weight) in substrate_reprioritize.py: lower α reduces substrate's influence on tiering; higher α prioritizes substrate-independent primes more aggressively.
  • MIN_BOOST_TARGET_TIER1: the score threshold for Tier 1. Lowering it makes Tier 1 broader.
  • Tiebreaker thresholds in substrate_reconcile.py: the gap at which programmatic vs. agent reconciliation kicks in (currently 1 vs. 2+).

Known limitations

  1. Single grader-agent class. All scoring agents (and the tiebreaker) are Claude-class. Systematic biases shared across model class cannot be ruled out. A human-expert calibration round on a sample (e.g., 50 primes) would be valuable evidence that the scoring tracks expert judgment.

  2. Anchor-set sensitivity. The 14 anchor examples shape the entire distribution. If those anchors are mis-calibrated, the whole score range is shifted. The anchors were chosen by the project author plus me; an external review of the anchor list would strengthen the methodology.

  3. Static at this snapshot. As the catalog grows, primes' transfer evidence may strengthen (more cross-domain examples added). Re-running after each major content update is intended.

  4. Empty-frontmatter primes excluded. The 11 primes without frontmatter could not be scored. They should either be repaired (frontmatter authored) or formally retired from the catalog.

  5. Dual-pass is not triple-pass. A single tiebreaker agent makes the call on 27 ambiguous primes. For high-stakes downstream uses, those 27 cases should get a third independent score before being treated as authoritative.

What this enables

  • Per-prime priority-of-attention signal: the highest-substrate-independence primes are the ones whose archetype-drafting investment produces the most cross-domain leverage.
  • Catalog quality auditing: low-substrate-independence primes whose claimed "cross-domain" status is exaggerated can be re-classified as domain concepts (with appropriate aliasing) rather than primes.
  • Pipeline-claim sharpening: the "complement to CoT" hypothesis can now be tested per-prime-substrate-band, instead of as one undifferentiated claim. The expectation is that the catalog's edge over bare CoT is largest on problems anchored on Band-1 (composite 4-5) primes and smallest on problems anchored on Band-3 (composite 1-2) primes.
  • Falsifiability for the encyclopedia's central claim: if downstream experiments show no substrate-band-dependence, the substrate-independence framework would be evidence against the catalog's central claim, not for it.