Related Work & References — verified prior-art map¶
Source material for the retrospective's Related Work section. Built from the focused literature scan (2026-05-26) after a citation-verification pass. Each entry carries a relation flag and a verification status.
Verification key. ✅ verified = link/claim checked against the primary source on 2026-05-26. ◑ recognized = pre-cutoff work the verifier can vouch for from training knowledge (ID given where confident). ⚠ unverified = appeared in the scan, plausible, not yet checked against source — do not cite as load-bearing until confirmed.
Relation key. [CONTRADICTS] undercuts a strong claim the project makes · [PRE-EMPTS] someone did a neighboring
version first · [SUPPORTS] backs the project's skeptical/null finding · [WHITESPACE] genuinely unaddressed.
1. Prior catalogs & pattern libraries (positioning arm 0: the corpus itself)¶
- Cyc — Lenat & Guha, Building Large Knowledge-Based Systems, 1989. ◑
[PRE-EMPTS]the "manual concept engineering" ambition; not a transfer-oriented prime-abstraction + archetype corpus. - ConceptNet — Speer, Chin & Havasi, AAAI 2017, arXiv 1612.03975. ✅ Commonsense term/relation graph;
[PRE-EMPTS]"structured concept inventory", not deep-structure transfer units. - WordNet / SUMO / Wikidata — Princeton WordNet; Niles & Pease 2001 (FOIS); Vrandečić & Krötzsch 2014, CACM. ◑ Lexical / formal /
linked-data resources;
[WHITESPACE]for the transfer-oriented corpus design. - TRIZ inventive principles — Altshuller tradition. ◑
[PRE-EMPTS]the cataloged reusable cross-case solution principle idea (closest non-AI antecedent to "solution archetypes"); engineering-contradiction scope, not general prime abstractions. - Software design patterns — Gamma, Helm, Johnson & Vlissides, Design Patterns, 1994. ◑
[PRE-EMPTS]named pattern catalog within one technical domain; not cross-domain abstraction mining. - Systems archetypes — Senge / Meadows lineage. ◑
[PRE-EMPTS]a meaningful subspace (feedback/dynamic motifs across domains); strongest historical pre-emption for the feedback/equilibrium slice. - Ologs — Spivak & Kent, PLoS ONE, 2012. ✅
[PRE-EMPTS]typed-relational modeling/alignment as a formalism;[WHITESPACE]for a curated transfer corpus built on top. (Already engaged in calculus paper §6.) - Analogy databases / benchmarks — Ichien et al. 2020; AnaloBench (Ye et al., EMNLP 2024). ◑/✅ Measurement
resources, not transfer libraries.
[WHITESPACE]. - Positioning: the corpus is not first at concept/pattern/ontology catalogs; the narrow novelty is assembling these instincts into one hand-curated, transfer-oriented prime-abstraction + archetype corpus. Strongest comparisons: TRIZ, systems archetypes, ologs — not Cyc/WordNet.
2. Prompted scaffolds at inference time (positioning arm 1: runtime)¶
- Take a Step Back — Zheng et al. 2023, arXiv 2310.06117 (ICLR 2024). ✅
[CONTRADICTS]a blanket "abstraction-first prompting doesn't help" claim (reported gains on PaLM-2L, GPT-4, Llama2-70B). The cleanest pre-emption of the project's idea; the project tests a stronger/typed/cross-domain version and finds it doesn't pay at the frontier. - LLMs as Analogical Reasoners — Yasunaga et al. 2023, arXiv 2310.01714 (ICLR 2024). ✅
[CONTRADICTS]/[PRE-EMPTS]analogical-transfer-as-prompting; reported gains over CoT. - SELF-DISCOVER — Zhou et al. 2024, arXiv 2402.03620. ✅
[CONTRADICTS](runtime structure helps on some benchmarks, far lighter than this project's pipeline → bottleneck may be cost/rigidity of rich structure). - Least-to-Most / Plan-and-Solve / Graph of Thoughts — Zhou et al. 2022 (2205.10625); Wang et al. ACL 2023
(Plan-and-Solve, 2305.04091); Besta et al. AAAI 2024 (2308.09687). ✅
[CONTRADICTS]for structured-prompting novelty; gains concentrated in symbolic/procedural tasks — narrows where structure helps. - To CoT or not to CoT? — Sprague et al. 2024, arXiv 2409.12183 (ICLR 2025). ✅
[SUPPORTS]— meta-analysis (100+ papers, 20 datasets, 14 models): CoT mainly helps math/logic; near-identical to direct on MMLU except symbolic. Key support for the project's broad-domain null. - Mind Your Step — Liu et al. 2024, arXiv 2410.21333. ✅
[SUPPORTS]— CoT can reduce performance off its symbolic sweet spot (drops up to ~36% in one o1-preview task family). - Revisiting CoT (zero- vs few-shot) — Cheng et al. 2025, arXiv 2506.14641 (EMNLP 2025 Findings). ✅
[SUPPORTS]receding horizon (exemplars help weaker/ older models, not strong Qwen2.5-series); math-heavy preprint — suggestive only. - TITAN — "Task-oriented Prompt Enhancement via Script Generation" — Wang, DaghighFarsoodeh & Pham 2024, arXiv 2409.16418. ✅
[SUPPORTS]receding horizon (bigger gains on GPT-3.5 than GPT-4; minimal/negative return for o1-mini) — domain-limited.
3. Synthetic data, distillation, process supervision (positioning arm 2: training)¶
- Distilling Step-by-Step — Hsieh et al., ACL Findings 2023, arXiv 2305.02301. ✅
[PRE-EMPTS]the broad runtime-trace → training-supervision move (rationales train smaller models to beat larger prompted ones). - SCOTT / Implicit CoT — Wang et al. 2023 (2305.01879); Deng et al. 2023 (2311.01460). ✅
[PRE-EMPTS]the general logic that a scaffold can fail as a runtime display yet work as a training signal. - Let's Verify Step by Step / OmegaPRM — Lightman et al. 2023 (2305.20050); Luo et al. 2024 (2406.06592). ✅
[PRE-EMPTS]process-supervised reasoning training;[WHITESPACE]for non-math prime-abstraction curricula. - VersaPRM — Zeng et al. 2025, arXiv 2502.06737, ICML 2025. ✅
[PRE-EMPTS]multi-domain process supervision (MMLU-Pro Law +7.9%); generic multi-domain data, not an abstraction ontology. - SAL: Self-supervised Analogical Learning — Ben Zhou et al. 2025, arXiv 2502.00996 (Amazon). ✅ closest to
the project's training thesis: trains models to transfer symbolic solutions from solvable→failing cases; +2–20% on
StrategyQA/GSM8K/HotpotQA.
[PRE-EMPTS]the general "train analogical transfer" idea; uses self-generated analogs, not a curated abstraction inventory. - Nemotron-CrossThink — Akter et al. 2025, arXiv 2504.13941 (NVIDIA/CMU/UW). ✅
[PRE-EMPTS]broad multi-domain reasoning post-training (+13.36%); no principled abstraction vocabulary. - Small Models Struggle to Learn from Strong Reasoners — Li et al. 2025, arXiv 2502.12143 (ACL Findings 2025). ✅
[CONTRADICTS]a naive receding-horizon→ distillation move (small-model "learnability gap"): rich/verbose traces don't reliably transfer to small models. - Positioning: arm 2 is crowded, not open. The narrow defensible whitespace: a curriculum whose supervision target is a canonical human-curated abstraction inventory (labels = supervision), which SAL/VersaPRM/ Nemotron do not use.
4. Human transfer & analogical instruction (positioning arm 3: pedagogy)¶
- Perkins & Salomon — "high road / low road" transfer, 1988 Educational Leadership / 1989 Educational Researcher. ◑
[SUPPORTS]— mindful abstraction is the classical prescription for high-road transfer; transfer is not automatic (needs pedagogical design). - Barnett & Ceci — Psychological Bulletin, 2002. ✅
[SUPPORTS]/caution — far-transfer taxonomy; the more dimensions differ, the harder transfer — guards against overclaiming. - Analogical encoding — Gentner, Loewenstein & Thompson, Journal of Educational Psychology, 2003. ✅
[SUPPORTS]— comparison induces schema and improves transfer; the most aligned positive precedent for arm 3. - Teaching by Analogy — Gray & Holyoak, 2021. ◑
[SUPPORTS]guided pedagogy (causal-relational focus, load management) — encouraging for the curriculum arm, not the unguided-prompt arm. - Schema-broadening — Fuchs et al., Elementary School Journal, 2010. ✅
[SUPPORTS]direct schema instruction works in bounded near-transfer settings. - Teaching for near transfer — Jerrim, Lopez-Agudo, Sims & Marcenaro-Gutierrez, Learning and Individual
Differences, 2025. ✅
[CONTRADICTS]— TIMSS 2019, ~280k students: no evidence that abstraction/ schema-oriented maths teaching improves performance on unfamiliar questions. The serious threat to an overconfident arm 3. - Structure-mapping in K-12 — Mix & Gentner, Discover Education, 2026. ⚠
[SUPPORTS](perspective/review): students rarely find deep structure unaided → direct instruction is plausible; not a new far-transfer demo. - Positioning: honest pitch is "serious theory + bounded empirical support, but broad educational transfer remains hard — a dedicated abstraction curriculum is still an open problem," not "direct instruction works."
5. Benchmarks on abstraction & cross-domain transfer (the "is the block real?" evidence)¶
- Emergent analogical reasoning in LLMs — Webb, Holyoak & Lu, Nature Human Behaviour, 2023. ✅
[CONTRADICTS]a simplistic "frontier can't do analogy" story (emergent analogy, rising with scale). - Counterfactual analogy / robustness of analogical reasoning — Lewis & Mitchell, 2024 (2402.08955 + a
robustness companion). ✅
[SUPPORTS]— humans hold, GPT drops sharply on counterfactual variants; apparent analogy may be shallow/surface. - ARN: Analogical Reasoning on Narratives — Sourati et al., TACL 2024. ✅
[SUPPORTS]— LLMs OK on near analogies, far analogies near/below random (GPT-4 below random zero-shot far). Directly mirrors the project's near-vs-far instrument gap. - AnaloBench — Ye et al., EMNLP 2024. ✅
[SUPPORTS]— scaling barely helps long-context/retrieval analogies (the hard part the project's scaffold targets). - Relevant or Random? — Qin et al., ACL Findings 2025. ✅
[SUPPORTS]— random exemplars ≈ "relevant" ones; correctness not relevance drives gains; questions whether LLMs do genuine analogical reasoning. - ARC-AGI-2 / ARC Prize 2025 — Chollet et al., ARC-AGI-2 paper (arXiv 2505.11831); ARC Prize 2025 tech report (arXiv 2601.10904). ✅
[SUPPORTS]— top private-eval 24% (vs 85% human target); frontier still weak on novel abstract reasoning. (Secondary 37.6%/54% commercial figures: confirm against the ARC Prize results-analysis blog before citing.)
6. CoT faithfulness & post-hoc rationalization (positioning arm 4)¶
- LMs Don't Always Say What They Think — Turpin et al., NeurIPS 2023, arXiv 2305.04388. ✅
[SUPPORTS]— CoT can omit the real biasing factor and rationalize; foundational for the project's biasing-cue probe. - Measuring Faithfulness in CoT — Lanham et al. 2023, arXiv 2307.13702. ✅
[SUPPORTS]— the intervene-and- observe paradigm the project's ablation probe instantiates. - CoT Unfaithfulness as Disguised Accuracy — Bentham, Stringham & Marasović, arXiv 2402.14897 (TMLR 2024).
✅
[CONTRADICTS](methodological) — Lanham-style proxies can be confounded with accuracy; so the project's ablation/cue "faithful" reading must not be over-interpreted. - CoT in the Wild Is Not Always Faithful — Arcuschin et al. 2025, arXiv 2503.08679. ✅
[SUPPORTS]— unfaithful CoT on realistic prompts without injected bias. - Reasoning Models Don't Always Say What They Think — Anthropic, 2025, arXiv 2505.05410. ✅
[SUPPORTS]/caution — reasoning models more faithful but still verbalize used hints <20%; faithfulness low, falls on hard tasks. The project's "transparent, cue-naming" result is likely specific to soft social cues.
7. Adjacent architectures & closest competitors¶
- A Path Towards Autonomous Machine Intelligence — LeCun, 2022. ✅
[WHITESPACE]— JEPA/world-models pursue latent predictive structure, not explicit language-level cross-domain abstraction transport — orthogonal. - V-JEPA / V-JEPA 2 — Bardes et al. 2024 (arXiv 2404.08471); Assran et al. 2025 (arXiv 2506.09985). ✅
[WHITESPACE]— abstraction in latent predictive space for physical-world planning; neighboring, not competing. - Neuro-symbolic AI surveys — Wan et al. 2024 (arXiv 2401.01040); Colelough & Regli 2025 (arXiv 2501.05435). ✅ adjacent prior art; mostly logical inference / KG reasoning, not a human-meaningful cross-domain abstraction inventory.
- DIN-Retrieval — Yan et al. 2026, arXiv 2604.05383 (HIT). ✅ close competitor
[PRE-EMPTS]— retrieves structurally-compatible cross-domain demos via domain-invariant neurons (~1.8% avg gain); latent retrieval, not an explicit abstraction corpus. - CoDA — Yan et al. 2026, arXiv 2604.19488. ✅ strongest cross-domain-transfer pre-emption — CoT-guided
latent adapter (MSE+MMD) transfers reasoning across domains zero-shot.
[PRE-EMPTS]the transport idea via latent alignment — and suggests latent alignment may beat explicit textual scaffolds (consistent with the project's null).
Bottom line for positioning¶
Novelty is narrow & combinatorial: a hand-curated cross-domain prime-abstraction + solution-archetype corpus, a typed relational / domain-stripped meta-model layer, an explicit source→target transport step, and a sober blinded finding that this rich runtime scaffold is largely null at the frontier. The biggest threats are the modern prompting (Step-Back, Analogical, SELF-DISCOVER, GoT) and post-training (SAL, VersaPRM, Nemotron-CrossThink, CoDA, DIN-Retrieval) literatures — not old ontologies. The strongest support for the null comes from Sprague, Liu, Lewis & Mitchell, ARN, AnaloBench, ARC-AGI-2, and the faithfulness papers. Arm 1 is therefore best presented as a stringent negative test of an already-popular idea, with arms 2 and 3 as the forward whitespace — each honestly caveated (arm 2 crowded → the curated-ontology-as-supervision angle; arm 3 has Jerrim 2025 against it).