Skip to content

Related Work & References — verified prior-art map

Source material for the retrospective's Related Work section. Built from the focused literature scan (2026-05-26) after a citation-verification pass. Each entry carries a relation flag and a verification status.

Verification key.verified = link/claim checked against the primary source on 2026-05-26. ◑ recognized = pre-cutoff work the verifier can vouch for from training knowledge (ID given where confident). ⚠ unverified = appeared in the scan, plausible, not yet checked against source — do not cite as load-bearing until confirmed.

Relation key. [CONTRADICTS] undercuts a strong claim the project makes · [PRE-EMPTS] someone did a neighboring version first · [SUPPORTS] backs the project's skeptical/null finding · [WHITESPACE] genuinely unaddressed.


1. Prior catalogs & pattern libraries (positioning arm 0: the corpus itself)

  • Cyc — Lenat & Guha, Building Large Knowledge-Based Systems, 1989. ◑ [PRE-EMPTS] the "manual concept engineering" ambition; not a transfer-oriented prime-abstraction + archetype corpus.
  • ConceptNet — Speer, Chin & Havasi, AAAI 2017, arXiv 1612.03975. ✅ Commonsense term/relation graph; [PRE-EMPTS] "structured concept inventory", not deep-structure transfer units.
  • WordNet / SUMO / WikidataPrinceton WordNet; Niles & Pease 2001 (FOIS); Vrandečić & Krötzsch 2014, CACM. ◑ Lexical / formal / linked-data resources; [WHITESPACE] for the transfer-oriented corpus design.
  • TRIZ inventive principles — Altshuller tradition. ◑ [PRE-EMPTS] the cataloged reusable cross-case solution principle idea (closest non-AI antecedent to "solution archetypes"); engineering-contradiction scope, not general prime abstractions.
  • Software design patterns — Gamma, Helm, Johnson & Vlissides, Design Patterns, 1994. ◑ [PRE-EMPTS] named pattern catalog within one technical domain; not cross-domain abstraction mining.
  • Systems archetypes — Senge / Meadows lineage. ◑ [PRE-EMPTS] a meaningful subspace (feedback/dynamic motifs across domains); strongest historical pre-emption for the feedback/equilibrium slice.
  • Ologs — Spivak & Kent, PLoS ONE, 2012. ✅ [PRE-EMPTS] typed-relational modeling/alignment as a formalism; [WHITESPACE] for a curated transfer corpus built on top. (Already engaged in calculus paper §6.)
  • Analogy databases / benchmarks — Ichien et al. 2020; AnaloBench (Ye et al., EMNLP 2024). ◑/✅ Measurement resources, not transfer libraries. [WHITESPACE].
  • Positioning: the corpus is not first at concept/pattern/ontology catalogs; the narrow novelty is assembling these instincts into one hand-curated, transfer-oriented prime-abstraction + archetype corpus. Strongest comparisons: TRIZ, systems archetypes, ologs — not Cyc/WordNet.

2. Prompted scaffolds at inference time (positioning arm 1: runtime)

  • Take a Step Back — Zheng et al. 2023, arXiv 2310.06117 (ICLR 2024). ✅ [CONTRADICTS] a blanket "abstraction-first prompting doesn't help" claim (reported gains on PaLM-2L, GPT-4, Llama2-70B). The cleanest pre-emption of the project's idea; the project tests a stronger/typed/cross-domain version and finds it doesn't pay at the frontier.
  • LLMs as Analogical Reasoners — Yasunaga et al. 2023, arXiv 2310.01714 (ICLR 2024). ✅ [CONTRADICTS]/[PRE-EMPTS] analogical-transfer-as-prompting; reported gains over CoT.
  • SELF-DISCOVER — Zhou et al. 2024, arXiv 2402.03620. ✅ [CONTRADICTS] (runtime structure helps on some benchmarks, far lighter than this project's pipeline → bottleneck may be cost/rigidity of rich structure).
  • Least-to-Most / Plan-and-Solve / Graph of Thoughts — Zhou et al. 2022 (2205.10625); Wang et al. ACL 2023 (Plan-and-Solve, 2305.04091); Besta et al. AAAI 2024 (2308.09687). ✅ [CONTRADICTS] for structured-prompting novelty; gains concentrated in symbolic/procedural tasks — narrows where structure helps.
  • To CoT or not to CoT? — Sprague et al. 2024, arXiv 2409.12183 (ICLR 2025). ✅ [SUPPORTS] — meta-analysis (100+ papers, 20 datasets, 14 models): CoT mainly helps math/logic; near-identical to direct on MMLU except symbolic. Key support for the project's broad-domain null.
  • Mind Your Step — Liu et al. 2024, arXiv 2410.21333. ✅ [SUPPORTS] — CoT can reduce performance off its symbolic sweet spot (drops up to ~36% in one o1-preview task family).
  • Revisiting CoT (zero- vs few-shot) — Cheng et al. 2025, arXiv 2506.14641 (EMNLP 2025 Findings). ✅ [SUPPORTS] receding horizon (exemplars help weaker/ older models, not strong Qwen2.5-series); math-heavy preprint — suggestive only.
  • TITAN — "Task-oriented Prompt Enhancement via Script Generation" — Wang, DaghighFarsoodeh & Pham 2024, arXiv 2409.16418. ✅ [SUPPORTS] receding horizon (bigger gains on GPT-3.5 than GPT-4; minimal/negative return for o1-mini) — domain-limited.

3. Synthetic data, distillation, process supervision (positioning arm 2: training)

  • Distilling Step-by-Step — Hsieh et al., ACL Findings 2023, arXiv 2305.02301. ✅ [PRE-EMPTS] the broad runtime-trace → training-supervision move (rationales train smaller models to beat larger prompted ones).
  • SCOTT / Implicit CoT — Wang et al. 2023 (2305.01879); Deng et al. 2023 (2311.01460). ✅ [PRE-EMPTS] the general logic that a scaffold can fail as a runtime display yet work as a training signal.
  • Let's Verify Step by Step / OmegaPRM — Lightman et al. 2023 (2305.20050); Luo et al. 2024 (2406.06592). ✅ [PRE-EMPTS] process-supervised reasoning training; [WHITESPACE] for non-math prime-abstraction curricula.
  • VersaPRM — Zeng et al. 2025, arXiv 2502.06737, ICML 2025. ✅ [PRE-EMPTS] multi-domain process supervision (MMLU-Pro Law +7.9%); generic multi-domain data, not an abstraction ontology.
  • SAL: Self-supervised Analogical Learning — Ben Zhou et al. 2025, arXiv 2502.00996 (Amazon). ✅ closest to the project's training thesis: trains models to transfer symbolic solutions from solvable→failing cases; +2–20% on StrategyQA/GSM8K/HotpotQA. [PRE-EMPTS] the general "train analogical transfer" idea; uses self-generated analogs, not a curated abstraction inventory.
  • Nemotron-CrossThink — Akter et al. 2025, arXiv 2504.13941 (NVIDIA/CMU/UW). ✅ [PRE-EMPTS] broad multi-domain reasoning post-training (+13.36%); no principled abstraction vocabulary.
  • Small Models Struggle to Learn from Strong Reasoners — Li et al. 2025, arXiv 2502.12143 (ACL Findings 2025). ✅ [CONTRADICTS] a naive receding-horizon→ distillation move (small-model "learnability gap"): rich/verbose traces don't reliably transfer to small models.
  • Positioning: arm 2 is crowded, not open. The narrow defensible whitespace: a curriculum whose supervision target is a canonical human-curated abstraction inventory (labels = supervision), which SAL/VersaPRM/ Nemotron do not use.

4. Human transfer & analogical instruction (positioning arm 3: pedagogy)

  • Perkins & Salomon — "high road / low road" transfer, 1988 Educational Leadership / 1989 Educational Researcher. ◑ [SUPPORTS] — mindful abstraction is the classical prescription for high-road transfer; transfer is not automatic (needs pedagogical design).
  • Barnett & CeciPsychological Bulletin, 2002. ✅ [SUPPORTS]/caution — far-transfer taxonomy; the more dimensions differ, the harder transfer — guards against overclaiming.
  • Analogical encodingGentner, Loewenstein & Thompson, Journal of Educational Psychology, 2003. ✅ [SUPPORTS] — comparison induces schema and improves transfer; the most aligned positive precedent for arm 3.
  • Teaching by Analogy — Gray & Holyoak, 2021. ◑ [SUPPORTS] guided pedagogy (causal-relational focus, load management) — encouraging for the curriculum arm, not the unguided-prompt arm.
  • Schema-broadeningFuchs et al., Elementary School Journal, 2010. ✅ [SUPPORTS] direct schema instruction works in bounded near-transfer settings.
  • Teaching for near transfer — Jerrim, Lopez-Agudo, Sims & Marcenaro-Gutierrez, Learning and Individual Differences, 2025. ✅ [CONTRADICTS] — TIMSS 2019, ~280k students: no evidence that abstraction/ schema-oriented maths teaching improves performance on unfamiliar questions. The serious threat to an overconfident arm 3.
  • Structure-mapping in K-12 — Mix & Gentner, Discover Education, 2026. ⚠ [SUPPORTS] (perspective/review): students rarely find deep structure unaided → direct instruction is plausible; not a new far-transfer demo.
  • Positioning: honest pitch is "serious theory + bounded empirical support, but broad educational transfer remains hard — a dedicated abstraction curriculum is still an open problem," not "direct instruction works."

5. Benchmarks on abstraction & cross-domain transfer (the "is the block real?" evidence)

  • Emergent analogical reasoning in LLMs — Webb, Holyoak & Lu, Nature Human Behaviour, 2023. ✅ [CONTRADICTS] a simplistic "frontier can't do analogy" story (emergent analogy, rising with scale).
  • Counterfactual analogy / robustness of analogical reasoning — Lewis & Mitchell, 2024 (2402.08955 + a robustness companion). ✅ [SUPPORTS] — humans hold, GPT drops sharply on counterfactual variants; apparent analogy may be shallow/surface.
  • ARN: Analogical Reasoning on NarrativesSourati et al., TACL 2024. ✅ [SUPPORTS] — LLMs OK on near analogies, far analogies near/below random (GPT-4 below random zero-shot far). Directly mirrors the project's near-vs-far instrument gap.
  • AnaloBenchYe et al., EMNLP 2024. ✅ [SUPPORTS] — scaling barely helps long-context/retrieval analogies (the hard part the project's scaffold targets).
  • Relevant or Random? — Qin et al., ACL Findings 2025. ✅ [SUPPORTS] — random exemplars ≈ "relevant" ones; correctness not relevance drives gains; questions whether LLMs do genuine analogical reasoning.
  • ARC-AGI-2 / ARC Prize 2025 — Chollet et al., ARC-AGI-2 paper (arXiv 2505.11831); ARC Prize 2025 tech report (arXiv 2601.10904). ✅ [SUPPORTS] — top private-eval 24% (vs 85% human target); frontier still weak on novel abstract reasoning. (Secondary 37.6%/54% commercial figures: confirm against the ARC Prize results-analysis blog before citing.)

6. CoT faithfulness & post-hoc rationalization (positioning arm 4)

  • LMs Don't Always Say What They Think — Turpin et al., NeurIPS 2023, arXiv 2305.04388. ✅ [SUPPORTS] — CoT can omit the real biasing factor and rationalize; foundational for the project's biasing-cue probe.
  • Measuring Faithfulness in CoT — Lanham et al. 2023, arXiv 2307.13702. ✅ [SUPPORTS] — the intervene-and- observe paradigm the project's ablation probe instantiates.
  • CoT Unfaithfulness as Disguised Accuracy — Bentham, Stringham & Marasović, arXiv 2402.14897 (TMLR 2024). ✅ [CONTRADICTS] (methodological) — Lanham-style proxies can be confounded with accuracy; so the project's ablation/cue "faithful" reading must not be over-interpreted.
  • CoT in the Wild Is Not Always Faithful — Arcuschin et al. 2025, arXiv 2503.08679. ✅ [SUPPORTS] — unfaithful CoT on realistic prompts without injected bias.
  • Reasoning Models Don't Always Say What They Think — Anthropic, 2025, arXiv 2505.05410. ✅ [SUPPORTS]/caution — reasoning models more faithful but still verbalize used hints <20%; faithfulness low, falls on hard tasks. The project's "transparent, cue-naming" result is likely specific to soft social cues.

7. Adjacent architectures & closest competitors

  • A Path Towards Autonomous Machine Intelligence — LeCun, 2022. ✅ [WHITESPACE] — JEPA/world-models pursue latent predictive structure, not explicit language-level cross-domain abstraction transport — orthogonal.
  • V-JEPA / V-JEPA 2 — Bardes et al. 2024 (arXiv 2404.08471); Assran et al. 2025 (arXiv 2506.09985). ✅ [WHITESPACE] — abstraction in latent predictive space for physical-world planning; neighboring, not competing.
  • Neuro-symbolic AI surveys — Wan et al. 2024 (arXiv 2401.01040); Colelough & Regli 2025 (arXiv 2501.05435). ✅ adjacent prior art; mostly logical inference / KG reasoning, not a human-meaningful cross-domain abstraction inventory.
  • DIN-Retrieval — Yan et al. 2026, arXiv 2604.05383 (HIT). ✅ close competitor [PRE-EMPTS] — retrieves structurally-compatible cross-domain demos via domain-invariant neurons (~1.8% avg gain); latent retrieval, not an explicit abstraction corpus.
  • CoDA — Yan et al. 2026, arXiv 2604.19488. ✅ strongest cross-domain-transfer pre-emption — CoT-guided latent adapter (MSE+MMD) transfers reasoning across domains zero-shot. [PRE-EMPTS] the transport idea via latent alignment — and suggests latent alignment may beat explicit textual scaffolds (consistent with the project's null).

Bottom line for positioning

Novelty is narrow & combinatorial: a hand-curated cross-domain prime-abstraction + solution-archetype corpus, a typed relational / domain-stripped meta-model layer, an explicit source→target transport step, and a sober blinded finding that this rich runtime scaffold is largely null at the frontier. The biggest threats are the modern prompting (Step-Back, Analogical, SELF-DISCOVER, GoT) and post-training (SAL, VersaPRM, Nemotron-CrossThink, CoDA, DIN-Retrieval) literatures — not old ontologies. The strongest support for the null comes from Sprague, Liu, Lewis & Mitchell, ARN, AnaloBench, ARC-AGI-2, and the faithfulness papers. Arm 1 is therefore best presented as a stringent negative test of an already-popular idea, with arms 2 and 3 as the forward whitespace — each honestly caveated (arm 2 crowded → the curated-ontology-as-supervision angle; arm 3 has Jerrim 2025 against it).