Teaching Abstract Reasoning Directly: An Exploratory Proposal¶

Prologue¶

We teach formal logic. We teach mathematics. We teach the methods of historical inquiry, the conventions of literary analysis, the experimental protocols of the natural sciences. Across the K–12 curriculum and into university, we devote substantial time to teaching reasoning within domains — the way a chemist reasons about reactions, the way a historian reasons about evidence, the way a programmer reasons about computation.

We do not, in any organized way, teach the cognitive operation that lets a student see that a chemical equilibrium and an ecological equilibrium and a market equilibrium are the same kind of thing — that the structural pattern at work in one case is operative in the others, and that knowing one well gives some usable purchase on the others. That cognitive operation, in the analogical-reasoning tradition that runs through Polya, Hofstadter, Gentner, Holyoak, and others, involves the perception of structural similarity across domains and the licensing of inferences from one to another.^[1]^[2]^[3] Cognitive scientists differ on the underlying mechanisms; the phenomenon itself is robust, and several decades of empirical work indicate that explicit instruction in cross-domain mapping can improve transfer in laboratory settings.^[4]^[5]^[6] The question this chapter takes up is whether such instruction can be scaled into a sustained curriculum.

The standard educational story is that abstract reasoning emerges as a byproduct of sustained engagement with multiple domains. Read enough physics and enough biology and enough history, and eventually the cross-domain pattern recognition begins to develop on its own. For some students this seems to work. For many it does not — or it works weakly, producing a vague sense of "everything connects" without the structural precision that licenses real inference. This is not surprising; the empirical literature on transfer is famously sobering, and explicit attempts to teach general thinking skills have a mixed track record.^[7]^[8]^[9] The dominant lesson of that literature is not that transfer is impossible but that transfer is harder than its proponents have historically assumed, and that getting transfer to occur reliably requires specific instructional conditions that ordinary curricula do not typically supply.

Several developments in the last few years make it worth revisiting the question of whether direct instruction in cross-domain abstract reasoning could be made workable. First, the Encyclopedia of Abstractions — the corpus of which this chapter is part — provides a structured catalog of patterns that recur across domains, with cross-domain examples and structural signatures attached to each one. Whether the specific patterns it catalogs are the right ones, and whether the catalog as a whole carves the territory at the joints, are open questions; the catalog is offered as a working hypothesis rather than as a final inventory. Second, large language models have matured to the point where they can perform certain cognitively demanding sub-operations of cross-domain reasoning (surfacing candidate patterns across unfamiliar domains; helping externalize a situation as a relational structure) on demand, in ways that were not technically feasible even five years ago. Third, the From Abstractions to Interventions layer of this work proposes an explicit nine-step procedure for cross-domain reasoning that can be taught as a procedure, practiced as a procedure, and (with effort) assessed as a procedure.

Combined, these three developments suggest that an explicit abstract-reasoning curriculum is worth seriously exploring. The chapter is not a claim that such a curriculum will succeed, nor a claim that the components individually represent unprecedented breakthroughs. Each component sits in a long lineage of prior attempts, surveyed below. What is new is mostly the integration: a catalog at this scale, paired with an explicit procedure, with language-model scaffolding that can carry the most cognitively demanding sub-operations. Whether this integration produces measurably better cross-domain transfer than existing approaches is an empirical question that no one has yet answered. The chapter argues that the question has now become tractable in a way it previously was not, and that asking it carefully is a research program worth pursuing.

The remainder of the chapter has three jobs: situating the proposal within the history of similar attempts (Leibniz's universal characteristic, Adler's Propædia, Senge's system archetypes, Alexander's pattern languages); engaging seriously with the transfer problem and what the cognitive-science literature actually licenses about the prospects for general-skills instruction; and sketching the operational shape of a curriculum that takes the Encyclopedia as substrate and the pipeline as procedure, with the candor about what would and would not work that the prior literature warrants.

What "abstract reasoning" might mean¶

The phrase abstract reasoning is used loosely in popular discourse to mean any thinking that is not concrete, immediate, or sensorimotor. In educational and cognitive-science contexts, it has a sharper but contested meaning. This section tries to disambiguate without claiming more settledness than the field actually contains.

The cognitive-science literature on analogical reasoning treats the phenomenon as the perception of structural correspondence between situations that share little or no surface similarity, paired with the licensing of inferences from one to another. Three components figure in most accounts. First, structural correspondence: the mapping operates on roles in relations rather than on surface features. Second, cross-domain reach: the correspondence is most cognitively useful when the situations come from different domains, since within- domain analogies often reduce to instances of explicit domain principles. Third, inference licensing: the cognitive operation must generate new inferences about one situation through the correspondence.^[10]^[11]^[1]

These components are not disputed in their broad outline. What is disputed is the underlying mechanism. Several frameworks compete.

Gentner's structure-mapping theory formalizes the cognitive operation as the maximal alignment of relational predicates between two situations, with surface attributes weighted lightly compared to relational structure.^[2]^[11] This framework has substantial empirical support, particularly for laboratory-tractable analogical reasoning tasks, and it is the model that most directly motivates the pipeline approach proposed in this chapter. It is also one model among several. Gentner herself has been careful about its scope, distinguishing it from accounts that treat analogy as a process of constraint satisfaction (Holyoak's multiconstraint theory^[3]) or as a process driven by similarity computation in distributed representations.^[12] None of these frameworks is settled; their empirical predictions overlap substantially, and the field continues to debate which best captures the phenomenon.

Other accounts treat abstract reasoning as fundamentally embodied rather than propositional. Lakoff and Johnson's program on conceptual metaphor argues that abstract thought is largely structured by metaphorical projections from sensorimotor experience.^[13]^[14] Barsalou's grounded-cognition framework develops the empirical case further, with substantial evidence that even highly abstract concepts retain perceptual residue.^[15] On these accounts, what looks like propositional structure-mapping may be the visible surface of an underlying embodied-simulation process; the propositional description is correct as a model of what the reasoner produces but incomplete as a model of how the reasoner produces it.

Yet other accounts emphasize intuitive pattern recognition as the primary cognitive operation underlying expert cross-domain reasoning. Klein's recognition-primed decision model documents this profile in naturalistic-decision-making contexts;^[16] Kahneman's dual- process framework places it within the System 1 / System 2 distinction.^[17]^[18] On these accounts, explicit pipeline-like procedures may be useful pedagogical scaffolding for novices but are not the cognitive operation that expert reasoners themselves perform.

The chapter takes the position that these accounts are not strictly incompatible. They emphasize different aspects of what is plausibly a heterogeneous phenomenon. Mature abstract reasoners likely use a mix of structural mapping, embodied simulation, and pattern recognition, with the mix varying by problem type, by individual cognitive style, and by stage of expertise. The pipeline-based curriculum proposed here primarily exercises the propositional/structural mode because that mode is the most directly teachable through explicit externalized procedure. It does not assume that the propositional mode is the canonical or only form of abstract reasoning; section 7 treats the alternative modes more carefully, and the curriculum's integration with visual, embodied, and intuitive instruction is part of the implementation question.

It is worth being explicit, as a final disambiguation, about what the chapter does not mean by abstract reasoning. It does not mean formal logic, which operates on propositions of specified syntactic form. It does not mean pure deduction, which is forced rather than constructed. It does not mean visual pattern recognition operating purely on perceptual features. And it does not mean a single cognitive operation in any homogeneous sense. The phrase, as used here, refers to a family of cognitive operations that share the generic shape of perceiving structure-preserving correspondence across surface-different situations and using the correspondence to license inferences. The chapter's argument applies more strongly to the propositional members of this family and more weakly to the others; the curriculum proposes one approach to teaching what is plausibly one mode within a heterogeneous capacity.

Why direct instruction has been difficult¶

The reasons abstract reasoning has remained outside direct instruction are partly structural and partly principled. Some reasons reflect historical and infrastructural barriers that are now diminishing. Others reflect genuine difficulties about the nature of the phenomenon that no curriculum can wave away. Both kinds of reasons matter for assessing the present proposal honestly.

The structural and infrastructural barriers include vocabulary scarcity, example scarcity, scaffolding scarcity, and the absence of operational tooling. Until recently, direct instruction in cross- domain abstract reasoning lacked a stable vocabulary of structural patterns; lacked a catalog of cross-domain examples organized for teaching rather than for academic argument; lacked graded exercises of the kind cognitive science has shown to support analogical learning;^[5]^[6] and lacked the operational tooling that would have made instruction practical at sufficient scale. Each of these barriers has been substantial. Each is, to varying degrees, now diminishing. The Encyclopedia and the pipeline are part of that diminishment, as are the language-model capabilities that allow individual practitioners to generate and evaluate large quantities of instructional material that previously required institutional investment.

But there are also more principled difficulties that no operational tooling can dissolve. Three deserve particular attention.

The first is the tacit-knowledge problem. Experts who reason abstractly across domains often cannot articulate what they do. Polanyi's argument that tacit knowing is constitutively prior to explicit knowing^[19] applies in full force here, and it has practical pedagogical consequences: the expert may produce brilliant cross-domain insights but be unable to teach the procedure that generated them. The pipeline architecture is one attempt to address this by externalizing the procedure into explicit artifacts. Whether the externalization captures what the expert is actually doing, or whether it captures only the visible surface while missing the cognitive substance, is itself an open question. Polanyi's view, seriously held, is that the latter is more likely than the former.

The second is the transfer problem. The empirical literature on whether explicit instruction in general thinking skills produces transferable capability is, to a first approximation, discouraging. This deserves an extended treatment of its own, given below. Curricula that have promised transferable thinking skills have historically delivered limited transfer in practice, and the present proposal must be honest that it does not yet have evidence to distinguish itself from those prior attempts.

The third is the primeness problem. The catalog the curriculum relies on calls its ~650 entries "primes," suggesting they are atomic in some principled sense — irreducible, mutually distinct, covering the structural-pattern space. The catalog does not in fact defend this claim formally. Primeness, as used in the Encyclopedia, is operationally rather than logically defined: a pattern is treated as prime if it recurs across multiple domains, has a discernible structural signature, and resists reduction to other patterns already in the catalog. This is a working criterion, not a proof. Whether the catalog's patterns are actually distinct, whether some are specializations of others that should be merged, and whether the collection covers the structural-pattern space adequately are questions the catalog does not settle. They are testable through use: deployment will surface gaps, redundancies, and miscategorizations, and the catalog is in any case treated as a versioned artifact subject to revision rather than a fixed canon.

These three principled difficulties — tacit knowledge, transfer failure, and primeness uncertainty — are not disqualifying. But they are reasons for genuine caution about the strongest forms of the proposal, and they shape how the chapter frames what is being claimed. The claim is not that the curriculum will produce reliable cross- domain reasoning capability in students. The claim is that the curriculum is now coherent enough to be tested, in a way it has not been before, and that the testing is likely to produce useful information regardless of the result.

Historical precedents and their limits¶

The aspiration to systematize abstract thought has a long lineage, and any current proposal sits within it whether or not the proposal acknowledges the prior history. The chapter does best by acknowledging it explicitly, since the lineage contains both cautionary tales and constructive precedents. Five precedents in particular deserve examination.

Leibniz's characteristica universalis

In the 1660s and 1670s Leibniz proposed a universal characteristic — a system of symbols that would represent all concepts unambiguously and would allow disputes to be settled by calculation rather than argument.^[20]^[21] The aspiration was operational: to externalize abstract thought into a manipulable symbolic system in which inference could be mechanized. Leibniz never built it. His attempts produced extensive notebook material on combinatorial reasoning and an early calculator, but the universal characteristic itself remained programmatic. Subsequent work in the tradition (Frege's Begriffsschrift; the logical empiricism of the Vienna Circle; the philosophical project that culminated in Russell and Whitehead's Principia Mathematica) achieved formal logic and mathematical foundations, which are real and significant, but did not deliver the broader operational mechanization Leibniz had imagined. The reasons are debated; the failure to deliver, despite significant intellectual investment by significant intellects, is not.

What this teaches: programmatic ambitions for systematizing abstract thought have repeatedly produced narrower successes than the original ambition envisioned. Formal logic is the residue of Leibniz's characteristica; it is enormously valuable and not what Leibniz was after. The present proposal should expect a similar pattern. Whatever it accomplishes is likely to be a narrower thing than the prologue suggests, and the narrowing will probably be along dimensions that are not yet visible.

Carnap's Aufbau and the encyclopedia movement

Rudolf Carnap's Der logische Aufbau der Welt (1928) attempted to reconstruct the world's concepts on a unified logical foundation, using a system of constitutive definitions that built complex concepts from elementary perceptual experiences.^[22] The project was a high-water mark of logical empiricism and was closely related to the broader encyclopedia movement — the ambition, realized in part by the International Encyclopedia of Unified Science, to integrate the sciences through shared logical foundations.^[23] The technical project ran into difficulties (most famously the problem of theoretical terms and the underdetermination of theory by observation) that contributed to logical empiricism's mid-century retreat. The ambition itself — that there is a unified structure beneath disciplinary diversity — survived in attenuated forms (general systems theory; cybernetics; complexity science) but never quite recovered the optimism of the Aufbau period.

What this teaches: cross-disciplinary unification projects tend to underestimate domain-specific knowledge that does not reduce to the proposed unifying scheme. The Encyclopedia of Abstractions is not making strong reductive claims — it is offering a vocabulary of recurring patterns, not a foundational reduction — but the temptation to over-claim is real, and the lineage suggests caution.

Adler's Propædia

Mortimer Adler's Propædia, published as part of the 1974 fifteenth edition of Encyclopædia Britannica, is the closest direct precedent to the Encyclopedia of Abstractions.^[24] The Propædia attempted an "outline of knowledge" — a systematic conceptual organization of the territory the Britannica covered, with an elaborate hierarchical structure of categories and sub-categories intended to make the encyclopedia browsable by topic structure rather than only by alphabetical entry. Adler was explicit that his structure was meant to support cross-domain thinking and to teach students how knowledge is organized.

The Propædia is a useful precedent in two ways. It demonstrates that systematic conceptual organization at encyclopedia scale is feasible. It also demonstrates that systematic conceptual organization, by itself, is not curriculum. Adler's Propædia was admired but little used as a teaching tool; students did not read the Britannica through Adler's outline, and educators did not build curricula on it. The pieces that turn an organized substrate into operational teaching — graded exercises, scaffolded procedures, assessment infrastructure, teacher training — were not part of the Propædia offering, and those pieces, as it turned out, were necessary.

What this teaches: a structured catalog by itself does not produce educational outcomes. The catalog is necessary but radically insufficient. The present proposal should be honest that the Encyclopedia's contribution is, at best, the substrate; the operational pedagogy, the assessment infrastructure, and the teacher preparation are separate layers, and each represents substantial work not yet done.

Senge's system archetypes and Alexander's pattern languages

Two more recent precedents are worth noting because they are closer in spirit to the present proposal than the older lineage. Peter Senge's system archetypes^[25] catalog about a dozen recurring structural patterns in dynamic systems (limits to growth; tragedy of the commons; shifting the burden; success to the successful) and provide explicit guidance about how to recognize and intervene in each. Christopher Alexander's pattern languages in architecture^[26] catalog hundreds of recurring patterns in built environments, with explicit specifications of when each pattern applies, what problems it solves, and how it composes with other patterns.

Both works have had real influence. Senge's archetypes are widely taught in management education and have produced visible changes in how managers reason about organizational systems. Alexander's pattern languages directly inspired the design-patterns movement in software engineering^[27] and have indirect influence in urban planning, education design, and adjacent fields. Both have also been consistently critiqued: Senge's archetypes are accused of being too few and too general (managers learn to recognize the patterns but the patterns underdetermine intervention); Alexander's pattern languages are accused of being so domain-specific that the abstract structure of pattern itself does not transfer across domains. Both critiques contain truth.

What this teaches: structured catalogs of recurring patterns can have real educational and operational impact, but the impact is typically more modest than the catalog's authors hoped. The catalogs become reference materials and shared vocabulary; they do not become, in themselves, operational reasoning systems. The Encyclopedia of Abstractions should expect a similar trajectory in its current form. What the pipeline architecture adds — explicit operational procedure, LLM scaffolding, structured artifacts at each step — is the part of the proposal that has no direct precedent at scale. Whether that addition produces qualitatively different educational outcomes than the catalog-only precedents achieved is, again, the empirical question.

What the lineage suggests

Across these five precedents a few stable patterns emerge.

First, programmatic ambitions to systematize abstract thought have consistently delivered narrower successes than originally envisioned, and the narrowing has typically not been predictable in advance. This argues for modest framing of present claims.

Second, structured catalogs can be valuable as shared vocabulary and reference material even when they do not function as their authors intended. The Encyclopedia is plausibly worthwhile in this minimal sense regardless of whether the curriculum proposal succeeds.

Third, the gap between substrate and curriculum is wider than catalog authors typically acknowledge. Adler's Propædia is the cautionary case here. The pipeline, the scaffolding, the assessment, and the teacher preparation each represent work that the substrate does not itself accomplish.

Fourth, the cases where structured catalogs have had operational impact (Senge in management, Alexander in software via design patterns) have all been cases where a tightly specified operational procedure for using the catalog accompanied the catalog itself. The pipeline architecture is meant to play that role for the Encyclopedia; whether it does so effectively is the next question.

The proposal sits in this lineage and is best read as a contemporary contribution to it rather than as a categorically new departure. The infrastructure available now (LLM scaffolding; large-scale catalog construction at modest cost) is genuinely different from what was available to Leibniz, Carnap, Adler, Senge, or Alexander. Whether the infrastructure changes the outcomes enough to escape the lineage's characteristic limits is precisely what is uncertain.

The transfer problem and what we know¶

The most damaging critique any explicit-instruction proposal in abstract reasoning faces is the transfer problem: the empirical finding that explicit instruction in general thinking skills frequently fails to produce capability that transfers to new domains beyond the trained context. The chapter must engage this critique seriously rather than waving at it. This section attempts the engagement.

The classical transfer literature begins with Edward Thorndike's identical-elements theory (1903): transfer occurs to the extent that two situations share specific elements, and there is no general faculty of "transfer" that operates beyond shared specifics.^[28] Thorndike's empirical work, focused on the question of whether studying Latin makes one a better thinker generally (a popular educational claim of the period), found that it did not — students who studied Latin showed no general transfer to other domains, only narrow transfer to specifically related tasks. The conclusion was that the "mental discipline" tradition in education, which had argued for studying difficult subjects as a way of training general intellectual capacity, was empirically unsupported.

A century of subsequent research has complicated the picture but has not overturned Thorndike's basic finding. Robert Glaser's syntheses of expertise research^[29] showed that expert reasoning is substantially domain-specific: chess masters have superb chess intuition that does not transfer to bridge; physicists have powerful physics intuition that does not transfer to economics. The information-processing tradition (Newell-Simon-style problem-solving research) similarly found that general problem-solving heuristics are real but their power is heavily mediated by domain-specific knowledge schemas.^[30] David Perkins and Gabriel Salomon's extensive work on transfer concludes that transfer is real but "hugged" (close transfer to similar contexts) is much more reliable than "bridged" (far transfer to dissimilar contexts), and that bridged transfer requires very specific instructional conditions that ordinary curricula do not provide.^[8]^[31]

In the specific domain of critical-thinking instruction, the picture is similarly cautionary. Daniel Willingham's review of the empirical literature on critical-thinking instruction reaches the conclusion that critical-thinking skills are substantially domain-bound: a student who learns to reason critically about historical evidence does not thereby become better at reasoning critically about scientific evidence, except weakly.^[9] Diane Halpern's own research, while more optimistic about the trainability of critical thinking, acknowledges that transfer requires explicit and sustained instruction over time and that brief interventions do not produce durable cross-context capability.^[32]

This is the empirical context the present proposal must accept. The default expectation, given the literature, should be that explicit instruction in cross-domain abstract reasoning will produce some transfer within hugged contexts (situations close to the trained examples) and limited transfer to bridged contexts (situations substantively different from the trained examples). It should not be expected to produce a free-floating "abstract-reasoning ability" that applies anywhere the student needs it.

Within this default, however, several conditions have been identified that improve transfer reliability when they are present. The proposal takes these seriously.

The first condition is multiple-source comparison. Students who see two or more analogous examples and are asked to extract the underlying shared structure transfer more reliably than students who see a single example or who see multiple examples without comparison prompts.^[5]^[6] The Encyclopedia's cross-domain example collections are organized around this principle: each prime is illustrated with multiple domain instances precisely so that comparison is possible.

The second condition is explicit schema induction. Students who are asked to produce an abstract characterization of the underlying pattern (rather than merely to solve a transfer problem) transfer more reliably.^[5] The pipeline's meta-model step is in effect an instance of explicit schema induction, with the externalized meta-model serving as the abstract characterization.

The third condition is deep processing during initial learning. Students who engage with the structural features of an example during initial study transfer better than students who engage only with surface features.^[33] Curriculum materials that direct attention to structural features during the encoding step are more likely to support later transfer than those that allow students to encode at a surface level.

The fourth condition is spaced practice with retrieval. Students who revisit material at increasing intervals retain it better than students who experience massed practice.^[34]^[35] A curriculum that revisits primes from earlier units as the catalog grows, with each revisit at increased depth, exploits this finding directly.

The fifth condition, perhaps the most pertinent to the present proposal, is what Bransford and Schwartz call preparation for future learning.^[36] On their account, transfer is better measured not by performance on a transfer task in isolation but by the rate at which a student can learn to perform a new task when given resources. Students who have learned to abstract structural patterns may be no faster than untrained peers at solving a single transfer problem cold, but they may be substantially faster at learning a new domain when introduced to it. This is a different account of what transfer means and what successful transfer instruction would look like, and it is more sympathetic to the kind of broad structural literacy the present proposal aims at.

The honest synthesis of this literature, applied to the present proposal, is roughly:

The proposal is unlikely to produce a free-floating general capacity to reason abstractly across all domains. The transfer literature gives no warrant for that hope.

It might produce reliable transfer within hugged contexts, where the trained patterns recur in situations close to the trained examples. This is a real but modest achievement.

It might, more interestingly, produce preparation for future learning: students who have worked through the curriculum may learn new domains faster, frame new problems more productively, and recognize structural patterns in unfamiliar territory more readily. This is the strongest plausible benefit, and it is testable through specifically designed assessment instruments.

It might produce structural vocabulary that is useful in itself, independent of transfer. A student who can name and recognize feedback, equilibrium, bottleneck, and emergence has analytical resources that students without this vocabulary lack, even if their abstract-reasoning performance on novel problems is not measurably superior. Vocabulary is not nothing; substantial parts of intellectual capability consist in having the words for things.

The proposal, in its honest form, is that some combination of these modest effects is plausible, and that the combination might be worthwhile educationally even if it falls short of the transfer literature's most demanding criteria. The proposal should not be read as overturning the transfer literature. It should be read as engaging the literature seriously and proposing an instructional approach that takes the literature's identified conditions for successful transfer (multiple-source comparison, schema induction, structural encoding, spaced practice, preparation for future learning) into account explicitly.

A separate but related concern is differentiation from existing critical-thinking curricula. The Paul-Elder framework of intellectual standards;^[37] Halpern's Thought and Knowledge textbook tradition;^[32] Ennis's critical-thinking dispositions;^[38] McMaster University's Problem-Based Learning approach^[39] — these and related approaches have been deployed for decades, with mixed results. The present proposal differs from these in three identifiable ways.

First, vocabulary. The existing critical-thinking curricula provide relatively few named structural patterns; they emphasize procedural norms (consider alternatives; ask about evidence; identify assumptions) rather than a catalog of patterns to recognize. The Encyclopedia provides explicit vocabulary at scale.

Second, cross-domain rather than discipline-specific. Existing critical-thinking curricula tend to teach discipline-bound reasoning (scientific reasoning; historical reasoning; literary analysis) with the hope that transfer will occur. The proposal here is explicitly cross-domain: the trained operation is structural mapping across domains, with multiple domains presented in proximity.

Third, AI-scaffolded procedure. Existing critical-thinking curricula do not have access to language-model assistance; the present proposal incorporates LLM scaffolding into the curriculum design from the start.

Whether these three differences are enough to produce different empirical outcomes is the open question. They might not be. The prior critical-thinking literature is sobering enough that one should not expect the proposal to dramatically outperform it. The proposal is best read as an attempt to apply the cognitive-science literature on transfer more systematically than prior critical- thinking curricula have, with the catalog and pipeline as the mechanisms by which the application becomes operational.

The Encyclopedia as one possible substrate¶

A curriculum substrate is the material from which a curriculum is composed: vocabulary, examples, structural relationships among concepts, entry points at varying levels of difficulty. The Encyclopedia of Abstractions is offered here as one possible substrate, not as the only or canonical one. Other substrates could serve similar curricular purposes (Senge's archetypes for management; Alexander's pattern languages for architecture; the systems-thinking tradition more broadly), and a mature pedagogical research program would compare across substrates rather than assuming any one is the right one.

The catalog's entries as teaching vocabulary

Each entry in the Encyclopedia is a candidate vocabulary item: a named pattern that the catalog claims recurs across multiple domains. The current catalog covers conceptual ground spanning elementary patterns (cause and effect, balance, pattern, containment), intermediate ones (equilibrium, feedback, bottleneck, emergence), and highly abstract ones (isomorphism, renormalization, universality in critical phenomena). This range matters for curriculum design: students at different developmental stages can engage with different layers of the catalog without leaving the substrate.

Whether the catalog's current entries are the right set is, as already noted, an open question. The catalog is constructed by an iterative process that is operationally workable but not formally principled. A pattern is treated as a candidate prime if it recurs across multiple domains and has a discernible structural signature; it is admitted to the catalog if it survives review for distinctness from existing entries. This procedure has produced a usable catalog. It has not produced a proof that the entries are atomic, mutually distinct, or jointly sufficient. They probably are not. Some entries are likely specializations of others that should be merged; some may be missing that future iterations will surface; some categorizations will turn out to be wrong.

For curricular purposes the underdetermination is less damaging than it might seem. A student learning the vocabulary is learning a working vocabulary rather than a canonical inventory. The vocabulary becomes useful through use; the canonical inventory, if there is one, is constructed retrospectively after extensive deployment has surfaced its actual structure. This is how most vocabulary develops in intellectual fields. The catalog should be presented to students as a working resource that they may eventually contribute to revising, rather than as a fixed canon.

v1 and v2 as developmental scaffolding

Each entry exists in two forms in the corpus. The v1 baseline is a short entry — typically 200–400 words — written for an introductory reader. The v2 density-pass is a longer entry — typically 3,500–5,000 words — that develops the pattern's structural signature in detail. For curriculum purposes this dual treatment functions as Bruner's spiral curriculum^[40] made operational: a student encounters a pattern first in its v1 form, returns later to the v2 form as understanding deepens, and may return again at a still later stage to the cross-references and category structures that connect this pattern to others.

Spiral curricula have empirical support across mathematics, science, and language education,^[41] and the spiral structure is consistent with the cognitive-science findings on spaced retrieval and depth-of-processing cited above. The Encyclopedia's v1/v2 split makes the spiral structurally explicit, but the spiral structure itself is the durable contribution; alternative substrates with similar layered treatment would serve curricular purposes equally well.

Categories and ontology as structural progression

The corpus's categorization layer (Structural, Relational, Dynamic, Cognitive, Social, etc.) provides one defensible progression order for introducing patterns. Structural patterns (containment, scale, proportion, balance) are perceptually grounded and accessible to younger students. Relational patterns (cause-effect, dependence, isomorphism) require facility with multi-element systems. Dynamic patterns (feedback, equilibrium, emergence, phase transition) require facility with change over time. Cognitive and social patterns (cognitive bias, agency problem, common-pool resource) require facility with mental and institutional models.

This progression is consistent with developmental findings in cognitive psychology, though the developmental literature itself has been substantially revised since Piaget. The revisionist literature (modern accounts of children's theory-of-mind, of causal reasoning, of mathematical cognition) has shown that children develop many abstract capacities earlier and more domain-specifically than Piaget's stage theory suggested.^[42]^[43] Piaget's broad developmental sequence is roughly preserved; the specifics are contested. For curricular purposes the implication is that the progression should be treated as a default order subject to empirical refinement, and that earlier introduction of certain patterns may be feasible than the conservative reading would suggest.

Solution archetypes as the intervention layer

The 230 solution archetypes are a separate but linked layer of the corpus. Where primes describe structural patterns, archetypes describe interventions: named compositions of two to five primes that solve recurring structural problems. Each archetype is documented with explicit problem signature, intervention logic, and failure modes.

For curricular purposes archetypes belong at the late-secondary or early-tertiary level, after the prime vocabulary is firm. Earlier introduction tends to produce memorization of intervention recipes rather than principled application — the failure mode that critical- thinking-curriculum research has documented many times in adjacent contexts.^[9] The intervention layer addresses the what now? problem that pure pattern-recognition instruction leaves open, but it does so at a cost in cognitive load that the curriculum must respect.

What the substrate does and does not provide

The substrate provides vocabulary, layered examples at multiple density levels, structural relationships among entries, and an intervention library for advanced students. It does not provide graded exercises (these must be constructed); does not provide assessment instruments (these must be developed and validated); does not provide teacher-preparation materials (these must be written); and does not provide the integration with substantive domain content that turns abstract-reasoning instruction into educationally useful content. Each of these is significant additional work, and treating the substrate as if it were a curriculum is precisely the Propædia mistake the historical lineage warns against.

The pipeline as one possible procedure¶

The augmented-abstract-reasoning pipeline described elsewhere in this work — the nine-step procedure that takes a problem statement through to a recommended intervention — is offered here as one possible procedural backbone for the curriculum. Its theoretical warrant draws primarily on the structure-mapping tradition; its practical structure derives from operational experimentation in the corpus's own development.

The nine steps are:

Specify the problem statement(s).
Identify the prime abstractions present.
Salience-rank the identified primes.
Prune the superfluous.
Build a context-specific model (primes plus key objects, subjects, phenomena, and their relationships).
Construct a meta-model (abstract the context-specific model into a portable structural form).
Query solution archetypes associated with the relevant primes.
Reason via both views (context-specific and meta-model), typically as separate passes.
Evaluate archetypes for fit in the specific context.

For pedagogical purposes the steps cluster into four phases.

Phase one: perception (steps 1–2)

The perceptual phase concerns the ability to articulate a situation precisely enough that structural patterns can be recognized in it, and then to recognize candidate patterns. This is what cognitive- science accounts of analogy call the encoding step.^[5] It is the gateway operation; without it, no mapping or transfer is possible.

In curricular terms, the encoding step is the operation most amenable to practice in isolation. A teacher can present situations of varying complexity and ask students to identify patterns, with no further reasoning required. The cognitive-science literature on perceptual training in expert domains (radiology,^[44] chess,^[45] firefighting^[16]) is consistent on a key finding: deliberate practice with feedback produces faster and more reliable pattern recognition than unstructured exposure. The Encyclopedia provides the labels for the patterns; the educational task is constructing graded exercises and providing reliable feedback. Language models can participate in feedback substantially, presenting candidate identifications for the student to verify against the canonical catalog.

Phase two: discernment (steps 3–4)

The discernment phase concerns which patterns matter in a given situation and which are peripheral. A complex real-world situation typically instantiates many patterns, only some of which are load- bearing for the question at hand. The cognitive operation is one of relevance judgment.

This operation has not been studied as systematically in the analogical-reasoning literature as encoding has, but its analogue in problem-solving research is well documented (Polya;^[46] information-foraging^[47]). In the present context the operation is taught through worked examples in which a fully populated pattern list is whittled to the operative subset, with explicit reasoning for each inclusion and exclusion. The student's practice mirrors this: identify a generous candidate set, then prune with stated rationale.

Discernment is harder to assess than perception because it requires domain knowledge. A student pruning patterns from a situation in medieval economic history needs enough familiarity with the period for the relevance judgment to be defensible. This is not a defect in the curriculum; it is a reminder that abstract reasoning is not domain-free, even when the structural patterns themselves are. The Encyclopedia's curriculum should be deployed alongside, not in place of, substantive domain instruction. This is the same point the transfer literature makes: domain-specific knowledge is a necessary input to the abstract-reasoning operation, not a replaceable layer.

Phase three: structural externalization and abstraction (steps 5–6)

The third phase is the cognitive heart of the pipeline. Step 5 asks the student to externalize the situation as an explicit structural model — operative patterns as nodes, relevant objects/subjects/ phenomena as additional nodes, relationships between them as labeled edges. Step 6 asks the student to abstract that model into a portable form, stripping the specifics and retaining the structure.

This is the operation that structure-mapping theory takes as central, and it is the operation that licenses cross-domain transfer in the empirical literature.^[5]^[6] Empirically, students who can externalize a situation as a relational graph and then strip it to its structural skeleton transfer better than students who cannot. Empirically, students do not develop this skill spontaneously — it must be taught explicitly, with worked examples and graded practice.

The pedagogical task in this phase is twofold. First, the student must acquire the mechanical skill of producing a relational diagram from a verbal situation description. Second, the student must acquire the judgment skill of deciding which features are structural (preserved under abstraction) and which are merely incidental (dropped). The first is teachable through repetition with feedback. The second is harder and requires considerable practice across many situations; it is also the cognitive skill that distinguishes proficient cross- domain reasoners from beginners.

Language-model assistance in this phase is particularly relevant because the model can produce a candidate context-specific model that the student verifies, modifies, and re-executes. The student is not asked to start from a blank page; they are asked to evaluate and edit a starting structure. The risks associated with this assistance are substantial and discussed in the next section.

Phase four: transfer and evaluation (steps 7–9)

The fourth phase is where structural correspondence cashes out as licensed inference. Step 7 retrieves known interventions associated with the operative patterns. Step 8 applies the intervention logic in both the context-specific and the meta-model views. Step 9 judges which candidate interventions actually fit the situation.

The dual-view requirement (step 8) deserves emphasis. Reasoning purely within the context-specific model tends to favor incremental, domain- conventional interventions. Reasoning purely within the meta-model tends to favor structurally elegant interventions that may be infeasible in the actual situation. Running both passes and reconciling them captures both kinds of insight while filtering each mode's characteristic failure. This dual-pass discipline parallels the dual-process tradition in cognitive psychology and the explicit advice in design and decision-making literatures to alternate between immersive and detached perspectives.^[17]^[48]

The pipeline is not the only possible procedural structure. Other structures could be designed around alternative cognitive-science frameworks (an embodied-simulation pipeline; a constraint- satisfaction pipeline; a dual-process pipeline that explicitly manages System-1/System-2 transitions). The proposal here is that some explicit procedure is necessary if abstract reasoning is to be taught directly; the specific procedure is open to refinement and to comparison against alternatives. Empirical research that compares pipeline-based instruction to alternative procedural backbones would be valuable.

AI as a provisional scaffold¶

Language models can serve as cognitive scaffolds within the pipeline, performing some of the more demanding sub-operations on demand and allowing students to engage with cognitive complexity beyond what they could produce unaided. This is potentially valuable; it is also the part of the proposal where the most caution is warranted, and where over-confidence about the technology is most costly.

What LLM scaffolding can plausibly contribute

The cross-domain pattern-surfacing asymmetry is real and useful. Language models trained on broad corpora have flatter and shallower coverage of human knowledge than any individual student or teacher. A student in a chemistry class has deep familiarity with chemistry- domain patterns but may recognize a quarter of the economics catalog and almost none of the sociology one. A model, conversely, has weaker grasp of any single domain but recognizes structural patterns from many domains it has seen instantiated in training. When a student is asked to identify patterns in an unfamiliar situation, the model can surface candidates the student would not have produced on their own, and the student's task becomes verification rather than generation.

This is consistent with Vygotsky's zone of proximal development:^[49] students can perform, with appropriate scaffolding, tasks they cannot yet perform independently. The LLM, used pedagogically, is one such scaffold. The question is not whether scaffolding is valuable (ample literature supports the value of scaffolding^[50]^[51]) but whether language models specifically can serve as effective scaffolds in this domain.

The pipeline architecture also produces inspectable artifacts, in contrast with free-form chain-of-thought. Each step yields a discrete artifact that the student or instructor can examine: the problem statement, the pattern list, the salience ranking, the pruned subset, the context-specific model, the meta-model, the archetype candidates, the dual-view reasoning, the recommendation. The inspectability is real and useful. It allows students to verify the model's claims at each step against the canonical catalog, and it allows instructors to diagnose where a student's reasoning has gone wrong.

What LLM scaffolding cannot reliably do

Several limits on LLM scaffolding deserve explicit acknowledgment.

The auditability of LLM reasoning is more limited than the inspectable-artifact framing suggests. Recent research has documented that the chain-of-thought outputs language models produce are not always faithful to their underlying computation. Models sometimes determine an answer through pattern matching early in the inference and then generate a chain of thought that rationalizes the predetermined answer.^[52] This is analogous to the human phenomenon Nisbett and Wilson documented in introspective reports.^[53] The pipeline architecture mitigates this risk partially: producing structured artifacts at each step provides more inspection points than free-form reasoning offers, and inconsistencies between the artifacts and the canonical catalog can be detected. But the mitigation is not airtight. The model can still produce internally inconsistent reasoning that appears coherent at first inspection. Treating the structured form as a guarantee of reasoning quality would be a mistake.

A separate concern is anchoring. Empirical work on human-AI collaboration shows consistently that human reviewers anchor heavily on AI-produced suggestions, treating them as defaults to be modified rather than proposals to be evaluated against alternatives the human might generate.^[54] In curricular use, anchoring threatens the development of the underlying skill: students who only ever evaluate LLM-produced pattern lists may never develop the skill of generating pattern lists from scratch. The mitigation, well established in the cognitive-load and worked-example literatures, is scaffolding fade: the LLM does heavy lifting at the novice stage and progressively less as the student develops competence.^[55]^[56] This is testable but not yet tested in the present context.

A third concern is reliability across domains. Language models' quality varies substantially by domain. A model that produces trustworthy pattern identifications in mainstream physics or economics may produce confabulated pattern identifications in specialized fields where its training data is sparse. The pipeline provides no internal mechanism for distinguishing reliable from unreliable output; that mechanism must come from external verification, either by the student against the canonical catalog or by an instructor with domain knowledge. This makes the curriculum's deployment more demanding than it might appear: students cannot simply trust the model, and the verification work is non-trivial.

A fourth concern is correlated errors across models. The peer- review pattern (one model evaluates another model's output) is a useful safeguard but has known limitations. Models trained on overlapping corpora share biases; a second model is unlikely to catch errors that arise from training-data gaps shared between the two. The mitigation is heterogeneous review (different model families, with explicit human spot-checking), not naive peer review.

These limits are not disqualifying. They are conditions under which the LLM-scaffolded curriculum must be carefully deployed. A curriculum that ignores them risks teaching the student to trust the model's output uncritically — the opposite of what abstract-reasoning instruction should accomplish. A curriculum that incorporates them explicitly, with instruction in verification practices and a fading regime that produces independent reasoners over time, can plausibly use LLM scaffolding as a productive component.

A staged relationship between student and model

The curriculum proposes a multi-stage relationship that fades the scaffold over time:

Stage 1 (introduction): the LLM produces all artifacts; the student evaluates each and asks questions. The student's task is verification and pattern absorption.
Stage 2 (assisted production): the LLM produces artifacts; the student modifies them substantially and re-executes the pipeline. The student's task is active editing and consequence tracing.
Stage 3 (parallel production): student and LLM produce artifacts independently and compare. The student's task is independent generation followed by reconciliation.
Stage 4 (independent production): the student produces artifacts unaided; the LLM serves as evaluator only. The student's task is unscaffolded reasoning.

This progression mirrors classical scaffolding-fade curricula in mathematics and writing instruction. It is empirically testable whether students taught this progression develop more robust abstract-reasoning skill than students who use LLM scaffolding without a fading regime. The chapter does not claim the answer is known; it claims the question can now be asked.

Alternative modes and limitations¶

The pipeline-based curriculum primarily exercises the propositional mode of abstract reasoning. Several other modes are real, well- documented in the cognitive science literature, and not directly addressed by the pipeline. A complete pedagogy of abstract reasoning would address them. This section sketches what they are and how they might be incorporated.

Visual / diagrammatic abstraction

Many structural patterns are perceived more efficiently in visual or diagrammatic form than in propositional form.^[57]^[58] For curricular purposes this means that several pipeline steps — particularly the structural-externalization steps (5 and 6) — benefit from explicit diagrammatic representation rather than verbal description alone. A student building a context-specific model should typically draw a graph: nodes for patterns and salient entities, labeled edges for relationships. The graph is the artifact; verbal description is caption rather than substance. Curriculum materials should include both propositional templates and visual templates appropriate to different problem classes.^[59]

Embodied / metaphorical abstraction

Lakoff and Johnson's work on conceptual metaphor argues that abstract thought is largely structured by metaphorical projection from sensorimotor experience.^[13]^[14] Barsalou's grounded-cognition framework develops the empirical case further.^[15] These frameworks are themselves contested within cognitive science — the strong embodiment hypothesis has been challenged, and the moderate version is closer to consensus than the strong version^[60] — but the moderate version is sufficient to motivate pedagogical attention to the embodied substrate. A teacher who introduces feedback through embodied analogues (the heating up of a room, the trembling of a hand learning a new motor skill) gives the student more cognitive purchase on the structural pattern than a teacher who introduces it through propositional definition alone. The curriculum should treat embodied metaphor as a deliberate pedagogical resource.

Intuitive / pattern-recognition abstraction

The chess-master, the experienced clinician, and the senior firefighter all share a cognitive profile that the explicit pipeline does not capture: they perceive structural patterns rapidly, with little or no externalization, on the basis of large prior practice.^[16]^[45] For curricular purposes, intuitive abstract reasoning is the target endpoint, not the starting point. A student who has been through the explicit pipeline many times develops over time the capacity to recognize structural patterns without externalizing them. The pipeline is scaffolding; intuition is the post-scaffolding state. This is consistent with how chess training, mathematical training, and clinical training all operate.

What the pipeline cannot do

It is worth being explicit about what the pipeline-based curriculum cannot accomplish, even when fully successful.

It cannot teach domain knowledge. A student who has learned to identify patterns and run the pipeline cannot, by virtue of that training alone, evaluate whether a candidate intervention is medically safe, economically viable, ethically acceptable, or historically accurate. The pipeline produces structurally licensed candidate inferences; whether those inferences are correct depends on domain facts the pipeline does not supply. The curriculum is a complement to substantive domain instruction, not a substitute.

It cannot teach moral judgment. The evaluation step (9) checks structural fit and failure-mode conditions. It does not, and cannot, evaluate whether a proposed intervention is ethical. A student who has learned to apply containment archetypes across domains may apply them in ways that are structurally sound but morally objectionable. The curriculum must be embedded within ethical instruction.

It cannot replace creativity in the strong sense. The pipeline operates over a catalog of known patterns. It may produce novel applications of those patterns to new domains, which is itself a form of creativity, but it does not generate fundamentally new patterns. A student capable of inventing a new pattern has gone beyond the pipeline into the territory of catalog construction itself.

These limitations are not defects to be remedied by curriculum revision. They are honest scope conditions on what explicit abstract- reasoning instruction can accomplish. The curriculum, if it works at all, will be a contribution to the educational toolkit, not a complete educational theory.

Implementation: what a pilot might look like¶

Translating the substrate and the pipeline into deployed curricular material requires decisions about age progression, unit structure, exercise design, assessment, integration with existing curricula, and teacher preparation. Given the lineage of similar proposals and the transfer-literature caveats, a sober implementation strategy begins with small-scale pilot deployment rather than broad rollout. This section sketches the design space for such a pilot; the design choices are open to refinement.

Age and developmental progression

The most defensible introduction sequence aligns the Encyclopedia's categorization with developmental cognitive stages. In late primary school (approximately ages 9–11), students can begin working with elementary structural patterns — cause and effect, balance, pattern, containment, scale, symmetry. These patterns are perceptually grounded and susceptible to embodied demonstration. The pedagogical mode at this stage is recognition with visual/embodied scaffolding; the full pipeline is not yet appropriate.

In middle school (approximately ages 11–14), students can extend the catalog into relational and dynamic patterns. The encoding step of the pipeline can be introduced as a procedure; context-specific modeling can be introduced for simple situations. In secondary school (approximately ages 14–18), the full pipeline becomes appropriate, and solution archetypes can enter at the late stages of secondary instruction. In tertiary and adult education, the full archetype catalog and advanced patterns become accessible, and the curriculum shades toward research methodology in which students contribute to the catalog by identifying candidate patterns the existing catalog does not name.

These stage assignments are heuristics rather than prescriptions, and the developmental literature has substantially complicated Piaget's strong stage theory in recent decades.^[42]^[43] Individual variation is large, and pilot deployment will produce empirical data on which the conservative defaults can be refined.

Unit structure

A reasonable unit structure for a single pattern, at the secondary level, looks roughly as follows: Encounter (paradigm example, typically embodied or perceptually concrete); Definition and structural signature (the v1 baseline read; the structural signature articulated); Cross-domain expansion (multiple examples from different domains); Near-misses and not-this (examples that look similar but instantiate different patterns); Pipeline practice (small situations worked through the pipeline); Transfer assessment (novel situations from domains not covered in the unit).

This structure can be applied at different scales — a class period for elementary patterns, a multi-week unit for complex ones — and integrated with substantive content from whatever subject the student is studying. A bottleneck unit can be taught through queue networks in math, traffic flow in geography, supply-chain examples in economics, and cellular metabolism in biology. The structural pattern is the same; the domain enrichments diversify.

Exercise design

Curriculum exercises should reflect the cognitive-science findings on transfer reviewed earlier: multiple-source comparison; explicit schema induction; deep processing during initial learning; spaced practice with retrieval; preparation for future learning. Worked examples should precede independent practice for novices, with fading as expertise develops.^[61]^[62] Productive-failure tasks should be incorporated as students develop sufficient prior knowledge to benefit from initial struggle.^[63]

These design principles are not abstract; they are operationally specific and well validated. A curriculum that ignores them loses much of what cognitive science has learned about transferable instruction. A curriculum that follows them is not thereby guaranteed to produce transfer — the transfer problem remains hard — but it does so under conditions the literature identifies as more favorable.

Assessment

Assessment is among the hardest curricular problems abstract-reasoning instruction faces. The standardized assessments most often used in educational settings measure recognition and recall, not transfer. The construct the curriculum targets — the ability to perceive structural patterns in novel situations and apply them productively — is intrinsically resistant to multiple-choice assessment.

Several assessment strategies bear investigation: novel-situation classification (present situations the student has not encountered and ask which patterns are operative); transfer construction (present a structural pattern and a target domain, ask the student to construct a candidate situation in that domain); pipeline-as-portfolio (collect the artifacts of full pipeline applications); long-form analytical essay (analyze a substantive real-world situation using the framework). Each has trade-offs. Inter-rater reliability is the hardest sub-problem, and rubric development is where assessment research must concentrate. The Encyclopedia's structural-signature sections provide a starting point for rubrics: an identification of a pattern is correct to the extent that the named elements of the structural signature are present in the situation.

A specifically interesting assessment direction, given Bransford and Schwartz's preparation for future learning framework,^[36] is to measure not transfer-task performance in isolation but the rate at which a student learns a new domain when introduced to it with appropriate resources. This is a different research design than classical transfer assessment and may better capture the kind of benefit the curriculum could plausibly produce.

Integration with existing curricula

The curriculum is not best deployed as a standalone subject. Abstract reasoning gains its leverage from being applied across substantive content. Several integration models are plausible: meta-skill modules within an existing subject; cross-curricular threading where the same patterns are introduced across multiple subjects in coordination; a standalone introductory course; capstone integration projects in late-secondary or tertiary education. The substrate supports any of these. Selection is a curricular-design choice that pilot deployment can inform.

Teacher preparation

Teachers cannot teach what they do not themselves understand. A curriculum that asks teachers to introduce patterns from domains outside their training imposes a significant preparation burden. The Encyclopedia's entries are written to be self-contained and accessible, but teachers will need to develop the same pipeline fluency the curriculum aims to produce in students. This is a substantial requirement; it is also an opportunity. A teacher- preparation program that puts teachers through the curriculum themselves before they teach it produces, in effect, a self- replicating cohort of cross-domain reasoners. Such a program has its own design problems but is structurally tractable.

A near-term implication is that initial deployment should focus on contexts where teacher preparation can be supported intensively: laboratory schools, magnet programs, university-affiliated secondary schools, and similar settings with the institutional capacity to invest in teacher development. Broader deployment follows once teacher-development materials have been refined through these contexts.

What concrete-lesson-level translation requires

It should be acknowledged that this chapter is pitched at the level of curriculum-design philosophy rather than classroom practice. Practical deployment requires translating the high-level structures described here into specific lesson plans, specific exercise sequences, specific assessment items, and specific feedback protocols. This translation is non-trivial and represents substantial pedagogical work that the chapter does not undertake. The translation work, like the implementation choices above, is best informed by pilot deployment rather than designed in the abstract.

Open questions and a research agenda¶

The proposal is offered as a hypothesis, and several empirical questions must be addressed before it can be evaluated as a serious educational intervention.

The first and most important question is transfer. Does explicit pipeline-based instruction produce measurably better cross-domain transfer than the current osmotic approach, or than existing critical- thinking curricula? The transfer literature reviewed in section 5 gives no warrant for confident expectations. A serious test would involve matched cohorts, random assignment to instructional condition, and transfer assessment on novel situations from domains not covered in either condition. The default expectation, given the literature, should be modest effects rather than dramatic improvements; the research question is whether modest effects materialize and whether they accumulate to educationally meaningful capability over time.

The second question is differentiation. Does the proposal outperform existing critical-thinking curricula (Paul-Elder, Halpern, Ennis, problem-based learning approaches), and if so under what conditions? This requires direct comparison rather than internal evaluation. The literature on critical-thinking curricula has been sobering; the proposal's distinguishing features (explicit cross- domain vocabulary, pipeline-procedure, AI scaffolding) are not guaranteed to produce different outcomes, and demonstration that they do would itself be a research contribution.

The third question is developmental sequencing. What is the optimal age progression for introducing different categories of pattern? The proposed sequence is defensible from the developmental- psychology literature but has not been empirically validated for this particular catalog.

The fourth question is AI-scaffolding parameters. What is the optimal fading schedule for LLM assistance? The cognitive-load and worked-example literatures suggest the rough shape but the specific parameters are not established.

The fifth question is equity and accessibility. Who benefits from the curriculum, and who does not? Differential effects on language- minority students, on students with various learning differences, on students from under-resourced school settings, and on neurodivergent students all require investigation. The risk of "abstract reasoning curriculum" becoming yet another marker of educational privilege is real and must be guarded against in deployment design.

The sixth question is AI dependency. Does heavy use of LLM scaffolding in the early phases produce students who cannot abstract- reason without LLM assistance? The cognitive-load literature predicts that fading discipline mitigates this, but the empirical question is open.

The seventh question concerns the catalog itself. The current entries are a working catalog, not a final one. Curriculum deployment will surface gaps, redundancies, and miscategorizations. The catalog should be treated as a versioned artifact subject to revision, with revision processes that preserve coherence while admitting refinement. A specific empirical sub-question: do students taught with the catalog outperform students taught with alternative substrates (Senge's archetypes; Alexander's pattern languages; the systems- thinking tradition more broadly), and if so under what conditions?

The eighth question is cross-cultural and cross-linguistic adaptability. The current catalog is constructed in English, drawing on largely English-language and Western-academic sources. Whether the catalog generalizes across languages and cultural contexts, and what changes are needed for other contexts, is unknown. The underlying claim — that structural patterns recur across domains — is plausibly culture-independent, but the specific catalog and naming may not be.

The ninth question concerns the primeness claim itself. By what criterion are the catalog's entries actually primes, and is the criterion defensible? This is a meta-question about the catalog's foundations that the catalog does not currently answer formally. Empirical and theoretical work on the criterion would strengthen or weaken the catalog's claims to atomicity and completeness.

These nine questions do not exhaust the research agenda. They identify the central loci where the proposal must be tested. The infrastructure to test them — the corpus, the pipeline, and increasingly capable AI tooling — exists now in a way it did not before. The work of testing them is the next phase, beyond this chapter's scope.

Conclusion: an idea worth exploring¶

The argument of this chapter has been that direct instruction in abstract reasoning across domains, long treated as a tacit byproduct of education and historically attempted with limited success, may now be worth attempting again under conditions that are different from those of prior attempts. Three components are different. The Encyclopedia of Abstractions provides a candidate vocabulary at unusual scale, with explicit structural signatures and cross-domain examples. The augmented-abstract-reasoning pipeline provides an explicit operational procedure that externalizes the otherwise tacit cognitive operations of cross-domain transfer. Capable language models provide scaffolding that supports student learning of operations the student cannot yet perform unaided.

None of this is novel in its parts. Each component sits in a long lineage of similar attempts (Leibniz, Carnap, Adler, Senge, Alexander), and the lineage is sobering: programmatic ambitions to systematize abstract thought have consistently delivered narrower successes than their authors envisioned. The transfer literature is also sobering: explicit instruction in general thinking skills has a mixed empirical track record, and the default expectation for any new proposal in this space should be modest effects rather than transformation.

What the proposal contributes, if anything, is the integration of components at scale, deployed under conditions the cognitive-science literature on transfer identifies as favorable: multiple-source comparison, explicit schema induction, structural encoding, spaced practice, scaffolding fade. Whether the integration produces measurably better outcomes than prior approaches is an empirical question that no one has yet answered. The chapter's claim is that the question is now askable, with infrastructure that did not previously exist, and that asking it carefully is worthwhile.

Several broader observations emerge from the analysis that bear on ongoing debates beyond curriculum design.

The first concerns what AI is for in education. Much of the public conversation about LLMs in classrooms has focused on LLMs as cheaters or replacements for student thinking. The pipeline approach proposes a different framing: the LLM as scaffold for cognitive operations the student cannot yet perform unaided, with explicit fading as the student develops capability. This framing is closer to the role of a seasoned tutor than to that of a substitute author, and it is the framing that the cognitive-load and worked-example literatures support. Whether it survives empirical contact with classroom realities is a separate question.

The second concerns the structure of expert thinking. The pipeline makes explicit operations that experts perform implicitly, often unconsciously. The explicitization is itself a contribution: a vocabulary and procedure for talking about cross-domain expert thinking. Whether the explicit procedure is the only or best way to develop the capacity remains genuinely open, and Polanyi's tacit- knowledge argument suggests caution about how much the explicitization actually captures.

The third concerns the relationship between human and machine reasoning. The curriculum treats students and language models as collaborators with distinct strengths and complementary roles. The student supplies domain context, judgment, and evaluative discipline; the model supplies cross-domain pattern surfacing and structural- externalization support. This is the cognitive partnership image of human-AI relationship, present in extended-cognition literatures.^[64]^[65] Whether it is the right image generally is a substantial open question; it is one image among several that the empirical work of the next decade will sort among.

The proposal is offered, in its honest form, as an idea worth exploring rather than as a settled approach. The substrate is real, the architecture is workable, and the empirical infrastructure to test the proposal exists. Whether the test produces the outcomes the chapter argues are plausible is for empirical research to determine. The chapter's contribution is to articulate the proposal carefully enough that the empirical research can begin, with appropriate acknowledgment of the historical lineage in which it sits and the empirical literature on transfer that constrains what the proposal can reasonably claim.

Whatever the empirical research determines, the Encyclopedia itself is plausibly worthwhile as shared vocabulary and reference material — that minimal contribution does not depend on the curriculum proposal succeeding. The pipeline architecture is plausibly worthwhile as a tool for individual cross-domain reasoning, regardless of its educational deployment. The combination might be more than the sum of its parts, or it might not. The chapter's argument is that finding out is worth the work.

References¶

[1] Hofstadter, D. R. (2001). Analogy as the core of cognition. In D. Gentner, K. J. Holyoak, & B. N. Kokinov (Eds.), The Analogical Mind: Perspectives from Cognitive Science (pp. 499–538). MIT Press. ↩

[2] Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7(2), 155–170. ↩

[3] Holyoak, K. J., & Thagard, P. (1995). Mental Leaps: Analogy in Creative Thought. MIT Press. ↩

[4] Gick, M. L., & Holyoak, K. J. (1980). Analogical problem solving. Cognitive Psychology, 12(3), 306–355. ↩

[5] Gick, M. L., & Holyoak, K. J. (1983). Schema induction and analogical transfer. Cognitive Psychology, 15(1), 1–38. ↩

[6] Gentner, D., Loewenstein, J., & Thompson, L. (2003). Learning and transfer: A general role for analogical encoding. Journal of Educational Psychology, 95(2), 393–408. ↩

[7] Detterman, D. K. (1993). The case for the prosecution: Transfer as an epiphenomenon. In D. K. Detterman & R. J. Sternberg (Eds.), Transfer on Trial: Intelligence, Cognition, and Instruction (pp. 1–24). Ablex. ↩

[8] Perkins, D. N., & Salomon, G. (1992). Transfer of learning. In International Encyclopedia of Education (2^nd ed., Vol. 11, pp. 6452–6457). Pergamon. ↩

[9] Willingham, D. T. (2007). Critical thinking: Why is it so hard to teach? American Educator, 31(2), 8–19. ↩

[10] Gentner, D. (1989). The mechanisms of analogical learning. In S. Vosniadou & A. Ortony (Eds.), Similarity and Analogical Reasoning (pp. 199–241). Cambridge University Press. ↩

[11] Gentner, D., & Markman, A. B. (1997). Structure mapping in analogy and similarity. American Psychologist, 52(1), 45–56. ↩

[12] Hummel, J. E., & Holyoak, K. J. (1997). Distributed representations of structure: A theory of analogical access and mapping. Psychological Review, 104(3), 427–466. ↩

[13] Lakoff, G., & Johnson, M. (1980). Metaphors We Live By. University of Chicago Press. ↩

[14] Lakoff, G., & Johnson, M. (1999). Philosophy in the Flesh: The Embodied Mind and Its Challenge to Western Thought. Basic Books. ↩

[15] Barsalou, L. W. (2008). Grounded cognition. Annual Review of Psychology, 59, 617–645. ↩

[16] Klein, G. A. (1998). Sources of Power: How People Make Decisions. MIT Press. ↩

[17] Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux. ↩

[18] Evans, J. S. B. T., & Stanovich, K. E. (2013). Dual-process theories of higher cognition: Advancing the debate. Perspectives on Psychological Science, 8(3), 223–241. ↩

[19] Polanyi, M. (1966). The Tacit Dimension. Doubleday. ↩

[20] Leibniz, G. W. (1666). Dissertatio de arte combinatoria. Leipzig. ↩

[21] Eco, U. (1995). The Search for the Perfect Language. Blackwell. ↩

[22] Carnap, R. (1928). Der logische Aufbau der Welt. Weltkreis-Verlag. ↩

[23] Neurath, O., Carnap, R., & Morris, C. W. (Eds.). (1938). International Encyclopedia of Unified Science. University of Chicago Press. ↩

[24] Adler, M. J. (1974). Propaedia: Outline of knowledge. In The New Encyclopædia Britannica (15^th ed.). Encyclopædia Britannica. ↩

[25] Senge, P. M. (1990). The Fifth Discipline: The Art and Practice of the Learning Organization. Doubleday. ↩

[26] Alexander, C., Ishikawa, S., Silverstein, M., Jacobson, M., Fiksdahl-King, I., & Angel, S. (1977). A Pattern Language: Towns, Buildings, Construction. Oxford University Press. ↩

[27] Gamma, E., Helm, R., Johnson, R., & Vlissides, J. (1994). Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley. ↩

[28] Thorndike, E. L. (1903). Educational Psychology. Lemcke & Buechner. ↩

[29] Glaser, R. (1984). Education and thinking: The role of knowledge. American Psychologist, 39(2), 93–104. ↩

[30] Chi, M. T. H., Glaser, R., & Rees, E. (1982). Expertise in problem solving. In R. J. Sternberg (Ed.), Advances in the Psychology of Human Intelligence (Vol. 1, pp. 7–75). Erlbaum. ↩

[31] Salomon, G., & Perkins, D. N. (1989). Rocky roads to transfer: Rethinking mechanisms of a neglected phenomenon. Educational Psychologist, 24(2), 113–142. ↩

[32] Halpern, D. F. (2014). Thought and Knowledge: An Introduction to Critical Thinking (5^th ed.). Psychology Press. ↩

[33] Bransford, J. D., & McCarrell, N. S. (1974). A sketch of a cognitive approach to comprehension. In W. B. Weimer & D. S. Palermo (Eds.), Cognition and the Symbolic Processes (pp. 189–229). Erlbaum. ↩

[34] Cepeda, N. J., Vul, E., Rohrer, D., Wixted, J. T., & Pashler, H. (2008). Spacing effects in learning: A temporal ridgeline of optimal retention. Psychological Science, 19(11), 1095–1102. ↩

[35] Roediger, H. L., & Butler, A. C. (2011). The critical role of retrieval practice in long-term retention. Trends in Cognitive Sciences, 15(1), 20–27. ↩

[36] Bransford, J. D., & Schwartz, D. L. (1999). Rethinking transfer: A simple proposal with multiple implications. Review of Research in Education, 24, 61–100. ↩

[37] Paul, R., & Elder, L. (2002). Critical Thinking: Tools for Taking Charge of Your Professional and Personal Life. Financial Times Prentice Hall. ↩

[38] Ennis, R. H. (1987). A taxonomy of critical thinking dispositions and abilities. In J. B. Baron & R. J. Sternberg (Eds.), Teaching Thinking Skills: Theory and Practice (pp. 9–26). Freeman. ↩

[39] Barrows, H. S., & Tamblyn, R. M. (1980). Problem-Based Learning: An Approach to Medical Education. Springer. ↩

[40] Bruner, J. S. (1960). The Process of Education. Harvard University Press. ↩

[41] Harden, R. M., & Stamper, N. (1999). What is a spiral curriculum? Medical Teacher, 21(2), 141–143. ↩

[42] Gopnik, A., & Meltzoff, A. N. (1997). Words, Thoughts, and Theories. MIT Press. ↩

[43] Carey, S. (2009). The Origin of Concepts. Oxford University Press. ↩

[44] Krupinski, E. A. (2010). Current perspectives in medical image perception. Attention, Perception, & Psychophysics, 72(5), 1205–1217. ↩

[45] Chase, W. G., & Simon, H. A. (1973). Perception in chess. Cognitive Psychology, 4(1), 55–81. ↩

[46] Polya, G. (1945). How to Solve It: A New Aspect of Mathematical Method. Princeton University Press. ↩

[47] Pirolli, P., & Card, S. (1999). Information foraging. Psychological Review, 106(4), 643–675. ↩

[48] Stanovich, K. E. (2011). Rationality and the Reflective Mind. Oxford University Press. ↩

[49] Vygotsky, L. S. (1978). Mind in Society: The Development of Higher Psychological Processes. Harvard University Press. ↩

[50] Pea, R. D. (2004). The social and technological dimensions of scaffolding and related theoretical concepts for learning, education, and human activity. The Journal of the Learning Sciences, 13(3), 423–451. ↩

[51] Van de Pol, J., Volman, M., & Beishuizen, J. (2010). Scaffolding in teacher–student interaction: A decade of research. Educational Psychology Review, 22(3), 271–296. ↩

[52] Turpin, M., Michael, J., Perez, E., & Bowman, S. R. (2023). Language models don't always say what they think: Unfaithful explanations in chain-of-thought prompting. In Advances in Neural Information Processing Systems 36. ↩

[53] Nisbett, R. E., & Wilson, T. D. (1977). Telling more than we can know: Verbal reports on mental processes. Psychological Review, 84(3), 231–259. ↩

[54] Bansal, G., Nushi, B., Kamar, E., Horvitz, E., & Weld, D. S. (2021). Is the most accurate AI the best teammate? Optimizing AI for teamwork. In Proceedings of the AAAI Conference on Artificial Intelligence, 35(13), 11405–11414. ↩

[55] Kalyuga, S., Ayres, P., Chandler, P., & Sweller, J. (2003). The expertise reversal effect. Educational Psychologist, 38(1), 23–31. ↩

[56] Van Merriënboer, J. J. G., Kirschner, P. A., & Kester, L. (2003). Taking the load off a learner's mind: Instructional design for complex learning. Educational Psychologist, 38(1), 5–13. ↩

[57] Larkin, J. H., & Simon, H. A. (1987). Why a diagram is (sometimes) worth ten thousand words. Cognitive Science, 11(1), 65–100. ↩

[58] Tversky, B. (2011). Visualizing thought. Topics in Cognitive Science, 3(3), 499–535. ↩

[59] Ainsworth, S. (2006). DeFT: A conceptual framework for considering learning with multiple representations. Learning and Instruction, 16(3), 183–198. ↩

[60] Mahon, B. Z., & Caramazza, A. (2008). A critical look at the embodied cognition hypothesis and a new proposal for grounding conceptual content. Journal of Physiology – Paris, 102(1–3), 59–70. ↩

[61] Sweller, J., van Merriënboer, J. J. G., & Paas, F. G. W. C. (1998). Cognitive architecture and instructional design. Educational Psychology Review, 10(3), 251–296. ↩

[62] Renkl, A. (2014). Toward an instructionally oriented theory of example-based learning. Cognitive Science, 38(1), 1–37. ↩

[63] Kapur, M. (2008). Productive failure. Cognition and Instruction, 26(3), 379–424. ↩

[64] Clark, A., & Chalmers, D. J. (1998). The extended mind. Analysis, 58(1), 7–19. ↩

[65] Hutchins, E. (1995). Cognition in the Wild. MIT Press. ↩

[66] Kahneman, D., & Klein, G. (2009). Conditions for intuitive expertise: A failure to disagree. American Psychologist, 64(6), 515–526.

[67] Richland, L. E., Zur, O., & Holyoak, K. J. (2007). Cognitive supports for analogies in the mathematics classroom. Science, 316(5828), 1128–1129.

[68] Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257–285.