The Verb Grammar of Abstraction Operations — a reference¶

Companion reference to The Limits of Runtime Scaffolding and to The Calculus of Abstraction (§5). This document specifies the operations the verb engine performs on abstractions, and grades each one honestly.

What this is, and the honesty it owes the reader¶

The project reframed the fixed nine-step Augmented Abstract Reasoning (AAR) pipeline as a small grammar of operations on abstractions — verbs that take abstractions (or abstraction-laden structures) as input and yield abstractions as output, plus rules for how they legally compose. This reference catalogs that grammar.

It is not a validated calculus. Exactly one verb (lift) was empirically stress-tested across the project's experiments — and the news there was largely negative (see the scorecard). The other verbs are specified and graded, not validated. We therefore evaluate every verb on three independent axes, a distinction that matters more than any single verdict:

Coherent — is the verb well-defined, with stable I/O types and legal composition behavior? (A property of the specification.)
Actionable — can a reasoner (human or model) actually execute it as a discrete, repeatable move with a checkable post-condition? We classify each verb's checkable core as mechanical (a real validator/algorithm could decide it) or soft (only a model's self-judgment).
Valuable — does executing it improve the outcome? And critically, for which use:
1. Runtime scaffolding for a frontier model — the one use we tested, and largely null.
2. Synthetic-data signal to train the skill into a model — untested.
3. Curriculum primitives to teach humans the moves — untested.

A verb can be coherent and actionable yet inert as runtime scaffolding for a strong model, while still being valuable as a teaching move or a training schema. "Does it have value?" is not answerable in the abstract — only "value for what." Keep that separation in mind throughout.

The STATE the verbs act on¶

The verbs read and write fields of a single typed object:

problem — statement, constraints, success/failure criteria.
operative_primes — [{slug, type: structural|framed|mixed, framedness, salience: load_bearing|context, rationale}].
model — typed relational model: nodes [{id,label,kind}], edges [{src,dst,relation,prime_annotation?}], prime_annotations.
meta_model — domain-stripped skeleton: roles, relations, invariants.
candidates — [{archetype, source_prime_overlap, retrieval_score}].
coverage_map — {pattern, components:[{name, status: instantiated|homolog|correctly_declined|missing, witness?, decline_reason?, gap_note?}]}.
gate_log — [{candidate, anti_signatures:[{slug,tripped,reason}], verdict}].
recommendation — the final design / decision (emitted only at stop).
history — [{step, verb, args, rationale}].

The grammar at a glance¶

The verbs compose, and several check each other — it is a small typed DSL, not just a list. Three things orient the rest of this reference:

The fixed pipeline is one sentence in the grammar. The nine-step AAR pipeline is a single legal path; the verb-engine experiments varied exactly that path (A/B/C/B′) and found the control structure didn't move outcomes at the frontier. Other orderings, subsets, and a free planner are all legal sentences.
Pairings. lift ⊣ lower (abstraction / instantiation — adjoint-like, not strict inverse); decompose ↔ frame (strip a frame / add one); salience-rank ↔ prune (generate / filter).
Verbs verify verbs, and the grammar is typed. Several verbs have another verb as their check (spelled out after the lexicon, where the names mean something), and framed/structural typing gates decompose but — importantly — not lift.

The retrieval substrate (what `match`/`transport` actually consume)¶

Because several verbs are only as good as the search beneath them, here is concretely what the catalog returned to the model on each call (the project-01 redesign; see the retrospective's Methods for detail):

search_prime(query) — semantic search over structural-signature embeddings, queried with the problem's domain-stripped meta-model (not raw text). Returns ranked primes, each with a cosine score, a distinctiveness_percentile + band (distinctive / mid / crowded), a structural|framed|mixed label, a return_unit, and — for crowded hits — a near-synonym family so the reasoner can recognize the neighborhood and disambiguate.
search_by_facets([facet, …]) — for oblique, multi-structure problems: each short domain-stripped facet phrase is retrieved separately; returns per-facet ranked lists plus a fused (reciprocal-rank) ranking. Surfaces latent patterns a single blended query buries.
get_prime / get_archetype — full records: a prime's structural signature, core idea, and "what it is not"; an archetype's essence, structural problem, component set, action-logic steps, and anti-signatures (the negative knowledge the gate consumes).
get_prime_neighborhood(slug) — the prime's explicit typed-graph neighbors (near-synonym, related-prime, source-/related-prime-of edges).
Plus find_archetypes_for_prime, find_related_primes, list_components, find_archetypes_using_component, corpus_stats.

The verbs¶

Each entry: definition · I/O · pre-condition · post-condition · checkable core (mechanical/soft) · failure modes · typing · grade.

match (retrieve)¶

Def. Anchor the problem to the catalog; return candidate primes + archetype neighborhoods.
I/O. problem.statement | meta_model → operative_primes (candidates), candidates (archetypes).
Pre. A problem statement exists. Post. ≥1 candidate with recorded source-prime overlap; oblique queries facet-decomposed.
Checkable core: mechanical (the retrieval is a reproducible function of the query; what is soft is the upstream choice of query/facets).
Failure. Retrieval miss (surface-distant pattern never surfaced); neighborhood crowding.
Grade. Coherent ✓ · Actionable ✓ (mechanical) · Value: the live one. Project-03 located the binding constraint here — pattern selection / retrieval, not downstream enforcement. The demonstrable value lives in the search engineering (semantic + facet beats lexical), not in the "verb" abstraction per se.

salience-rank¶

Def. Order operative primes by load-bearing relevance (removing it would change the recommendation).
I/O. operative_primes → same, tagged load_bearing|context + rationale. Pre. ≥1 candidate prime.
Checkable core: soft at runtime; the principled check is ablation (drop a prime, re-derive, see if the answer changes) — mechanical but expensive, and not run at scale.
Failure. De-prioritizing a load-bearing prime. Grade. Coherent ✓ · Actionable ~ (soft; ablatable) · Value untested in isolation.

prune¶

Def. Cut the ranked set to the operative subset (dual of salience-rank). I/O. ranked operative_primes → subset (~3–9), each cut justified.
Checkable core: soft. Failure. Premature pruning (cut a load-bearing prime); failure to prune (undifferentiated mush). Grade. Coherent ✓ · Actionable ~ · Value untested.

compose¶

Def. Assemble operative primes into the typed relational model. I/O. operative_primes + problem → model. Pre. ≥2 operative primes.
Post / checkable core: MECHANICAL — referential integrity: every edge references declared nodes; every operative prime appears as an annotation (a graph validator decides this).
Failure. Spurious coupling (invented edges); missing coupling (dropped load-bearing edge).
Grade. Coherent ✓ · Actionable ✓ (mechanical) · Value untested in isolation (imposing the typed structure plausibly tightened reasoning in early qualitative work, but the A/B/C ablation did not isolate it).

lift — coverage-enforcement (the one verb we stress-tested)¶

Def. Two modes sharing one artifact. L1 (lift the situation): model → meta_model (domain-stripped skeleton). L2 (lift a pattern's components onto the design): name the operative pattern's components, pose each as a candidate requirement, check whether the design instantiates each, and keep gap-closers / decline inapplicable-or-over-built ones.
I/O. L1: model → meta_model. L2: pattern + model → coverage_map.
Post / checkable core: MECHANICAL. L1: witness + neutrality (every skeleton element has a witness in model; no domain terms). L2: the coverage map — every component marked instantiated / homolog / correctly_declined / missing, with a witness or reason; may not finalize while any component is missing.
Failure. Over-lift (forcing an abstract frame the problem didn't need); mis-lift (wrong invariant); over-build (instantiating components a strong model didn't need — observed).
Typing. NOT gated on framed/structural — that gate was tested across 13 primes and refuted (conservation_laws, pure structural, got the largest lift-effect; symmetry, pure structural, went negative). The lift fires by coverage, not type.
Grade. Coherent ✓ · Actionable ✓ (mechanical) · Value: tested, and largely NULL. For a frontier model on within-reach problems it was inert; forcing it (condition C) over-built and slightly hurt; removing its scaffolding (B′) cost no quality; and in the low-coverage / weak-solver regime it still did not raise coverage, because it only verifies the components of a pattern the solver already selected. (See retrospective Investigations 2–3.) Value for training/teaching uses: untested.

lower (instantiation)¶

Def. Realize a transported pattern/role-set in the target domain. Typed mode — lower-to-homolog: on framed roles, find a substrate mechanism that discharges the same function even when the literal form is unavailable, and justify its independence.
I/O. pattern/role-set + model → concrete moves in recommendation (and entries in coverage_map).
Post / checkable core: semi-mechanical — every posed component/role maps to a named target mechanism (instantiated/homolog) or is explicitly declined.
Failure. Forced lowering; under-specification; naive role-dropping ("independent review" → the unit re-checking itself). Typing. lower-to-homolog is meaningful only when there is a frame to re-instantiate.
Grade. Coherent ✓ · Actionable ~ (the map is checkable; the quality of a homolog is soft) · Value untested in isolation; the homolog-search mode added robustness, not feasibility (Protocol 0/1) — a second-order, unproven gain.

transport (= retrieve + map)¶

Def. Carry a pattern across a domain gap: match (retrieve) + map (align the pattern's roles to the situation's entities and report where the correspondence breaks).
I/O. meta_model | operative_primes → an aligned candidate + a correspondence with break-points.
Checkable core: SPLIT. retrieve = mechanical; map has no implemented check — the calculus paper flags Gentner's Structure-Mapping Engine as the natural verifier, but we did not build it. So transport is only partially actionable.
Failure. Retrieval failure; forced mapping (aligning a superficially-similar pattern whose structure doesn't fit). Typing. Structural → direct recognitional (risk: neighborhood crowding); framed → lift-roles → map → lower-to-homolog (risk: smuggled frame on mapping — empirically near-zero).
Grade. Coherent ✓ · Actionable ~ (map unverified) · Value: the heart of the project's ambition and the hardest, least reliable operation; not isolated in our experiments.

decompose (the verb the framed/structural gate genuinely governs)¶

Def. Given a framed prime, strip the institutional/normative frame to expose a structural core, with three fates (unifies with an existing structural prime / is a new structural prime / is constitutively framed).
I/O. framed prime → {structural_core, frame} (or no-op on a structural prime).
Pre (TYPED GATE — load-bearing here, unlike lift). prime.type == framed.
Checkable core: semi-mechanical — fate-2 ("a new projectable prime") is verified by transport: does the extracted core project into ≥3 domains where the framed prime looked alien? (a runnable test).
Failure. False fate-two (a thin residue mistaken for a projectable pattern). Typing. This is the place framed/structural typing earns its keep.
Grade. Coherent ✓ · Actionable ~ (transport-verifiable) · Value: a catalog-construction / theory verb, not a problem-solving one; untested empirically in this project.

evaluate-fit (gate)¶

Def. Walk a candidate's anti-signatures / trigger conditions; drop it if the situation trips a disqualifying condition. I/O. candidate + model → gate_log entry + keep/drop.
Pre. Candidate carries anti-signatures (well-typed over archetypes; under-typed over bare primes).
Post / checkable core: MECHANICAL in principle — anti-signatures encoded as explicit conditions over the typed model (the clearest "scaffold-to-verifier" target, calculus §8.4).
Failure. Hollow gate (nodding at anti-signatures without testing them).
Grade. Coherent ✓ · Actionable ✓ (mechanizable) · Value: the most promising for quality — the AAR-vs-CoT work suggested the catalog's negative knowledge was where it added something — but the A/B/C and faithfulness runs did not isolate a quality gain, and the gate ran without changing choices. Strongest as a verifier (legibility/audit value), value-for-quality untested/weak.

reconcile¶

Def. Apply a candidate's logic to the concrete model and the meta_model separately; resolve the (characteristically different) inferences, preferring the reading consistent with both.
Checkable core: SOFT. Diagnostic only: if the two views never disagree, the meta-model is decorative.
Failure. Collapse (one view silently dominates). Grade. Coherent ✓ · Actionable ✗ (no real check) · Value untested; the diagnostic suggests it is often decorative.

Frontier / candidate verbs (named, not specified)¶

The grammar is open. The calculus paper (§5.5) lists operations that visibly belong but are not specified or graded here, offered only as a search frontier: map (structure-mapping pulled out of transport and made first-class, à la SME), generalize / specialize (move along a subsumption hierarchy), dualize (swap a pattern for its opposite-valence twin), negate / anti-pattern (construct a failure mode from a pattern), blend / merge (conceptual blending of two abstractions), factor / refactor (decompose or re-express a model), project / restrict (view a model through one prime's lens). A productive way to discover verbs is to mine each existing verb's failure mode for the verb that would have caught it.

Composition grammar: how the verbs chain and check each other¶

(The glance above previewed this; here is the detail, now that the lexicon makes the names mean something.)

The pipeline, written out. As one composition the fixed AAR pipeline is lower ∘ reconcile ∘ evaluate-fit ∘ transport ∘ lift ∘ compose ∘ prune ∘ salience-rank ∘ match — one legal path through the grammar, not the only one.
Verbs check verbs (the cross-check web). This is the part that needed the lexicon first: decompose is verified by transport (does the extracted core actually project into ≥3 alien domains?); salience-rank by ablation (does removing the prime change the recommendation?); lift by witness-checking (L1) and its coverage map (L2); compose by referential integrity. A grammar is not just a list of operations — it is which compositions are legal and which verb verifies which other.
Typing, in detail. framed/structural typing gates decompose (only a framed prime has a frame to strip — a no-op on a structural one) and shapes transport (recognitional for structural inputs, interpretive for framed); it does not gate lift — that gate was tested and refuted.

Summary scorecard¶

verb	coherent	actionable (checkable core)	value — tested?
match (retrieve)	✓	✓ mechanical	live bottleneck; value is in the search engineering, not the verb
salience-rank	✓	~ soft (ablatable)	untested in isolation
prune	✓	~ soft	untested
compose	✓	✓ mechanical (ref-integrity)	untested in isolation
lift	✓	✓ mechanical (coverage map)	tested → largely NULL at the frontier; framed/structural gate refuted
lower (+homolog)	✓	~ semi-mechanical	untested; homolog-mode adds robustness not feasibility
transport (retrieve+map)	✓	~ split (map unverified)	untested; the hardest operation
decompose	✓	~ transport-verifiable	untested; framed/structural gate genuinely applies here
evaluate-fit (gate)	✓	✓ mechanizable	most promising for quality / strongest as a verifier; not isolated
reconcile	✓	✗ soft only	untested; diagnostic suggests often decorative

What this grammar is, and is not¶

It is: a coherent, typed vocabulary of operations on abstractions, with explicit composition rules and a web of which-verb-checks-which — a small, legible DSL of abstraction-manipulation, and a set of teachable primitives. Four verbs (compose, lift, evaluate-fit, and decompose-via-transport) have genuinely mechanical checkable cores; the rest rest on soft judgment.

It is not: a validated calculus. Only lift was empirically stress-tested, and for the one use we tested — runtime scaffolding for a frontier model — its measured value was null (and its framed/structural gate was refuted). Whether this grammar earns its keep as training signal (teaching a model the moves) or as a human curriculum (teaching people to lift, transport, decompose, gate) remains open, and is — per the retrospective — the more promising place to look.