Equivalence-Preserving Rewriting¶
Core Idea¶
Equivalence-preserving rewriting is the structural move of transforming an expression, specification, derivation, or document into a behaviourally equivalent but operationally different form under an explicit equivalence relation, then selecting among the candidate rewrites by a cost criterion that the equivalence relation itself does not adjudicate. The pattern has three load-bearing parts: an object with a specified meaning (a query, a program fragment, a proof step, an algebraic term, a sentence, a clause); an equivalence relation that names which transformations are allowed because they preserve that meaning (relational-algebra equivalence, operational equivalence, logical equivalence, meaning-preserving paraphrase, unchanged-legal-effect rewording); and a cost criterion orthogonal to the equivalence (estimated execution time, instruction count, proof length, expression depth, readability, ambiguity reduction) under which one of the many admissible rewrites is chosen.
What makes the move structurally distinct from optimisation, simplification, or rewording-in-general is that the equivalence and the cost are factored: the equivalence relation defines a space of safe moves, and the cost criterion picks one point in that space. The discipline lives in keeping the two separate. Slip in a transformation outside the equivalence, and the rewrite no longer means what the original meant; slip in cost-driven shortcuts that the equivalence happens to permit but the user did not intend to permit, and the system silently changes behaviour the user expected to hold. The factoring also exposes a recurring failure mode keyed to the equivalence's grain: an equivalence relation that is too coarse admits rewrites that change behaviour users care about, and one that is too fine excludes rewrites that would be safe. The canonical instance is the compiler optimisation that is "correct" under the language-standard equivalence but "wrong" under the programmer's intuitive model — the equivalence relation is doing work the user did not notice. The disciplined remedy is to write the equivalence down, because an implicit equivalence drifts under maintenance and accumulates "obvious" rewrites it never actually permitted.
How would you explain it like I'm…
Same Thing, Easier Way
Swap Without Changing Meaning
Meaning-Safe Rewrites
Structural Signature¶
the object with a specified meaning — the equivalence relation defining the space of safe moves — the cost criterion orthogonal to the equivalence — the admissible-rewrite space the equivalence carves out — the within-space selection the cost performs — the grain of the equivalence (too coarse admits surprises, too fine forbids safe moves)
A configuration exhibits equivalence-preserving rewriting when each of the following holds:
- An object with specified meaning. Something carries a meaning to be held invariant: a query, a program fragment, a proof step, an algebraic term, a sentence, a clause, a circuit.
- An equivalence relation. An explicit relation names which transformations are admissible because they preserve that meaning — relational-algebra equivalence, operational equivalence, logical equivalence, meaning-preserving paraphrase, unchanged-legal-effect rewording.
- A cost criterion. A separate measure, orthogonal to the equivalence and not adjudicated by it, ranks admissible forms: estimated execution time, instruction count, proof length, readability, gate count.
- An admissible-rewrite space. The equivalence relation carves out a space of safe moves — the set of forms behaviourally equivalent to the original — within which any choice preserves meaning.
- A within-space selection. The cost criterion picks one point in that space, factoring the move into safe-space construction (the equivalence) and within-space selection (the cost), which must be kept separate.
- A grain. The equivalence has a grain that is the locus of failure: too coarse, it admits rewrites that change behaviour users care about ("correct by the spec, wrong by the user"); too fine, it excludes rewrites that would be safe. An implicit equivalence drifts under maintenance.
The components compose so that the equivalence relation is the primary object of scrutiny — the cost is where visible work happens, but correctness lives in the equivalence — and the disciplined architecture is generate-verify-select with a swappable cost model over a declared, written-down equivalence.
What It Is Not¶
- Not
transformationin general. A transformation may change meaning freely; this prime is the meaning-preserving subclass, constrained by an explicit equivalence relation and a separate cost criterion. The constraint is the whole point. - Not
relationas such.relationis the bare abstraction of a relationship between elements; this prime uses one specific relation (an equivalence) as a tool to carve a safe-move space, plus an orthogonal cost. It is an application, not the abstraction. - Not
refinement. Refinement moves toward a more specified or improved object, changing what it commits to; equivalence-preserving rewriting holds meaning fixed and changes only operational form. One narrows; the other relabels within an equivalence class. - Not
optimizationgenerically. Optimization is search for a best point under a cost; this prime factors that search into a safe-space construction (the equivalence) and a within-space selection (the cost), which generic optimization fuses. - Not
canonical_form. A canonical form is a single distinguished representative of an equivalence class; this prime is the cost-driven navigation among many admissible forms, which need not converge on any canonical one. - Common misclassification. Calling a "cleanup," "simplification," or "rewording" equivalence-preserving without stating the relation. If the preservation invariant is unwritten, the cost work runs over an undefined safe space; catch it by demanding "preserved under what relation?" before any rewrite is trusted.
Broad Use¶
- Database query optimisation: relational-algebra rewrites — selection push-down, join reordering, predicate rewriting — transform a declarative query into operationally different but result-equivalent plans, and the optimiser picks the lowest estimated cost.
- Compiler optimisation: loop unrolling, dead-code elimination, common-subexpression elimination, instruction selection, peephole rewrites — each a program-equivalence-preserving transformation scored by an execution-cost model.
- Theorem proving: normal-form rewriting in term-rewriting systems, rewriting a proof for shorter or more standard form, tactic-based proof restructuring under preserved logical content.
- Algebra and symbolic computation: factoring, expanding, simplifying, rewriting integrals into standard form — symbolic-equivalence rewrites scored by length, depth, or standard-form-ness.
- Natural-language paraphrase: meaning-preserving rewordings scored by clarity, register, or length; controlled-language editing for technical writing.
- Legal drafting: restating a clause to clarify, consolidate cross-references, or remove ambiguity without changing its legal effect; consolidation acts and uniform codes are large-scale instances.
- Hardware synthesis: equivalent-circuit rewrites under logical equivalence, scored by gate count, depth, or power.
Clarity¶
Naming the move forces two questions that ordinary language conflates: under what equivalence relation are these rewrites admissible? and what cost are we minimising over the admissible set? A "simplification" that does not state its equivalence relation is undisciplined; a "rewording" that does not state what is preserved is a covert content change. The clarifying separation is between the safe-space construction (the equivalence) and the within-space selection (the cost), which ordinary usage of words like "optimise," "clean up," or "restate" fuses and thereby hides. The factoring also surfaces the grain problem explicitly: an equivalence that is too coarse admits behaviour-changing rewrites, one that is too fine forbids safe ones, and the dangerous case is the equivalence quietly doing work the user did not authorise — "correct by the spec, wrong by the user." Once the move is named, every claimed equivalence-preserving transformation invites the audit "what exactly is preserved, and by what relation?", which converts a vague appeal to having kept the meaning into a checkable commitment.
Manages Complexity¶
The pattern manages complexity by separating the safe-space construction from the cost minimisation. Generating one good rewrite is hard; checking that a rewrite preserves meaning is often much easier and is mechanisable; choosing among many checked-safe rewrites by cost is a separate, equally mechanisable problem. This factoring is what makes industrial query optimisers, production compilers, and proof assistants tractable: the equivalence checker is a verifier, the cost-driven search is a planner, and the two halves can be improved independently. A second compression follows: the equivalence relation, once made explicit, is reusable across many cost criteria — the same relational-algebra equivalences serve cost models for spinning disk, SSD, distributed shuffle, and in-memory engines, and the same logical-equivalence laws serve proof shortening, proof clarification, and prover-friendly rewriting. The compression is that a database engineer, a compiler writer, a proof engineer, and a legal drafter all maintain a rewrite-rule database under a declared equivalence and a swappable cost model, so the architecture learned in one substrate transfers as a template in the next, and re-tuning a system to a new objective becomes a matter of swapping the cost model rather than rebuilding the safe space.
Abstract Reasoning¶
Recognising the pattern enables several reasoning moves. Equivalence as design choice: which equivalence relation one commits to determines what rewrites are admissible and where surprise can leak in, so choosing the wrong equivalence is the silent root cause of "correct by the spec, wrong by the user" defects. Verification asymmetry: it is often easier to check that a rewrite preserves meaning than to generate it, which is why search-then-verify architectures are common — generate candidates freely, filter by the equivalence checker, then score by cost. Cost-equivalence orthogonality: changing the cost criterion changes which rewrite wins without changing the equivalence, so a system architected around the factoring can be re-tuned by swapping cost models. Composable rewrites: chains of admissible rewrites compose under the same equivalence, which underwrites the rewrite-rule databases that optimisers, term-rewriting systems, and drafting style guides all maintain. Equivalence drift: when the equivalence is implicit it drifts under maintenance, accepting "obvious" rewrites it did not actually permit, so the discipline is to write it down and check rewrites against it. The non-obvious move the prime licenses is to treat the equivalence relation as the primary object of scrutiny — the cost criterion is where the visible work happens, but the equivalence is where correctness lives, and most failures trace to an equivalence that was too coarse, too fine, or never written down.
Knowledge Transfer¶
The factoring transfers across substrates that share no surface vocabulary. From compiler to legal drafting: the discipline of declaring the preservation invariant explicitly before allowing a rewrite ports directly — a consolidation project that articulates the "preserved legal effects" relation up front and screens proposed rewrites against it behaves more like a compiler optimiser than like ad-hoc redrafting. From database optimiser to natural-language paraphrase: the search-then-verify architecture — generate many candidate paraphrases, filter by a meaning-equivalence check, pick by stylistic cost — transfers to controlled-language tooling and assisted-editing pipelines. From term rewriting to product design: the idea that a normal form exists, a canonical rewrite to which all admissible rewrites should converge, underwrites style guides, code formatters, and design-system standardisation. The role mappings transfer directly — object ↔ query / program / proof / term / sentence / clause / circuit; equivalence relation ↔ relational-algebra equivalence / operational equivalence / logical equivalence / paraphrase / preserved legal effect / logic equivalence; cost criterion ↔ estimated I/O / instruction count / proof length / readability / clarity / gate count; candidate generator and checker ↔ rewrite-rule database plus verifier. The transferred and non-obvious lesson is that "preservation" and "preference" are two separable disciplines, and conflating them is the recurring source of defects: the legal team that optimises a statute for clarity without first writing down what legal effects must be preserved is making the same error as the compiler that applies a standard-legal but intuition-violating optimisation. A practitioner who has internalised the factoring in one substrate carries into every other the same two questions — what relation makes these rewrites safe, and what orthogonal cost picks among them — and the same architecture, generate-verify-select with a swappable cost model over a declared equivalence, which is exactly what makes the move transfer intact from query plans to statutory consolidation.
Examples¶
Formal/abstract¶
The database query optimiser is the prime's defining formal instance, where the factoring of equivalence from cost is fully mechanised. The object is a declarative SQL query with a specified result-set meaning. The equivalence relation is relational-algebra equivalence: the set of rewrites — selection push-down, join reordering, predicate simplification, subquery flattening — that provably preserve the result set under set semantics. This relation carves out the admissible-rewrite space, the family of physical plans all guaranteed to return the same rows. The cost criterion is orthogonal to that relation: estimated I/O, CPU, and memory under a cost model parameterised by table statistics, index availability, and the storage engine. The within-space selection is the optimiser's search choosing the lowest-estimated-cost plan. The architecture is exactly generate-verify-select: rewrite rules generate candidates, each provably inside the equivalence so correctness is never at issue, and the cost model ranks them. The prime's grain warning is concrete here: if the equivalence relation is stated for set semantics but the query expects ordered or duplicate-preserving (bag) semantics, a "correct" reorder can change behaviour the user relied on — the equivalence was too coarse. And the reusability point is sharp: the same relational-algebra equivalences serve completely different cost models — spinning disk, SSD, distributed shuffle, in-memory — so re-tuning to new hardware is swapping the cost model over a fixed, declared equivalence, not rebuilding the safe space. Mapped back: the SQL query is the object, relational-algebra equivalence is the declared relation defining safe moves, estimated cost is the orthogonal criterion, and the optimiser is the generate-verify-select architecture — with grain failures tracing to a set-versus-bag mismatch in the equivalence, never to the cost model.
Applied/industry¶
Two applied instances share the factoring across substrates with no shared vocabulary. First, legal drafting and statutory consolidation: the object is a clause or statute with a specified legal effect. The equivalence relation is "unchanged legal effect" — the rewrites (renaming a defined term, promoting a recital into an operative clause, splitting a tangled section, consolidating cross-references) that a drafter asserts preserve the obligations, rights, and liabilities. The cost criterion is orthogonal: readability, brevity, ambiguity reduction. The prime's central discipline — write the equivalence down — is the load-bearing move: a consolidation project that first articulates the "preserved legal effects" relation and screens each proposed rewrite against it behaves like a compiler optimiser, while one that "cleans up for clarity" without stating what must be preserved is making the classic error — optimising the cost while silently slipping outside the equivalence and changing the law. Second, natural-language paraphrase tooling: the object is a sentence with a meaning to hold invariant; the equivalence relation is meaning-preserving paraphrase; the cost criterion is clarity, register, or length. The portable architecture is search-then-verify: generate many candidate rewordings freely, filter by a meaning-equivalence check, then score by stylistic cost — the same shape as the query optimiser, imported into controlled-language and assisted-editing pipelines. In both, conflating "preservation" with "preference" is the recurring defect: the legal team that optimises for clarity without writing down the preserved effects makes the same mistake as a compiler applying a standard-but-intuition-violating optimisation. Mapped back: clauses and sentences are objects; unchanged-legal-effect and meaning-preserving-paraphrase are the declared equivalences; clarity and brevity are the orthogonal costs; and the generate-verify-select architecture with a written-down equivalence transfers intact from the optimiser to statutory consolidation to paraphrase tooling.
Structural Tensions¶
T1 — Equivalence Preservation versus Cost Preference (scopal). The prime's load-bearing factoring is that the equivalence defines what is safe and the cost picks among the safe set; correctness lives in the equivalence, visible work in the cost. The tension is that ordinary language ("optimise," "clean up," "restate") fuses them. The characteristic failure mode is conflation: optimising for clarity, speed, or brevity while silently slipping outside the equivalence and changing meaning — the legal team that polishes a statute without first writing down what legal effects must be preserved. The diagnostic: for any rewrite, ask separately "what relation makes this safe?" and "what orthogonal cost picks it?"; if the preservation invariant is unstated, the cost work is operating over an undefined safe space.
T2 — Coarse Equivalence versus Fine Equivalence (grain). The equivalence has a grain: too coarse, it admits rewrites that change behaviour users care about; too fine, it forbids rewrites that would be safe. The tension is that both errors hide in the relation's definition. The failure mode is the "correct by the spec, wrong by the user" defect — a reorder valid under set semantics that breaks a query expecting ordered or bag semantics, the equivalence quietly doing work the user did not authorise. The diagnostic: probe the boundary of the equivalence against the user's actual intent — does the relation admit any transformation the user would object to (too coarse), or forbid one they would welcome (too fine)?
T3 — Implicit Equivalence versus Declared Equivalence (temporal). An equivalence can be written down or left implicit; an implicit one drifts under maintenance, accreting "obvious" rewrites it never actually permitted. The tension is between the convenience of an unstated relation and the discipline of a declared one. The failure mode is equivalence drift: over successive edits, each individually plausible, the safe space silently expands until a rewrite changes behaviour no one authorised. The diagnostic: ask whether the preservation invariant is written down and checkable; if rewrites are screened against a maintainer's intuition rather than a declared relation, the equivalence is drifting and the next "harmless" rewrite may breach it.
T4 — Generation versus Verification (coupling). The architecture factors into generating candidate rewrites and verifying they preserve meaning, and the two have asymmetric difficulty — checking is often far easier and more mechanisable than generating. The tension is between trusting a generator and independently verifying its output. The failure mode is skipping verification because the generator is "known good," so a generator bug or an out-of-equivalence heuristic ships unchecked. The diagnostic: confirm a verifier independent of the generator screens every rewrite against the declared equivalence — a generate-then-verify pipeline is robust to generator error precisely because correctness rests on the check, not on the generator's good behaviour.
T5 — Cost-Model Swap versus Equivalence Stability (coupling). Because cost is orthogonal to equivalence, a system can be re-tuned by swapping cost models over a fixed safe space — the same relational-algebra equivalences serve disk, SSD, and in-memory cost models. The tension is the temptation to "fix" a cost-driven surprise by quietly loosening the equivalence instead of the cost. The failure mode is editing the equivalence to admit a fast-but-unsafe rewrite when the real problem was the cost model, contaminating the safe space to serve a preference. The diagnostic: when a rewrite is wanted but unsafe, ask whether the fix belongs in the cost model (re-rank) or the equivalence (re-define safety) — pushing preference changes into the equivalence is how the safe space silently degrades.
T6 — Local Rewrite Safety versus Composed-Chain Safety (scalar). Individually admissible rewrites compose under the same equivalence, which underwrites rewrite-rule databases — but composition can interact with context-dependent side conditions a single rule checked only locally. The tension is between per-rewrite safety and whole-chain safety. The failure mode is assuming a chain of locally-verified rewrites is globally safe when one rewrite's precondition was invalidated by an earlier rewrite in the chain. The diagnostic: verify the composed result against the equivalence, not just each step — when rules carry context-sensitive side conditions, a sequence of individually safe moves can land outside the equivalence even though every link checked out in isolation.
Structural–Framed Character¶
Equivalence-preserving rewriting sits at the structural end of the structural–framed spectrum, with an aggregate of 0.0: it is a pure relational pattern — an equivalence relation carving a space of safe moves, plus a cost criterion orthogonal to it that selects within that space — that holds wherever an object's meaning is held invariant while its operational form is changed. The structure lives in the factoring of equivalence from cost, not in any field's vocabulary.
Every diagnostic reads structural. The pattern carries no home vocabulary that must travel: it is told as query optimisation in databases, loop unrolling in compilers, normal-form rewriting in theorem proving, simplification in symbolic algebra, meaning-preserving paraphrase in language, unchanged-legal-effect rewording in statutory drafting, and equivalent-circuit synthesis in hardware, each substrate naming its own equivalence while the equivalence-plus-orthogonal-cost skeleton stays invariant. It carries no inherent approval or disapproval — a rewrite is neither good nor bad until you specify the cost; preservation and preference are both value-neutral structural roles. Its origin is formal — an equivalence relation (a reflexive, symmetric, transitive relation) and a separate cost functional over its classes — and it runs indifferently in a relational-algebra engine, a term-rewriting system, and a logic-synthesis tool, none of which requires a human institution. And invoking it RECOGNISES an equivalence class already latent in the object's meaning rather than IMPORTING a frame: the audit move is "preserved under what relation?", not an overlaid interpretation. On every diagnostic the prime reads structural, consistent with the 0.0 aggregate.
Substrate Independence¶
Equivalence-preserving rewriting is a maximally substrate-independent prime — composite 5 / 5 on the substrate-independence scale. Its domain breadth is wide: navigating an equivalence class to minimise a cost while preserving meaning recurs as relational-algebra rewrites in database query optimisation, loop unrolling and dead-code elimination in compilers, normal-form rewriting in theorem proving, factoring and simplification in symbolic computation, meaning-preserving paraphrase in natural language, clause restatement in legal drafting, and equivalent-circuit rewrites in hardware synthesis — computational, mathematical, linguistic, legal, and engineering substrates. Its structural abstraction is maximal: the signature is a clean relational object — an equivalence relation partitioning a space into classes, a set of meaning-preserving transformations, and a cost functional minimised within a class — with no domain content, so it is read straight off any "rewrite to a cheaper equivalent form" task. The transfer evidence is heavy and concrete: the role mappings (the equivalence relation, the rewrite rules, the cost model, the search over equivalents) carry intact, and a compiler engineer recognises a query optimiser and a theorem-prover's normalisation as the same equivalence-class search under a different cost criterion, with named formal instances (term-rewriting systems, cost-based query planners) in each. Maximal breadth, maximal abstraction, and documented transfer converge on a canonical 5.
- Composite substrate independence — 5 / 5
- Domain breadth — 5 / 5
- Structural abstraction — 5 / 5
- Transfer evidence — 5 / 5
Relationships to Other Primes¶
Parents (1) — more general patterns this builds on
-
Equivalence-Preserving Rewriting is a kind of Transformation
The MEANING-PRESERVING subclass of transformation: an explicit equivalence relation carves a space of safe moves, plus an orthogonal cost criterion that selects within it. The file: 'this prime is the disciplined subclass... preserved under what relation?' Generic transformation may change meaning freely; this constrains it by a named equivalence.
Path to root: Equivalence-Preserving Rewriting → Transformation
Neighborhood in Abstraction Space¶
Equivalence-Preserving Rewriting sits among the more crowded primes in the catalog (36th percentile for distinctiveness): several abstractions describe nearly the same structure, so a description that fits it will tend to fit its neighbors too — transporting it usually means disambiguating within this family rather than landing on it exactly.
Family — Logical Moves & Precondition Gating (10 primes)
Nearest neighbors
- Canonical Form — 0.73
- Inversion — 0.72
- Verifier-Prover Asymmetry — 0.72
- Contraposition — 0.71
- Complement — 0.71
Computed from structural-signature embeddings · 2026-06-14
Not to Be Confused With¶
The most important confusion to dissolve is with generic transformation. A transformation is any structured mapping of one object to another, and it is silent about whether meaning is preserved — compilation to machine code, lossy compression, and translation that shifts nuance are all transformations. Equivalence-preserving rewriting is the disciplined subclass that adds two commitments a bare transformation lacks: an explicit equivalence relation declaring exactly what must be held invariant, and an orthogonal cost criterion that selects among the meaning-preserving forms. The invariants differ at the root. A transformation's invariant is whatever its definition happens to fix (possibly nothing about meaning); this prime's invariant is behavioural equivalence under a named relation, with the cost forbidden from adjudicating safety. The practical hazard of conflation is precisely the prime's central defect: treating a meaning-changing transformation as if it were meaning-preserving because the output "looks the same" — the compiler optimisation correct under the spec but wrong under the programmer's model, the legal redraft that "cleans up" while silently altering effect. The prime exists to force the question a generic transformation never asks: preserved under what relation?
A second genuine confusion is with optimization. Both end in choosing a form that minimises a cost, so it is tempting to call equivalence-preserving rewriting "just optimisation." The distinction is the factoring. Generic optimisation searches a space for a best point under an objective, with the feasible region and the objective often entangled. This prime insists on a clean separation: the equivalence relation constructs the feasible region (the safe-move space), and the cost criterion selects within it, and the two must never leak into each other. The invariant of optimisation is "best under the objective"; the invariant here is "best among the provably-safe, where safety is defined independently of best." Why it matters: when a desired rewrite is fast but unsafe, optimisation thinking tempts one to admit it by relaxing the constraint; the prime's factoring says the fix belongs in the cost model (re-rank) or is forbidden, never in quietly loosening the equivalence to serve a preference. An optimiser that contaminates its feasible region to win on cost is exactly the failure the prime's discipline prevents.
A third confusion worth drawing is with canonical_form. Both live in the world of equivalence classes, and a reader may assume rewriting is just "putting the object in normal form." But a canonical form is a single distinguished representative — the unique normal form to which all admissible rewrites should converge — whereas equivalence-preserving rewriting is the cost-driven navigation among the many admissible forms, which in general does not converge on any one representative. The fastest query plan, the shortest proof, and the most readable clause are different points in the same equivalence class, chosen by different cost criteria; none is "the" canonical form. Canonicalisation is a special case where the cost criterion is "standardness" and the target is uniqueness; the general prime is multi-objective and multi-destination. Conflating them leads to expecting a unique answer where the cost actually selects different winners under different objectives, or to ignoring the cost dimension entirely because "there's a normal form."
For a practitioner these distinctions converge on the prime's two-question discipline. Confronted with a rewrite, do not file it under transformation (which would skip the preservation question), optimisation (which would let cost adjudicate safety), or canonical form (which would assume a unique target). Ask instead: under what explicit equivalence relation is this rewrite safe, and what orthogonal cost selects it among the safe set? The prime's value is in holding preservation and preference apart — because each neighbour, taken alone, fuses exactly the two things whose separation makes the move correct.
Solution Archetypes¶
No catalogued solution archetypes reference this prime yet.