Data Integrity Preservation¶

Preserve the accuracy, consistency, and traceability of data or records across their lifecycle.

Essence¶

Data Integrity Preservation is the pattern of keeping data trustworthy while it is created, changed, copied, interpreted, migrated, archived, and restored. It does not assume that a value remains reliable merely because it is stored somewhere. Instead, it asks what must stay true for the data to support action: accuracy, consistency, attribution, freshness, provenance, and recoverability.

The archetype is useful whenever data becomes a basis for decisions, obligations, rights, safety, money, memory, or evidence. In those cases, integrity is not a decorative quality-control layer. It is the structure that lets a system tell the difference between a trusted record, a suspect record, a stale record, a disputed record, and a repaired record.

Compression statement¶

When decisions depend on data that can be corrupted, duplicated, desynchronized, altered, stale, or detached from its source, Data Integrity Preservation defines integrity invariants, constrains authorized mutation paths, validates lifecycle transitions, records provenance, reconciles mismatches, and maintains recovery paths so trusted data state remains usable.

Canonical formula: trusted_data_state = integrity_invariants + authorized_mutation_paths + validation + provenance + reconciliation + recovery

When to Use This Archetype¶

Use this archetype when decisions depend on records that can be corrupted, duplicated, desynchronized, altered without authority, detached from provenance, or restored from an uncertain state. It fits databases, ledgers, case files, research datasets, safety logs, supply-chain records, documentation systems, and any workflow where downstream users need to know whether the data can be relied on.

It is especially relevant at lifecycle boundaries: intake, import, transformation, write, approval, migration, synchronization, export, archival, restoration, and deletion. These are the moments when a record can silently lose meaning even if it still looks formally valid.

Structural Problem¶

The structural problem is that data must remain stable enough to guide action while also moving through imperfect people, systems, formats, permissions, transformations, and time. A record that was once accurate can become stale. A field that is valid in isolation can contradict another system. A file can be unchanged but wrong at source. A log can exist while the real change path went around it.

This produces a trust gap: the system has data, but it cannot confidently say what the data means, where it came from, whether it is current, who changed it, or how to repair it when it disagrees with another representation.

Intervention Logic¶

The intervention begins by naming the data or record whose trust matters and defining the integrity invariants that must be preserved. The designer then maps the record lifecycle to locate vulnerable transitions, constrains who or what may mutate protected state, places validation and evidence mechanisms at the right boundaries, and defines how discrepancies are reconciled.

The most important move is to connect controls to decisions. A checksum, audit log, schema, permission table, or backup only becomes part of Data Integrity Preservation when it protects a named integrity invariant and feeds a correction, review, or recovery path. Otherwise it is a mechanism without an archetype.

Key Components¶

Data Integrity Preservation keeps records trustworthy across their full lifecycle by stating what trust means, locating where it is vulnerable, and connecting controls to decisions rather than to ceremony. The work begins with an Integrity Invariant — the condition that must remain true, whether a correct balance, a valid identity relation, an attributable change history, or a nonduplicated transaction — without which teams cannot tell actual corruption from mere difference. The Record Lifecycle Boundary then locates where data is born, transformed, copied, approved, migrated, archived, restored, or retired, because integrity failures concentrate at handoffs and lifecycle transitions rather than at rest. The Validation Rule translates the invariant into checks at those boundaries — schemas, range checks, referential constraints, plausibility tests, or human review criteria.

Four components govern authority, change, and history. The Authoritative State Relation names which representation has authority for which claim, since many integrity failures are not isolated errors but conflicts between systems that each look plausible. The Authorized Mutation Path restricts legitimate creation, update, deletion, migration, and restoration to known routes, protecting against hidden side paths that alter records without evidence. The Provenance Record preserves origin and transformation history so users can understand where a value came from, how it changed, and whether its path supports the use being proposed. The Consistency Relation defines how separate records, fields, ledgers, or versions should agree — through equality, derivation, ordering, referential linkage, or a tolerated lag window.

Three final components handle discrepancy, failure, and ambiguity. The Reconciliation Rule converts mismatches into decisions by specifying when to repair, escalate, accept, quarantine, or document divergence, rather than letting each conflict be handled ad hoc. The Recovery Path defines how trusted state is restored after corruption, loss, failed migration, or unauthorized change — and crucially, recovery must be verified against the integrity invariants rather than treated as adequate just because a backup exists. The Integrity Exception Channel keeps suspicious, stale, disputed, or unreconciled records visibly flagged until review is complete, preventing unresolved anomalies from being silently treated as settled truth.

Component	Description
Integrity Invariant ↗	defines the condition that must remain true. It might be a correct balance, a valid identity relation, an up-to-date status, a nonduplicated transaction, or an attributable change history. Without this component, teams cannot distinguish actual corruption from mere difference.
Record Lifecycle Boundary ↗	shows where the data is born, transformed, copied, approved, read, archived, restored, or retired. Integrity failures often happen at handoffs, imports, migrations, exports, and restores, so the lifecycle boundary tells the system where protection must be placed.
Validation Rule ↗	translates the invariant into checks. A validation rule can be a schema, range check, semantic rule, referential constraint, plausibility test, human review criterion, or cross-record comparison.
Authoritative State Relation ↗	identifies which representation has authority for which claim. This component is needed because many integrity failures are not isolated errors; they are conflicts between systems that each appear plausible.
Provenance Record ↗	preserves origin and transformation history. It helps users understand where a value came from, how it changed, and whether its path supports the use being proposed.
Authorized Mutation Path ↗	restricts legitimate creation, update, deletion, migration, and restoration. It protects the system from hidden side paths that alter records without evidence or review.
Consistency Relation ↗	defines how separate records, fields, ledgers, systems, or versions should agree. It can express equality, derivation, ordering, referential linkage, conservation of totals, or a tolerated lag window.
Reconciliation Rule ↗	explains how mismatches become decisions. It prevents discrepancies from being handled ad hoc by defining when to repair, escalate, accept, quarantine, or document divergence.
Recovery Path ↗	defines how trusted state can be restored after corruption, loss, failed migration, unauthorized change, or incident response. A backup is not enough; recovery must be verified against the integrity invariants.
Integrity Exception Channel ↗	keeps unresolved anomalies from being silently treated as settled truth. It gives suspicious, stale, disputed, or unreconciled records a visible status until review is complete.

Common Mechanisms¶

Checksum or Hash Validation implements integrity preservation by detecting some forms of unintended alteration or file corruption. It is powerful for byte-level change detection, but it cannot prove that the original data was semantically correct.

Data Validation Schema implements the archetype by making expected structure and allowed values explicit. Schemas are useful when format and basic relationships can be formalized, but they do not replace authority, provenance, or recovery design.

Referential Integrity Constraint protects consistency among linked records. It is common in databases and registries where one record should not point to a nonexistent or invalid related record.

Access Control Enforcement protects authorized mutation paths. It reduces unauthorized change but does not by itself prove that authorized changes are correct.

Audit Log records what happened, when, and by whom or what. It supports review and accountability, but it must be connected to investigation and correction rather than treated as a passive paper trail.

Data Lineage Capture preserves transformation paths so downstream errors can be traced back to upstream sources, joins, derivations, or cleaning steps.

Source-of-Truth Registry implements an authority relation by documenting which source is authoritative for each claim, field, or decision context. It supports this archetype but also belongs near the distinct Source-of-Truth Assignment archetype.

Reconciliation Workflow compares records or systems, classifies discrepancies, resolves conflicts, and records correction decisions. It implements repair within the parent archetype.

Backup and Restore Verification implements recovery by proving that restoration is possible and that restored data still satisfies integrity invariants.

Transactional Write Control protects integrity during state changes by reducing partial writes, duplicate effects, and inconsistent intermediate states.

Integrity Anomaly Monitoring detects suspicious values, unexpected changes, missing records, impossible totals, duplication spikes, or stale data. It becomes meaningful only when alerts feed an exception or repair channel.

Access Control Enforcement — Restricts who or what may read, write, approve, delete, or restore protected data, so records change only through authorized paths and never through hidden side doors.
Audit Log — Keeps an append-only, attributable record of every action on protected data — who, when, and what changed — so integrity events can be investigated and reconstructed after the fact.
Backup and Restore Verification — Proves that protected data can actually be restored and that the restored records still satisfy their integrity invariants — not merely that a backup file exists.
Checksum or Hash Validation — Detects unintended alteration, transmission error, or corruption by comparing a freshly computed hash against a trusted reference value.
Data Lineage Capture — Records how each value moved through sources, transformations, joins, and derivations, so a suspect output can be traced back to the upstream step that produced it.
Data Validation Schema — Encodes the structure, types, allowed values, and cross-field rules a record must satisfy, rejecting malformed data at the boundary before it is trusted.
Integrity Anomaly Monitoring — Watches trusted data for impossible values, unexpected drift, duplication spikes, missing records, or staleness, and raises a visible exception when something looks wrong.
Reconciliation Workflow — Compares two records or states that should agree, classifies each discrepancy, and drives it to a repair, quarantine, or accepted-divergence decision that is recorded.
Referential Integrity Constraint — Prevents a record from pointing to a nonexistent or invalid related record, so links between data never dangle.
Source-of-Truth Registry — Documents, per field or claim, which system or role is authoritative — the reference that integrity checks and reconciliation consult to know which value should win.
Transactional Write Control — Groups related updates so they all commit or none do, keeping partial, duplicate, or inconsistent intermediate states out of trusted records.

Parameter / Tuning Dimensions¶

Key tuning dimensions include the strictness of validation, acceptable lag between related systems, level of provenance detail, retention duration, review depth for exceptions, recovery-point tolerance, recovery-time tolerance, audit granularity, and the scope tier assigned to different records.

The main design challenge is proportionality. Over-control makes systems slow, brittle, expensive, and privacy-invasive. Under-control lets corruption and disagreement propagate until downstream decisions fail. The correct setting depends on decision stakes, reversibility, detectability, repair cost, and the rights or safety interests affected.

Invariants to Preserve¶

The primary invariants are accuracy, consistency, authorized change, provenance continuity, freshness for time-sensitive data, recoverability to a known-good state, and explicit exception status for unresolved records.

These invariants should be phrased in terms of use. A record may be accurate enough for trend analysis but not for eligibility, payment, diagnosis, or legal evidence. Integrity preservation works best when it states what kind of trust a record must support.

Target Outcomes¶

A well-designed integrity-preservation system makes trusted state visible and maintainable. Users can tell whether a record is final, pending, corrected, stale, suspect, disputed, or restored. Errors are detected earlier. Discrepancies are repaired through explicit authority and reconciliation rules. Recovery restores trustworthy state rather than merely restoring stored data.

The deeper outcome is institutional confidence: people can rely on records without pretending that storage, logging, or validation alone creates truth.

Tradeoffs¶

Integrity controls consume time, storage, attention, and review capacity. Strong validation can reject unusual but legitimate cases. Detailed logging and provenance improve accountability but can create privacy and retention risks. Centralized authority simplifies conflict resolution but can become brittle when reality is distributed. Reconciliation improves consistency but can become a permanent patch for upstream defects.

The archetype therefore requires governance judgment, not only technical controls. More integrity machinery is not always better; the controls should be matched to the consequences of data failure.

Failure Modes¶

A common failure mode is integrity theater, where dashboards, logs, schemas, or certifications exist but do not change how records are trusted, disputed, repaired, or recovered. Another is the undefined invariant, where everyone wants accuracy but no one specifies accuracy relative to what source, time, or use.

Other failures include silent authority conflict, provenance stripped during transformation, stale truth, backups that preserve corrupted baselines, audit logs that miss real mutation paths, and reconciliation used as a permanent workaround instead of a signal that upstream design needs repair.

Neighbor Distinctions¶

This archetype is close to Source-of-Truth Assignment, but that neighbor chooses authority while Data Integrity Preservation protects the trustworthy state of records across their lifecycle. It is close to Traceability Linking, but traceability is only one part of integrity preservation. It is close to Conservation Accounting, but conservation accounting tracks a conserved quantity across transformations. It is close to Transactional Atomicity, but atomic commits are only one mechanism for preserving data integrity during mutation.

It also borders Invariant Guarding, Correspondence Validation, Versioned Evolution, and Reconciliation After Drift. The practical boundary is the center of gravity: if the work is preserving trustworthy data state, use this archetype; if the work is authority selection, old-new overlap validation, version strategy, or post-drift repair, use the more specific neighbor.

Cross-Domain Examples¶

In banking, data integrity preservation prevents duplicate charges, partial writes, unreconciled balances, and unauditable corrections. In healthcare, it distinguishes final results from pending or corrected results while preserving attribution. In research, it keeps datasets interpretable by preserving source metadata and transformations. In government services, it protects eligibility and case records from unauthorized or unexplained changes. In supply chains, it keeps digital custody and quantity records aligned with physical goods.

The same structure transfers because each domain has records that must remain trustworthy through movement, change, and recovery.

Non-Examples¶

A checksum alone is not Data Integrity Preservation. It can prove that bytes did not change, but not that the bytes were right. An audit log alone is not the archetype because a log can record bad or incomplete events. A dashboard alone is not the archetype because display is not preservation. A one-time cleanup is not the archetype unless it creates durable rules for preserving integrity afterward.

A debate over which record should be official is usually Source-of-Truth Assignment. A test showing that a new system behaves like an old one is usually Correspondence Validation. A conservation-of-total accounting exercise is Conservation Accounting.

Abstractions this archetype builds on — directly (a source ingredient) or as a related pattern. Links follow the typed catalog namespace.

Built directly on (2)

Data Integrity: Accuracy and consistency preserved.
Invariance: Properties unchanged under transformation.

Also references 11 related abstractions

Access Control: Restrict system access.
Accountability: Responsibility for actions.
Boundary: Defines system limits.
Closure: Ensures operations remain within a set.
Constraint: Limits possibilities to guide outcomes.
Fault Tolerance: Continue operating under failure.
Feedback: Outputs influence inputs.
Observability: Infer internal state externally.
Relation: Describes associations or dependencies.
Transaction: All-or-nothing operations.

▸ Show 1 more

Variants¶

Narrower or domain-specific specializations that share this archetype's core structure. Recognized variants are established; candidate variants are provisional.

Record Integrity Preservation · domain variant · recognized

Preserve the trustworthiness of official, legal, clinical, financial, administrative, or operational records across their lifecycle.

Distinct from parent: The parent covers any data or state; this variant emphasizes official-record trust, retention, custody, and evidentiary continuity.
Use when: {'condition': 'A record is used as evidence, authority, eligibility proof, accountability support, or operational memory.'}; {'condition': 'The problem is not only whether a value is technically valid but whether the record remains admissible, attributable, and complete.'}; {'condition': 'Record changes require review, retention, auditability, or chain-of-custody-like evidence.'}.
Typical domains: medical records, legal evidence, financial ledgers, public administration, inventory custody
Common mechanisms: Audit Log, Chain-of-Custody Record, Retention and Disposition Rule

Provenance Integrity Preservation · implementation variant · recognized

Preserve trustworthy origin, authorship, transformation, and custody information for data or records.

Distinct from parent: The parent protects data integrity broadly; this variant makes provenance the main integrity invariant.
Use when: {'condition': 'A value cannot be trusted unless its source, transformation path, or handler is known.'}; {'condition': 'Downstream errors require tracing back to upstream source or transformation defects.'}; {'condition': 'A decision must distinguish original evidence, copied evidence, derived evidence, and unverified claims.'}.
Typical domains: research data, supply-chain records, journalism and evidence review, machine-learning datasets
Common mechanisms: Data Lineage Capture, Signed Record Metadata, Custody Transfer Log

Cross-System Consistency Preservation · scale variant · recognized

Preserve agreement among related records, states, ledgers, schemas, or reports distributed across systems or organizations.

Distinct from parent: The parent may protect a single store or lifecycle; this variant focuses on multi-system agreement and tolerated discrepancy windows.
Use when: {'condition': 'Multiple systems hold overlapping representations of the same underlying entity, event, account, or obligation.'}; {'condition': 'Timing lag, replication, integration, manual entry, or migration can create divergent states.'}; {'condition': 'No single validation rule is enough because integrity depends on relations among representations.'}.
Typical domains: distributed databases, financial ledgers, inventory systems, case-management platforms, multi-agency public services
Common mechanisms: Reconciliation Workflow, Dual-Write Monitor, Consistency Check Report

Transactional Integrity Preservation · mechanism family variant · recognized

Preserve data integrity while related changes are written, committed, reversed, retried, or exposed to readers.

Distinct from parent: The parent governs integrity across the whole lifecycle; this variant focuses on writes, commits, retries, and partial-failure windows.
Use when: {'condition': 'A state change involves multiple related writes, approvals, systems, or side effects.'}; {'condition': 'Partial completion, duplicate retry, ordering error, or intermediate visibility could corrupt trusted state.'}; {'condition': 'Integrity depends on when the system treats a change as committed.'}.
Typical domains: banking transactions, order management, database updates, workflow approvals, records migration
Common mechanisms: Transactional Write Control, Idempotency Key, Commit Log

Recovery Integrity Preservation · risk or failure variant · candidate

Preserve or re-establish trusted data state after corruption, loss, unauthorized change, failed migration, or incident response.

Distinct from parent: The parent includes recovery as one component; this variant elevates post-failure trust restoration.
Use when: {'condition': 'The system can fail in ways that leave records corrupted, missing, stale, or inconsistent.'}; {'condition': 'Restoration is only safe if the restored state can be verified against integrity invariants.'}; {'condition': 'Operations must continue while integrity status is uncertain or under repair.'}.
Typical domains: incident response, database administration, digital preservation, clinical records, public services
Common mechanisms: Backup and Restore Verification, Point-in-Time Recovery, Post-Restore Integrity Check

Near names: Data Correctness Preservation, Record Integrity, Corruption Prevention, Consistency Guarding, Provenance Preservation, Data Quality Control, Integrity Controls.