Data Integrity Preservation¶
Essence¶
Data Integrity Preservation is the pattern of keeping data trustworthy while it is created, changed, copied, interpreted, migrated, archived, and restored. It does not assume that a value remains reliable merely because it is stored somewhere. Instead, it asks what must stay true for the data to support action: accuracy, consistency, attribution, freshness, provenance, and recoverability.
The archetype is useful whenever data becomes a basis for decisions, obligations, rights, safety, money, memory, or evidence. In those cases, integrity is not a decorative quality-control layer. It is the structure that lets a system tell the difference between a trusted record, a suspect record, a stale record, a disputed record, and a repaired record.
Compression statement¶
When decisions depend on data that can be corrupted, duplicated, desynchronized, altered, stale, or detached from its source, Data Integrity Preservation defines integrity invariants, constrains authorized mutation paths, validates lifecycle transitions, records provenance, reconciles mismatches, and maintains recovery paths so trusted data state remains usable.
Canonical formula: trusted_data_state = integrity_invariants + authorized_mutation_paths + validation + provenance + reconciliation + recovery
When to Use This Archetype¶
Use this archetype when decisions depend on records that can be corrupted, duplicated, desynchronized, altered without authority, detached from provenance, or restored from an uncertain state. It fits databases, ledgers, case files, research datasets, safety logs, supply-chain records, documentation systems, and any workflow where downstream users need to know whether the data can be relied on.
It is especially relevant at lifecycle boundaries: intake, import, transformation, write, approval, migration, synchronization, export, archival, restoration, and deletion. These are the moments when a record can silently lose meaning even if it still looks formally valid.
Structural Problem¶
The structural problem is that data must remain stable enough to guide action while also moving through imperfect people, systems, formats, permissions, transformations, and time. A record that was once accurate can become stale. A field that is valid in isolation can contradict another system. A file can be unchanged but wrong at source. A log can exist while the real change path went around it.
This produces a trust gap: the system has data, but it cannot confidently say what the data means, where it came from, whether it is current, who changed it, or how to repair it when it disagrees with another representation.
Intervention Logic¶
The intervention begins by naming the data or record whose trust matters and defining the integrity invariants that must be preserved. The designer then maps the record lifecycle to locate vulnerable transitions, constrains who or what may mutate protected state, places validation and evidence mechanisms at the right boundaries, and defines how discrepancies are reconciled.
The most important move is to connect controls to decisions. A checksum, audit log, schema, permission table, or backup only becomes part of Data Integrity Preservation when it protects a named integrity invariant and feeds a correction, review, or recovery path. Otherwise it is a mechanism without an archetype.
Key Components¶
Data Integrity Preservation keeps records trustworthy across their full lifecycle by stating what trust means, locating where it is vulnerable, and connecting controls to decisions rather than to ceremony. The work begins with an Integrity Invariant — the condition that must remain true, whether a correct balance, a valid identity relation, an attributable change history, or a nonduplicated transaction — without which teams cannot tell actual corruption from mere difference. The Record Lifecycle Boundary then locates where data is born, transformed, copied, approved, migrated, archived, restored, or retired, because integrity failures concentrate at handoffs and lifecycle transitions rather than at rest. The Validation Rule translates the invariant into checks at those boundaries — schemas, range checks, referential constraints, plausibility tests, or human review criteria.
Four components govern authority, change, and history. The Authoritative State Relation names which representation has authority for which claim, since many integrity failures are not isolated errors but conflicts between systems that each look plausible. The Authorized Mutation Path restricts legitimate creation, update, deletion, migration, and restoration to known routes, protecting against hidden side paths that alter records without evidence. The Provenance Record preserves origin and transformation history so users can understand where a value came from, how it changed, and whether its path supports the use being proposed. The Consistency Relation defines how separate records, fields, ledgers, or versions should agree — through equality, derivation, ordering, referential linkage, or a tolerated lag window.
Three final components handle discrepancy, failure, and ambiguity. The Reconciliation Rule converts mismatches into decisions by specifying when to repair, escalate, accept, quarantine, or document divergence, rather than letting each conflict be handled ad hoc. The Recovery Path defines how trusted state is restored after corruption, loss, failed migration, or unauthorized change — and crucially, recovery must be verified against the integrity invariants rather than treated as adequate just because a backup exists. The Integrity Exception Channel keeps suspicious, stale, disputed, or unreconciled records visibly flagged until review is complete, preventing unresolved anomalies from being silently treated as settled truth.
| Component | Description |
|---|---|
| Integrity Invariant ↗ | defines the condition that must remain true. It might be a correct balance, a valid identity relation, an up-to-date status, a nonduplicated transaction, or an attributable change history. Without this component, teams cannot distinguish actual corruption from mere difference. |
| Record Lifecycle Boundary ↗ | shows where the data is born, transformed, copied, approved, read, archived, restored, or retired. Integrity failures often happen at handoffs, imports, migrations, exports, and restores, so the lifecycle boundary tells the system where protection must be placed. |
| Validation Rule ↗ | translates the invariant into checks. A validation rule can be a schema, range check, semantic rule, referential constraint, plausibility test, human review criterion, or cross-record comparison. |
| Authoritative State Relation ↗ | identifies which representation has authority for which claim. This component is needed because many integrity failures are not isolated errors; they are conflicts between systems that each appear plausible. |
| Provenance Record ↗ | preserves origin and transformation history. It helps users understand where a value came from, how it changed, and whether its path supports the use being proposed. |
| Authorized Mutation Path ↗ | restricts legitimate creation, update, deletion, migration, and restoration. It protects the system from hidden side paths that alter records without evidence or review. |
| Consistency Relation ↗ | defines how separate records, fields, ledgers, systems, or versions should agree. It can express equality, derivation, ordering, referential linkage, conservation of totals, or a tolerated lag window. |
| Reconciliation Rule ↗ | explains how mismatches become decisions. It prevents discrepancies from being handled ad hoc by defining when to repair, escalate, accept, quarantine, or document divergence. |
| Recovery Path ↗ | defines how trusted state can be restored after corruption, loss, failed migration, unauthorized change, or incident response. A backup is not enough; recovery must be verified against the integrity invariants. |
| Integrity Exception Channel ↗ | keeps unresolved anomalies from being silently treated as settled truth. It gives suspicious, stale, disputed, or unreconciled records a visible status until review is complete. |
Common Mechanisms¶
Checksum or Hash Validation implements integrity preservation by detecting some forms of unintended alteration or file corruption. It is powerful for byte-level change detection, but it cannot prove that the original data was semantically correct.
Data Validation Schema implements the archetype by making expected structure and allowed values explicit. Schemas are useful when format and basic relationships can be formalized, but they do not replace authority, provenance, or recovery design.
Referential Integrity Constraint protects consistency among linked records. It is common in databases and registries where one record should not point to a nonexistent or invalid related record.
Access Control Enforcement protects authorized mutation paths. It reduces unauthorized change but does not by itself prove that authorized changes are correct.
Audit Log records what happened, when, and by whom or what. It supports review and accountability, but it must be connected to investigation and correction rather than treated as a passive paper trail.
Data Lineage Capture preserves transformation paths so downstream errors can be traced back to upstream sources, joins, derivations, or cleaning steps.
Source-of-Truth Registry implements an authority relation by documenting which source is authoritative for each claim, field, or decision context. It supports this archetype but also belongs near the distinct Source-of-Truth Assignment archetype.
Reconciliation Workflow compares records or systems, classifies discrepancies, resolves conflicts, and records correction decisions. It implements repair within the parent archetype.
Backup and Restore Verification implements recovery by proving that restoration is possible and that restored data still satisfies integrity invariants.
Transactional Write Control protects integrity during state changes by reducing partial writes, duplicate effects, and inconsistent intermediate states.
Integrity Anomaly Monitoring detects suspicious values, unexpected changes, missing records, impossible totals, duplication spikes, or stale data. It becomes meaningful only when alerts feed an exception or repair channel.
Parameter / Tuning Dimensions¶
Key tuning dimensions include the strictness of validation, acceptable lag between related systems, level of provenance detail, retention duration, review depth for exceptions, recovery-point tolerance, recovery-time tolerance, audit granularity, and the scope tier assigned to different records.
The main design challenge is proportionality. Over-control makes systems slow, brittle, expensive, and privacy-invasive. Under-control lets corruption and disagreement propagate until downstream decisions fail. The correct setting depends on decision stakes, reversibility, detectability, repair cost, and the rights or safety interests affected.
Invariants to Preserve¶
The primary invariants are accuracy, consistency, authorized change, provenance continuity, freshness for time-sensitive data, recoverability to a known-good state, and explicit exception status for unresolved records.
These invariants should be phrased in terms of use. A record may be accurate enough for trend analysis but not for eligibility, payment, diagnosis, or legal evidence. Integrity preservation works best when it states what kind of trust a record must support.
Target Outcomes¶
A well-designed integrity-preservation system makes trusted state visible and maintainable. Users can tell whether a record is final, pending, corrected, stale, suspect, disputed, or restored. Errors are detected earlier. Discrepancies are repaired through explicit authority and reconciliation rules. Recovery restores trustworthy state rather than merely restoring stored data.
The deeper outcome is institutional confidence: people can rely on records without pretending that storage, logging, or validation alone creates truth.
Tradeoffs¶
Integrity controls consume time, storage, attention, and review capacity. Strong validation can reject unusual but legitimate cases. Detailed logging and provenance improve accountability but can create privacy and retention risks. Centralized authority simplifies conflict resolution but can become brittle when reality is distributed. Reconciliation improves consistency but can become a permanent patch for upstream defects.
The archetype therefore requires governance judgment, not only technical controls. More integrity machinery is not always better; the controls should be matched to the consequences of data failure.
Failure Modes¶
A common failure mode is integrity theater, where dashboards, logs, schemas, or certifications exist but do not change how records are trusted, disputed, repaired, or recovered. Another is the undefined invariant, where everyone wants accuracy but no one specifies accuracy relative to what source, time, or use.
Other failures include silent authority conflict, provenance stripped during transformation, stale truth, backups that preserve corrupted baselines, audit logs that miss real mutation paths, and reconciliation used as a permanent workaround instead of a signal that upstream design needs repair.
Neighbor Distinctions¶
This archetype is close to Source-of-Truth Assignment, but that neighbor chooses authority while Data Integrity Preservation protects the trustworthy state of records across their lifecycle. It is close to Traceability Linking, but traceability is only one part of integrity preservation. It is close to Conservation Accounting, but conservation accounting tracks a conserved quantity across transformations. It is close to Transactional Atomicity, but atomic commits are only one mechanism for preserving data integrity during mutation.
It also borders Invariant Guarding, Correspondence Validation, Versioned Evolution, and Reconciliation After Drift. The practical boundary is the center of gravity: if the work is preserving trustworthy data state, use this archetype; if the work is authority selection, old-new overlap validation, version strategy, or post-drift repair, use the more specific neighbor.
Variants and Near Names¶
Important variants include Record Integrity Preservation, where the protected object is an official or evidentiary record; Provenance Integrity Preservation, where origin and transformation history are the main integrity property; Cross-System Consistency Preservation, where the primary risk is divergence among representations; Transactional Integrity Preservation, where the vulnerable moment is a write or commit; and Recovery Integrity Preservation, where the concern is restoring trusted state after failure.
Near names include data correctness preservation, corruption prevention, integrity controls, data quality control, record integrity, and provenance preservation. These should usually point back to this archetype or one of its variants unless the case is really a mechanism such as a checksum, audit log, schema, backup, or access-control list.
Cross-Domain Examples¶
In banking, data integrity preservation prevents duplicate charges, partial writes, unreconciled balances, and unauditable corrections. In healthcare, it distinguishes final results from pending or corrected results while preserving attribution. In research, it keeps datasets interpretable by preserving source metadata and transformations. In government services, it protects eligibility and case records from unauthorized or unexplained changes. In supply chains, it keeps digital custody and quantity records aligned with physical goods.
The same structure transfers because each domain has records that must remain trustworthy through movement, change, and recovery.
Non-Examples¶
A checksum alone is not Data Integrity Preservation. It can prove that bytes did not change, but not that the bytes were right. An audit log alone is not the archetype because a log can record bad or incomplete events. A dashboard alone is not the archetype because display is not preservation. A one-time cleanup is not the archetype unless it creates durable rules for preserving integrity afterward.
A debate over which record should be official is usually Source-of-Truth Assignment. A test showing that a new system behaves like an old one is usually Correspondence Validation. A conservation-of-total accounting exercise is Conservation Accounting.