Stationarity Validation¶

Check whether the assumptions that made past data or behavior predictive still hold before extrapolating.

Essence¶

Stationarity Validation protects decisions from stale assumptions. It asks whether the pattern, baseline, model behavior, measurement process, or policy context that made past evidence useful is still stable enough to support the current use. The archetype is not the statistical idea of stationarity itself; it is the intervention that makes assumed stability explicit, testable, auditable, and actionable.

In practice, this means naming the specific property assumed stable, comparing present conditions against a declared baseline, deciding whether observed differences are noise or material drift, and changing the decision rule when the old assumption no longer holds.

Compression statement¶

When decisions depend on historical patterns, Stationarity Validation names the property assumed stable, compares current conditions against a declared baseline, detects drift or regime change, and updates the model, forecast, rule, or policy before stale assumptions cause bad decisions.

Canonical formula: stationarity_assumption + stable_property_definition + baseline_reference_window + drift_indicator + regime_change_criterion + recalibration_rule + monitoring_cadence + assumption_status_record -> valid_or_limited_extrapolation

When to Use This Archetype¶

Use this archetype when a decision depends on history: a forecast trained on past data, a dashboard using old thresholds, a staffing model based on old demand, a policy built around old behavioral assumptions, or a research comparison across time. It is especially useful when the cost of false stability is high: models misclassify people, forecasts misallocate resources, policies become harmful, alerts go quiet, or dashboards continue to project confidence after the world has changed.

It is weaker when no extrapolation is being made, when a new baseline is already being built from scratch, or when the only issue is corrupted data rather than comparability across time or context.

Structural Problem¶

The structural problem is unvalidated extrapolation. A system treats past behavior as predictive because the past was once useful, but the underlying process may no longer be the same. The old baseline can persist through dashboards, operating procedures, contracts, machine-learning models, policy thresholds, or institutional habit. Because the artifact still looks authoritative, users may not notice that the conditions that made it valid have changed.

The key risk is not merely error. It is confidence without comparability: decisions appear evidence-based while quietly relying on a historical regime that no longer describes the present.

Intervention Logic¶

Stationarity Validation first identifies the extrapolation being made. It then defines the property assumed stable, sets the baseline reference window, chooses drift indicators, and establishes a criterion for when change becomes material. When the assumption fails or becomes uncertain, the system does not simply keep using the old baseline. It recalibrates, retrains, segments, relabels confidence, pauses use, or limits scope.

The intervention should leave a record: what was assumed, what was checked, what changed, how the decision was revised, and when the assumption should be checked again.

Key Components¶

Stationarity Validation protects decisions from stale assumptions by making assumed stability explicit, testable, auditable, and actionable. The pattern starts with the Stationarity Assumption, which states exactly what is being treated as stable enough for past observations to remain predictive — the property, the population or process it applies to, the time span, and the decision that depends on it. The Stable Property Definition sharpens this further by naming the specific rate, distribution, variance, relationship, baseline behavior, input mix, or measurement process whose stability matters, replacing loose claims that "things are normal" with something testable. The Baseline Reference Window declares the historical period, cases, cohorts, regions, or operating conditions used for comparison, since a stale or contaminated window can make a changed regime look stationary or make ordinary variation look like drift. Drift Indicator provides the signals — quantitative, qualitative, operational, or contextual — that the generating process or measurement conditions may have shifted, tied directly to the stable property rather than firing as generic alerts.

The remaining components convert detection into governed response. The Regime Change Criterion defines when observed drift becomes material enough to treat the prior baseline as no longer valid, combining thresholds, expert review, statistical tests, and known contextual breaks to prevent both overreaction to noise and underreaction to structural change. The Recalibration Rule specifies what to update, pause, constrain, retrain, re-estimate, or revalidate when stationarity fails — because detection without consequence is theater. Monitoring Cadence sets when checks happen, whether continuously, periodically, at known change points, after incidents, or before high-stakes extrapolation. The Assumption Status Record makes the assumption's validity auditable by marking it current, questionable, failed, revised, or out of scope, especially important when models, reports, dashboards, or policies are reused by people who did not perform the validation. The Segment-Specific Baseline separates baselines for subpopulations, regions, products, or workflows that may drift differently, catching subgroup change that aggregate stability would hide. Finally, the Scope Limit or Pause Rule restricts use of a model, forecast, rule, or policy while stationarity is uncertain or under review, preventing an invalid baseline from creating harm before recalibration is complete.

Component	Description
Stationarity Assumption ↗	States exactly what is being assumed stable enough for past observations to remain predictive or governable. The assumption should name the property, the population or process it applies to, the time span, and the decision that depends on it. Without this component, stationarity validation collapses into vague caution about change.
Stable Property Definition ↗	Defines the property whose stability matters, such as a rate, distribution, variance, relationship, baseline behavior, input mix, measurement process, or decision consequence. The property must be observable or testable enough to support action. It should not be a loose claim that “things are normal”; it should specify what normal means for this decision context.
Baseline Reference Window ↗	Specifies the historical period, cases, cohorts, regions, or operating conditions used as the comparison baseline. A baseline is only useful when its scope is declared. A stale or contaminated window can make a changed regime look stationary or make ordinary seasonal variation look like drift.
Drift Indicator ↗	Provides signals that the generating process, input mix, measurement conditions, or outcome relationship may have shifted. Drift indicators can be quantitative, qualitative, operational, or contextual. They should be tied to the stable property definition and not treated as generic alerts.
Regime Change Criterion ↗	Defines when observed drift becomes material enough to treat the prior baseline, model, forecast, or policy as no longer valid without revision. This criterion prevents both overreaction to noise and underreaction to structural change. It may combine thresholds, expert review, statistical tests, and known contextual breaks.
Recalibration Rule ↗	Specifies what to update, pause, constrain, retrain, re-estimate, or revalidate when stationarity no longer holds. Stationarity validation is incomplete if it only detects drift. The system needs a rule that connects failed validation to decision change, model revision, policy adjustment, or scope limitation.
Monitoring Cadence ↗	Sets when stationarity will be checked: continuously, periodically, at known change points, after incidents, or before high-stakes extrapolation. Cadence should reflect decision velocity, cost of false stability, availability of data, and how quickly the underlying process can change.
Assumption Status Record ↗	Records whether the stationarity assumption is current, questionable, failed, revised, or out of scope for the intended decision. The record makes assumption validity auditable. It is especially important where models, reports, dashboards, or policies are reused by people who did not perform the validation.
Segment-Specific Baseline ↗	Separates baselines for subpopulations, regions, products, channels, instruments, or workflows that may drift differently. Useful when aggregate stability hides subgroup change or when local regimes differ materially from the pooled baseline.
Scope Limit or Pause Rule ↗	Limits the use of a model, forecast, rule, or policy while stationarity is uncertain or under review. Useful in high-stakes contexts where applying an invalid baseline may create harm before recalibration is complete.

Common Mechanisms¶

The mechanisms below are ways to implement the archetype. They should not be confused with the archetype itself. A test, dashboard, or chart can reveal possible nonstationarity, but Stationarity Validation also requires declared assumptions, decision consequences, ownership, and recalibration logic.

Mechanism	Description
Stationarity Test ↗	Tests whether the process that generated past lifetimes is still the same process, the precondition for treating survival so far as evidence about survival ahead.
Model Drift Monitoring ↗	Watches a live predictor for the slow slide where yesterday's model quietly stops fitting today's world — before the residuals it suppresses start hiding real change.
Process Control Chart ↗	This is a metric_or_dashboard that implements Stationarity Validation by helping determine whether past evidence still applies. Visualizes whether process measurements remain within expected bounds or show patterns that suggest special-cause variation. A control chart helps distinguish ordinary noise from meaningful process change; it is not the full archetype unless connected to baseline ownership and revision rules.
Forecast Backtesting ↗	Replays a predictor against withheld history — across time, segments, and regimes — to earn or deny the right to suppress its residuals.
Change-Point Detection ↗	Flags the moment the target jumps to a new regime — an abrupt discontinuity the current tracking mode can no longer follow — so the loop switches modes instead of chasing a break as if it were noise.
Rolling Window Comparison ↗	Quantifies how much the target, state, and error distributions have drifted by comparing a recent window against earlier ones — turning gradual staleness into a measured magnitude rather than a yes/no event.
Baseline Validation Review ↗	This is a procedure that implements Stationarity Validation by helping determine whether past evidence still applies. Periodically reviews whether a baseline remains relevant before it is reused for targets, alerts, quotas, forecasts, audits, or policy decisions. This mechanism is common in operations and governance settings where the risk is stale baseline reuse rather than formal statistical modeling error.
Policy Assumption Audit ↗	This is a procedure that implements Stationarity Validation by helping determine whether past evidence still applies. Checks whether the environmental and behavioral assumptions behind a rule, policy, or governance process still hold. This mechanism brings stationarity validation into institutional and policy work, where the “data-generating process” may be social, legal, operational, or organizational.

Parameter / Tuning Dimensions¶

Important tuning dimensions include the baseline window length, drift sensitivity, monitoring frequency, segment granularity, recalibration aggressiveness, and uncertainty tolerance. A short baseline may adapt quickly but mistake noise for change. A long baseline may support continuity but import stale conditions. Aggregate validation may be efficient, but segment-level validation may be necessary when subgroup drift creates harm or material error.

The right tuning depends on decision stakes, change velocity, data quality, and how costly it is to pause or revise the decision rule.

Invariants to Preserve¶

The archetype preserves several invariants. The status of the stationarity assumption should be visible. Baselines should be used only where past and present conditions remain comparable enough. Models and forecasts should remain valid within their declared scope. Measurement continuity should be checked before observed changes are interpreted as real-world changes. Recalibration should be traceable so a new baseline does not erase the history of why the old one stopped applying.

Target Outcomes¶

A successful Stationarity Validation intervention produces fewer stale-model decisions, earlier detection of process change, better-calibrated forecasts and policies, clearer uncertainty labels, and safer transitions between old and new operating regimes. It also improves institutional memory because later users can see which assumptions were current, uncertain, failed, or revised.

Tradeoffs¶

The main tradeoff is validation overhead versus stale-assumption risk. More checking can slow decisions, but less checking can create confident errors. Another tradeoff is sensitivity versus noise: early drift detection is valuable, but over-sensitive validation can constantly reset baselines. There is also a continuity-versus-adaptation tension. Keeping a stable baseline supports long-term comparison, while revising it supports current fit.

Failure Modes¶

Common failure modes include false stability, where old baselines are trusted because they are familiar; false regime shift, where ordinary noise is mistaken for structural change; test without consequence, where stationarity tests are run but no decision changes; aggregate masking, where overall stability hides subgroup drift; measurement drift misread as system drift; perpetual baseline reset; and validation theater, where review rituals do not alter practice.

The usual mitigation is to tie every validation mechanism to a named stable property, drift criterion, owner, recalibration rule, and assumption-status record.

Neighbor Distinctions¶

Stationarity Validation is distinct from Adaptive Fit Monitoring, which asks whether the whole system design still fits its environment. It is distinct from Adaptive Threshold Recalibration, which changes thresholds after conditions change; stationarity validation may justify threshold changes, but it is not merely threshold tuning. It is distinct from State Estimation, which infers the current hidden state; stationarity validation asks whether the inference rule remains valid. It is distinct from Correspondence Validation, which checks whether a new model or system preserves old valid behavior in an overlap domain. It is also distinct from Data Integrity Preservation, which protects data accuracy and provenance; stationarity validation assumes integrity may be necessary but asks a different question about comparability and extrapolation.

The second-wave candidate Uniformity Assumption Audit should remain under review. It may broaden the pattern from time-based stability to cross-context transfer across places, cases, and historical periods.

Cross-Domain Examples¶

In machine learning, a fraud model is revalidated after attacker behavior and transaction channels shift. In operations, a warehouse revalidates productivity baselines after automation changes the task mix. In public policy, eligibility thresholds are reviewed when economic conditions change the assumptions behind them. In analytics, conversion-rate trends are qualified after logging definitions change. In healthcare operations, triage and wait-time baselines are rechecked when patient mix and protocols shift.

Across all these domains, the same structure appears: do not reuse historical patterns until the stability assumption behind the reuse has been checked.

Non-Examples¶

A one-off statistical stationarity test with no decision consequence is not this archetype. A dashboard that merely displays data drift is not the archetype unless it is tied to assumption status and recalibration. Routine model retraining is not enough if it never asks what changed. A pure data-corruption incident is primarily a data-integrity problem. Creating a first baseline for a new process is baseline construction, not validation of an existing stationarity assumption.

Abstractions this archetype builds on — directly (a source ingredient) or as a related pattern. Links follow the typed catalog namespace.

Built directly on (2)

Foreseeing (Prediction): Predict future states.
Stationarity: Stable statistical properties.

Also references 14 related abstractions

Adaptation: Systems adjust to conditions.
Bayesian Updating: Update beliefs with evidence.
Confidence Intervals: Range of plausible values.
Continuity: Smooth change without jumps.
Continuity vs. Rupture: Gradual vs abrupt change.
Data Integrity: Accuracy and consistency preserved.
Environmental Scanning: Analyze external factors.
Feedback: Outputs influence inputs.
Hypothesis Testing (Null vs. Alternative): Null vs alternative evaluation.
Observability: Infer internal state externally.

▸ Show 4 more

Variants¶

Narrower or domain-specific specializations that share this archetype's core structure. Recognized variants are established; candidate variants are provisional.

Model Stationarity Validation · domain variant · recognized

Validates that the assumptions behind a predictive, statistical, or machine-learning model still hold for current use.

Distinct from parent: The parent can validate any historical-to-present extrapolation; this variant focuses on models and forecasting systems.
Use when: A model is trained or calibrated on past data and used for current or future decisions; Inputs, populations, measurement conditions, or target relationships may have shifted; The cost of applying a stale model is high enough to require explicit validation.
Typical domains: machine learning, risk scoring, demand forecasting, clinical prediction, fraud detection
Common mechanisms: model drift monitoring, forecast backtesting, rolling window comparison

Operational Baseline Validation · domain variant · recognized

Checks whether operational baselines, alert thresholds, capacity assumptions, or performance norms remain valid after process or context changes.

Distinct from parent: The parent is general; this variant specializes in operational metrics and process control contexts.
Use when: Targets, alerts, service levels, staffing plans, or performance comparisons rely on an older operating baseline; Changes in tools, demand, staff, supply, seasonality, or workflow may have changed what “normal” means; Teams need to avoid both stale-baseline complacency and false alarms from healthy process change.
Typical domains: manufacturing, customer support, healthcare operations, site reliability, logistics
Common mechanisms: process control chart, baseline validation review, rolling window comparison

Policy Assumption Revalidation · governance variant · recognized

Rechecks whether a policy or rule still rests on valid assumptions about behavior, context, incentives, risks, and institutional capacity.

Distinct from parent: The parent includes technical and statistical uses; this variant emphasizes institutional review and policy maintenance.
Use when: A rule or policy was designed under conditions that may no longer hold; The policy is repeatedly extended, reused, or automated without assumption review; The domain has changed through regulation, technology, social behavior, threat patterns, or resource availability.
Typical domains: public policy, compliance, risk governance, workplace policy, education administration
Common mechanisms: policy assumption audit, baseline validation review

Measurement Stationarity Check · implementation variant · recognized

Checks whether the measurement process itself remains comparable before interpreting observed changes as real system change.

Distinct from parent: The parent validates stationarity of decision assumptions broadly; this variant focuses on whether observations remain comparable.
Use when: Instruments, sensors, survey methods, logging pipelines, definitions, or coding practices may have changed; Observed trends could be artifacts of measurement drift rather than changes in the underlying system; Decisions require comparing data collected under different measurement conditions.
Typical domains: sensor networks, survey research, analytics logging, medical testing, audit sampling
Common mechanisms: rolling window comparison, baseline validation review, change point detection

Regime Shift Gate · temporal variant · candidate

Adds a decision gate when evidence suggests the system has crossed into a materially different operating regime.

Distinct from parent: The parent validates stationarity generally; this variant focuses on decisive boundaries where baseline use must be paused or segmented.
Use when: A known event, threshold, shock, intervention, or environmental change may have changed the generating process; Historical baselines may still be informative but cannot be reused without a regime-specific decision; The main risk is applying old rules after a structural break.
Typical domains: crisis management, market forecasting, policy evaluation, security operations, epidemiology
Common mechanisms: change point detection, baseline validation review, policy assumption audit

Near names: Stationarity Check, Stationarity Test, Baseline Validation, Model Drift Monitoring, Model Drift Dashboard, Process Control Chart, Regime Change Test, Forecast Backtesting, Assumption Stability Audit.