Circuit Breaker¶
Intent¶
Preserve the viability of a coupled system under active cascading failure by severing downstream load from an overloaded upstream component, and by reducing the aggregate flow into that component until it can stabilize. The archetype does not optimize for throughput or fairness; it trades both for survival.
Primes¶
Composed of: Boundary, Feedback, Sampling (Representativeness)
Structural Signature¶
This archetype is applicable when all of the following are present:
- A flow (requests, current, capital, traffic) is moving from one part of the system to another.
- An upstream or shared component is reaching a capacity constraint — utilization is climbing, latency is spiking, error rate is rising, or some equivalent saturation signal.
- There is coupling such that the failing component's degradation is being transmitted to other components rather than absorbed locally, turning a local problem into a cascade.
When those three signals co-occur, this archetype is often the right composition to reach for.
Composition¶
This archetype is assembled from three prime abstractions:
- Boundary — A controllable seam is introduced between the failing component and its callers. When the seam is "open," flow is blocked; when "closed," flow resumes. The boundary is the mechanism by which local failure is prevented from propagating.
- Sampling — Rather than admitting all inbound flow, only a representative fraction is allowed through. This is load shedding at the population level: the sacrifice of some requests so that the remainder can be served.
- Feedback — The boundary and sampling rate are not static. They are continuously adjusted in response to an observed signal (latency, error rate, queue depth) coming back from the protected component. Without feedback, the archetype is a kill switch, not a breaker.
The feedback loop is what distinguishes a circuit breaker from a simple outage: it opens when saturation is detected, closes when the downstream recovers, and re-probes cautiously rather than restoring full load instantly.
Worked Example¶
A Site Reliability Engineer facing a cascading failure in a distributed database cannot deploy new code (the pipeline is too slow and the risk of new bugs is too high). They assemble an intervention using the materials at hand:
- Boundary: activate a circuit breaker to sever the connection between the web tier and the database.
- Sampling: implement load shedding to drop 50% of incoming traffic.
- Feedback: watch the latency metrics to see if the system stabilizes.
The system recovers. The solution was not optimal — some users were dropped — but it was viable. It was constructed in minutes using universal structural concepts applied to a specific technological emergency.
The SRE did not memorize incident response; they recognized the structural signature (flow / capacity / coupling) and assembled a response.
Invariants to Preserve¶
When deploying this archetype, name what must remain true even as service is degraded. Common invariants:
- Data integrity cannot be compromised (no partial writes, no silent drops of already-acknowledged work).
- The system must be able to re-probe and recover without human intervention once downstream stabilizes.
- Rejected flow must be rejected cleanly (with a predictable error) rather than queued indefinitely.
If any of these invariants cannot be held, this archetype is the wrong composition and a different one is needed.
Failure Modes¶
- Cargo Culting — Applying "Circuit Breaker" as a vibe rather than a mechanism. If the sensor, threshold, and action loop are not concretely named, there is no feedback loop, only a hope.
- Overfitting — Forcing this archetype onto a situation where the real problem is not flow-plus-capacity-plus-coupling but something else (e.g., a data corruption issue, a correctness bug, or an authorization failure). The breaker will trigger on the wrong signal and hide the real fault.
- Static Thresholds — Treating the breaker as stateless. Without hysteresis, the breaker will oscillate between open and closed as the system hovers at saturation, producing worse behavior than no breaker at all.
Cross-Domain Instances¶
- Software (SRE) — The canonical example above. See the Netflix Hystrix / resilience4j lineage.
- Finance — Exchange-level trading halts during extreme volatility. The flow is order submission; the capacity constraint is price-discovery bandwidth; the coupling is that a disorderly market in one instrument contaminates others.
- Power Grids — Protective relays on transformers. The flow is current; the capacity constraint is thermal limits; the coupling is that an overloaded line can cascade to adjacent infrastructure.
- Human Systems — An on-call escalation policy that takes a person off rotation after repeated pages. The flow is incidents; the capacity constraint is cognitive capacity; the coupling is that a saturated responder degrades the team's overall response quality.
Notes¶
(placeholder for later refinement — in particular, the cross-domain list could be expanded, and the relationship to the Fault Tolerance and Fail-Safe prime abstractions deserves its own section.)