Failure Mode and Effects Analysis (FMEA)¶

Prime #: 293
Origin domain: Engineering & Design
Also from: Organizational & Management Science, Systems Thinking & Cybernetics
Aliases: FMEA, Failure analysis, Effects analysis, Risk prioritization
Related primes: fault tree analysis, Error Proofing (Poka-Yoke), Redundancy, Margin of Safety, risk management

Core Idea¶

FMEA is a systematic, step-by-step method for identifying potential failure modes in a product, process, or system—evaluating their causes and effects so that designers can prioritize and mitigate the most severe or likely issues before they occur.

How would you explain it like I'm…

Think About What Breaks

Before a big trip, a careful grown-up imagines all the things that could go wrong: flat tire, no gas, lost map, dead phone. For each one they ask, "how bad would that be? How likely is it? Would we even notice?" Then they pack a spare tire and charger for the worst, most likely, sneakiest problems. That's what engineers do for rockets and cars, but with a checklist.

Listing What Could Go Wrong

FMEA is a careful checklist engineers run *before* they build something, to list every way a part could break, why it would break, and what would happen if it did. For each failure they give three scores: how bad it is, how often it might happen, and how easy it is to catch before it hurts anyone. Multiply those scores and you get a "risk number" that says which problems to fix first. The whole point is to find scary problems on paper, when they're cheap to fix, instead of discovering them in a real crash.

Systematic Failure Audit

Failure Mode and Effects Analysis is a structured way of asking, *before deployment*, "what could go wrong, what would happen, and which problems deserve attention first?" A team walks through every component and subsystem, lists each way it could fail (a *failure mode*), traces the *cause* and the *effect* on the wider system, and scores each failure on three dimensions: severity (how bad if it happens), occurrence (how likely), and detectability (how easily it would be caught before harm). Multiply the three to get a Risk Priority Number (RPN), and you have a ranked list telling you where to spend your mitigation budget. Built originally for NASA's manned spaceflight program, FMEA is now standard in aerospace, automotive, and medical devices.

FMEA — Failure Mode and Effects Analysis — is a systematic, structured methodology for identifying and evaluating potential failure modes in a product, process, or system before deployment. It comprises (1) exhaustive enumeration of the ways a component or subsystem can fail, (2) tracing each failure mode back to its root causes and forward to its effects on system operation and safety, (3) scoring each failure on severity (consequence to user or mission), occurrence (likelihood), and detectability (likelihood the failure is caught before reaching the user), (4) computing a Risk Priority Number (RPN) as the product of those three scores to prioritize mitigation, and (5) designing and implementing countermeasures for high-RPN failures, then re-scoring to verify effectiveness. The deeper commitment is *systematic exhaustiveness*: rather than design-and-hope (reactive discovery via test or field failure), FMEA mandates that the team explicitly map what can go wrong, evaluate consequences upfront, and design controls before deployment. The practice converts the unbounded question "what could go wrong?" into a bounded, enumerable problem: walk through components, apply patterns from prior failures and design standards, rate each mode, and focus resources on high-impact mitigations. Originating in 1960s NASA manned-spaceflight requirements, formalized in MIL-STD-1629A, it is now standard in automotive (AIAG-VDA Handbook), medical devices (FDA guidance), and other safety-critical domains. FMEA does not prevent failures — it makes failure analysis systematic and repeatable so common modes are not overlooked and high-consequence ones receive proportional attention.

Broad Use¶

Automotive & Aerospace: Engineers apply FMEA to each subsystem (brakes, engine, avionics) to detect critical points of failure and reduce safety risks.
Software Development: Teams analyze possible ways a feature or module could fail (e.g., input errors, load spikes), assessing impact and likelihood to guide protective measures.
Healthcare: Hospital staff might evaluate how a medication administration process could fail at each step, preventing dangerous errors.

Clarity¶

Emphasizes the proactive mindset: find and assess weaknesses up front rather than reacting post-disaster. Helps teams systematically move through each element, cause, and effect.

Manages Complexity¶

Breaks down a large system into manageable chunks—failure modes—and quantifies severity, occurrence probability, and detection difficulty. This structured approach prevents confusion and oversight.

Abstract Reasoning¶

Demonstrates how mapping possible failure modes creates a conceptual model of risk, clarifying relationships among components, environment, and user interactions.

Knowledge Transfer¶

Public Policy: Identifying policy "failures" and their ripple effects (e.g., an unforeseen loophole) before a law is enacted.
Event Planning: Checking logistical points where an event could derail (ticketing, crowd flow, electrical supply).
Educational Assessment: Pinpointing failure modes in a curriculum design (e.g., insufficient practice tasks) that might undermine student learning.

Example¶

In car seat design, an FMEA might list failure modes like latch not securing, foam degrading, or harness tension issues, then rank each by potential injury risk to ensure the highest risks are addressed first.

Relationships to Other Abstractions¶

Current abstraction Failure Mode and Effects Analysis (FMEA) Prime

Parents (2) — more general patterns this builds on

Failure Mode and Effects Analysis (FMEA) presupposes Risk Prime

FMEA ranks prospective failure modes by risk-bearing consequences and likelihood or detectability dimensions.
Failure Mode and Effects Analysis (FMEA) is a decomposition of Decomposition Prime

FMEA decomposes a system or process into distinct failure modes, causes, local effects, and downstream effects for separate evaluation.

Hierarchy paths (4) — routes to 4 parentless roots

Failure Mode and Effects Analysis (FMEA) → Risk → Uncertainty

Show alternative paths (3)

Not to Be Confused With¶

Failure Mode and Effects Analysis (FMEA) is not Stakeholder Analysis because FMEA maps failure modes and their effects on system function, whereas Stakeholder Analysis identifies who is affected by and influences decisions or outcomes.
Failure Mode and Effects Analysis (FMEA) is not Pareto Effect (80/20 Rule) because FMEA catalogs all failure modes and their consequences to prioritize by severity, whereas the Pareto Effect identifies that a small subset of causes (20%) drive most (80%) of the outcomes.
Failure Mode and Effects Analysis (FMEA) is not Cross-Impact Analysis because FMEA systematically enumerates failure modes and their direct effects on specific functions, whereas Cross-Impact Analysis explores indirect consequences and second-order interactions between events.