Failure Mode and Effects Analysis (FMEA)¶

Prime #: 293
Origin domain: Engineering & Design
Also from: Organizational & Management Science, Systems Thinking & Cybernetics
Aliases: FMEA, Failure analysis, Effects analysis, Risk prioritization
Related primes: fault tree analysis, Error Proofing (Poka-Yoke), Redundancy, Margin of Safety, risk management

Core Idea¶

Failure Mode and Effects Analysis (FMEA) is a systematic, structured methodology for identifying and evaluating potential failure modes in a product, process, or system — characterized by (1) exhaustive enumeration of ways a component or subsystem can fail (failure modes), (2) tracing each failure mode to its root causes and to its effects on system operation and safety, (3) ranking failures by severity (consequence to user or mission), occurrence probability (likelihood of the failure happening), and detectability (likelihood that the failure will be caught before it reaches the user), (4) computing a Risk Priority Number (RPN) as the product of these three factors to guide mitigation prioritization, and (5) designing and implementing countermeasures for high-RPN failures, with post-mitigation re-scoring to verify effectiveness. The deeper commitment is to systematic exhaustiveness: rather than design-and-hope (reactive discovery of failures through test or field failure), FMEA mandates that the design team explicitly map out what can go wrong, evaluate consequences upfront, and design controls before deployment. The practice originated in aerospace in the 1960s (NASA requirement for manned spaceflight) and formalized in MIL-STD-1629A; it is now foundational in automotive (AIAG-VDA FMEA Handbook), medical devices (FDA guidance), and any safety-critical domain. The mechanism works because it converts the unbounded problem ("what could go wrong?") into a bounded, enumerable problem: systematically walk through components and subsystems, apply patterns from prior failures and design standards, rate each identified mode, and focus resources on high-impact mitigations. FMEA does not prevent failures but makes failure analysis systematic and repeatable, ensuring that common failure modes are not overlooked and that high-consequence failures receive proportional design attention^[1].

How would you explain it like I'm…

Think About What Breaks

Before a big trip, a careful grown-up imagines all the things that could go wrong: flat tire, no gas, lost map, dead phone. For each one they ask, "how bad would that be? How likely is it? Would we even notice?" Then they pack a spare tire and charger for the worst, most likely, sneakiest problems. That's what engineers do for rockets and cars, but with a checklist.

Listing What Could Go Wrong

FMEA is a careful checklist engineers run *before* they build something, to list every way a part could break, why it would break, and what would happen if it did. For each failure they give three scores: how bad it is, how often it might happen, and how easy it is to catch before it hurts anyone. Multiply those scores and you get a "risk number" that says which problems to fix first. The whole point is to find scary problems on paper, when they're cheap to fix, instead of discovering them in a real crash.

Systematic Failure Audit

Failure Mode and Effects Analysis is a structured way of asking, *before deployment*, "what could go wrong, what would happen, and which problems deserve attention first?" A team walks through every component and subsystem, lists each way it could fail (a *failure mode*), traces the *cause* and the *effect* on the wider system, and scores each failure on three dimensions: severity (how bad if it happens), occurrence (how likely), and detectability (how easily it would be caught before harm). Multiply the three to get a Risk Priority Number (RPN), and you have a ranked list telling you where to spend your mitigation budget. Built originally for NASA's manned spaceflight program, FMEA is now standard in aerospace, automotive, and medical devices.

FMEA — Failure Mode and Effects Analysis — is a systematic, structured methodology for identifying and evaluating potential failure modes in a product, process, or system before deployment. It comprises (1) exhaustive enumeration of the ways a component or subsystem can fail, (2) tracing each failure mode back to its root causes and forward to its effects on system operation and safety, (3) scoring each failure on severity (consequence to user or mission), occurrence (likelihood), and detectability (likelihood the failure is caught before reaching the user), (4) computing a Risk Priority Number (RPN) as the product of those three scores to prioritize mitigation, and (5) designing and implementing countermeasures for high-RPN failures, then re-scoring to verify effectiveness. The deeper commitment is *systematic exhaustiveness*: rather than design-and-hope (reactive discovery via test or field failure), FMEA mandates that the team explicitly map what can go wrong, evaluate consequences upfront, and design controls before deployment. The practice converts the unbounded question "what could go wrong?" into a bounded, enumerable problem: walk through components, apply patterns from prior failures and design standards, rate each mode, and focus resources on high-impact mitigations. Originating in 1960s NASA manned-spaceflight requirements, formalized in MIL-STD-1629A, it is now standard in automotive (AIAG-VDA Handbook), medical devices (FDA guidance), and other safety-critical domains. FMEA does not prevent failures — it makes failure analysis systematic and repeatable so common modes are not overlooked and high-consequence ones receive proportional attention.

Structural Signature¶

The systematic enumeration of failure modes at the component and subsystem level ^[1]
The causal chain mapping from root cause through failure mode to system-level effect ^[2]
The multi-factor ranking: severity (consequence), occurrence (probability), and detection (likelihood of pre-field discovery) ^[3]
The Risk Priority Number computation guiding resource allocation and countermeasure prioritization ^[4]
The distinction between design FMEA (product architecture, component selection), process FMEA (manufacturing steps, assembly sequence), and system FMEA (interaction of subsystems under operational loads) ^[3]
The iterative refinement cycle: FMEA → countermeasures → re-assessment → updated RPN ^[1]

What It Is Not¶

Not a guarantee of perfect safety. FMEA systematizes analysis of known failure modes; it does not foresee failure modes not yet discovered (unknown unknowns). A system with excellent FMEA that encounters a failure mode outside the scope of the analysis can still fail catastrophically. FMEA is a tool for managing known uncertainties, not for achieving omniscience.
Not a substitute for testing and field feedback. FMEA is conducted at design time on paper (or in simulation); it relies on engineering judgment about failure probability and consequence. Real-world testing, field data, and operational experience reveal failures that FMEA analysis missed, enabling continuous refinement of the methodology and the product design.
Not the same as Fault Tree Analysis (FTA). FTA works backward from a top-level undesired event (e.g., system loss of function) to identify all combinations of component failures that could cause it. FMEA works forward from component failures to their effects. Both are complementary; FTA is more comprehensive for understanding system-level failure propagation, while FMEA is more practical for component-level design iteration.
Not applicable when there is no system yet. FMEA requires a defined design, product architecture, or process flow to analyze. In early concept stages with no detailed design, FMEA is premature; Pugh matrices, Design of Experiments (DOE), and qualitative risk assessment are more appropriate. FMEA scales in detail and value as the design matures.
Not a substitute for specification and standards compliance. FMEA identifies modes beyond specification limits; it does not replace the need for clear functional requirements, design standards (e.g., SAE, ASME, IEEE), and regulatory compliance. A product meeting all standards can still have high-RPN failure modes that FMEA reveals and design should address.
Not a one-time activity. FMEA becomes obsolete if the product changes (new materials, suppliers, manufacturing processes, operational environments) without FMEA update. Living FMEA — maintained and revised as the design evolves — provides ongoing value; static FMEA conducted once and filed away becomes misleading as the actual product diverges from the analyzed design.

Broad Use¶

Automotive engineering (FMEA on brake systems, powertrains, electrical architecture; required by ISO/TS 16949 and AIAG-VDA standards for first-tier suppliers), aerospace (FMEA on flight-critical systems; NASA and FAA require FMEA or equivalent for crewed vehicles and safety-critical components), medical devices (FDA guidance encourages FMEA for design robustness and risk management; required for implantable devices and life-support systems), pharmaceutical manufacturing (process FMEA on drug synthesis, purification, packaging to identify contamination, degradation, or mislabeling risks), food processing (FMEA combined with HACCP — Hazard Analysis Critical Control Points — to identify pathogen transmission, allergen cross-contamination, and foreign-body risks), software engineering (FMEA of algorithms, error-handling paths, and edge-case inputs to identify security vulnerabilities, data corruption, and denial-of-service vectors), telecommunications and network infrastructure (FMEA on switching equipment, fiber routes, and control systems to identify single-points-of-failure and design redundancy), nuclear engineering (FMEA on reactor control systems, emergency cooling, and containment to identify failure combinations requiring mitigation before licensing), financial systems (FMEA on trading algorithms, settlement processes, and fraud detection to identify market-abuse vectors and operational risks), construction and civil engineering (FMEA on structural members, foundation design, and temporary support systems during construction), and consumer product design (appliances, power tools, children's toys requiring FMEA to demonstrate reasonable hazard control and reduce product-liability exposure).

Clarity¶

Naming FMEA explicitly signals a shift from intuitive design review ("does this look right?") to systematic failure enumeration and prioritization. The formality of FMEA — defining severity scales, occurrence rates, detection likelihood, and computing an RPN — makes the risk assessment visible and comparable across design choices. Teams can ask: "Which failure mode has the highest RPN and should receive design focus first?" without ambiguity. The structured format also creates a communication artifact: FMEA documentation allows a new engineer joining the team to understand what failures were analyzed, what severity each was assigned, and what design choices were made to mitigate them. This institutional memory prevents the common failure where a design is changed (supplier substitution, cost reduction, manufacturing process change) without understanding why the prior design was chosen, leading to reintroduction of a previously mitigated failure mode.

Manages Complexity¶

Complex systems (aircraft, cars, power plants) have thousands of components and millions of possible failure combinations. Exhaustively analyzing all combinations is intractable. FMEA instead applies structured decomposition: divide the system into subsystems and components, apply FMEA to each level, and manage the complexity by focusing detailed analysis on high-RPN items. For interconnected systems, FMEA can be extended to interface failures (e.g., what if the brake controller sends the wrong signal to the brake actuator?), revealing failure modes that component-level analysis alone would miss. The RPN ranking automatically prioritizes the design team's effort: high-RPN failures receive detailed countermeasure design and verification; low-RPN failures are accepted or addressed with low-cost controls. This enables resource-constrained teams to allocate engineering time where impact is highest.

Abstract Reasoning¶

The analyst asks: What are all the ways this component can fail? For each failure mode, what are the root causes (design weakness, manufacturing defect, material property variation, aging, environmental stress)? What are the immediate effects (loss of function, erratic behavior, delayed response, false activation)? What are the downstream effects on the system and user (nuisance fault, service interruption, injury, death)? How severe is the downstream consequence on a defined scale? How likely is the failure to occur in the population of products and operational lifetimes? How likely is the failure to be detected before reaching the user (through manufacturing inspection, supplier test, or early-life field feedback)? Given severity, occurrence, and detectability ratings, which failures warrant design changes, which merit enhanced inspection, and which are acceptable risks? What countermeasures (design changes, material upgrades, process controls, inspection checkpoints, warning systems) most effectively reduce RPN? After countermeasures are implemented, has the RPN reduced sufficiently, or are additional mitigations required? The most mature practice recognizes that FMEA is an iterative conversation with the product and process, not a checklist to complete once and archive.

Knowledge Transfer¶

Context	Failure mode example	Severity	Occurrence	Detection	Mitigation
Automotive brake	Brake pad wear beyond limit	High (loss of braking)	Medium (after 50k miles)	Low (not detected until applied)	Wear sensor + warning light
Medical infusion pump	Motor stalls due to line occlusion	Medium (pump stops, alarm)	Medium (during use)	High (alarm detects in <1 min)	Pressure sensor + auto-stop
Software authentication	Race condition in session validation	High (unauthorized access)	Low (specific timing window)	Low (not caught in unit test)	Add mutex; increase QA test coverage
Aircraft landing gear	Actuator failure, gear stuck up	Critical (crash landing)	Very low (excellent fatigue testing)	Medium (pre-landing test required)	Redundant actuators + manual alternate
Manufacturing process	Contamination in sterile environment	High (batch loss)	Medium (depends on controls)	High (particle count monitored)	HEPA filtration + gowning procedures

Transfer principle: the analytical structure (enumerate modes, trace causes, assess consequence and likelihood, design mitigations) applies across domains. An automotive engineer analyzing brake failure, a software engineer analyzing race conditions, and a pharmacist analyzing contamination risk all perform the same diagnostic reasoning under different variable names.

Examples¶

Formal/abstract¶

Stamatis (2003) in Failure Mode and Effect Analysis documents the foundational FMEA methodology: define the scope (component, subsystem, or system boundary), assemble a cross-functional team (design, manufacturing, test, service), enumerate failure modes through brainstorming and historical failure review, assess severity (S), occurrence (O), and detectability (D) on scales typically 1–10, compute Risk Priority Number as S × O × D, sort by RPN, and design countermeasures for high-RPN items. Stamatis traces FMEA evolution from its aerospace origins (Grumman Aircraft, NASA) in the 1960s through automotive adoption (Ford's "Green Sheets" for supplier quality, 1970s), medical device formalization (FDA guidance integration, 1990s), and contemporary integration with other tools (Fault Tree Analysis for system-level failure propagation, Failure Reporting Analysis and Corrective Action Systems — FRACAS — for field-failure feedback loops). The deeper insight: FMEA is not a calculation (severity × occurrence × detectability) but a discipline of forcing the design team to ask hard questions about what can go wrong and why. The RPN computation is merely a tie-breaker when resources are constrained; the primary value is the systematic enumeration and the difficult conversations about cause and consequence that FMEA discussion surfaces. Modern FMEA practice emphasizes team diversity: manufacturing engineers catch failure modes that design engineers miss, service engineers bring field experience, and procurement engineers understand supplier process risks^[1].

Mapped back: This instantiates the signature directly — enumeration of failure modes (D34-047), causal tracing (D34-048), severity-occurrence-detection multi-factor ranking (D34-049), RPN prioritization (D34-050), and iterative refinement as countermeasures are designed and implemented (D34-052). Stamatis's historical review shows how FMEA has evolved across design FMEA, process FMEA, and system FMEA variants (D34-051), and how FMEA success depends on team structure and cross-functional dialogue rather than mechanistic RPN computation.

Applied/industry¶

An automotive supplier designs an engine control module (ECM) for a mid-size sedan, responsible for fuel injection timing, ignition control, and emission management. The design is mature (third generation), but the supplier has adopted a new semiconductor process (28nm vs. prior 65nm) to reduce cost and power consumption. The engineering team conducts design FMEA on the new ECM. Failure modes identified include: (1) Semiconductor leakage current causes power loss during cold cranking (high severity: engine fails to start; medium occurrence: cold-start risk; low detection: revealed in winter testing), RPN = 60. (2) Analog-to-digital converter drift with temperature causes incorrect sensor reading, leading to rich fuel mixture and catalyst damage (medium severity: emission fault, repair required; low occurrence: only at temperature extremes; high detection: onboard diagnostics catches it), RPN = 10. (3) Single-Event Upset (SEU) — cosmic ray flips bit in fuel-injection timing table, causing cylinder misfire (high severity: customer experience; very low occurrence: estimated 1 per million operating hours; low detection: misfire not obvious to driver), RPN = 8. (4) Solder crack in ball-grid array (BGA) interconnect due to thermal cycling of PCB causes intermittent open circuit, leading to erratic fuel control or no-start (high severity; medium occurrence: over vehicle lifetime; high detection: manufacturing test catches most, but field failures occur due to customer vibration/corrosion), RPN = 36. The team ranks failures by RPN: (1) cold-cranking leakage is highest. Countermeasures: (a) add low-leakage bias circuitry to maintain power during cranking, (b) perform extended cold-temperature characterization to verify leakage budget, © add pre-start power-supply check. (4) BGA solder cracking is second highest. Countermeasures: (a) switch to lead-free solder with improved fatigue properties, (b) increase thermal-test duration from 100 to 500 thermal cycles, © add conformance coating to reduce moisture-induced corrosion. After countermeasures are designed and simulated, RPN for leakage drops from 60 to 15 (improved detection via cold-test, reduced occurrence via circuit redesign). RPN for solder cracking drops from 36 to 12. The team reassesses: remaining RPN items are now acceptable (all below 15), and the design is ready for prototype. Historical context: prior ECM designs from competing suppliers have experienced cold-cranking failures in northern markets and solder-cracking warranty returns; this FMEA captures learned lessons^[3].

Mapped back: Shows FMEA as iterative discipline — failure modes enumerated (D34-047), causes traced (process node, temperature, vibration, D34-048), severity-occurrence-detectability rated (D34-049), RPN computed to prioritize (D34-050), countermeasures designed for high-RPN items, and design verified against countermeasures with post-mitigation RPN reduction (D34-052). The example also shows interaction between design FMEA (new ECM architecture) and process FMEA (new semiconductor process, solder selection), demonstrating the distinction (D34-051).

Structural Tensions¶

T1: Comprehensiveness versus analysis paralysis. A complete FMEA must enumerate all credible failure modes; an incomplete FMEA misses high-impact failures. However, the number of possible failure modes grows combinatorially with system complexity. A team can spend months enumerating modes and never finish. The tension is resolved by scoping FMEA to the system boundary (component, subsystem, vehicle, etc.), applying checklists from prior FMEA examples and industry standards to accelerate enumeration, and recognizing that FMEA is iterative: initial FMEA captures major modes; field experience and design evolution inform updates. A common failure is confusing "comprehensive" with "enumeration without end," leading to FMEA documents that are never finished or whose RPN decisions are never acted upon^[1].
T2: Severity rating objectivity versus contextual judgment. Severity is subjective — is a comfort issue (seat adjustment) low (level 1) or moderate (level 4)? Is a temporary loss of power steering high (level 8) or critical (level 10)? Standards (AIAG-VDA) provide severity definitions, but application to a specific failure mode requires judgment. A common failure is inconsistent severity rating: identical failures rated differently across the FMEA because different team members assess consequence differently. The resolution requires explicit consequence mapping (loss of life, injury, mission failure, customer inconvenience, cost) and calibration of severity scale to organizational risk appetite^[3].
T3: Occurrence rate estimation from limited data. Occurrence rating (probability a failure will happen during the product lifetime) is difficult to estimate accurately without field data. New designs, new suppliers, or new operating environments have no historical failure rate. Teams resort to engineering judgment ("I think this failure is unlikely") or conservative assumptions ("assume worst case"). A mature approach combines benchmarking to similar systems, accelerated-life testing to estimate failure rate, and design conservatism for high-consequence failures. A common failure is underestimating occurrence because the team is overconfident in the design or unfamiliar with the operational environment^[5].
T4: Detection capability and cost trade-off. Some failures are easy to detect (in-factory test immediately reveals it), others are hard (failure mode occurs only after months of field operation in a specific climate). Improving detectability often costs money (additional sensors, more stringent acceptance tests, longer burn-in time). The tension is between ideal detectability (catch every failure before customer sees it) and practical detectability (catch high-severity failures; accept some low-consequence failures reaching the field). A common failure is designing detection systems whose cost exceeds the failure consequence^[6].
T5: RPN as a decision-making oracle versus team consensus. RPN is a convenient metric for prioritization, but it is computed from subjective inputs (S, O, D ratings). A failure with S=9, O=1, D=1 has RPN=9, while another with S=3, O=3, D=1 has RPN=9. Should the team invest equally in both? The first is catastrophic but rare; the second is frequent but minor. RPN treats them identically, which may not reflect organizational risk appetite. A mature approach uses RPN as a communication tool ("RPN=9 failures are here") and allows team judgment to override RPN ranking when the underlying context justifies it. A common failure is treating RPN as gospel, investing resources inefficiently in low-consequence, rare failures that happen to have high RPN, while under-investing in frequent, moderate-consequence failures that have low RPN^[5].
T6: FMEA scope and system boundaries. A system is composed of subsystems and components; each level can have its own FMEA. Component FMEA might identify all ways a bearing can fail; subsystem FMEA might identify failure modes of the transmission that contains the bearing; system FMEA might identify failures of the vehicle drivetrain. The tension is in determining scope: FMEA too granular (every component analyzed separately) is labor-intensive and loses system-level failure interactions. FMEA too broad (entire system in one pass) is superficial and misses component-specific failure physics. A mature approach uses hierarchical FMEA: high-level system FMEA to identify critical functions and failure modes, then detailed component FMEA for high-risk items. A common failure is FMEA without clear scope definition, leading to discussions that jump between component failures and system effects without discipline^[7].

Structural–Framed Character¶

Failure Mode and Effects Analysis is a hybrid on the structural–framed spectrum, and the frame side is substantial. Part of it is a bare pattern — enumerate the ways something can fail, trace each to its causes and effects, and rank them — that you could apply to a jet engine, a hospital procedure, or a software release. Part of it is a vocabulary and a methodology inherited from engineering design and reliability practice.

The diagnostics tip it toward framed. The skeletal logic of map-failures-then-prioritize does transfer from one field to another, and at that level it is a relational procedure. But the home apparatus travels with it and largely constitutes it: the severity-occurrence-detection ranking, the risk priority number, the structured causal-chain mapping, the disciplined enumeration — these are a worked-out methodology with built-in norms about how thoroughly risk ought to be analyzed and acted on. Its origin is an institutionalized engineering practice rather than a formal relation, and carrying it out means importing that whole assessment framework, not merely noticing a structure already present. It therefore reads mixed-framed.

Substrate Independence¶

Failure Mode and Effects Analysis (FMEA) is a moderately substrate-independent prime — composite 3 / 5 on the substrate-independence scale. Its methodology — enumerate failure modes, trace root causes and effects, rank by severity, occurrence, and detection, then prioritize countermeasures — is clear and portable across engineering design, operations management, systems thinking, and quality management. But the examples cluster in automotive and manufacturing, and transfer into biological, computational, or social-system diagnostics is not prominent. This remains primarily an engineering and operations discipline rather than a fully substrate-independent pattern.

Composite substrate independence — 3 / 5
Domain breadth — 3 / 5
Structural abstraction — 3 / 5
Transfer evidence — 3 / 5

Relationships to Other Abstractions¶

Current abstraction Failure Mode and Effects Analysis (FMEA) Prime

Parents (2) — more general patterns this builds on

Failure Mode and Effects Analysis (FMEA) presupposes Risk Prime

FMEA ranks prospective failure modes by risk-bearing consequences and likelihood or detectability dimensions.
Failure Mode and Effects Analysis (FMEA) is a decomposition of Decomposition Prime

FMEA decomposes a system or process into distinct failure modes, causes, local effects, and downstream effects for separate evaluation.

Hierarchy paths (4) — routes to 4 parentless roots

Failure Mode and Effects Analysis (FMEA) → Risk → Uncertainty

Show alternative paths (3)

Neighborhood in Abstraction Space¶

Failure Mode and Effects Analysis (FMEA) sits in a sparse region of abstraction space (79^th percentile for distinctiveness): few abstractions share its structure, so a faithful description tends to retrieve it precisely rather than landing on a neighbor.

Family — Risk, Fragility & Layered Defense (16 primes)

Nearest neighbors

Computed from structural-signature embeddings · 2026-07-26

Not to Be Confused With¶

FMEA must be distinguished from Stakeholder Analysis, which operates in a different analytical dimension. Stakeholder Analysis maps who is affected by decisions or outcomes—identifying actors, their interests, their power to influence, and their dependencies—in order to understand coalitions, resistance, and buy-in requirements. FMEA, by contrast, maps what can fail in a system—identifying failure modes, their causes, and their consequences—in order to design mitigations. While both are used in risk assessment, they answer different questions. A product design might conduct FMEA to identify brake-system failures and Stakeholder Analysis to understand which customer segments, regulators, and suppliers have interests in brake safety. FMEA focuses inward on component and system failures; Stakeholder Analysis focuses outward on social and organizational actors. A system FMEA might reveal that a cooling-system failure could cause an engine fire (high-severity failure); stakeholder analysis reveals that consumers, liability attorneys, and regulators all have strong interests in preventing engine fires. Both insights matter, but FMEA is engineering-centric (what breaks?), while stakeholder analysis is actor-centric (who cares?).

FMEA is also distinct from the Pareto Effect (80/20 Rule), though the two are often confused in quality improvement contexts. The Pareto Effect states that a small subset of causes (roughly 20%) are responsible for the majority (roughly 80%) of observed outcomes. In quality improvement, this often manifests as: 80% of defects come from 20% of failure modes; or 80% of customer complaints come from 20% of product features. A team applying Pareto reasoning identifies the vital few failure modes responsible for most observed problems. FMEA, by contrast, catalogs all credible failure modes and ranks them by severity, occurrence, and detectability—independent of how frequently they have occurred in the past. The difference is historical versus prospective. Pareto analysis looks backward at field data: "What failures have actually happened most often?" FMEA looks forward at design: "What failures could happen, and how severe would they be?" A software product might have Pareto data showing that 80% of reported bugs come from 20% of modules; FMEA on the remaining 80% of modules might reveal a catastrophic failure mode (low occurrence but extreme consequence) that has not yet been triggered in the field. Pareto optimization alone would ignore this high-severity, low-frequency mode; FMEA ensures it is addressed. Pareto is pragmatic (allocate resources where the observed pain is); FMEA is defensive (allocate resources where concealed risk is highest).

Nor is FMEA synonymous with Cross-Impact Analysis, which explores second- and higher-order interactions among events. Cross-Impact Analysis asks: "If Event A occurs, how does it change the probability of Event B, C, and D? If both A and B occur, what is the combined consequence?" It surfaces indirect effects, cascades, and non-linear feedback loops. FMEA, by contrast, is primarily concerned with first-order effects: "Component X fails → what is the immediate effect on system function and the end user?" FMEA can be extended to capture failure propagation (System FMEA with interface failure modes), but its core structure is direct causal mapping (failure → consequence). A medical-device FMEA might enumerate: pump motor fails → no fluid delivery → patient harm (direct). Cross-Impact Analysis would extend this: pump motor fails → alarms sound → clinician manually intervenes → introduces new error mode → additional patient harm (second-order). Both are valuable in risk assessment; FMEA is faster and more structured, capturing the first-order landscape quickly; Cross-Impact is more thorough but also more resource-intensive, surfacing hidden cascade risks.

FMEA is also complementary to, but distinct from, Fault Tree Analysis (FTA). Both are structured methods for understanding failure causation, but they approach the problem from opposite directions. FTA works backward from a top-level undesired event (e.g., "aircraft loses power during flight") and decomposes it into all combinations of component failures that could cause it, using Boolean logic (AND gates for combined failure, OR gates for alternative paths). FTA answers the question: "What component failure combinations lead to system disaster?" FMEA works forward from component failures and asks: "What are the effects of each component failure on system operation?" FTA is more comprehensive for understanding system-level failure propagation and hidden single-points-of-failure; FMEA is more practical for guiding iterative design and manufacturing process improvements. A mature design program uses both: FMEA identifies component failure modes and their individual effects; FTA identifies the critical combinations of failures that cascade to catastrophic outcomes. For example, an aviation FMEA might identify that a single hydraulic pump failure degrades braking authority by 50%; FTA reveals that a pump failure combined with loss of electrical power causes complete braking loss—a hidden interdependency that FMEA alone would not surface.

FMEA is also distinct from Risk Management (the broader discipline), which encompasses risk identification, assessment, mitigation, monitoring, and acceptance. FMEA is one risk-assessment technique within risk management, but it is not the whole discipline. Risk Management might use FMEA to identify failure modes, combine this with financial impact analysis to assess cost-of-failure, weigh this against mitigation costs, and make business decisions about which risks to accept, insure, or mitigate. FMEA itself remains silent on risk acceptance; it produces ranked failure modes, and management decides what to do with them. Similarly, FMEA interfaces with Error Proofing (Poka-Yoke), which uses design to prevent or make failure modes impossible. FMEA analysis might identify that a critical part can be installed backward, creating a failure mode; error proofing redesigns the part so it can only be installed correctly. FMEA identifies the vulnerability; error proofing eliminates it structurally.

Solution Archetypes¶

Solution archetypes in the catalog that build on this prime — directly (this prime is a source ingredient) or as a related prime.

Built directly on this prime (1)

Layered Defense Gap Decorrelation: Treat every defense layer as imperfect, then prevent catastrophe by finding and breaking the cross-layer alignment of its holes.
▸ Mechanisms (8)
- Aligned Gap Heatmap
- Barrier Gap Walkthrough
- Bowtie Analysis with Layer Gaps
- Common-Cause Layer Audit
- Independent Barrier Test Drill
- Latent Condition Rounds
- Near-Miss Trajectory Review
- Swiss-Cheese Barrier Review

Also a related prime in 3 archetypes

Conjunctive Path Assurance: Map the condition on every edge of a hazardous path, test the joint states that make the whole route conduct, and preserve an independent break before the target becomes reachable.
Dependency Concentration Control: Prevent dependency fragility by measuring where reliance is concentrated and capping, diversifying, or isolating overweight dependency providers before their failure can dominate the system.
Hidden Support Depletion Guarding: Protect an apparently stable structure by monitoring and replenishing the hidden support substrate before ordinary load becomes unsupported.

Notes¶

Failure Mode and Effects Analysis originated in aerospace and defense (NASA, Grumman Aircraft) in the 1960s as a structured method for ensuring crewed spaceflight safety. The formalization as MIL-STD-1629A (1969) provided a standardized approach adopted across defense contractors and eventually the automotive industry. The automotive supply chain (Ford, General Motors, Chrysler and their suppliers) formalized FMEA as a requirement in Quality System Assessment (QSA) and later ISO/TS 16949. Modern FMEA practice integrates with Fault Tree Analysis (backward analysis of failure combinations), Design of Experiments (forward-looking variation analysis), and FRACAS (field-failure feedback loops). The concept interfaces with Error Proofing (D34-061, preventing failures vs. analyzing them), Redundancy (using duplication to mask failures vs. analyzing single-point failures), Risk Management (FMEA as a risk-assessment technique within broader risk frameworks), and Margin of Safety (designing to absorb variation vs. analyzing what variations might exceed design limits). The technique has evolved beyond manufacturing: software teams use FMEA for security vulnerability assessment (threat modeling is analogous to FMEA), and operations teams use it for incident-prevention planning (what operational failures could cascade into outages?).

References¶

[1] Stamatis, D. H. Failure Mode and Effect Analysis: FMEA from Theory to Execution (2^nd ed.). ASQ Quality Press, 2003 (ISBN 9780873895989). Comprehensive treatment of system, design, process, service, and machine FMEA: scope definition, cross-functional teams, mode enumeration, severity/occurrence/detection rating, RPN, and linkages to FTA and FRACAS. Supports FACT-046 (FMEA as systematic, repeatable failure analysis), FACT-047 (systematic enumeration of failure modes), FACT-052 (the iterative FMEA → countermeasures → re-assessment cycle), FACT-053 (the foundational methodology and its aerospace→automotive→medical-device evolution), and FACT-055 (T1: comprehensiveness vs analysis-paralysis, FMEA as scoped and iterative). ↩

[2] U.S. Department of Defense. Procedures for Performing a Failure Mode, Effects and Criticality Analysis (MIL-STD-1629A). DoD, 24 November 1980 (superseding MIL-STD-1629, 1974). Standardizes FMECA: item-level failure-mode analysis tracing each functional/hardware failure mode through its causes to mission, safety, and performance effects, with criticality ranking. Supports FACT-048 (the causal-chain mapping from root cause through failure mode to system-level effect). CITATION-FIX: the prime dates MIL-STD-1629A to 1969 (in the reference def and the Notes); MIL-STD-1629A is dated 1980. (FMECA procedures trace to the 1960s and MIL-STD-1629 original is 1974/1976, but the '1629A' revision specifically is 1980.) ↩

[3] AIAG & VDA. FMEA Handbook (1^st ed.). Automotive Industry Action Group & Verband der Automobilindustrie, June 2019. The harmonized automotive FMEA standard: 7-step approach for Design FMEA, Process FMEA, and Supplemental FMEA-MSR, with severity/occurrence/detection rating tables and the Action Priority (AP) method. Supports FACT-049 (the severity/occurrence/detection multi-factor ranking), FACT-051 (the design/process/system FMEA distinction), FACT-054 (the applied ECM design-FMEA example), and FACT-056 (T2: severity-rating objectivity vs contextual judgment, standardized severity definitions). ↩

[4] SAE International. Potential Failure Mode and Effects Analysis in Design (Design FMEA) ... and in Manufacturing and Assembly Processes (Process FMEA) (SAE J1739). SAE International, 2009 (J1739_200901). Provides terms, ranking charts, and worksheets for Design and Process FMEA, including the Risk Priority Number (RPN = S × O × D) for prioritization. Supports FACT-050 (the RPN computation guiding resource allocation and countermeasure prioritization). ↩

[5] Bowles, J. B., & Peláez, C. E. "Fuzzy logic prioritization of failures in a system failure mode, effects and criticality analysis." Reliability Engineering & System Safety, vol. 50, no. 2 (1995): 203–213. Replaces crisp RPN with a fuzzy-rule-based combination of severity, occurrence, and detectability, motivated by the well-known weaknesses of multiplicative RPN (identical RPNs from very different S/O/D profiles; difficulty estimating occurrence). Supports FACT-057 (T3: occurrence-rate estimation from limited data) and FACT-059 (T5: RPN as oracle vs team consensus — distinct S/O/D profiles yielding equal RPN). NOTE: the prime's title omits 'in a system' — the published title is '...failures in a system failure mode, effects and criticality analysis.' ↩

[6] Dezfuli, H., et al. NASA System Safety Handbook, Volume 1: System Safety Framework and Concepts for Implementation (NASA/SP-2010-580). NASA, November 2011. Advocates a risk-informed, analytic-deliberative approach integrating system-safety analysis (including detection/diagnostic coverage) with systems engineering and cost-effective risk management. Supports FACT-058 (T4: detection capability and cost trade-off — improving detectability costs money and must be weighed against failure consequence). RE-SOURCED / CITATION-FIX: the prime cites 'NASA (2007), System Safety Handbook: A Practical Approach to Engineering a Safer World, NASA Technical Reports Server.' No 2007 NASA System Safety Handbook with that subtitle exists; the actual handbook is NASA/SP-2010-580 (2011), and the cited subtitle echoes Leveson's book title. Year and subtitle corrected; key renamed nasa-safety-2011. The detection-cost trade-off is also generic FMEA content (detectability ratings, RPN). ↩

[7] International Organization for Standardization / IATF. Quality management systems — Particular requirements for the application of ISO 9001:2008 for automotive production and relevant service part organizations (ISO/TS 16949:2009). ISO, 2009. The automotive QMS technical specification that, with the AIAG core tools, made FMEA a supplier requirement and frames hierarchical (system/subsystem/component) quality planning. Supports FACT-060 (T6: FMEA scope and system boundaries — hierarchical system→component analysis). CITATION-FIX: the prime cites 'ISO/TS 16949 (2016)'; the last ISO/TS 16949 edition is 2009. In October 2016 it was superseded by IATF 16949:2016, which is an IATF (not ISO/TS) document. Either ISO/TS 16949:2009 or IATF 16949:2016 is correct; 'ISO/TS 16949 (2016)' conflates the two. ↩

[8] NASA. (2007). System Safety Handbook: A Practical Approach to Engineering a Safer World. NASA Technical Reports Server.

[9] U.S. Food and Drug Administration. Design Control Guidance for Medical Device Manufacturers. FDA / CDRH, March 11, 1997. Defines design controls and integrates risk management (with FMEA as the primary risk-analysis tool, supplemented by FTA) into the device design process. Tier C: listed in References, not cited at any FACT marker or inline in the body. Existence-verified and linked. CITATION-FIX: the prime dates it 2016; the canonical guidance is 1997.