What is Failure Mode and Effect Analysis?

What is Failure Mode and Effect Analysis?

A Failure Mode and Effect Analysis is a risk estimation technique, a comprehensive analysis of potential failure points in multiple system components, determining how resistant to breakdown the entire system is.

Along with identifying the weak spots - failure modes - teams also look at their potential causes and, especially, at their potential effects on other system components - effect analysis. Possible failures, their probability of presenting, and the magnitude of their impact are expressed quantitatively in a chart.

Who uses an FMEA?

FMEA has roots in military systems management of the 1940s and applies to any process, product, or design analysis. Since the 1970s it’s been widely adopted by business sectors requiring the highest levels of reliability - aerospace industry, automotive, nuclear, oil & gas engineering, as well as healthcare. FMEA is popularly used by Lean Six Sigma practitioners, as it aligns perfectly with its zero-defects program.

Why does it pay to do Failure Mode and Effect Analysis?

Appreciating possible scales of damage in case of a military system or a gas plant failure is not hard. But the same applies to your business - defects, production halts, service downtime, and product recalls are costly and negatively impact your reputation and customers’ satisfaction levels, which reflects on your revenue.
The goal of a sustainable, effective production process should be to have quality checking measures in place from the get-go - to build quality assurance in, rather than inspect for it later.

It’s useful to do an FMEA when:

  • Redesigning a process or changing the way an existing process is being executed
  • Before a control plan is issued, or before improvements are being considered
  • Whenever you’re looking to analyze a process’s weaknesses and need to be able to prioritize which of them are critical, necessary to be addressed.
    Sometimes, the technique is extended to Failure Mode, Effects & Criticality Analysis (FMECA), to name the critical concerns specifically.

What types of FMEA are there?

Failure Mode and Effect Analysis are typically assigned to assess one of these 3 areas: a function or system, a design, or a process.

  • A system FMEA
    addresses breakdowns that impact the entire system you operate within. It looks at the relationship between its subsystems, and their integration with each other, and other, external systems. It’s high-level analysis, at times too general to be possibly accurate (depending on the system size).
  • A design FMEA
    focuses on possible product failures, stemming from engineering, component faults, the nature of the design, its longevity, integration with other products, and more.
  • A process FMEA
    deals with faults that impact product quality, drive down the process reliability, cause customer dissatisfaction, stem from unreliable process metrics, variable human and equipment factors, and unstandardized practices.

How to do an FMEA?

Step 1: Team selection

Gather representatives of all functions of the analyzed process, i.e. design, testing, quality, reliability, maintenance, production, marketing, as well as suppliers and customers. Make sure the choice of people isn’t random but based on their range of experience. This matters, as you need people who will be able to tell what can go wrong in their portion of the process, and what solutions would be possible - either based on their experience or familiarity with the specific environment or aspect of the product.

Step 2: Determine the scope

FMEAs are, in general, fairly detailed. That said, you set the scope of the analysis for your specific scenario. Make sure all participants will be addressing problems of the same scale, to prevent seeing “Website downtime” next to “There is a typo in the website footer” on your FMEA sheet.

Step 3: Estimate potential failure occurrence

It’s up to you how many details get listed here, but the more ways in which each failure impacts the system are given, the more accurate your risk estimation should be. Take a look at the proposed criteria in the image below. The crucial parts of the analysis are:

  • how severely the failure affects the customers (S)
  • the likelihood of a breakdown occurring (O)
  • the ease of the issue detection (D)

To rank the impact of probable failures, you’re going to need to use 1-10 numbers, with 1 standing for the lowest value. Thanks to this, you will be able to calculate a risk priority number at the end of the analysis:

Calculating risk priority number

Note: the calculation has its limitations, and some alternatives have been suggested. For more information, please see the Further Reading section below.

Step 4: Choose points of immediate action

Align the RPNs for all possible failures from the highest down. Items with the largest values will be the ones that should get addressed first.

If the Pareto rule - 80% of issues are caused by 20% of causes - is to be trusted, then there is a chance that solving the top few risk points will go a long way. But whether this materializes for you or nor, depends largely on the nature of your process and its unique difficulties.

Step 5: Design & apply correction, then reassess

Now that you know what needs to be done immediately, create an action plan, execute it and once the new process is in place, run the FMEA again, to adjust the scores for the now addressed concerns, and to verify if the change has possibly impacted other failure modes. In many cases, it will not be possible to reduce the severity parameter of a possible breakdown, so the ease and speed of detection together with problem occurrence should be your focus.

By applying Failure Mode and Effects Analysis to your process, product, or system, you’ll get a few steps closer to making sure, that you’ve done all you can, to prevent problems from presenting and causing costly delays, unhappy customers, and teams.