What is Failure Mode and Effect Analysis?

What is Failure Mode and Effect Analysis?

A Failure Mode and Effect Analysis is a risk estimation technique, a comprehensive analysis of potential failure points in multiple system components, determining how resistant to breakdown the observed system is.

Along with identifying the weak spots - failure modes - teams also look at their potential causes and, especially, at their potential effects on other system components - effect analysis. Possible failures, their probability of presenting, and the magnitude of their impact are expressed quantitatively in a chart.

Who uses an FMEA?

FMEA has roots in military systems management of the 1940s and applies to any process, product, or design analysis. Since the 1970s it’s been widely adopted by business sectors requiring the highest levels of reliability - aerospace industry, automotive, nuclear, oil & gas engineering, as well as healthcare. FMEA is popularly used by Lean Six Sigma practitioners, as it aligns perfectly with its zero-defects program.

Why does it pay to do Failure Mode and Effect Analysis?

Appreciating possible scales of damage in case of a military system or a gas plant failure is not hard. But the same applies to your business - defects, production halts, service downtime, and product recalls are costly and negatively impact your reputation and customers’ satisfaction levels (or safety!), which, in turn, reflects on your revenue.

FMEA provides a systematical, inter-disciplinary approach to analyzing potential issues and their impact on a given system. Thereby, the key is to clearly distinguish between the cause and the effect: the likelihood of a cause occurring might be tiny; yet, its effect might be disastrous. Imagine a core meltdown in a nuclear power plant as a result of being hit by an aircraft - the cause (or the probability of it occurring) and effect can be disproportional to one another! Not every subject of an FMEA is that serious, of course.

The goal of a sustainable, effective production process should be to have quality checking measures in place from the get-go - to design quality into the product, rather than inspect for it later.

It’s useful to do an FMEA when:

  • Redesigning a process/product or changing the way an existing process is being executed
  • Before a control plan is issued, or before improvements are being considered
  • Whenever you’re looking to analyze a process’ or a product’s weaknesses and need to be able to prioritize which of them are critical, necessary to be addressed.
    Sometimes, the technique is extended to Failure Mode, Effects & Criticality Analysis (FMECA), to name the critical concerns specifically.

What types of FMEA are there?

Failure Mode and Effect Analysis are typically assigned to assess one of these 3 areas: a function or system, a design, or a process.

  • A system FMEA
    addresses breakdowns that impact the entire system you operate within. It looks at the relationship between its subsystems, and their integration with each other, and other, external systems. It’s high-level analysis, at times too general to be possibly accurate (depending on the system size).
  • A design FMEA
    focuses on possible product failures, stemming from engineering, component faults, the nature of the design, its longevity, integration with other products, and more.
  • A process FMEA
    deals with faults that impact product quality during the process, or which could not have been prevented in design, but drive down the process reliability and cause customer dissatisfaction. They typically stem from unreliable process metrics, variable human and equipment factors, and unstandardized practices.

A word of caution

Try not to commit to performing an FMEA lightheadedly - unless, of course, it is a regulatory or customer requirement or your system is safety-relevant. Kaizen and Six-Sigma provide plenty of tools to support your cause and effect analysis, e.g. 5 Whys, Ishikawa, Quality Function Deployment (QFD), Pareto, etc. It is wise to treat FMEA as the very last weapon in your arsenal and thoroughly define the subject of your investigation beforehand.

FMEAs are usually conducted by sizable teams of subject experts, sometimes including supplier and customer representatives, who may even need training on using the tool. Therefore, unless set up and carried out in just the right way, you are likely to burn plenty of hours and money on the exercise. By no means is this a call for declaring FMEA a waste of time - when done well, it’s a power tool - but make sure that the outcome of your hard work is worth the investment.

How to do an FMEA?

Step 1: Team selection

Gather representatives of all functions of the analyzed process, i.e. design, testing, quality, reliability, maintenance, production, marketing, as well as suppliers and customers. Make sure the choice of people isn’t random but based on their range of experience. This matters, as you need people who will be able to tell what can go wrong in their portion of the process, and what solutions would be possible - either based on their experience or familiarity with the specific environment or aspect of the product.

Step 2: Determine the scope

FMEAs are, in general, fairly detailed. That said, you set the scope of the analysis for your specific scenario. Make sure all participants will be addressing problems of the same scale, to prevent seeing “Website downtime” next to “There is a typo in the website footer” on your FMEA sheet.
At the same time, make sure not to remove from your consideration problems that seem insignificant but ultimately can harbor a considerable potential of error.

Step 3: Estimate potential failure occurrence

It’s up to you how many details get listed here, but the more ways in which each failure impacts the system are given, the more accurate your risk estimation should be. Take a look at the proposed criteria in the image below. The crucial parts of the analysis are:

  • how severely the failure affects the customers (S)
  • the likelihood of a breakdown occurring (O)
  • the ease of the issue detection (D)

To rank the impact of probable failures, you’re going to need to use 1-10 numbers, with 1 standing for the lowest value. Thanks to this, you will be able to calculate a risk priority number at the end of the analysis:

Risk priority number calculation

Note: the RPN calculation has its limitations, and some alternatives have been suggested. Practically speaking, there is the risk of spending hours of discussion working out whether “the likelihood is a 6, or a 7, or perhaps just a 5”. It is not uncommon to see the scale reduced to just 1-3-7-10, or to see a traffic-light-style rating (red-yellow-green) instead. For more information, please see the Further Reading section below.

Step 4: Choose points of immediate action

Align the RPNs for all possible failures from the highest down. Items with the largest values will be the ones that should get addressed first.

According to the Pareto rule, 80% of problems are results of 20% of causes. Consequently, there is a good chance that solving the top few risk points will bring about significant improvement. Whether this materializes for you or not largely depends on the nature of your process and its unique difficulties. Please note, however, that critical errors should be assigned corrective measures even if they don’t ultimately belong to the top 20%.

Step 5: Design & apply correction, then reassess

Now that you know what needs to be done immediately, create an action plan, execute it and once the new process is in place, run the FMEA again, to adjust the scores for the now addressed concerns, and to verify if the change has possibly impacted other failure modes. In many cases, it will not be possible to reduce the severity parameter of a possible breakdown, so the ease and speed of detection together with problem occurrence should be your focus; you may also want to consider error-proofing the process.

Yes, you read correctly: do the FMEA again. This is a critical step in establishing an improvement cycle instead of a dead-end (unfortunately, many FMEAs fail at this point in practice). By doing so, your FMEA can become an important product knowledge base.

By applying Failure Mode and Effects Analysis to your process, product, or system, you’ll get a few steps closer to making sure, that you’ve done all you can, to prevent problems from presenting and causing costly delays, unhappy customers, and teams.

Furthermore, your FMEA serves as an extraordinarily important piece of documentation that your organization has thoroughly analyzed and evaluated risks and has taken appropriate countermeasures to mitigate them. Having this on record can save your business, should you ever need to prove your duty of care or need to defend against claims for damages to life and limb.