What is Failure Mode and Effect Analysis?

What is Failure Mode and Effect Analysis?

A Failure Mode and Effect Analysis is a risk estimation technique, a comprehensive analysis of potential failure points in multiple system components, determining how resistant to breakdown the observed system is.

Along with identifying the weak spots - failure modes - teams also look at their potential causes and, especially, their potential effects on other system components - effect analysis. Possible failures, their probability of presenting, and the magnitude of their impact are expressed quantitatively in a chart.

Who uses an FMEA?

FMEA has roots in military systems management of the 1940s and applies to any process, product, or design analysis. Since the 1970s, it’s been popular with business sectors requiring the highest levels of reliability - aerospace industry, automotive, nuclear, oil & gas engineering, as well as healthcare. FMEA is popularly used by Lean Six Sigma practitioners, as it aligns perfectly with its zero-defects program.

Why does it pay to do Failure Mode and Effect Analysis?

Appreciating possible scales of damage in a military system or a gas plant failure is not hard. But the same applies to your business - defects, production halts, service downtime, and product recalls are costly and negatively impact your reputation and customers’ satisfaction levels (or safety!), which, in turn, reflects on your revenue.

FMEA provides a systematical, inter-disciplinary approach to analyzing potential issues and their impact on a given system. Thereby, the key is to clearly distinguish between the cause and the effect: the likelihood of a cause occurring might be tiny; yet, its effect might be disastrous. Imagine a core meltdown in a nuclear power plant hit by an aircraft - the cause (or the probability of it occurring) and effect can be disproportional to one another! Not every subject of an FMEA is that serious, of course.

The goal of a sustainable, effective production process should be to have quality checking measures in place from the get-go - to design quality into the product, rather than inspect for it later.

It’s beneficial to do an FMEA when:

  • Redesigning a process/product or changing the way an existing process flows
  • Before issuing a control plan or before improvements are considered
  • Whenever you’re looking to analyze a process or a product’s weaknesses and need to prioritize which of them are critical.
    Sometimes, the technique is extended to Failure Mode, Effects & Criticality Analysis (FMECA) to name the critical concerns specifically.

What types of FMEA are there?

Failure Mode and Effect Analysis are typically assigned to assess one of these three areas: a function or system, a design, or a process.

  • A system FMEA
    addresses breakdowns that impact the entire system you operate within. It looks at the relationship between its subsystems and their integration with each other or external systems. It’s a high-level analysis, at times too general to be possibly accurate (depending on the system size).
  • A design FMEA
    focuses on possible product failures stemming from engineering, component faults, the nature of the design, its longevity, integration with other products, and more.
  • A process FMEA
    deals with faults that impact product quality during the process or could not be prevented in design, but drive down the process reliability and cause customer dissatisfaction. They typically stem from unreliable process metrics, variable human and equipment factors, and unstandardized practices.

A word of caution

Try not to commit to performing an FMEA lightheadedly - unless, of course, it is a regulatory or customer requirement or your system is safety-relevant. Kaizen and Six-Sigma provide plenty of tools to support your cause and effect analysis, e.g., 5 Whys, Ishikawa, Quality Function Deployment (QFD), Pareto, etc. It is wise to treat FMEA as the very last weapon in your arsenal and thoroughly define the subject of your investigation beforehand.

FMEAs are usually conducted by sizable teams of subject experts, sometimes including supplier and customer representatives, who may even need training on using the tool. Therefore, unless set up and carried out in just the right way, you are likely to burn plenty of hours and money on the exercise. By no means is this a call for declaring FMEA a waste of time - when done well, it’s a power tool - but make sure that the outcome of your hard work is worth the investment.

How to do an FMEA?

Step 1: Team selection

Gather representatives of all functions of the analyzed process, i.e., design, testing, quality, reliability, maintenance, production, marketing, together with suppliers and customers. Make sure the choice of people isn’t random but based on their range of experience. This matters, as you need people to tell what can go wrong in their portion of the process and what solutions would be possible - either based on their experience or familiarity with the specific environment or aspect of the product.

Step 2: Determine the scope

FMEAs are, in general, fairly detailed. That said, you set the scope of the analysis for your specific scenario. Make sure all participants will be addressing problems of the same scale. It should prevent you from seeing, e.g., “Website downtime” next to “There is a typo in the website footer” on your FMEA sheet.
At the same time, do not remove from your consideration those problems that seem insignificant but ultimately can harbor a considerable potential of error.

Step 3: Estimate potential failure occurrence

It’s up to you how many details get listed here. But, the more ways in which each failure impacts the system you list, the more accurate your risk estimation will be. Take a look at the proposed criteria in the image below. The crucial parts of the analysis are:

  • how severely the failure affects the customers (S)
  • the likelihood of a breakdown occurring (O)
  • the ease of the issue detection (D)

To rank the impact of probable failures, you’re going to need to use 1-10 numbers, with one standing for the lowest value. Thanks to this, you will be able to calculate a risk priority number at the end of the analysis:

Risk priority number calculation

Note: the RPN calculation has its limitations, and some alternatives have been suggested. Practically speaking, there is the risk of spending hours of discussion working out whether “the likelihood is a 6, or a 7, or perhaps just a 5”. It is not uncommon to see the scale reduced to just 1-3-7-10 or to see a traffic-light-style rating (red-yellow-green) instead. For more information, please see the Further Reading section below.

Step 4: Choose points of immediate action

Align the RPNs for all possible failures from the highest down. Items with the highest values will be the ones that should get addressed first.

According to the Pareto rule, 80% of problems are results of 20% of causes. Consequently, there is a good chance that solving the top few risk points will bring about significant improvement. Whether this materializes for you or not largely depends on the nature of your process and its unique difficulties. Please note, however, that critical errors should be assigned corrective measures even if they don’t ultimately belong to the top 20%.

Step 5: Design & apply correction, then reassess

Now that you know what needs doing first, create an action plan. Execute it and - once the new process is in place - perform the FMEA again. It will let you adjust the scores for the already addressed concerns and verify if the change has possibly impacted other failure modes. In many cases, it will not be possible to reduce the severity parameter of a potential breakdown. So, the ease and speed of detection, along with problem occurrence, should be your focus; you may also want to consider error-proofing the process.

Yes, you read correctly: do the FMEA again. It’s a critical step in establishing an improvement cycle instead of a dead-end (unfortunately, many FMEAs fail at this point in practice). By doing so, your FMEA can become an essential product knowledge base.

By applying Failure Mode and Effects Analysis to your process, product, or system, you’ll get a few steps closer to making sure you’ve done all you can to prevent problems causing costly delays, unhappy customers, and teams.

Furthermore, your FMEA serves as an imperative piece of documentation that your organization has thoroughly analyzed and evaluated risks and has taken appropriate countermeasures to mitigate them. Having this on record can save your business, should you ever need to prove your duty of care or need to defend against claims for damages to life and limb.