Human Error and Systematic Failures

Today I want to highlight an excellent paper “Improving Barrier Effectiveness Using Human Factors Methods” recently published by Mr. Dave Grattan of AE Solutions. In the past, Dave has been kind enough to comment on my blog posts, so it is a pleasure to highlight his work. But more than that, Dave highlights an important gap in current process safety methods and offers valuable insights on how to close this human error gap using proven Human Factors Engineering (HFE) and Human Reliability Analysis (HRA) tools.

It is well worth the read. Go ahead. I will wait right here. My observations and comments are below.

Human Error and Systematic Error

It may not be immediately obvious, but Dave’s observations about Human Factors and Human Error Probability (HEP) nicely complement my own observations regarding systematic failures.

The original IEC 61508 definition of systematic failures is pretty narrow:

“…failure related to a pre-existing fault, which consistently occurs under particular conditions…”

As I have discussed before, IEC 61511 has adopted a much broader definition of systematic failures, relegating human error to qualitative management only. Dave rightly points out that the assumptions of LOPA and most SIL calculation methods end up neglecting the entire area of human error.

Under the IEC 61508 definition, human error would clearly be considered a random error. In the section on quantifying failure metrics, the standard explicitly states:

The estimate of the achieved failure measure for each safety function, as required by 7.4.5.1, shall take into account: […] i) the effect of random human error if a person is required to take action to achieve the safety function.

If human error is quantifiable in that specific instance, why would it not be considered elsewhere in the operation and maintenance of the function? (e.g. testing, maintenance, bypassing, MOC, etc.)

Dave lays out a compelling case that the existing tools of Human Reliability Analysis (HRA) are well-proven and perfectly adequate to close this gap.

Bayesian Human Error

Dave provides a good explanation of the Bayesian vs. Frequentist views of probability, and he advocates that process safety practitioners should be Bayesians. Readers of this blog should not be surprised that I heartily agree!

One statement in the paper seems to contradict approaches that I have advocated in my own Bayesian work:

It is tempting to lump systematic error with random hardware failure to produce a reliability number and believe all is well. Human error probability data which is based on averages can create the same over-confidence.

While I generally agree with the sentiment and the warning, I think the Bayesian toolkit allows us to effectively deal with this problem.

When starting with risk assessments and/or reliability calculations that take no account for human error probability (HEP), it is a useful first step to incorporate human error factors based on industry averages. These estimates need to incorporate appropriate uncertainty bands to both avoid over-confidence and allow future updating. These “average” assumptions will at least position practitioners to begin collecting data and eventually updating the generic averages with application-specific data.

The main advantage of this approach is that human error can be readily incorporated into current models prior to completing more complex HRA analyses. As the practitioner grows in HRA sophistication and completes more rigorous models (e.g. THERP), the existing reliability models can be updated step-wise with more rigorous HRA.

To THERP Or Not To THERP

Dave spends some time discussing the Technique for Human Error Rate Prediction (THERP), and rightly highlights it as a very influential and foundational method in the HRA field. However, let’s be honest. THERP is hard. If you don’t believe me, you can check it out yourself in NUREG 1278.

A couple of other methods are discussed in the paper, but I think it is worth pointing out that there are many HRA methods out there, each with different strengths & weaknesses and aiming at different types of problems. In addition to the HSE report referenced in the paper, I also recommend the following resources for general information on HRA methods and best practices:

For those looking for a less complex method to “dip their toe” into HRA, I suggest checking out the NARA method:

Nuclear Action Reliability Assessment (NARA) A Data-Based HRA Tool (subscription)

Dave’s paper covers much more than just HRA methods, so it is certainly not meant as a criticism that he didn’t go into the 20+ HRA methods out there. My concern is just that the complexity of THERP may scare off some potential process industry users. Never fear, there are easier options available to get started!

Conclusion

Besides highlighting a critical problem in current process safety practices, this paper also does an impressive job of summarizing 50 years of Human Factors and HRA progress into about 10 very readable pages.

Although the new jargon and multiple methods of HRA may be a little intimidating to new users, it is really no more complex than modeling complex physical systems. For me, the biggest challenge was overcoming my resistance to the inherent “squishiness” of dealing with human cognitive processes, emotions etc. A Bayesian mindset toward uncertainty is a must!

If you are interested in learning more about Human Reliability Analysis, I can also recommend the course Human Reliability Analysis (ENRE 645) from my alma mater, the University of Maryland.

Thanks for reading! Thanks to Dave for this important contribution!

Stephen Thomas, PE, CFSE

Stephen is the founder and editor of functionalsafetyengineer.com. He is a functional safety expert with over 26 years of experience. He is currently a system safety engineer with a leading developer of autonomous vehicle technology. He is a member of the IEC 61508 and IEC 61511 functional safety committees. He is a member of the non-profit CFSE Advisory Board advising the exida CFSE program. He is the Director of Education & Professional Development for the International System Safety Society and an associate editor for the Journal of System Safety.

Follow Me on LinkedIn

One thought on “Human Error and Systematic Failures”

Michael Thompson says:

at

Great reading list Stephen and a very timely topic that often gets overlooked . Engineering a Safer World is a really good read. I took a Human Factors class for my Masters at John’s Hopkins and this was heavily referenced. Dr. Leveson really challenges some existing paradigms.