ISO 26262 Fault Metrics Intro

In today’s post, we will take a deep dive into Part 5 of ISO 26262, which covers product development at the hardware level. In particular, we will take a close look at the Fault Metrics defined by the standard. This part is where ISO 26262 differs the most from IEC 61508. As we will see below, some of these differences are just terminology, while others are more fundamental.

Hardware Design

Part 5 of the standard is dedicated to the development of the hardware required to achieve safety goals (software is covered in the next part). In this part, the technical safety requirements developed in Part 4 are allocated to specific hardware and software designs. This could be thought of as equivalent to detailed engineering in a typical IEC 61511 project.

Without going too deep into the details, the ISO standard requires that the design consider several factors, including:

Response to failures, including transient faults
Diagnostic capabilities
Consideration of fault detection times
Expected failure rates of components
Design verification requirements
Consistency with the higher-level safety specifications

The hardware detailed design is captured in three main deliverables

Hardware Safety Requirements Specification
Hardware-software Interface Specification
Hardware Safety Requirements Verification Report

The ISO standard does not go into great detail on the hardware design process (neither does IEC 61508), so I will not either.

The later sections of Part 5 discuss the quantitative verification of the hardware via various metrics, which is where the rest of this article will focus.

ISO 26262 Fault Metrics

Clause 8 covers the evaluation of the hardware architectural metrics. Specifically, these metrics are intended to evaluate the effectiveness of the hardware architecture in dealing with random failures. This is distinct from the evaluation of random hardware failures (i.e. ASIL) covered in Clause 9.

Diagnostic coverage is defined much the same way as it is in the IEC standards. Diagnostic coverage is required to be estimated based on failure rates from recognized industry source, statistics based on field returns or tests, expert judgement.

The two metrics defined in this part are:

Single-point Fault Metric – measures the robustness of the design to single-point and residual faults. Higher is better.
Latent-fault Metric – measures the robustness of the design to latent faults. Higher is better.

As with many of the ISO requirements, these metrics only apply to higher ASIL function (i.e. B, C, or D).

Before discussing the metrics, it is useful to remember the taxonomy of faults/failures (from Part 1) used by ISO 26262, which is different from IEC. The total failure rate λ can be broken down into:

λ = λ_SPF + λ_RF + λ_MPF + λ_S

where:

λ_SPF: Single Point Faults (i.e. a DU fault where there are no diagnostics)

λ_RF: Residual Faults (i.e. a DU fault not covered by diagnostics)

λ_MPF: Multiple Point Faults (i.e. a combination of independent SPFs)

λ_S: Safe Faults

Single-point Fault Metric

The single-point fault metric is defined as the sum of the multiple-point faults and the safe faults divided by the total failure rate, i.e. the following ratio: Σ(λ_MPF + λ_S) / Σ(λ)

Note: The name “single-point fault metric may initially be confusing, since the single point fault rate (λ_SPF) does not appear in the formula! However, the formula can equivalently be written as: 1- Σ(λ_SPF + λ_RF) / Σ(λ)

This ratio looks suspiciously like the IEC 61508 concept of Safe Failure Fraction (SFF), with the notable exception that multiple-point faults are also considered “safe”. The inclusion of multiple-point faults is a somewhat unusual approach, but it is probably why the latent-fault metric is also calculated.

A quantitative target for the single-point fault metric is set by the standard based on the ASIL target:

By combining safe faults and multiple-point faults into the same metric, this metric has a similar impact to the SFF-based hardware fault tolerance requirements in IEC 61508. In other words, if the single-point fault metric is to low, additional fault tolerance will convert those faults to multiple point faults and improve the metric. Sound familiar?

Latent Fault Metric

The latent fault metric is defined as the sum of the multiple-point faults that are perceived by the driver or detected by diagnostics plus the safe faults divided by the total multiple-point and safe faults, i.e. the following ratio: Σ(λ_MPF(Per/Det) +λ_S) /Σ(λ_MPF +λ_S).

Again, this concept looks very similar to the concept of IEC 61508 diagnostic coverage, except that it also includes the possibility of driver “perception” of and response to faults. This is generally not considered in IEC 61508 (or IEC 61511) since most systems are dormant and activate on demand.

A quantitative target value for the latent-fault metric is set by the standard based on the ASIL target:

My interpretation of the two required fault metrics is that they are a sort of roll-up of several IEC concepts, including diagnostic coverage, safe failure fraction, and hardware fault tolerance. The end result should be that reasonable amounts of diagnostics and fault tolerance should be built into the system before the ASIL is even calculated. Of course, the SFF concept and the HFT requirements in the IEC standards are somewhat notorious for the “numbers games” they have inspired (e.g. see here and here). It would be interesting to see if the automotive industry is more successful in this regard.

Wrap Up

Key takeaways from this part of the article include:

The hardware design process is not specified in detailed in the ISO 26262 standard, but includes familiar concepts such as diagnostic coverage, failure rates, and verifications.
The standard includes two fault metrics: (i) the single-point fault metric and (ii) the latent fault metric, which are similar in concept to the safe failure fraction (SFF) and diagnostic coverage (DC) metrics from IEC 61508

Part 5 of the standard is also where the Automotive Safety Integrity Level (ASIL) is introduced. We will cover ASIL in detail in a future post. Thanks for reading!

Stephen Thomas, PE, CFSE

Stephen is the founder and editor of functionalsafetyengineer.com. He is a functional safety expert with over 26 years of experience. He is currently a system safety engineer with a leading developer of autonomous vehicle technology. He is a member of the IEC 61508 and IEC 61511 functional safety committees. He is a member of the non-profit CFSE Advisory Board advising the exida CFSE program. He is the Director of Education & Professional Development for the International System Safety Society and an associate editor for the Journal of System Safety.

Follow Me on LinkedIn