blog/posts/high-reliability-organizations.org

---
title: "High reliability organizations"
date: 2022-06-03
tags: management, social science
toc: false
---

[cite/t:@dietterich2018_robus_artif_intel_robus_human_organ] is an
interesting article about how to make /robust/ AI. High risk
situations require the combined AI and human system to operate as a
high reliability organization (HRO). Only such an organization can
have sufficiently strong safety and reliability properties to ensure
that powerful AI systems will not amplify human mistakes.

* Reliability and high-reliability organizations

The concept of high reliability organization (HRO) comes from
[cite/t:@weick1999_organ]. Examples of HROs include nuclear power
plants, aircraft carriers, air traffic control systems, and space
shuttles. They share several characteristics: an unforgiving
environment, vast potential for error, and dramatic scales in the case
of a failure.

The paper identifies five processes common to HROs, that they group
into the concept of /mindfulness/ (a kind of "enriched
awareness"). Mindfulness is about allocating and conserving attention
of the group. It includes both being consciously aware of the
situation and /acting/ on this understanding.

This mindfulness leads to the capacity to discover and manage
unexpected events, which in turn leads to reliability.

* Characteristics of a high reliability organization

An HRO is an organization with the following five attributes.

** Preoccupation with failure

Failures in HROs are extremely rare. To make it easier to learn from
them, the organization has to broaden the data set by expanding the
definition of failure and studying all types of anomalies and near
misses. Additionally, the analysis is much richer, and always
considers the reliability of the entire system, even for localized
failures.

HROS also study the /absence/ of failure: why it didn't fail, and the
possibility that no flaws were identified because there wasn't enough
attention to potential flaws.

To further increase the number of data point to study, HROs encourage
reporting all mistakes and anomalies by anyone. Contrary to most
organizations, members are rewarded for reporting potential failures,
even if their analysis is wrong or if they are responsible for
them. This creates an atmosphere of "psychological safety" essential
for transparency and honesty in anomaly reporting.

** Reluctance to simplify interpretations

HROs avoid having a single interpretation for a given event. They
encourage generating multiple, complex, contradicting interpretations
for every phenomenon. These varied interpretations enlarge the number
of concurrent precautions. Redundancy is implemented not only via
duplication, but via skepticism of existing systems.

People are encouraged to have different views, different backgrounds,
and are re-trained often. To resolve the contradictions and the
oppositions of views, interpersonal and human skills are highly
valued, possibly more than technical skills.

** Sensitivity to operations

HROs rely a lot on "situational awareness". They are ensuring that no
[[https://en.wikipedia.org/wiki/Emergence][emergent phenomena]] emerge in the system: all outputs should always be
explained by the known inputs. Otherwise, there might be other forces
at work that need to be identified and dealt with. A small group of
people may be dedicated to this awareness at all times.

** Commitments to resilience

HROs train people to be experts at combining all processes and events
to improve their reactions and their improvisation skills. Everyone
should be an expert at anticipating potential adverse events, and
managing surprise. When events get outside normal operational
boundaries, organizations members self-organize into small dedicated
teams to improvise solutions to novel problems.

** Underspecification of structures

There is no fixed reporting path, anyone can raise an alarm and halt
operations. Everyone can take decisions related to their technical
expertise. Information is spread directly through the organization, so
that people with the right expertise are warned first. Power is
delegated to operation personal, but management is completely
available at all times.

* HROs vs non-HROs

Non-HROs increasingly exhibit some properties of HROs. This may be due
to the fact that highly competitive environments with short cycles
create unforgiving conditions (high performance standards, low
tolerance for errors). However, most everyday organizations do not put
failure at the heart of their thinking.

Failures in non-HROs come from the same sources: cultural assumptions
on the effectiveness or accuracy of previous precautions measures.

Preoccupation with failure also reveal the couplings and the complex
interactions in the manipulated systems. This in turn leads to
uncoupling and less emergent behaviour over time. People understand
better long-term, complex interactions.

* Reliability vs performance, and the importance of learning

An interesting discussion is around the (alleged) trade-off between
reliability and performance. It is assumed that HROs put the focus on
reliability at the cost of throughput. As a consequence, it may not
make sense for ordinary organizations to put as much emphasis on
safety and reliability, as the cost to the business may be
prohibitive.

However, investments in safety can also be viewed as investments in
/learning/. HROs view safety and reliability as a process of search
and learning (constant search for anomalies, learning the interactions
between the parts of a complex system, ensuring we can link outputs to
known inputs). As such, investments in safety encourage collective
knowledge production and dissemination.

Mindfulness also stimulates intrinsic motivation and perceptions of
efficacy and control, which increase individual performance. (People
who strongly believe they are in control of their own output are more
motivated and more efficient.)

HROs may encourage mindfulness based on operational necessity in front
of the catastrophic consequences of any failure, but non-HROs can
adopt the same practice to boost efficiency and learning to gain
competitive advantage.

Additional lessons that can be learned from HROs (implicit in the
previous discussion):
1. The expectation of surprise is an organizational resource because
   it promotes real-time attentiveness and discovery.
2. Anomalous events should be treated as outcomes rather than
   accidents, to encourage search for sources and causes.
3. Errors should be made as conspicuous as possible to undermine
   self-deception and concealment.
4. Reliability requires diversity, duplication, overlap, and a varied
   response repertoire, whereas efficiency requires homogeneity,
   specialization, non-redundancy, and standardization.
5. Interpersonal skills are just as important in HROs as are technical
   skills.

* References