Home The Quarterly 2014 Resilience Engineering


Embracing the Unexpected: Resilience Engineering In Australian Healthcare Print E-mail
The Quarterly 2014

This article was written by Dr Susan Keam, derived from material presented by Dr Robyn Clay-Williams and Professor Jeffrey Braithwaite on the 14th  of August 2013 for the RACMA Interact Webinar series. A recording of the webinar can be found on the RACMA website here.

Safety in patient care  

Although we have made significant inroads into designing systems to improve the health care system, patients still receive highly variable, frequently inappropriate, and too often, unsafe care1.

A typical patient safety approach to health systems is “when we have problems we better go fix them”. Safety in health is generally described in negative terms of how risky and unsafe a system is, rather than in positive terms of how safe it is. As a consequence, the solution is usually focused on designing processes and methods to make the system safer. This is not the only approach to patient safety in healthcare, but it is the predominant way.

How does your organisation work?  

The most prevalent model of thinking about how an organisation works is a linear, hierarchical model, like that shown below.

Based on the linear, hierarchical approach, the methods we use for dealing with error are also linear Examples include cause and effect models, root cause analysis, and Reason’s Swiss Cheese model2. Reason’s model is based on the premise is that if any of the events contributing to the error had not happened the poor patient outcome would not have occurred; see example below.

The reality

However, in reality, healthcare systems are not linear and hierarchical, but are complex adaptive systems with multiple feedback loops, like this causal loop model of an emergency department::

Results of a study of people working in an emergency department who were asked a series of questions to determine their problem solving and socialising networks confirm the complex adaptive nature of healthcare systems. For instance, when asked “who would you go to if you had a problem”, the network diagram (see below) showed that generally people go to others within the same group (i.e. doctors go to doctors, nurses go to nurses etc.) rather than going to another group for the solution3.

Staff socialising patterns were also heavily rooted in the indiviual professional groups. So, healthcare isn’t just linear, but is a complex adaptive system with professionalised clusters whose behaviours are not team-oriented and are not easy to predict, especially through a lens of linearity.
Another factor to consider in healthcare is variation, and this is not limited to geographic or outcomes variance (e.g. length of waiting lists, who has caesarian sections). In all aspects of behaviour, variance in performance can lead to unexpected outcomes, not only between individual's, but also within an individuals day-to-day performance (everyone has “off” days).

A new way of thinking is needed  

Current thinking (linear reductionism; cause and effect logic) has got us to where we are today, but this has not always achieved the gains we would like to see, so we need to consider a new approach if we want to improve performance. A group of safety experts meeting in Denmark each year has looked at safety in a different way, and in doing so have critiqued the linear cause and effect model (which focuses on errors rather than things that are going right), arguing that we need to change the way things are done in safety management. They have coined the phrases “Safety I” and “Safety II” to describe the current and new ways of thinking. The search for error and rooting it out is Safety I thinking, whereas Safety II focuses on identifying and replicating success.

Safety I is reactive, focusing on the (relative) absence of adverse events, and assumes that safety can be achieved by finding, and eliminating the causes of adverse events. In contrast, Safety II is proactive (working to strengthen the healthcare system), seeks to identify the capacity to succeed under varying conditions, and focuses on what goes right, so that the number of intended and acceptable outcomes is as high as possible every day4.

For instance an error rate of 10 failures/100 events would be seen as a 10% failure rate in Safety I thinking, but a 90% success rate in Safety II thinking, reflecting the many potential errors that were prevented. Safety 2 thinking takes the approach that we need to look at where the health system does well, so that those practices can be replicated and spread.

Stereotypical understanding of safety

In Safety I thinking, a typical understanding of safety relies on the ‘find and fix’ principle. Investments in safety and the core activity of treating patients are seen as costs and therefore can be difficult to justify or sustain. Having a focus on what goes right receives little encouragement and there is little demand from authorities and regulators to look at what works well and how we can disseminate what works better, and if someone should, there is little help to be found. Safety I thinking follows a reactive safety management cycle, such as that illustrated by the World Health Organisation (WHO).

In their critical analysis of Safety I thinking, Hollnagel and colleagues concluded that Safety I is a highly technocratic and largely retrospective model of learning. It uses reactive, not proactive, forms of analysis and problem-solving, and focuses on the10-20% of breaches versus the 80-90% of instances that maintain day to day safety. There is a poor understanding of everyday work, including organisational culture and politics. Of interest, there has not been a comprehensive study across an entire health system to show that implementing Safety I processes has reduced the rate of harm.
Safety II however is a different way of looking at safety and of applying many familiar methods and techniques. It asks us to identify things that go right and analyse why they work well, and requires proactive management of performance variability, not just constraints and avoidance. It asks us to consider the question “What if we changed the definition of safety from ‘avoiding something that goes wrong’ to ‘ensuring that everything goes right’?” More precisely, Safety II is about ‘ensuring that the number of intended and acceptable outcomes is as high as possible’. This requires a deep understanding of everyday activities.
Safety I thinking assumes that the healthcare system is predictable and has little variability, with problems that are repeatable and recurring errors that are identical. Based on this, Safety I also assumes that you can identify problems and put in place barriers to prevent harm. This thinking works well in a linear, highly technical system (e.g. one that is highly mechanical), but doesn’t work so well in a highly variable system, such as healthcare, where variability in patients leads to variance in errors. This scenario is where Safety II thinking comes to the fore, and we need to develop organisational resilience in healthcare to manage the highly variable nature of the system.

What about resilience…

According to Martin-Breen and Anderies5, for an individual or object, resilience is can be visualised as  bouncing back after stress, enduring greater stresses or being disturbed less by a given amount of stress. For a system, resilience can be thought of as maintaining system function in the event of a disturbance. For an adaptive system such as healthcare, resilience is the ability to withstand, recover from and reorganise in response to crises.

A look at history shows that there have been different ways of dealing with problems and error. Between the 1930s and the 1980s, the focus on what went wrong and correcting problematic processes achieved a certain level of performance and sustainability. However, as we moved beyond the 1980s and systems became more complex, a higher level of performance was needed, which led to the development of the theory of error, based on analysis of system failure and acccidents and identification of repeatable events.

Despite the rigour of data analysis and the development of rules and compliance codes to avoid unacceptable risk, errors still occur. This has led to the recognition that other approaches are required, in particular to account for systems that are not linear and predictable. One such theory, the ‘theory of action’, is based on the premise that the more things go right, the fewer things there are to go wrong, and that systems can adjust to sustain performance in a range of conditions.

Safety II thinking encourages people and systems to adjust. It sees errors in normal performance as part of normal variability, and looks for ways of reducing variance, as this automatically reduces errors. It also helps to identify the boundaries of variability beyond which an error or crisis will occur.

…and system resilience?

When an unexpected event occurs there is a degree of loss followed by either recovery or an alternative state. Resilience aims to minimise losses and recovery time.

Another way of thinking about systems resilience that can be applied to healthcare systems is: “resilience is the intrinsic ability of a system to adjust its functioning prior to, during or following changes/ disturbances in order to sustain required operations under expected or unexpected conditions” 6.
To achieve a maximally sustainable system there needs to be an optimal balance between resilience (diversity and interconnectivity) and efficiency. Too much efficiency creates a brittle system; conversely, too much diversity and flexibility means that nothing gets done. The regulation and standardised procedures required for efficiency create an environment where errors are low, so long as events are predictable and not unexpected. However, an unexpected event can cause major problems, because the system is not designed to handle the variance. In a highly flexible system, there is no repeatability, there are no targets or direction and the system is not effective and moves towards stagnation.

Reason’s '3 Bucket' model7 is a good self-assessment tool for frontline health workers when analysing performance to see if a crisis or problem is likely. The model considers good and bad factors relating to self, context and task. The ‘self’ bucket considers 'how am I going at the moment' - am I poorly trained, fatigued or overloaded or is everything going well. The ‘context’ bucket considers the current environment - is everyone busy, have colleagues seen this before, do we have enough support, or are resources sufficient.  The ‘task’ bucket considers the activity to be performed -  is it straightforward or complex. The clinician should consider how much is in each “bucket”. If there are more bad than good factors overall, then the tipping point at which an error or crisis can occur has been reached.

First story, second story

The first story (Safety I) gives a linear account of the problem and looks back over time and the history of the problem to identify what can be changed to fix the problem. It has a story line of  “Things have gone wrong » Find out what happened » Attribute actions to people » Uncover the root causes » Fix the systems so this doesn’t happen again”.

The second story (Safety II) is more complex than the first story. It’s not linear at all and has multiple interacting variables. It has a storyline of “Uncover how come we did this many times previously and things went right (the procedure has been done correctly many times before, and the error is a variation to the norm » Strengthen the systems so we do more things well and thereby reduce variance”.

Productive insights are generated from the ‘second story’ that lies behind the ‘first story’ of incidents and accidents. First stories are accounts of the ‘celebrated’ accidents which categorise them as both catastrophes and blunders . According to Cook, Woods and Miller,8 second stories tell how ‘multiple interacting factors in complex systems can combine to produce systemic vulnerabilities to failure … the system usually … manages risk but sometimes fails.’ We need to recognise that we can never root out all errors, so we should focus on strengthening systems to prevent errors.
Resilience is an important part of the second story. It is a property of systems, conferring the ability to remain intact and functional despite the presence of threats to their integrity and function. It is the opposite of brittleness and aspires to be a theory of systemic function.
Implications of new ways of thinking - a practical idea…

This model, developed by Eric Hollnagel9, can be used to evaluate the resilience of a system and to identify areas in which system resilience can be improved. It illustrates the continuum between a learning and adapting organisation, and identifies the four characteristics associated with achieving resilience.

  1. Anticipate. Is the system reactive or can it see what is coming up? Over what time horizon does the system operate? How are the cost/benefits calculated? Is the time horizon such that it allows people to anticipate what lies ahead? At what point do people decide that something is critical enough that action has to occur?
  2. Monitor. Are there tools and systems in place to monitor in real time whether things are going well/not going well? If you can identify when things are getting overloaded and likely to fail and  you can intervene early, there is less variance, and less resource is required to fix it;
  3. Respond. When there is a problem, are there response systems in place to allow an early intervention?
  4. Learn. Are feedback mechanisms in place so that lessons learnt can be applied going forward? Is the organisation able to change if required?
The model also recognises that an organisation's context and its culture and structure, are important in the development of resilience (i.e. the ability to anticipate, monitor, respond and learn). So the system needs to have a culture that encourages such thinking and has the appropriate resources and structure in place to support a “can we deal with that ahead of time” type of approach. To be resilient, an organisation needs a receptive and proactive culture that is willing to accept a greater level of risk while a new system is put in place. It also needs to recognise that there is a place for both Safety I and Safety II thinking, in that Safety I works well for technical, unvarying activities, while Safety II is better for variable activities.

Hollnagel is the leading thinker in explicating safety-I and safety-II. He refers to this model.

Proactive Safety Management  

Hollnagel and colleagues4 offer some practical suggestions for beginning the thinking process:
  • Look at what goes right as well as what goes wrong
  • When something has gone wrong, look for everyday performance variability rather than specific causes
  • Look at what happens regularly and focus on the frequency and severity of events
  • Allow time to reflect, learn and communicate
  • Remain mindful i.e. sensitive to the possibility of failure
Learning from everyday activities and things that go right is key to resilience. If we do this well, then we may begin to build systems where fewer problems happen. The bottom line is that health organisations and systems should understand their own goals and understand the distinction between Safety-I and Safety-II. If resilient health care can do this, it may become a de facto leader rather than a hapless follower.

Professor Jeffrey Braithwaite

1 Runciman WB, Hunt TD, Hannaford NA, Hibbert PD, Westbrook JI, Coiera EW, Day RO, Hindmarsh DM, McGlynn EA, Braithwaite J. CareTrack: assessing the appropriateness of health care delivery in Australia. Medical Journal of Australia 2012, 197(10):549.
2 Reason J. Human error: models and management. BMJ 2000;320(7237):768-70.
3 Creswick N, Westbrook J. Examining the socialising and problem-solving networks of clinicians on a hospital ward. Conference Proceedings of Social Science Methodology Conference of the Australian Consortium for Social and Political Research (ACSPR) 2006.
4 Hollnagel E, Braithwaite J, Wears R. Resilient health care. Surrey, UK: Ashgate Publishing Limited, 2013.
5 Martin-Breen P, Anderies JM. Resilience: A literature review. Institute of Development Studies, Rockefeller Foundation 2011.
6 Hollnagel E, Pariès J, Woods DD, Wreathall J. Resilience engineering in practice: a guidebook. Surrey, UK: Ashgate Publishing Limited, 2011.
7 Reason JT. Beyond the organisational accident: the need for ‘‘error wisdom’’ on the frontline. Quality & Safety in Health Care 2004(13(Suppl II):ii28–ii33).
8 Cook R, Woods DD, Miller C. A tale of two stories: contrasting views of patient safety. Report from a workshop on assembling the scientific basis for progress on patient safety. Chicago: National Patient Safety Foundation, 1998.
9 Hollnagel E. Epilogue: RAG - the Resilience Analysis Grid In: Hollnagel E, Paries J, Woods D, Wreathall J, editors. Resilience engineering in practice: a guidebook. Surrey, UK: Ashgate Publishing Limited, 2011.

The Royal Australasian College of Medical Administrators
Prof Jeffrey Braithwaite and Dr Robyn Clay-William, , p
www.racma.edu.au /index.php?option=com_content&view=article&id=676&Itemid=398