The dragons of the unknown; part 6 – Safety II, a new way of looking at safety

Introduction

This is the sixth post in a series about problems that fascinate me, that I think are important and interesting. The series draws on important work from the fields of safety critical systems and from the study of complexity, specifically complex socio-technical systems. This will be the theme of my keynote at EuroSTAR in The Hague (November 12th-15th 2018).

The first post was a reflection, based on personal experience, on the corporate preference for building bureaucracy rather than dealing with complex reality, “Facing the dragons part 1 – corporate bureaucracies”. The second post was about the nature of complex systems, “part 2 – crucial features of complex systems”. The third followed on from part 2, and talked about the impossibility of knowing exactly how complex socio-technical systems will behave with the result that it is impossible to specify them precisely, “part 3 – I don’t know what’s going on”.

The fourth post, “part 4 – a brief history of accident models”, looks at accident models, i.e. the way that safety experts mentally frame accidents when they try to work out what caused them.

The fifth post, “part 5 – accident investigations and treating people fairly”, looks at weaknesses of of the way that we have traditionally investigated accidents and failures, assuming neat linearity with clear cause and effect. In particular, our use of root cause analysis, and willingness to blame people for accidents is hard to justify.

This post looks at the response of the safety criticial community to such problems and the necessary trade offs that a practical response requires. The result, Safety II, is intriguing and has important lessons for software testers.

More safety means less feedback

2017 - safest year in aviation historyIn 2017 nobody was killed on a scheduled passenger flight (sadly that won’t be the case in 2018). This prompted the South China Morning Post to produce this striking graphic, which I’ve reproduced here in butchered form. Please, please look at the original. My version is just a crude taster.

Increasing safety is obviously good news, but it poses a problem for safety professionals. If you rely on accidents for feedback then reducing accidents will choke off the feedback you need to keep improving, to keep safe. The safer that systems become the less data is available. Remember what William Langewiesche said (see part 4).

“What can go wrong usually goes right – and then people draw the wrong conclusions.”

If accidents have become rare, but are extremely serious when they do occur, then it will be highly counter-productive if investigators pick out people’s actions that deviated from, or adapted, the procedures that management or designers assumed were being followed.

These deviations are always present in complex socio-technical systems that are running successfully and it is misleading to focus on them as if they were a necessary and sufficient cause when there is an accident. The deviations may have been a necessary cause of that particular accident, but in a complex system they were almost certainly not sufficient. These very deviations may have previously ensured the overall system would work. Removing the deviation will not necessarily make the system safer.

There might be fewer opportunities to learn from things going wrong, but there’s a huge amount to learn from all the cases that go right, provided we look. We need to try and understand the patterns, the constraints and the factors that are likely to amplify desired emergent behaviour and those that will dampen the undesirable or dangerous. In order to create a better understanding of how complex socio-technical systems can work safely we have to look at how people are using them when everything works, not just when there are accidents.

Safety II – learning from what goes right

Complex systems and accidents might be beyond our comprehension but that doesn’t mean we should just accept that “shit happens”. That is too flippant and fatalistic, two words that you can never apply to the safety critical people.

Safety I is shorthand for the old safety world view, which focused on failure. Its utility has been hindered by the relative lack of feedback from things going wrong, and the danger that paying insufficient attention to how and why things normally go right will lead to the wrong lessons being learned from the failures that do occur.

Safety ISafety I assumed linear cause and effect with root causes (see part 5). It was therefore prone to reaching a dangerously simplistic verdict of human error.

This diagram illustrates the focus of Safety I on the unusual, on the bad outcomes. I have copied, and slight adapted, the Safety I and Safety II diagrams from a document produced by Eurocontrol, (The European Organisation for the Safety of Air Navigation) “From Safety-I to Safety-II: A White Paper” (PDF, opens in new tab).

Incidentally, I don’t know why Safety I and Safety II are routinely illustrated using a normal distribution with the Safety I focus kicking in at two standard deviations. I haven’t been able to find a satisfactory explanation for that. I assume that this is simply for illustrative purposes.

Safety IIIf Safety I wants to prevent bad outcomes, in contrast Safety II looks at how good outcomes are reached. Safety II is rooted in a more realistic understand of complex systems than Safety I and extends the focus to what goes right in systems. That entails a detailed examination of what people are doing with the system in the real world to keep it running. Instead of people being regarded as a weakness and a source of danger, Safety II assumes that people, and the adaptations they introduce to systems and processes, are the very reasons we usually get good, safe outcomes.

If we’ve been involved in the development of the system we might think that we have a good understanding of how the system should be working, but users will always, and rightly, be introducing variations that designers and testers had never envisaged. The old, Safety I, way of thinking regarded these variations as mistakes, but they are needed to keep the systems safe and efficient. We expect systems to be both, which leads on to the next point.

There’s a principle in safety critical systems called ETTO, the Efficiency Thoroughness Trade Off. It was devised by Erik Hollnagel, though it might be more accurate to say he made it explicit and popularised the idea. The idea should be very familiar to people who have worked with complex systems. Hollnagel argues that it is impossible to maximise both efficiency and thoroughness. I’m usually reluctant to cite Wikipedia as a source, but its article on ETTO explains it more succinctly than Hollnagel himself did.

“There is a trade-off between efficiency or effectiveness on one hand, and thoroughness (such as safety assurance and human reliability) on the other. In accordance with this principle, demands for productivity tend to reduce thoroughness while demands for safety reduce efficiency.”

Making the system more efficient makes it less likely that it will achieve its important goals. Chasing these goals comes at the expense of efficiency. That has huge implications for safety critical systems. Safety requires some redundancy, duplication and fallbacks. These are inefficient. Efficiencies eliminate margins of error, with potentially dangerous results.

ETTO recognises the tension between organisations’ need to deliver a safe, reliable product or service, and the pressure to do so at the lowest cost possible. In practice, the conflict in goals is usually fully resolved only at the sharp end, where people do the real work and run the systems.

airline job adAs an example, an airline might offer a punctuality bonus to staff. For an airline safety obviously has the highest priority, but if it was an absolute priority, the only consideration, then it could not contemplate any incentive that would encourage crews to speed up turnarounds on the ground, or to persevere with a landing when prudence would dictate a “go around”. In truth, if safety were an absolute priority, with no element of risk being tolerated, would planes ever take off?

People are under pressure to make the systems efficient, but they are expected to keep the system safe, which inevitably introduces inefficiencies. This tension results in a constant, shifting, pattern of trade-offs and compromises. The danger, as “drift into failure” predicts (see part 4), is that this can lead to a gradual erosion of safety margins.

The old view of safety was to constrain people, reducing variability in the way they use systems. Variability was a human weakness. In Safety II variability in the way that people use the system is seen as a way to ensure the system adapts to stay effective. Humans aren’t seen as a potential weakness, but as a source of flexibility and resilience. Instead of saying “they didn’t follow the set process therefore that caused the accident”, the Safety II approach means asking “why would that have seemed like the right thing to do at the time? Was that normally a safe action?”. Investigations need to learn through asking questions, not making judgments – a lesson it was vital I learned as an inexperienced auditor.

Emergence means that the behaviour of a complex system can’t be predicted from the behaviour of its components. Testers therefore have to think very carefully about when we should apply simple pass or fail criteria. The safety critical community explicitly reject the idea of pass/fail, or the bimodal principle as they call it (see part 4). A flawed component can still be useful. A component working exactly as the designers, and even the users, intended can still contribute to disaster. It all depends on the context, what is happening elsewhere in the system, and testers need to explore the relationships between components and try to learn how people will respond.

Safety is an emergent property of the system. It’s not possible to design it into a system, to build it, or implement it. The system’s rules, controls, and constraints might prevent safety emerging, but they can only enable it. They can create the potential for people to keep the system safe but they cannot guarantee it. Safety depends on user responses and adaptations.

Adaptation means the system is constantly changing as the problem changes, as the environment changes, and as the operators respond to change with their own variations. People manage safety with countless small adjustments.

we don't make mistakesThere is a popular internet meme, “we don’t make mistakes – we do variations”. It is particularly relevant to the safety critical community, who have picked up on it because it neatly encapsulates their thinking, e.g. this article by Steven Shorrock, “Safety-II and Just Culture: Where Now?”. Shorrock, in line with others in the safety critical community, argues that if the corporate culture is to be just and treat people fairly then it is important that the variations that users introduce are understood, rather than being used as evidence to punish them when there is an accident. Pinning the blame on people is not only an abdication of responsibility, it is unjust. As I’ve already argued (see part 5), it’s an ethical issue.

Operator adjustments are vital to keep systems working and safe, which brings us to the idea of trust. A well-designed system has to trust the users to adapt appropriately as the problem changes. The designers and testers can’t know the problems the users will face in the wild. They have to confront the fact that dangerous dragons are lurking in the unknown, and the system has to trust the users with the freedom to stay in the safe zone, clear of the dragons, and out of the disastrous tail of the bell curve that illustrates Safety II.

Safety II and Cynefin

If you’re familiar with Cynefin then you might wonder about Safety II moving away from a focus on the tail of the distribution. Cynefin helps us understand that the tail is where we can find opportunities as well as threats. It’s worth stressing that Safety II does encompass Safety I and the dangerous tail of the distribution. It must not be a binary choice of focusing on either the tail or the body. We have to try to understand not only what happens in the tail, how people and systems can inadvertently end up there, but also what operators do to keep out of the tail.

The Cynefin framework and Safety II share a similar perspective on complexity and the need to allow for, and encourage, variation. I have written about Cynefin elsewhere, e.g. in two articles I wrote for the Association for Software Testing, and there isn’t room to repeat that here. However, I do strongly recommend that testers familiarise themselves with the framework.

To sum it up very briefly, Cynefin helps us to make sense of problems by assigning them to one of four different categories, the obvious, the complicated (the obvious and complicated being related in that problems have predictable causes and resolutions), the complex and the chaotic. Depending on the category different approaches are required. In the case of software development the challenge is to learn more about the problem in order to turn it from a complex activity into a complicated one that we can manage more easily.

Applying Cynefin would result in more emphasis on what’s happening in the tails of the distribution, because that’s where we will find the threats to be avoided and the opportunities to be exploited. Nevertheless, Cynefin isn’t like the old Safety I just because they both focus on the tails. They embody totally different worldviews.

Safety II is an alternative way of looking at accidents, failure and safety. It is not THE definitive way, that renders all others dated, false and heretical. The Safety I approach still has its place, but it’s important to remember its limitations.

Everything flows and nothing abides

Thinking about linear cause and effect, and decomposing components are still vital in helping us understand how different parts of the system work, but they offer only a very limited and incomplete view of what we should be trying to learn. They provide a way of starting to build our understanding, but we mustn’t stop there.

We also have to venture out into the realms of the unknown and often unknowable, to try to understand more about what might happen when the components combine with each other and with humans in complex socio-technical systems. This is when objects become processes, when static elements become part of a flow that is apparent only when we zoom out to take in a bigger picture in time and space.

The idea of understanding objects by stepping back and looking at how they flow and mutate over time has a long, philosophical and scientific history. 2,500 years ago Heraclitus wrote.

“Everything flows and nothing abides. Everything gives way and nothing stays fixed.”

Professor Michael McIntyre (Professor of Atmospheric Dynamics, Cambridge University) put it well in a fascinating BBC documentary, “The secret life of waves”.

“If we want to understand things in depth we usually need to think of them both as objects and as dynamic processes and see how it all fits together. Understanding means being able to see something from more than one viewpoint.”

In my next post I will try to discuss some of the implications for software testing of the issues I have raised here, the need to look from more than one viewpoint, to think about how users can keep systems going, and dealing with the inevitability of failure. That will lead us to resilience engineering.everything flows

Advertisements

The dragons of the unknown; part 5 – accident investigations and treating people fairly

Introduction

This is the fifth post in a series about problems that fascinate me, that I think are important and interesting. The series draws on important work from the fields of safety critical systems and from the study of complexity, specifically complex socio-technical systems. This will be the theme of my keynote at EuroSTAR in The Hague (November 12th-15th 2018).

The first post was a reflection, based on personal experience, on the corporate preference for building bureaucracy rather than dealing with complex reality, “Facing the dragons part 1 – corporate bureaucracies”. The second post was about the nature of complex systems, “part 2 – crucial features of complex systems”. The third followed on from part 2, and talked about the impossibility of knowing exactly how complex socio-technical systems will behave with the result that it is impossible to specify them precisely, “part 3 – I don’t know what’s going on”.

The fourth post “part 4 – a brief history of accident models” looks at accident models, i.e. the way that safety experts mentally frame accidents when they try to work out what caused them. This post looks at weaknesses of of the way that we have traditionally investigated accidents and failures, assuming neat linearity with clear cause and effect. In particular, our use of root cause analysis, and willingness to blame people for accidents is hard to justify.

The limitations of root cause analysis

root cause (fishbone) diagramOnce you accept that complex systems can’t have clear and neat links between causes and effects then the idea of root cause analysis becomes impossible to sustain. “Fishbone” cause and effect diagrams (like those used in Six Sigma) illustrate traditional thinking, that it is possible to track back from an adverse event to find a root cause that was both necessary and sufficient to bring it about.

The assumption of linearity with tidy causes and effects is no more than wishful thinking. Like the Domino Model (see “part 4 – a brief history of accident models”) it encourages people to think there is a single cause, and to stop looking when they’ve found it. It doesn’t even offer the insight of the Swiss Cheese Model (also see part 4) that there can be multiple contributory causes, all of them necessary but none of them sufficient to produce an accident. That is a key idea. When complex systems go wrong there is rarely a single cause; causes are necessary, but not sufficient.

complex airline systemHere is a more realistic depiction of what a complex socio-technical system. It is a representation of the operations control system for an airline. The specifics don’t matter. It is simply a good illustration of how messy a real, complex system looks when we try to depict it.

This is actually very similar to the insurance finance applications diagram I drew up for Y2K (see “part 1 – corporate bureaucracies”). There was no neat linearity. My diagram looked just like this, with a similar number of nodes, or systems most of which had multiple two-way interfaces with others. And that was just at the level of applications. There was some intimidating complexity within these systems.

As there is no single cause of failure the search for a root cause can be counter-productive. There are always flaws, bugs, problems, deviances from process, variations. So you can always fix on something that has gone wrong. But it’s not really a meaningful single cause. It’s arbitrary.

The root cause is just where you decide to stop looking. The cause is not something you discover. It’s something you choose and construct. The search for a root cause can mean attention will focus on something that is not inherently dangerous, something that had previously “failed” repeatedly but without any accident. The response might prevent that particular failure and therefore ensure there’s no recurrence of an identical accident. However, introducing a change, even if it’s a fix, to one part of a complex system affects the system in unpredictable ways. The change therefore creates new possibilities for failure that are unknown, even unknowable.

It’s always been hard, even counter-intuitive, to accept that we can have accidents & disasters without any new failure of a component, or even without any technical failure that investigators can identify and without external factors interfering with the system and its operators. We can still have air crashes for which no cause is ever found. The pressure to find an answer, any plausible answer, means there has always been an overwhelming temptation to fix the blame on people, on human error.

Human error – it’s the result of a problem, not the cause

If there’s an accident you can always find someone who screwed up, or who didn’t follow the rules, the standard, or the official process. One problem with that is the same applies when everything goes well. Something that troubled me in audit was realising that every project had problems, every application had bugs when it went live, and there were always deviations from the standards. But the reason smart people were deviating wasn’t that they were irresponsible. They were doing what they had to do to deliver the project. Variation was a sign of success as much as failure. Beating people up didn’t tell us anything useful, and it was appallingly unfair.

One of the rewarding aspects of working as an IT auditor was conducting post-implementation reviews and being able to defend developers who were being blamed unfairly for problem projects. The business would give them impossible jobs, complacently assuming the developers would pick up all the flak for the inevitable problems. When auditors, like me, called them out for being cynical and irresponsible they hated it. They used to say it was because I had a developer background and was angling for my next job. I didn’t care because I was right. Working in a good audit department requires you to build up a thick skin, and some healthy arrogance.

There always was some deviation from standards, and the tougher the challenge the more obvious they would be, but these allegedly deviant developers were the only reason anything was delivered at all, albeit by cutting a few corners.

It’s an ethical issue. Saying the cause of an accident is that people screwed up is opting for an easy answer that doesn’t offer any useful insights for the future and just pushes problems down the line.

Sidney Dekker used a colourful analogy. Dumping the blame on an individual after an accident is “peeing in your pants management” (PDF, opens in new tab).

“You feel relieved, but only for a short while… you start to feel cold and clammy and nasty. And you start stinking. And, oh by the way, you look like a fool.”

Putting the blame on human error doesn’t just stink. It obscures the deeper reasons for failure. It is the result of a problem, not the cause. It also encourages organisations to push for greater automation, in the vain hope that will produce greater safety and predictability, and fewer accidents.

The ironies of automation

An important part of the motivation to automate systems is that humans are seen as unreliable & inefficient. So they are replaced by automation, but that leaves the humans with jobs that are even more complex and even more vulnerable to errors. The attempt to remove errors creates fresh possibilities for even worse errors. As Lisanne Bainbridge wrote in a 1983 paper “The ironies of automation”;

“The more advanced a control system is… the more crucial may be the contribution of the human operator.”

There are all sorts of twists to this. Automation can mean the technology does all the work and operators have to watch a machine that’s in a steady-state, with nothing to respond to. That means they can lose attention & not intervene when they need to. If intervention is required the danger is that vital alerts will be lost if the system is throwing too much information at operators. There is a difficult balance to be struck between denying operators feedback, and thus lulling them into a sense that everything is fine, and swamping them with information. Further, if the technology is doing deeply complicated processing, are the operators really equipped to intervene? Will the system allow operators to override? Bainbridge makes the further point;

“The designer who tries to eliminate the operator still leaves the operator to do the tasks which the designer cannot think how to automate.”

This is a vital point. Systems are becoming more complex and the tasks left to the humans become ever more demanding. System designers have only a very limited understanding of what people will do with their systems. They don’t know. The only certainty is that people will respond and do things that are hard, or impossible, to predict. That is bound to deviate from formal processes, which have been defined in advance, but these deviations, or variations, will be necessary to make the systems work.

Acting on the assumption that these deviations are necessarily errors and “the cause” when a complex socio-technical system fails is ethically wrong. However, there is a further twist to the problem, summed up by the Law of Stretched Systems.

Stretched systems

Lawrence Hirschhorn’s Law of Stretched Systems is similar to the Fundamental Law of Traffic Congestion. New roads create more demand to use them, so new roads generate more traffic. Likewise, improvements to systems result in demands that the system, and the people, must do more. Hirschhorn seems to have come up with the law informally, but it has been popularised by the safety critical community, especially by David Woods and Richard Cook.

“Every system operates always at its capacity. As soon as there is some improvement, some new technology, we stretch it.”

And the corollary, furnished by Woods and Cook.

“Under resource pressure, the benefits of change are taken in increased productivity, pushing the system back to the edge of the performance envelope.”

Every change and improvement merely adds to the stress that operators are coping with. The obvious response is to place more emphasis on ergonomics and human factors, to try and ensure that the systems are tailored to the users’ needs and as easy to use as possible. That might be important, but it hardly resolved the problem. These improvements are themselves subject to the Law of Stretched Systems.

This was all first noticed in the 1990s after the First Gulf War. The US Army hadn’t been in serious combat for 18 years. Technology had advanced massively. Throughout the 1980s the army reorganised, putting more emphasis on technology and training. The intention was that the technology should ease the strain on users, reduce fatigue and be as simple to operate as possible. It didn’t pan out that way when the new army went to war. Anthony H. Cordesman and Abraham Wagner analysed in depth the lessons of the conflict. They were particularly interested in how the technology had been used.

“Virtually every advance in ergonomics was exploited to ask military personnel to do more, do it faster, and do it in more complex ways… New tactics and technology simply result in altering the pattern of human stress to achieve a new intensity and tempo of combat.”

Improvements in technology create greater demands on the technology – and the people who operate it. Competitive pressures push companies towards the limits of the system. If you introduce an enhancement to ease the strain on users then managers, or senior officers, will insist on exploiting the change. Complex socio-technical systems always operate at the limits.

This applies not only to soldiers operating high tech equipment. It applies also to the ordinary infantry soldier. In 1860 the British army was worried that troops had to carry 27kg into combat (PDF, opens in new tab). The load has now risen to 58kg. US soldiers have to carry almost 9kg of batteries alone. The Taliban called NATO troops “donkeys”.

These issues don’t apply only to the military. They’ve prompted a huge amount of new thinking in safety critical industries, in particular healthcare and air transport.

The overdose – system behaviour is not explained by the behaviour of its component technology

Remember the traditional argument that any system that was not determimistic was inherently buggy and badly designed? See “part 2 – crucial features of complex systems”.

In reality that applies only to individual components, and even then complexity & thus bugginess can be inescapable. When you’re looking at the whole socio-technical system it just doesn’t stand up.

Introducing new controls, alerts and warnings doesn’t just increase the complexity of the technology as I mentioned earlier with the MIG jet designers (see part 4). These new features add to the burden on the people. Alerts and error message can swamp users of complex systems and they miss the information they really need to know.

I can’t recommend strongly enough the story told by Bob Wachter in “The overdose: harm in a wired hospital”.

A patient at a hospital in California received an overdose of 38½ times the correct amount. Investigation showed that the technology worked fine. All the individual systems and components performed as designed. They flagged up potential errors before they happened. So someone obviously screwed up. That would have been the traditional verdict. However, the hospital allowed Wachter to interview everyone involved in each of the steps. He observed how the systems were used in real conditions, not in a demonstration or test environment. Over five articles he told a compelling story that will force any fair reader to admit “yes, I’d have probably made the same error in those circumstances”.

Happily the patient survived the overdose. The hospital staff involved were not disciplined and were allowed to return to work. The hospital had to think long and hard about how it would try to prevent such mistakes recurring. The uncomfortable truth hey had to confront was that there were no simple answers. Blaming human error was a cop out. Adding more alerts would compound the problems staff were already facing; one of the causes of the mistake was the volume of alerts swamping staff making it hard, or impossible, to sift out the vital warnings from the important and the merely useful.

One of the hard lessons was that focussing on making individual components more reliable had harmed the overall system. The story is an important illustration of the maxim in the safety critical community that trying to make systems safer can make them less safe.

Some system changes were required and made, but the hospital realised that the deeper problem was organisational and cultural. They made the brave decision to allow Wachter to publicise his investigation and his series of articles is well worth reading.

The response of the safety critical community to such problems and the necessary trade offs that a practical response requires, is intriguing with important lessons for software testers. I shall turn to this in my next post, “part 6 – Safety II, a new way of looking at safety”.

Games & scripts – stopping at red lights

I’ve been reading about game development lately. It’s fascinating. I’m sure there is much that conventional developers and testers could learn from the games industry. Designers need to know their users, how they think and what they expect. That’s obvious, but we’ve often only paid lip service to these matters. The best game designers have thought much more seriously and deeply about the users than most software developers.

Game designers have to know what their customers want from games. They need to understand why some approaches work and others fail. Developing the sort of games that players love is expensive. If the designers get it wrong then the financial consequences are ruinous. Inevitably the more thoughtful designers stray over into anthropology.

That is a subject I want to write about in more depth, but in the meantime this is a short essay I wrote in slightly amended form for Eurostar’s Test Huddle prompted by a blog by Chris Bateman the game designer. Bateman was talking about the nature of play and games, and an example he used made me think about how testing has traditionally been conducted.Midtown Madness

“Ramon Romero of Microsoft’s Games User Research showed footage of various random people playing games for the first time. I was particular touched by the middle aged man who drove around in Midtown Madness as if he was playing a driving simulator. ‘This is a great game’, he said, as he stopped at a red light and waited for it to change.”

That’s ridiculous isn’t it? Treating a maniacal, race game as a driving simulator? Maybe, but I’m not sure. That user was enjoying himself, playing the game entirely in line with his expectations of what the game should be doing.

The story reminded me of testers who embark on their testing armed with beliefs that are just as wildly misplaced about what the game, sorry, the application should be doing. They might have exhaustive scripts generated from requirements documents that tell them exactly what the expected behaviour should be, and they could be dangerously deluded.

Most of my experience has been with financial applications. They have been highly complicated, and the great concern was always about the unexpected behaviour, the hidden functionality that could allow users to steal money or screw up the data.

Focusing on the documented requirements, happy paths and the expected errors is tackling an application like that Midtown Madness player; driving carefully around the city, stopping at red lights and scrupulously obeying the law. Then, when the application is released the real users rampage around discovering what it can really do.

Was that cheerfully naïve Midtown Madness player really ridiculous? He was just having fun his way. He wasn’t paid a good salary to play the game. The ones who are truly ridiculous are the testers who are paid to find out what applications can do and naively think they can do so by sticking to their scripts. Perhaps test scripts are rather like traffic regulations. Both say what someone thinks should be going on, but how much does that actually tell you about what is really happening out there, on the streets, in the wild where the real users play?

Service Virtualization interview about usability

Service VirtualizationThis interview with Service Virtualization appeared in January 2015. Initially when George Lawton approached me I wasn’t enthusiastic. I didn’t think I would have much to say. However, the questions set me thinking, and I felt they were relevant to my experience so I was happy to take part. It gave me something to do while I was waiting to fly back from EuroSTAR in Dublin!

How does usability relate to the notion of the purpose of a software project?

When I started in IT over 30 years ago I never heard the word usability. It was “user friendliness”, but that was just a nice thing to have. It was nice if your manager was friendly, but that was incidental to whether he was actually good at the job. Likewise, user friendliness was incidental. If everything else was ok then you could worry about that, but no-one was going to spend time or money, or sacrifice any functionality just to make the application user friendly. And what did “user friendly” mean anyway. “Who knows? Who cares? We’ve got serious work do do. Forget about that touchy feely stuff.”

The purpose of software development was to save money by automating clerical routines. Any online part of the system was a mildly anomalous relic of the past. It was just a way of getting the data into the system so the real work could be done. Ok, that’s an over-simplification, but I think there’s enough truth in it to illustrate why developers just didn’t much care about the users and their experience. Development moved on from that to changing the business, rather than merely changing the business’s bureaucracy, but it took a long time for these attitudes to shift.

The internet revolution turned everything upside down. Users are no longer employees who have to put up with whatever they’re given. They are more likely to be customers. They are ruthless and rightly so. Is your website confusing? Too slow to load? Your customers have gone to your rivals before you’ve even got anywhere near their credit card number.

The lesson that’s been getting hammered into the heads of software engineers over the last decade or so is that usability isn’t an extra. I hate the way that we traditionally called it a “non-functional requirement”, or one of the “quality criteria”. Usability is so important and integral to every product that telling developers that they’ve got to remember it is like telling drivers they’ve got to remember to use the steering wheel and the brakes. If they’re not doing these things as a matter of course they shouldn’t be allowed out in public. Usability has to be designed in from the very start. It can’t be considered separately.

What are the main problems in specifying for and designing for software usability?

Well, who’s using the application? Where are they? What is the platform? What else are they doing? Why are they using the application? Do they have an alternative to using your application, and if so, how do you keep them with yours? All these things can affect decisions you take that are going to have a massive impact on usability.

It’s payback time for software engineering. In the olden days it would have been easy to answer these questions, but we didn’t care. Now we have to care, and it’s all got horribly difficult.

These questions require serious research plus the experience and nous to make sound judgements with imperfect evidence.

In what ways do organisations lose track of the usability across the software development lifecycle?

I’ve already hinted at a major reason. Treating usability as a non-functional requirement or quality criterion is the wrong approach. That segregates the issue. It’s treated as being like the other quality criteria, the “…ities” like security, maintainability, portability, reliability. It creates the delusion that the core function is of primary importance and the other criteria can be tackled separately, even bolted on afterwards.

Lewis & Rieman came out with a great phrase fully 20 years ago to describe that mindset. They called it the peanut butter theory of usability. You built the application, and then at the end you smeared a nice interface over the top, like a layer of peanut butter (PDF, opens in new tab).

“Usability is seen as a spread that can be smeared over any design, however dreadful, with good results if the spread is thick enough. If the underlying functionality is confusing, then spread a graphical user interface on it. … If the user interface still has some problems, smear some manuals over it. If the manuals are still deficient, smear on some training which you force users to take.”

Of course they were talking specifically about the idea that usability was a matter of getting the interface right, and that it could be developed separately from the main application. However, this was an incredibly damaging fallacy amongst usability specialists in the 80s and 90s. There was a huge effort to try to justify this idea by experts like Hartson & Hix, Edmonds, and Green. Perhaps the arrival of Object Oriented technology contributed towards the confusion. A low level of coupling so that different parts of the system are independent of each other is a good thing. I wonder if that lured usability professionals into believing what they wanted to believe, that they could be independent from the grubby developers.

Usability professionals tried to persuaded themselves that they could operate a separate development lifecycle that would liberate them from the constraints and compromises that would be inevitable if they were fully integrated into development projects. The fallacy was flawed conceptually and architecturally. However, it was also a politically disastrous approach. The usability people made themselves even less visible, and were ignored at a time when they really needed to be getting more involved at the heart of the development process.

As I’ve explained, the developers were only too happy to ignore the usability people. They were following methods and lifecycles that couldn’t easily accommodate usability.

How can organisations incorporate the idea of usability engineering into the software development and testing process?

There aren’t any right answers, certainly none that will guarantee success. However, there are plenty of wrong answers. Historically in software development we’ve kidded ourselves thinking that the next fad, whether Structured Methods, Agile, CMMi or whatever, will transform us into rigorous, respected professionals who can craft high quality applications. Now some (like Structured Methods) suck, while others (like Agile) are far more positive, but the uncomfortable truth is that it’s all hard and the most important thing is our attitude. We have to acknowledge that development is inherently very difficult. Providing good UX is even harder and it’s not going to happen organically as a by-product of some over-arching transformation of the way we develop. We have to consciously work at it.

Whatever the answer is for any particular organisation it has to incorporate UX at the very heart of the process, from the start. Iteration and prototyping are both crucial. One of the few fundamental truths of development is that users can’t know what they want and like till they’ve seen what is possible and what might be provided.

Even before the first build there should have been some attempt to understand the users and how they might be using the proposed product. There should be walkthroughs of the proposed design. It’s important to get UX professionals involved, if at all possible. I think developers have advanced to the point that they are less likely to get it horribly wrong, but actually getting it right, and delivering good UX is asking too much. For that I think you need the professionals.

I do think that Agile is much better suited to producing good UX than traditional methods, but there are still dangers. A big one is that many Agile developers are understandably sceptical about anything that smells of Big Up-Front Analysis and Design. It’s possible to strike a balance and learn about your users and their needs without committing to detailed functional requirements and design.

How can usability relate to the notion of testable hypothesis that can lead to better software?

Usability and testability go together naturally. They’re also consistent with good development practice. I’ve worked on, or closely observed, many applications where the design had been fixed and the build had been completed before anyone realised that there were serious usability problems, or that it would be extremely difficult to detect and isolate defects, or that there would be serious performance issues arising from the architectural choices that had been made.

We need to learn from work that’s been done with complexity theory and organisation theory. Developing software is mostly a complex activity, in the sense that there are rarely predictable causes and effects. Good outcomes emerge from trialling possible solutions. These possibilities aren’t just guesswork. They’re based on experience, skill, knowledge of the users. But that initial knowledge can’t tell you the solution, because trying different options changes your understanding of the problem. Indeed it changes the problem. The trials give you more knowledge about what will work. So you have to create further opportunities that will allow you to exploit that knowledge. It’s a delusion that you can get it right first time just by running through a sequential process. It would help if people thought of good software as being grown rather than built.

Why do UX evangelists give up?

People often dismiss Twitter for being a pointless waste of time, but the other day I had a chat that illustrated how useful and interesting Twitter can be if you’re prepared to use it thoughtfully.

Charlotte Lewis tweeted a link to a comment I’d made on Craig Tomlin’s blog. “UX evangelist as step 1 of ux-ing an org, huh…See @James_Christie comment http://www.usefulusability.com/the-5-models-of-corporate-user-experience-culture/”.

Charlotte Lewis's tweet

Craig had argued in his blog that there are five levels of UX maturity.

Level 1 – IT has responsibility for UX.

Level 2 – Operations (ie the department that administers the organisation) has responsibility for UX.

Level 3 – Marketing has responsibility for UX.

Level 4 – UX is independent and equal to IT, Operations and Marketing.

Level 5 – UX is independent and has authority over IT, Operations and Marketing.

I had suggsted in my comment that there is an important lower, earlier stage in the evolution of UX within an organisation. That is when an enthusiastic and well informed individual has identified that the organisation’s neglect of UX is a problem and starts lobbying for changes.

I was curious about Charlotte’s tweet and responded, asking why she’d dug out my comment. She said that a feed she follows had highlighted it, and she felt that it was still valid. We then had a chat about factors allowing a organisation to move from the initial level 0 that I’d identified to level 1.

I thought that the question was framed the wrong way round. If there is an enthusiastic and well-informed advocate of UX within the organisation then progress should follow. I thought it would be more interesting, and useful, to look at the factors that block UX evangelists, hence this blog.

What makes a UX evangelist give up?

I’d never considered the problem in such explicit terms before. I had thought about the reasons that organisations find themselves in what I called the usability death zone in my comment in Craig’s blog. The death zone is where organisations aren’t even aware that there is a problem. Usability and UX simply aren’t on their radar.

There are three groups of reasons for organisations staying in the death zone; the way that software engineering matured, the isolation of usability professionals, and the culture of organisations.

I wrote my Masters dissertation on the reasons why usability hadn’t penetrated software engineering. That really looked at the first two sets of reasons.

My conclusions were that traditional development methods, structured techniques and procurement practices had made it extremely difficult for software engineers to incorporate usability. On the other hand usability professionals had isolated themselves because of their lack of understanding of how commercial software was developed, and their willingness to stand aside from the development mainstream and carry out usability testing when it was too late to shape the application.

The whole dissertation is available on line.

Cultures that suffocate

What I didn’t consider in my dissertation were the cultural factors that block people who’ve identified the problem.

The first group of problems I mentioned are historical features of software engineering. Likewise, the second set of problems arise from the way the UX profession developed. However, the cultural factors are much more widespread, and stop many types of organisations from improving.

The common theme with these cultural factors is that the organisation has become rigid and has stopped learning, learning from its customers, its employees and its rivals.

Large organisations can grow to the point that they become so complex that many employees are totally cut off from the real purpose of the organisation, i.e selling products or providing a service to customers. For these employees the purpose of the organisation has become the smooth running of the organisation.

Large numbers of experienced, capable employees work hard, follow processes, calculate metrics and swap vast numbers of emails, “fyi”. They are subject to annual performance reviews in which targets are set, achievement is measured and rewards flow to those whose performance conforms to those idealised targets.

Compliance with processes and standards becomes the professional and responsible way to approach work. Indeed, professionalism and conformity can become indistinguishable from each other. Improvements are possible, but they are strictly marginal increments. The organisation learns how to carry on working the same way, but slightly more efficiently and effectively.

Sure, in software development CMMI can lead to greater rigour and to steady improvements, but they are unlikely to be radical. As Ben Simo put it when he contributed to an article I wrote on testing standards; “if an organisation is at a level less than the intent of level 5, CMM seems to often lock in ignorance that existed when the process was created”.

Seemingly radical change certainly does occur, but it is usually a case of managers being replaced, organisation structures being put through the blender, or whole departments being outsourced. All this chaos and disruption is predicated on the need to produce a more polished and refined version of the past, not to do anything radically different or better.

Such maturity models effectively discourage people from saying, “hang on, we’re on the wrong track here”. People who do try to speak out and to highlight fundamental flaws in working practices are liable to be seen as rebels and troublemakers, not team players.

Improvement is tied to metrics. The underlying assumption on which this paradigm rests is that measurement and metrics provide an objective insight into reality. The only acceptable insights are objective, and there is a hugely damaging and unchallenged false assumption that objectivity requires numbers. If there is a problem it will be detected in the numbers. Unfortunately these numbers are the product of the existing worldview and simply affirm that worldview.

Potential usability evangelists can see the problem, and understand its cause, but working within such a constrained and blinkered environment they might as well be trying to make their colleagues understand an alternative universe in which the laws of physics don’t apply.

The metrics were designed only to illuminate the problems that were imagined. Those problems that were never conceived lack the numerical evidence that justifies reform. Without such evidence the concerns of individuals are dismissed as subjective, as a matter of opinion rather than objective fact.

As I said, such a culture inhibits all sorts of reform, but I believe that it would be particularly dispiriting for budding usability evangelists. Faced with rejection and crushing disapproval they would either learn to conform or they would leave for a more enlightened employer.

Is the picture I’ve painted accurate? I’ve no figures or hard evidence, but my argument ties in with my own experience of how a company’s culture can suffocate innovation. Certainly, too much focus on measurement can be damaging. That is well established. See Robert Austin’s book, “Measuring and Managing Performance in Organizations”. Can anyone shed light on this? What is it that stops organisations moving out of the usability death zone? What makes the UX evangelists give up? I’d love to hear what people think about this problem.

Prototyping – making informed decisions at the right time

The day after I wrote my last blog about taking decisions I went to a talk arranged by the Scottish Usability Professionals Association, “An Introduction to Prototyping”.

The speaker was Neil Allison of the University of Edinburgh’s Website Development Project.

There’s no need for me to discuss Neil’s talk at length because he’s posted it here.

Neil used a phrase that leapt out at me. Prototyping ”helps us take informed decisions at the right time”.

That was exactly what I was thinking about when I wrote in my last blog about taking decisions at the right time, the last responsible moment in Lean terms.

Neil’s phrase summed up the appeal of prototyping perfectly for me. We can take informed decisions at the right time; not take poor, arbitrary or ill-informed decisions when we’ve simply run out of options and have to take whatever default option we’re left with.

In a later email Neil made the point that I keep on trying to make; that usability testing should shape the design, rather than evaluate it.

“When I do usability testing, it’s to check the requirements and the ease of use of a preferred solution. Usually before development begins in earnest as it’s too time consuming and/or costly to backpedal later. (Almost) pointless doing usability testing if you don’t have the resources to take action on the findings”.

Prototyping may have clear benefits but Neil is still trying to raise awareness within the university, spreading the word and trying to widen the use of the technique.

Testers should never see their role as being defined by a rigid process. They should always be looking for better ways to do their job, and be prepared to lobby and educate others.

Please have a look at Neil’s presentation, which contains advice on useful books and tools.

Usability and external suppliers – part 2

In my last post I talked about how the use of external suppliers can cause problems with the usability of applications. I promised to return to the topic and explain what the solutions are. I’m not sure I’d have bothered if I hadn’t stated that I would do it. The solutions are pretty basic and should be well known.

However, I’ve got to admit that there’s considerable reluctance to acknowledge the problems are real, and that the the solutions are relevant and practical. They won’t guarantee success, but since the traditional alternative load the odds heavily against success they’ve got to be an improvement.

The predictability paradox

There are two uncomfortable truths you have to acknowledge; the general problem and the specific one. In general, you can’t specify requirements accurately and completely up front, and any attempt to do so is doomed. The specific problem is that usability requirements are particularly difficult to specify.

Managers crave certainty and predictability. That’s understandable. What’s dangerous is the delusion that they exist when the truth is that we cannot be certain at the start of a development and we cannot predict the future with the level of confidence that we’d like.

It’s hard to stand up and say “we don’t know – and at this stage we couldn’t responsibly say that we do know”, even though that might be the truthful, honest and responsible thing to say.

It’s always been hugely tempting to pretend that you can set aside a certain amount of time during which you will get the requirements right in a single pass. It makes procurement, planning and management simpler.

Sadly any attempt to do so is like trying to build a house on shifting sand. The paradox is that striving for predictability pushes back the point when you truly can find out what is required and possible.

Mary Poppendieck made the point very well a few years back in an excellent article “Lean Development and the Predictability Paradox” (PDF, opens in new window).

I tried to argue much the same point in an article in 2008, “The Dangerous and Seductive V Model”. an article that has proved scarily popular on my website for the last couple of years. I guess many people still need to learn these lessons.

Unrighteous contracts?

Trying to create predictability by tying suppliers into tight contracts is a result of the delusion that predictability is available at an early stage. This can have perverse results.

Tight and tough contracts committing suppliers right at the start of a project is an understandable attempt to contain risk, or transfer it to the supplier.

Sadly, clients never truly transfer risk to the supplier if the risk is better handled internally. If the requirements are complex and poorly understood at the start then attempting to transfer the risk to the supplier is misguided.

Awarding a contract for the full development creates the danger that it will be awarded to the most optimistic, inexperienced or irresponsible supplier. Those suppliers with a shrewd understanding of what is really likely to be involved will tender cautiously and be undercut by the clueless, reckless or unprincipled.

When the wheels come off the project the client may be able to hold the supplier to the contract, but the cost will be high. Changes will be fought over, and the supplier will try to minimise the work – regardless of the impact on quality.

There seem to be three options about how to handle the contractual relationship with an external supplier if you want a suitable focus on quality, and especially usability.

The orthodox approach is to split the contract in two. The first contract would be to establish, at a high level, what is required, and ideally to deliver an early prototype. The second would be to build the full application.

This approach was recommended way back in 1987 by the US Defense Science Board Task Force On Military Software (PDF – opens in new window), and its motives were exactly the same as I’ve outlined.

The Agile approach would probably be to write contracts that have a budget for time and cost, and allow the scope to vary, possibly giving the client the right to cancel at set points, possibly after every iteration.

Martin Fowler summarises that approach well in this article, “The New Methodology”.

The third, more radical, approach was mooted by Mary Poppendieck in her article “Righteous Contracts”.

These “righteous contracts” would be written after the development, not at the start. I’m not convinced. It doesn’t seem all that different from simply dispensing with a contract and relying on trust, but I urge you to read her piece. She analyses the problems well and has some challenging ideas. I’d like to see them work, but they do seem rather idealistic.

Righteous contracts would work only if the client and supplier trust each other and are committed to a long term relationship. However, I’ve got to concede that without that trust and commitment the client is going to have problems with their suppliers regardless of the contractual style. You can’t create a productive partnership with a contract, but you can destroy it if the contract is poorly written and conceived.

Enough of the contracts! What about UX?

This piece was really prompted by my concerns about the effects of the contractual relationship between clients and suppliers, so not surprisingly I’ve been focussing on contracts.

However, I must admit I’ve found that rather dispiriting. So in order to feel a bit more positive I’m going to provide some pointers about how to try and handle usability better.

I discussed this in an article in 2009. It’s all about Agile and UX, but that’s where the interesting work is being done, and honestly, I’ve nothing positive to say about how traditional, linear techniques have handled usability. It might be possible to smuggle in techniques like Nielsen’s Discount Usability Engineering, or more ambitiously, Constantine & Lockwood’s Collaborative Usability Inspections. However, remember we’re talking about external suppliers and the contractual relationship. These techniques require iteration, and if the contract doesn’t make explicit allowance for iterative development then the odds are stacked hopelessly against you.

So Agile really seems to be the best hope for taking account of usability when dealing with an external supplier. Rather embarrassingly I came across this excellent article by Jeff Paton only a few days ago. It dates from 2008 and I really wish I’d found it earlier. It lists 12 “best practice” patterns that Paton has seen people using to incorporate UX into agile development. There’s some great stuff there.

A somewhat downbeat conclusion

Perhaps I shouldn’t have talked about “solutions” to the usability problems caused by using external suppliers. Maybe I should have used a phrase like “a better way”. I believe that flexible, staged contracts are vital. They are necessary, but they’re not sufficient. A commitment to usability is required regardless of the approach that is taken. It’s just that the traditional fixed price contract for a linear development torpedoes any hope of getting usability right. The damage is far more widespread than that, but usability is the issue that gets me particularly passionate.

I doubt if I’ve set off a light bulb in anyone’s head and they’re now thinking, “that’s what we’ve been doing wrong and I know what to do now”. However, I hope I’ve pointed some people in a useful direction to help them read, think and understand a bit more about what we do wrong.

I don’t know all the answers. Testers never do, and the big decisions that will affect usability and quality are a long way out of our remit. But we can, and should, be aware of the pitfalls of following particular approaches. We’ve a duty to speak out and tell management what can go wrong, and that just might tilt the odds back in our favour.