This is the second post in a series about problems that fascinate me, that I think are important and interesting. The series draws on important work from the fields of safety critical systems and from the study of complexity, specifically complex socio-technical systems. This was the theme of my keynote at EuroSTAR in The Hague (November 12th-15th 2018).
The first post was a reflection, based on personal experience, on the corporate preference for building bureaucracy rather than dealing with complex reality, “The dragons of the unknown; part 1 – corporate bureaucracies”. This post is about the nature of complex systems and discusses some features that have significant implications for testing. We have been slow to recognise the implications of these features.
Complex systems are probabilistic (stochastic) not deterministic
A deterministic system will always produce the same output, starting from a given initial state and receiving the same input. Probabilistic, or stochastic, systems are inherently unpredictable and therefore non-deterministic. Stochastic is defined by the Oxford English Dictionary as “having a random probability distribution or pattern that may be analysed statistically but may not be predicted precisely.”
Traditionally, non-determinism meant a system was badly designed, inherently buggy, and untestable. Testers needed deterministic systems to do their job. It was therefore the job of designers to produce systems that were deterministic, and testers would demonstrate whether or not the systems met that benchmark. Any non-determinism meant a bug had to be removed.
Is that right or nonsense? Well, neither, or rather it depends on the context you choose. It depends what you choose to look at. You can restrict yourself to a context where determinism holds true, or you can expand your horizons. The traditional approach to determinism is correct, but only within carefully defined limits.
You can argue, quite correctly, that a computer program cannot have the properties of a true complex system. A program does what it’s coded to do: outputs can always be predicted from the inputs, provided you’re clever enough and you have enough time. For a single, simple program that is certainly true. A fearsomely complicated program might not be meaningfully deterministic, but we can respond constructively to that with careful design, and sensitivity to the needs of testing and maintenance. However, if we draw the context wider than individual programs the weaker becomes our confidence that we can know what should happen.
Once you’re looking at complex socio-technical systems, i.e. systems where people interact with complex technology, then any reasonable confidence that we can predict outcomes accurately has evaporated. These are the reasons.
Even if the system is theoretically still deterministic we don’t have brains the size of a planet, so for practical purposes the system becomes non-deterministic.
The safety critical systems community likes to talk about tractable and intractable systems. They know that the complex socio-technical systems they work with are intractable, which means that they can’t even describe with confidence how they are supposed to work (a problem I will return to). Does that rule out the possibility of offering a meaningful opinion about whether they are working as intended?
That has huge implications for testing artificial intelligence, autonomous vehicles and other complex technologies. Of course testers will have to offer the best information they can, but they shouldn’t pretend they can say these systems are working “as intended” because the danger is that we are assuming some artificial and unrealistic definition of “as intended” that will fit the designers’ limited understanding of what the system will do. I will be returning to that. We don’t know what complex systems will do.
In a deeply complicated system things will change that we are unaware of. There will always be factors we don’t know about, or whose impact we can’t know about. Y2K changed the way I thought about systems. Experience had made us extremely humble and modest about what we knew, but there was a huge amount of stuff we didn’t even know we didn’t know. At the end of the lengthy, meticulous job of fixing and testing we thought we’d allowed for everything, in the high risk, date sensitive areas at least. We were amazed how many fresh problems we found when we got hold of a dedicated mainframe LPAR, effectively our own mainframe, and booted it up with future dates.
We discovered that there were vital elements (operating system utilities, old vendor tools etc) lurking in the underlying infrastructure that didn’t look like they could cause a problem but which interacted with application code in ways we could not have predicted when run with Y2K dates. The systems had run satisfactorily in test enviroments that were built to mirror production, but they crashed when they ran on a mainframe with the future dates. We were experts, but we hadn’t known what we didn’t know.
The behaviour of these vastly complicated systems was indistinguishable from complex, unpredictable systems. When a test passes with such a system there are strict limits to what we should say with confidence about the system.
So, even if you look at the system from a narrow technical perspective, the computerised system only, the argument that a good system has to be deterministic is weak. We’ve traditionally tested systems as if they were calculators, which should always produce the same answers from the same sequence of button presses. That is a limited perspective. When you factor in humans then the ideal of determinism disintegrates.
In any case there are severe limits to what we can say about the whole system from our testing of the components. A complex system behaves differently from the aggregation of its components. It is more than the sum. That brings us to an important feature of complex systems. They are emergent. I’ll discuss this in the next section.
My point here is that the system that matters is the wider system. In the case of safety critical systems, the whole, wider system decides whether people live or die.
Instead of thinking of systems as being deterministic, we have to accept that complex socio-technical systems are stochastic. Any conclusions we reach should reflect probability rather than certainty. We cannot know what will happen, just what is likely. We have to learn about the factors that are likely to tip the balance towards good outcomes, and those that are more likely to give us bad outcomes.
I can’t stress strongly enough that lack of determinism in socio-technical systems is not a flaw, it’s an intrinsic part of the systems. We must accept that and work with it. I must also stress that I am not dismissing the idea of determinism or of trying to learn as much as possible about the behaviour of individual programs and components. If we lose sight of what is happening within these it becomes even more confusing when we try to look at a bigger picture. Likewise, I am certainly not arguing against Test Driven Development, which is a valuable approach for coding. Cling to determinism whenever you can, but accept its limits – and abandon all hope that it will be available when you have to learn about the behaviour of complex socio-technical systems.
We have to deal with whole systems as well as components, and that brings me to the next point. It’s no good thinking about breaking the system down into its components and assuming we can learn all we need to by looking at them individually. Complex systems have emergent behaviour.
Complex systems are emergent; the whole is greater than the sum of the parts
It doesn’t make sense to talk of an H2O molecule being wet. Wetness is what you get from a whole load of them. The behaviour or the nature of the components in isolation doesn’t tell you about the behaviour or nature of the whole. However, the whole is entirely consistent with the elements. The H2O molecules are governed by the laws of the periodic table and that remains so regardless of whether they are combined. But once they are combined they become water, which is unquestionably wet and is governed by the laws of fluid dynamics. If you look at the behaviour of free surface water in the oceans under the influence of wind then you are dealing with a stochastic process. Individual waves are unpredictable, but reasonable predictions can be made about the behaviour of a long series of waves.
As you draw back and look at the wider picture, rather than the low level components you see that the components are combining in ways that couldn’t possibly have been predicted simply by looking at the components and trying to extrapolate.
Starlings offer another good illustration of emergence. These birds combine in huge flocks to form murmurations, amazing, constantly evolving aerial patterns that look as if a single brain is in control. The individual birds are aware of only seven others, rather than the whole murmuration. They concentrate on those neighbours and respond to their movements. Their behaviour isn’t any different from what they can do on their own. However well you understood the invidual starling and its behaviour you could not possibly predict what these birds do together.
Likewise with computer systems, even if all of the components are well understood and working as intended the behaviour of the whole is different from what you’d expect from simply looking at these components. This applies especially when humans are in the loop. Not only is the whole different from the sum of the parts, the whole system will evolve and adapt unpredictably as people find out what they have to do to make the system work, as they patch it and cover for problems and as they try to make it work better. This is more than a matter of changing code to enhance the system. It is about how people work with the system.
Safety is an emergent property of complex systems. The safety critical experts know that they cannot offer a meaningful opinion just by looking at the individual components. They have to look at how the whole system works.
In complex systems success & failure are not absolutes
Success & failure are not absolutes. A system might be flawed, even broken, but still valuable to someone. There is no right, simple answer to the question “Is it working? Are the figures correct?”
Appropriate answers might be “I don’t know. It depends. What do you mean by ‘working’? What is ‘correct’? Who is it supposed to be working for?”
The insurance finance systems I used to work on were notoriously difficult to understand and manipulate. 100% accuracy was never a serious, practicable goal. As I wrote in “Fix on failure – a failure to understand failure”;
“With complex financial applications an honest and constructive answer to the question ‘is the application correct?’ would be some variant on ‘what do you mean by correct?’, or ‘I don’t know. It depends’. It might be possible to say the application is definitely not correct if it is producing obvious garbage. But the real difficulty is distinguishing between the seriously inaccurate, but plausible, and the acceptably inaccurate that is good enough to be useful. Discussion of accuracy requires understanding of critical assumptions, acceptable margins of error, confidence levels, the nature and availability of oracles, and the business context of the application.”
I once had to lead a project to deliver a new sub-system that would be integrated into the main financial decision support system. There were two parallel projects, each tackling a different line of insurance. I would then be responsible for integrating the new sub-systems to the overall system, a big job in itself.
The other project manager wanted to do his job perfectly. I wanted to do whatever was necessary to build an acceptable system in time. I succeeded. The other guy delivered late and missed the implementation window. I had to carry on with the integration without his beautiful baby.
By the time the next window came around there were no developers available to make the changes needed to bring it all up to date. The same happened next time, and then the next time, and then… and eventually it was scrapped without ever going live.
If you compared the two sub-systems in isolation there was no question that the other man’s was far better than the one I lashed together. Mine was flawed but gave the business what they needed, when they needed it. The other was slightly more accurate but far more elegant, logical, efficient and lovingly crafted. And it was utterly useless. The whole decision support system was composed of sub-systems like mine, flawed, full of minor errors, needing constant nursing, but extremely valuable to the business. If we had chased perfection we would never have been able to deliver anything useful. Even if we had ever achieved perfection it would have been fleeting as the shifting sands of the operational systems that fed it introduced new problems.
The difficult lesson we had to learn was that flaws might have been uncomfortable but they were an inescapable feature of these systems. If they were to be available when the business needed them they had to run with all these minor flaws.
Richard Cook expanded on this point in his classic, and highly influential article from 1998 “How complex systems fail”. He put it succinctly.
“Complex systems run in degraded mode.”
Cook’s arguments ring true to those who have worked with complex systems, but it hasn’t been widely appreciated in the circles of senior management where budgets, plans and priorities are set.
Complex systems are impossible to specify precisely
Cook’s 1998 paper is important, and I strongly recommend it, but it wasn’t quite ground breaking. John Gall wrote a slightly whimsical and comical book that elaborated on the same themes back in 1975. “Systemantics; how systems work and especially how they fail”. Despite the jokey tone he made serious arguments about the nature of complex systems and the way that organisations deal, and fail to deal, with them. Here is a selection of his observations.
“Large systems usually operate in failure mode”.
“The behaviour of complex systems… living or non-living, is unpredictable”.
“People in systems do not do what the system says they are doing”.
“Failure to function as expected is an intrinsic feature of systems”
John Gall wrote that fascinating and hugely entertaining back more than forty years ago. He nailed it when he discussed the problems we’d face with complex socio-technical systems. How can we say the system is working properly if we neither know how it is working, or even how it is supposed to work? Or what the people are doing within the system?
The complex systems we have to deal with are usually socio-technical systems. They operate in a social setting, with humans. People make the systems work and they have to make decisions under pressure in order to keep the system running. Different people will do different things. Even the same person might act differently at different times. That makes the outcomes from such a system inherently unpredictable. How can we specify such a system? What does it even mean to talk of specifying an unpredictable system?
That’s something that the safety critical experts focus on. People die because software can trip up humans even when it is working smoothly as designed. This has received a lot of attention in medical circles. I’ll come back to that in a later post.
That is the reality of complex socio-technical systems. These systems are impossible to specify with complete accuracy or confidence, and certainly not at the start of any development. Again, this is not a bug, but an inescapable feature of complex socio-technical systems. Any failure may well be in our expectations, a flaw in our assumptions and knowledge, and not necessarily the system.
This reflected my experience with the insurance finance systems, especially for Y2K, and it was also something I had to think seriously about when I was an IT auditor. I will turn to that in my next post, “part 3 – I don’t know what’s going on”.