My EuroSTAR 2013 tutorial in Gothenburg was titled “Questioning auditors questioning testing”. Not surprisingly a recurring theme of the tutorial was risk. Do we really understand risk? How do we deal with it? These are important questions for both testers and auditors.
I argued that both auditors and testers, in their different ways, have struggled to deal with risk. The failure of auditors contributed to the financial crash of 2007/8. The problems within the testing profession may have been less conspicuous, but they have had a significant impact on our ability to do an effective job.
One of the issues I discussed was our tendency to perform naïve and mechanical risk assessments. I’m sure you’ve seen risk matrices like this one from the Health and Safety Executive, the UK Government inspectorate responsible for regulating workplace safety.
There are two fundamental problems with such a matrix that should make testers wary of using it.
Firstly, it implies that cells in the matrix with equal scores reflect equally acceptable positions. Is that really the case? Is a trivial chance of a catastrophic outcome genuinely as acceptable as the near certain chance of a trivially damaging outcome? The HSE deals with the sort of risks that lead national news bulletins when they come to pass; they regulate the nuclear industry, chemical plants and North Sea oil rigs.
I suspect the HSE takes a rather more nuanced approach to risks than is implied by the scoring in the matrix.
The second basic problem with these risk matrices is that we often lack the evidence to assign credible estimates of probability and impact to the risks.
This problem applies in particular to probabilities. Is there a reasonable basis for the figures we’ve assigned to the probabilities? Are they guesses? Are we performing precisely engineered and sophisticated calculations that are ultimately based on arbitrary or unsound assumptions? It makes a huge difference to the outcomes, but we can be vague to the point of cluelessness about the basis for these calculations.
Such a matrix may be relevant for risks where both the probability and the likely impact of something going wrong are well understood. That is often not the case during the early stages of a software development when the testing is being planned.
What’s the point of putting a number on the probability?
Whilst I was preparing my tutorial I came across an interesting case that illustrated the limitations of assigning probabilities when we’ve no experience or historic data.
I was reading about the development of the atomic bomb during the Second World War. Before the first bomb was tested the scientists were concerned about the possibility that a nuclear explosion might set the atmosphere on fire and wipe out life on earth. Enrico Fermi, the brilliant Italian nuclear physicist who worked on the development of the atomic bomb, estimated the probability of such a catastrophe at 10%.
I was astonished. How could anyone have taken the decision to explode an atomic bomb after receiving such scientific advice? My curiousity was aroused and I did some background reading on the episode. I learned that Fermi had also been asked in 1939 for his estimate of the probability that nuclear fission could be controlled for power or weapons. His estimate was 10%.
Then, in a separate article, I discovered that in 1950 he had estimated the probability that humans would have developed the technology to travel faster than light by 1960. You’ve guessed it. The answer was 10%.
Apparently Fermi had the reputation for being a sound estimator, when (and this is a crucial qualification) he had the information to support a reasonable estimate. Without such information he was clearly liable to take a guess. If something might happen, but he thought it unlikely, then he assigned a probability of 10%.
I think most of us do no better than Fermi. Indeed, the great majority are probably far worse. Are we really any more capable than Enrico Fermi of assigning probabilities to a naïve risk matrix that would allow simple, mechanical calculations of relative outcomes?
I strongly suspect that if Enrico Fermi had thought anyone would take his estimates and slot them into a simplistic risk formula to guide decision making then he’d have objected. Yet many of us see nothing wrong with such a simplistic approach to risk. I wonder if that’s simply because our risk assessments are little more than a tickbox exercise, a task that has to be knocked off to show we are following “the process”.
The incertitude matrix – uncertainty, ambiguity and ignorance
The risk matrix clearly assumes greater knowledge of probabilities and outcomes than we usually have. A more useful depiction of the true situation is provided by O’Riordan and Cox’s incertitude matrix.
In this representation the conventional risk matrix occupies only the top left hand corner. We are in a position to talk about risk only when we have well defined outcomes and a realistic basis for assessing the probabilities.
If we understand the outcomes, but not the probabilities then we are in a state of uncertainty. If we understand the probabilities of events, but not the outcomes then we are dealing with ambiguity.
Ambiguity is an interesting situation. It seems more relevant to scientific problems than software development. My wife works in the field of climate change adaptation for a Scottish Government agency. She recognises ambiguity in her line of work, where the probability of initial events might be reasonably well understood, but it isn’t possible to define the outcomes. Feedback mechanisms, or an unknown tipping point, might turn a benign outcome into a catastrophic one in ways we can’t predict with confidence.
One area where ambiguity could exist in software development is in the way that social media can create entirely unpredictable outcomes. An error that might have had little impact 10 years ago could now spiral into something far more serious if it catches people’s attention and goes viral.
Nevertheless, uncertainty, rather than ambiguity, is probably the quadrant where testers and developers are more likely to find themselves. Here, we can identify outcomes with confidence, but not assign meaningful probabilities to them.
However, uncertainty is unlikely to be a starting point. To get there we have to know what part of the product or application could fail, how it might fail and what the effect would be. We might sometimes know that at the start, if this is a variant on a well understood product, but often we have to learn it all.
The usual starting point, our default position, should be one of ignorance. We don’t know what can go wrong and what the impact might be, and we almost certainly don’t know anything with confidence about the probabilities.
Ignorant and proud!
Sadly, in business as well as software development, an honest admission of ignorance is seen as weakness. The pretence that we know more than we do is welcomed and admired rather than being derided as dangerous bluster. Such misplaced confidence leads to disappointment, stress, frustration, misdirected effort, and actually makes it harder to learn about products and applications. We deceive ourselves that we know things that we don’t, and stop digging to find out the true situation.
So please, speak up for ignorance! Only if we admit what we truly don’t know can we hope to learn the lessons that will give our stakeholders the insights that they need. Surely we need to look for and understand the most damaging failures before we start thinking of assigning probabilities that might guide the rest of our testing. Don’t assume that knowledge we’ve not gained can ever be a valid starting point for calculations of risk. Knowledge has to be earned, not assumed.