The dragons of the unknown; part 9 – learning to live with the unknowable

Introduction

This is the ninth and final post in a series about problems that fascinate me, that I think are important and interesting. The series draws on important work from the fields of safety critical systems and from the study of complexity, specifically complex socio-technical systems. This was the theme of my keynote at EuroSTAR in The Hague (November 12th-15th 2018).

The first post was a reflection, based on personal experience, on the corporate preference for building bureaucracy rather than dealing with complex reality, “facing the dragons part 1 – corporate bureaucracies”. Part 2 was about the nature of complex systems. The third followed on from part 2, and talked about the impossibility of knowing exactly how complex socio-technical systems will behave with the result that it is impossible to specify them precisely, “I don’t know what’s going on”.

Part 4 “a brief history of accident models”, looked at accident models, i.e. the way that safety experts mentally frame accidents when they try to work out what caused them.

The fifth post, “accident investigations and treating people fairly”, looked at weaknesses in the way that we have traditionally investigated accidents and failures, assuming neat linearity with clear cause and effect. In particular, our use of root cause analysis, and willingness to blame people for accidents is hard to justify.

Part six “Safety II, a new way of looking at safety” looks at the response of the safety critical community to such problems and the necessary trade offs that a practical response requires. The result, Safety II, is intriguing and has important lessons for software testers.

The seventh post “Resilience requires people” is about the importance of system resilience and the vital role that people play in keeping systems going.

The eighth post “How we look at complex systems” is about the way we choose to look at complex systems, the mental models that we build to try and understand them, and the relevance of Devops.

This final post will try to draw all these strands together and present some thoughts about the future of testing as we are increasingly confronted with complex systems that are beyond our ability to comprehend.

Computing will become more complex

Even if we choose to focus on the simpler problems, rather than help users understand complexity, the reality is that computing is only going to get more complex. The problems that users of complex socio-technical systems have to grapple with will inevitably get more difficult and more intractable. The choice is whether we want to remain relevant, but uncomfortable, or go for comfortable bullshit that we feel we can handle. Remember Zadeh’s Law of Incompatibility (see part 7 – resilience requires people). “As complexity increases, precise statements lose their meaning, and meaningful statements lose precision”. Quantum computing, artificial intelligence and adaptive algorithms are just three of the areas of increasing importance whose inherent complexity will make it impossible for testers to offer opinions that are both precise and meaningful.

Quantum computing, in particular, is fascinating. By its very nature it is probabilistic, not deterministic. The idea that well designed and written programs should always produce the same output from the same data is relevant only to digital computers (and even then the maxim has to be heavily qualified in practice); it never holds true at any level for quantum computers. I wrote about this in “Quantum computing; a whole new field of bewilderment”.

The final quote from that article, “perplexity is the beginning of knowledge”, applies not only to quantum computing but also to artificial intelligence and the fiendish complexity of algorithms processing big data. One of the features of quantum computing is the way that changing a single qubit, the equivalent of digital bytes, will trigger changes in other qubits. This is entanglement, but the same word is now being used to describe the incomprehensible complexity of modern digital systems. Various writers have talked about this being the Age of Entanglement, eg Samuel Arbesman, in his book “Overcomplicated: Technology at the Limits of Comprehension)”, Emmet Connolly, in an article “Design in the Age of Entanglement” and Danny Hillis, in an article “The Enlightenment is Dead, Long Live the Entanglement”.

The purist in me disapproves of recycling a precise term from quantum science to describe loosely a phenomenon in digital computing. However, it does serve a useful role. It is a harsh and necessary reminder and warning that modern systems have developed beyond our ability to understand them. They are no more comprehensible than quantum systems, and as Richard Feynman is popularly, though possibly apocryphally, supposed to have said; “If you think you understand quantum physics, you don’t understand quantum physics.”

So the choice for testers will increasingly be to decide how we respond to Zadeh’s Law. Do we offer answers that are clear, accurate, precise and largely useless to the people who lose sleep at night worrying about risks? Or do we say “I don’t know for sure, and I can’t know, but this is what I’ve learned about the dangers lurking in the unknown, and what I’ve learned about how people will try to stay clear of these dangers, and how we can help them”?

If we go for the easy options and restrict our role to problems which allow definite answers then we will become irrelevant. We might survive as process drones, holders of a “bullshit job” that fits neatly into the corporate bureaucracy but offers little of value. That will be tempting in the short to medium term. Large organisations often value protocol and compliance more highly than they value technical expertise. That’s a tough problem to deal with. We have to confront that and communicate why that worldview isn’t just dated, it’s wrong. It’s not merely a matter of fashion.

If we’re not offering anything of real value then there are two possible dangers. We will be replaced by people prepared to do poor work cheaper; if you’re doing nothing useful then there is always someone who can undercut you. Or we will be increasingly replaced by automation because we have opted to stay rooted in the territory where machines can be more effective, or at least efficient.

If we fail to deal with complexity the danger is that mainstream testing will be restricted to “easy” jobs – the dull, boring jobs. When I moved into internal audit I learned to appreciate the significance of all the important systems being inter-related. It was where systems interfaced, and when people were involved that they got interesting. The finance systems with which I worked may have been almost entirely batch based, but they performed a valuable role for the people with whom we were constantly discussing the behaviour of these systems. Anything standalone was neither important nor particularly interesting. Anything that didn’t leave smart people scratching their heads and worrying was likely to be boring. Inter-connectedness and complexity will only increase and however difficult testing becomes it won’t be boring – so long as we are doing a useful job.

If we want to work with the important, interesting systems then we have to deal with complexity and the sort of problems the safety people have been wrestling with. There will always be a need for people to learn and inform others about complex systems. The American economist Tyler Cowen in his book “Average is Over” states the challenge clearly. We will need interpreters of complex systems.

“They will become clearing houses for and evaluators of the work of others… They will hone their skills of seeking out, absorbing and evaluating information… They will be translators of the truths coming out of our network of machines… At least for a while they will be the only people who will have a clear notion of what is going on.”

I’m not comfortable with the idea of truths coming out of machines, and we should resist the idea that we can ever be entirely clear about what is going on. But the need for experts who can interpret complex systems is clear. Society will look for them. Testers should aspire to be among those valuable specialists. conductorThe complexity of these systems will be beyond the ability of any one person to comprehend, but perhaps these interpreters, in addition to deploying their own skills, will be able to act like a conductor of an orchestra, to return to the analogy I used in part seven (Resilience requires people). Conductors are talented musicians in their own right, but they call on the expertise of different specialists, blending their contribution to produce something of value to the audience. Instead of a piece of music the interpreter tester would produce a story that sheds light on the system, guiding the people who need to know.

Testers in the future will have to be confident and assertive when they try to educate others about complexity, the inexplicable and the unknowable. Too often in corporate life a lack of certainty has been seen as a weakness. We have to stand our ground and insist on our right to be heard and taken seriously when we argue that certainty cannot be available if we want to talk about the risks and problems that matter. My training and background meant I couldn’t keep quiet when I saw problems, that were being ignored because no-one knew how to deal with them. As Better Software said about me, I’m never afraid to voice my opinion.better software says I am never afraid to voice my opinion

Never be afraid to speak out, to explain why your experience and expertise make your opinions valuable, however uncomfortable these may be for others. That’s what you’re paid for, not to provide comforting answers. The metaphor of facing the dragons of the unknown is extremely important. People will have to face these dragons. Testers have a responsibility to try and shed as much light as possible on those dragons lurking in the darkness beyond what we can see and understand easily. If we concentrate only on what we can know and say with certainty it means we walk away from offering valuable, heavily qualified advice about the risks, threats & opportunities that matter to people. Our job should entail trying to help and protect people. As Jerry Weinberg said in “Secrets of Consulting”;

“No matter what they tell you, it’s always a people problem.”

The dragons of the unknown; part 1 – corporate bureaucracies

Introduction

This is the first post in a series about problems that fascinate me, that I think are important and interesting. The series will draw on important work from the fields of safety critical systems and from the study of complexity, specifically complex socio-technical systems. I’m afraid I will probably dwell longer on problems than answers. One of the historical problems with software development and testing has been an eagerness to look for and accept easy, but wrong answers. We have been reluctant to face up to reality when we are dealing with complexity, which doesn’t offer simple or clear answers.

This was the theme of my keynote at EuroSTAR in The Hague (November 12th-15th 2018).

Complexity is intimidating and it’s tempting to pretend the world is simpler than it is. We’ve been too keen to try & reshape reality so that it will look like something we can manage neatly. That mindset often dovetails neatly with the pressures of corporate life and it is possible to go far in large organisations while denying and evading reality. It is, however, bullshit.

A bit about my background

When I left university I went to work for one of the big, international accountancy firms as a trainee chartered accountant. It was a bewildering experience. I felt clueless. I didn’t understand what was going on. I never did feel comfortable that I understood what we were doing. It wasn’t that I was dimmer than my colleagues. I was the only one who seemed to question what was going on and I felt confused. Everyone else took it all at face value but the work we were doing seemed to provide no value to anyone.

Alice managing flamingosAt best we were running through a set of rituals to earn a fee that paid our salaries. The client got a clean, signed off set of accounts, but I struggled to see what value the information we produced might have for anyone. None of the methods we used seemed designed to tell us anything useful about what our clients were doing. It all felt like a charade. I was being told to do things and I just couldn’t see how anything made sense. I may as well have been trying to manage that flamingo, from Alice in Wonderland. That picture may seem a strange one to use, but it appeals to me; it sums up my confusion well. What on earth was the point of all these processes? They might as well have been flamingos for all the use they seemed. I hadn’t a clue.

I moved to a life assurance company, managing the foreign currency bank accounts. That entailed shuffling tens of millions of dollars around the world every day to maximise the overnight interest we earned. The job had a highly unattractive combination of stress and boredom. A simple, single mistake in my projections of the cash flowing through the accounts on one day would cost far more than my annual salary. The projections weren’t an arithmetical exercise. They required judgment and trade offs of various factors. Getting it right produced a sense of relief rather than satisfaction.

The most interesting part of the job was using the computer systems to remotely manage the New York bank accounts (which was seriously cutting edge for the early 1980s) and discussing with the IT people how they worked. So I decided on a career switch into IT, a decision I never regretted, and arrived in the IT division of a different, larger, insurance company. I loved it. I think my business background got me more interesting roles in development, involving lots of analysis and design as well as coding.

After a few years I had the chance to move into computer audit, part of the group audit department. It was a marvellous learning experience, seeing how IT fitted into the wider business, and seeing all the problems of IT from a business perspective. That transformed my outlook and helped me navigate my way round corporate bureaucracies, but once I learned to see problems and irresponsible bullshit I couldn’t keep my mouth shut. I didn’t want to, and that’s because of my background, my upbringing, and training. I acquired the reputation for being an iconoclast, an awkward bastard. I couldn’t stand bullshit.

The rise of bullshit jobs

The anthropologist David Graeber developed the idea of bullshit jobs. He argued that many modern jobs in big organisations are ultimately pointless. Here I offer a partial explanation, based on my experience.

My ancestors had real jobs, tough, physical jobs as farmhands, stonemasons and domestic servants till the 20th century when they managed to work their way up into better occupations, like shopkeeping, teaching, sales, interspersed with spells in the military during the two world wars. These were still real jobs, where it was obvious if you didn’t do anything worthwhile, if you weren’t achieving anything.

I had a very orthodox Scottish Presbyterian upbringing. We were taught to revere books and education. We should work hard, learn, stand our ground when we know we are right and always argue our case. We should always respect those who have earned respect, regardless of where they are in society.Scotty from Star Trek

In the original Star Trek Scotty’s accent may have been dodgy, but that character was authentic. It was my father’s generation. As a Star Trek profile puts it; “rank means nothing to Scotty if you’re telling him how to do his job”.

A few years ago Better Software magazine introduced an article I wrote by saying I was never afraid to voice my opinion. I was rather embarrassed when I saw that. Am I really opinionated and argumentative? Well, probably (definitely, says my wife). When I think that I’m right I find it hard to shut up. Nobody does righteous certainty better than Scottish Presbyterians! In that, at least, we are world class, but I have to admit, it’s not always an attractive quality (and the addictive yearning for certainty becomes very dangerous when you are dealing with complex systems). However, that ingrained attitude, along with my experience in audit, did prepare me well to analyse and challenge dysfunctional corporate practices, to challenge bullshit and there has never been any shortage of that.

Why did corporations embrace harmful practices? A major factor was that they had become too big, complex and confusing for anyone to understand what was going on, never mind exercise effective control. The complexity of the corporation itself is difficult enough to cope with, but the problems it faces and the environment it operates in have also become more complex.

I’m not advocating some radical Year Zero destruction of corporate bureaucracy. Large organisations are so baffling and difficult to manage that without some form of bureaucracy nothing would happen. All would be confusion and chaos. But it is difficult to keep the bureaucracy under control and in proportion to the organisation’s real needs and purpose. There is an almost irrestible tendency for the bureaucracy to become the master rather than the servant.

Long and painful, if educational, experience has allowed me to distill the lessons I’ve learned into seven simple statements.

  • Modern corporations, the environment they’re operating in and the problems they face are too complex for anyone to control or understand.
  • Corporations have been taken over by managers and run for their own benefit, rather than customers, shareholders, the workforce or wider society.
  • Managers need an elaborate bureaucracy to maintain even a semblance of control, though it’s only the bureaucracy they control, not the underlying reality.
  • These managers struggle to understand the jobs of the people who do the productive work.
  • So the managers value protocol and compliance with the bureaucracy over technical expertise.
  • The purpose of the corporate bureaucracy therefore becomes the smooth running of the bureaucracy.
  • Hence the proliferation of jobs that provide no real value and exist only so that the corporate bureaucracy can create the illusion of working effectively.

I have written about this phenomenon in a blog series “Corporate bureaucracy and testing” and also reflected in “Testing: valuable or bullshit?” on the specific threat to testing if it becomes a low skilled, low value corporate bullshit job.

The aspect of this problem that I want to focus on in this series is our desire to simplify complexity. We furnish simple explanations for complex problems. I did this as a child when I decided the wind was caused by trees waving their branches. My theory fitted what I observed, and it was certainly much easier for a five year old to understand than variations in atmospheric pressure. We also make convenient, but flawed, assumptions that turn a messy, confusing, complex problem into one that we are confident we can deal with. The danger is that in doing so we completely lose sight of the real problem while we focus on a construct of our own imagination. This is hardly a recent phenomenon.Guns of August

The German military planners of World War One provide a striking example of this escape from reality. They fully appreciated what a modern, industrial war would be like with huge armies and massively destructive armaments. The politicians didn’t get it, but according to “Barbara Tuchman” the German military staff did understand. They just didn’t know how to respond. So they planned to win the sort of war they were already familiar with, a 19th century war.

(General Moltke, the German Chief of Staff,) said to the Kaiser in 1906, ‘It will be a long war that will not be settled by a decisive battle but by a long wearisome struggle with a country that will not be overcome until its whole national force is broken, and a war that will utterly exhaust our own people, even if we are victorious.’ It went against human nature, however – and the nature of General Staffs – to follow the logic of his own prophecy. Amorphous and without limits, the concept of a long war could not be scientifically planned for as could the orthodox, predictable and simple solution of decisive battle and short war. The younger Moltke was already Chief of Staff when he made his prophecy, but neither he nor his Staff, nor the Staff of any other country made any effort to plan for a long war.

The military planners yearned for a problem that allowed an “orthodox, predictable and simple solution”, so they redefined the problem to fit that longing. The results were predictably horrific.

There is a phrase for the mental construct the military planners chose to work with; an “envisioned world” (PDF – opens in new tab). That paper, by David Woods, Paul Feltovich, Robert Hoffman, and Axel Roesler is a fairly short and clear introduction to the dangers of approaching complex systems with a set of naively simplistic assumptions. Our natural, human bias towards over-simplification has various features. In each case the danger is that we opt for a simplified perspective, rather than a more realistic one.

We like to think of activities as a series of discrete steps that can be analysed individually, rather than as continuous processes that cannot meaningfully be broken down. Similarly, we prefer to see processes as being separable and independent, rather than envisage them all interacting with the wider world. We are inclined to consider activities as if they were sequential when they actually happen simultaneously. We instinctively want to assume homogeneity rather than heterogeneity, so we mentally class similar things as if they were exactly the same, thus losing sight of nuance and important distinctions; we assume regularity when the reality is irregular. We look at elements as if there is only one perspective when there might be multiple viewpoints. We like to assume any rules or principles are universal when they might really be local and conditional, relevant only to the current context. We inspect the surface and shy away from considering deep analysis that might reveal awkward complications and subtleties.

These are all relevant considerations for testers, but there are three more that are all related and are particularly important when trying to learn how complex socio-technical systems work.

  • We look on problems as if they are static objects, when we should be thinking of them as dynamic, flowing processes. If we focus on the static then we lose sight of the problems or opportunities that might arise as the problem, or the application, changes over time or space.
  • We treat problems as if they are simple and mechanical, rather than organic with unpredictable, emergent properties. The implicit assumption is that we can know how whole systems will behave simply by looking at the behaviour of the components.
  • We pretend that the systems are subject to linear causes and effects with the same cause always producing the same effect. The possibilities of tipping points and cascading effects is ignored.

Complex socio-technical systems are not static, simple or linear. Testers have to recognise that and frame their testing to take account of the reality, that these systems are dynamic, organic and non-linear. If they don’t and if they try to restrict themselves to the parts of the system that can be treated as mechanical rather than truly complex, the great danger is that testing will become just another pointless, bureaucratic job producing nothing of any real value. I have worked both as an external auditor and an internal auditor. Internal audit has a focus and a mindset that allows it to deliver great value, when it is done well. External audit has been plagued by a flawed business model that is struggling with the complexity of modern corporations and their accounts. The external audit model requires masses of inexperienced, relatively lowly paid staff, carrying out unskilled checking of the accounts and producing output of dubious value. The result can fairly be described as a crisis of relevance for external audit.

I don’t want to see testing suffer the same fate, but that is likely if we try to define the job as one that can be carried out by large squads of poorly skilled testers. We can’t afford to act as if the job is easy. That is the road to irrelevance. In order to remain relevant we must try to meet the real needs of those who employ us. That requires us to deal with the world as it is, not as we would like it to be.

My spell in IT audit forced me to think seriously about all these issues seriously for the first time. The audit department in which I worked was very professional and enlightened, with some very good, very bright people. We carried out valuable, risk-based auditing when that was at the leading edge of internal audit practice. Many organisations have still not caught up and are mired in low-value, low-skilled, compliance checking. That style of auditing falls squarely into the category of pointless, bullshit jobs. It is performing a ritual for the sake of appearances

My spell as an auditor transformed my outlook. I had to look at, and understand the bigger picture, how the various business critical applications fitted together, and what the implications were of changing them. We had to confront bullshitters and “challenge the intellectual inadequates”, as the Group Chief Auditor put it. We weren’t just allowed to challenge bullshit; it was our duty. Our organisational independence meant that nobody could pull rank on us, or go over our heads.

I never had a good understanding of what the company was doing with IT till I moved into audit. The company paid me enough money to enjoy a good lifestyle while I played with fun technology. As an auditor I had to think seriously about how IT kept the company competitive and profitable. I had to understand how everything fitted together, understand the risks we faced and the controls we needed.

I could no longer just say “well, shit happens” I had to think “what sort of shit?”, “how bad is it?”, “what shit can we live with?”, “what shit have we really, really got to avoid”, “what are the knock-on implications?”, “can we recover from it?”, “how do we recover?”, “what does ‘happen’ mean anyway?”, “who does it happen to?”, “where does it happen?”.

Everything that mattered fitted together. If it was stand alone, then it almost certainly didn’t matter and we had more important stuff to worry about. The more I learned the more humble I became about the limits of my knowledge. It gradually dawned on me how few people had a good overall understanding of how the company worked, and this lesson was hammered home when we reached Y2K.

When I was drafted onto the Y2K programme as a test manager I looked at the plans drawn up by the Y2K architects for my area, which included the complex finance systems on which I had been working. The plans were a hopelessly misleading over-simplification. There were only three broad systems defined, covering 1,175 modules. I explained that it was nonsense, but I couldn’t say for sure what the right answer was, just that it was a lot more.

I wrote SAS programs to crawl through the production libraries, schedules, datasets and access control records to establish all the links and outputs. I drew up an overview that identified 20 separate interfacing applications with 3,000 modules. That was a shock to management because it had already been accepted that there would not be enough time to test the lower number thoroughly.

My employers realised I was the only available person who had any idea of the complexity of both the technical and business issues. They put me in charge of the development team as well as the testers. That was an unusual outcome for a test manager identifying a fundamental problem. I might not have considered myself an expert, but I had proved my value by demonstrating how much we didn’t know. That awareness was crucial.

That Y2K programme might be 20 years ago but it was painfully clear at the time that we had totally lost sight of the complexity of these finance applications. I was able to provide realistic advice only because of my deep expertise and thereafter I was always uncomfortably aware that I never again had the time to acquire such deep knowledge.

These applications, for all their complexity, were at least rigidly bounded. We might not have known what was going on within them, but we knew where the limits lay. They were all internal to the corporation with a tightly secured perimeter. That is a different world from today. The level of complexity has increased vastly. Web applications are built on layers of abstraction that render the infrastructure largely opaque. These applications aren’t even notionally under the control of organisations in the way that our complex insurance applications were. That makes their behaviour impossible to control precisely, and even to predict as I will discussing in my next post, “part 2 – crucial features of complex systems”