Why ISO 29119 is a flawed quality standard

Why ISO 29119 is a flawed quality standard

This article originally appeared in the Fall 2015 edition of Better Software magazine.

In August 2014, I gave a talk attacking ISO 29119” at the Association for Software Testing’s conference in New York. That gave me the reputation for being opposed to standards in general — and testing standards in particular. I do approve of standards, and I believe it’s possible that we might have a worthwhile standard for testing. However, it won’t be the fundamentally flawed ISO 29119.

Technical standards that make life easier for companies and consumers are a great idea. The benefit of standards is that they offer protection to vulnerable consumers or help practitioners behave well and achieve better outcomes. The trouble is that even if ISO 29119 aspires to do these things, it doesn’t.

Principles, standards, and rules

The International Organization for Standardization (ISO) defines a standard as “a document that provides requirements, specifications, guidelines or characteristics that can be used consistently to ensure that materials, products, processes and services are fit for their purpose.”

It might be possible to derive a useful software standard that fits this definition, but only if it focuses on guidelines, rather than requirements, specifications, or characteristics. According to ISO’s definition, a standard doesn’t have to be all those things. A testing standard that is instead framed as high level guidelines would be consistent with the widespread view among regulatory theorists that standards are conceptually like high-level principles. Rules, in contrast, are detailed and specific (see Frederick Schauer’s “The Convergence of Rules and Standards”: PDF opens in new tab). One of ISO 29119’s fundamental problems is that it is pitched at a level consistent with rules, which will undoubtedly tempt people to treat them as fixed rules.

Principles focus on outcomes rather than detailed processes or specific rules. This is how many professional bodies have defined standards. They often use the words principles and standards interchangeably. Others favor a more rules-based approach. If you adopt a detailed, rules-based approach, there is a danger of painting yourself into a corner; you have to try to specify exactly what is compliant and noncompliant. This creates huge opportunities for people to game the system, demonstrating creative compliance as they observe the letter of the law while trashing underlying quality principles, (see John Braithwaite’s “Rules and Principles: A Theory of Legal Certainty”). Whether one follows a principles-based or a rules-based approach, regulators, lawyers, auditors, and investigators are likely to assume standards define what is acceptable.

As a result, there is a real danger that ISO 29119 could be viewed as the default set of rules for responsible software testing. People without direct experience in development or testing look for some form of reassurance about what constitutes responsible practice. They are likely to take ISO 29119 at face value as a definitive testing standard. The investigation into the HealthCare.gov website problems showed what can happen.

In its March 2015 report (PDF, opens in new tab) on the website’s problems, the US Government Accountability Office checked the HealthCare.gov project for compliance with the IEEE 829 test documentation standard. The agency didn’t know anything about testing. They just wanted a benchmark. IEEE 829 was last revised in 2008; it said that the content of standards more than five years old “do not wholly reflect the present state of the art”. Few testers would disagree that IEEE 829 is now hopelessly out of date.

when a document is more than five years old

IEEE 829’s obsolescence threshold

The obsolescence threshold for ISO 29119 has increased from five to ten years, presumably reflecting the lengthy process of creating and updating such cumbersome documents rather than the realities of testing. We surely don’t want regulators checking testing for compliance against a detailed, outdated standard they don’t understand.

Scary lessons from the social sciences

If we step away from ISO 29119, and from software development, we can learn some thought-provoking lessons from the social sciences.

Prescriptive standards don’t recognize how people apply knowledge in demanding jobs like testing. Scientist Michael Polanyi and sociologist Harry Collins have offered valuable insights into tacit knowledge, which is knowledge we possess and use but cannot articulate. Polanyi first introduced the concept, and Collins developed the idea, arguing that much valuable knowledge is cultural and will vary between different contexts and countries. Defining a detailed process as a standard for all testing excludes vital knowledge; people will respond by concentrating on the means, not the ends.

Donald Schön, a noted expert on how professionals learn and work, offered a related argument with “reflection in action” (see Willemien Visser’s article: PDF opens in new tab). Schön argued that creative professionals, such as software designers or architects, have an iterative approach to developing ideas—much of their knowledge is understood without being expressed. In other words, they can’t turn all their knowledge into an explicit, written process. Instead, to gain access to what they know, they have to perform the creative act so that they can learn, reflect on what they’ve learned, and then apply this new knowledge. Following a detailed, prescriptive process stifles learning and innovation. This applies to all software development—both agile and traditional methods.

In 1914, Thorstein Veblen identified the problem of trained incapacity. People who are trained in specific skills can lack the ability to adapt. Their response worked in the past, so they apply it regardless thereafter.

young girl, old woman

Young woman or old woman? Means or ends? We can focus on only one at a time.

Kenneth Burke built upon Veblen’s work, arguing that trained incapacity means one’s abilities become blindnesses. People can focus on the means or the ends, not both; their specific training makes them focus on the means. They don’t even see what they’re missing. As Burke put it, “a way of seeing is also a way of not seeing; a focus upon object A involves a neglect of object B”. This leads to goal displacement, and the dangers for software testing are obvious.

The problem of goal displacement was recognized before software development was even in its infancy. When humans specialize in organizations, they have a predictable tendency to see their particular skill as a hammer and every problem as a nail. Worse, they see their role as hitting the nail rather than building a product. Give test managers a detailed standard, and they’ll start to see the job as following the standard, not testing.

In the 1990s, British academic David Wastell studied software development shops that used structured methods, the dominant development technique at the time. Wastell found that developers used these highly detailed and prescriptive methods in exactly the same way that infants use teddy bears and security blankets: to give them a sense of comfort and help them deal with stress. In other words, a developer’s mindset betrayed that the method wasn’t a way to build better software but rather a defense mechanism to alleviate stress and anxiety.

Wastell could find no empirical evidence, either from his own research at these companies or from a survey of the findings of other experts, that structured methods worked. In fact, the resulting systems were no better than the old ones, and they took much more time and money to develop. Managers became hooked on the technique (the standard) while losing sight of the true goal. Wastell concluded the following:

Methodology becomes a fetish, a procedure used with pathological rigidity for its own sake, not as a means to an end. Used in this way, methodology provides a relief against anxiety; it insulates the practitioner from the risks and uncertainties of real engagement with people and problems.

Developers were delivering poorer results but defining that as the professional standard. Techniques that help managers cope with stress and anxiety but give an illusory, reassuring sense of control harm the end product. Developers and testers cope by focusing on technique, mastery of tools, or compliance with standards. In doing so they can feel that they are doing a good job, so long as they don’t think about whether they are really working toward the true ends of the organization or the needs of the customer.

Standards must be fit for their purpose

Is all this relevant to ISO 29119? We’re still trying to do a difficult, stressful job, and in my experience, people will cling to prescriptive processes and standards that give the illusion of being in control. Standards have credibility and huge influence simply from their status as standards. If we must have standards, they should be relevant, credible, and framed in a way that is helpful to practitioners. Crucially, they must not mislead stakeholders and regulators who don’t understand testing but who wield great influence and power.

The level of detail in ISO 29119 is a real concern. Any testing standard should be in the style favored by organizations like the Institute of Internal Auditors (IIA), whose principles based professional standards cover the entire range of internal auditing but are only one-tenth as long as the three completed parts of ISO 29119. The IIA’s standards are light on detail but far more demanding in the outcomes required.

Standards must be clear about the purpose they serve if we are to ensure testing is fit for its purpose, to hark back to ISO’s definition of a standard. In my opinion, this is where ISO 29119 falls down. The standard does not clarify the purpose of testing, only the mechanism—and that mechanism focuses on documentation, not true testing. It is this lack of purpose, the why, that leads to teams concentrating on standards compliance rather than delivering valuable information to stakeholders. This is a costly mistake. Standards should be clear about the outcomes and leave the means to the judgment of practitioners.

A good example of this problem is ISO 29119’s test completion report, which is defined simply as a summary of the testing that was performed. The standard offers examples for traditional and agile projects. Both focus on the format, not the substance of the report. The examples give some metrics without context or explanation and provide no information or insight that would help stakeholders understand the product and the risk and make better decisions. Testers could comply with the standard without doing anything useful. In contrast, the IIA’s standards say audit reports must be “accurate, objective, clear, concise, constructive, complete, and timely.” Each of these criteria is defined briefly in a way that makes the standard far more demanding and useful than ISO 29119, in far less space.

It’s no good saying that ISO 29119 can be used sensibly and doesn’t have to be abused. People are fallible and will misuse the standard. If we deny that fallibility, we deny the experience of software development, testing, and, indeed, human nature. As Jerry Weinberg said (in “The Secrets of Consulting”), “no matter how it looks at first, it’s always a people problem”. Any prescriptive standard that focuses on compliance with highly detailed processes is doomed. Maybe you can buck the system, but you can’t buck human nature.

David Graeber’s “The Utopia of Rules: On Technology, Stupidity and the Secret Joys of Bureaucracy”

When I gave my talk at CAST 2014 in New York, “Standards – promoting quality or restricting competition?” I was concentrating on the economic aspects of standards. They are often valuable, but they can be damaging and restrict competition if they are misused. A few months later I bought “The Utopia of Rules: On Technology, Stupidity, and the Secret Joys of Bureaucracy” by David Graeber, Professor of Anthropology at the London School of Economics. I was familiar with Graeber as a challenging and insightful writer. I drew on his work when I wrote “Testing: valuable or bullshit?“. The Utopia of Rules also inspired the blog article I wrote recently, “Frozen in time – grammar and testing standards” in which I discussed the similarity between grammar textbooks and standards, which both codify old usages and practices that no longer match the modern world.

What I hadn’t expected from The Utopia of Rules was how strongly it would support the arguments I made at CAST.

Certification and credentialism

Graeber makes the same argument I deployed against certification. It is being used increasingly to enrich special interests without benefiting society. On page 23 Graeber writes:

Almost every endeavor that used to be considered an art (best learned through doing) now requires formal professional training and a certificate of completion… In some cases, these new training requirements can only be described as outright scams, as when lenders, and those prepared to set up the training programs, jointly lobby the government to insist that, say, all pharmacists be henceforth required to pass some additional qualifying examination, forcing thousands already practicing the profession into night school, which these pharmacists know many will only be able to afford with the help of high-interest student loans. By doing this, lenders are in effect legislating themselves a cut of most pharmacists’ subsequent incomes.

To be clear, my stance on ISTQB training is that it educates testers in a legitimate, though very limited, vision of testing. My objection is to any marketing of the qualification as a certification of testing ability, rather than confirmation that the tester has passed an exam associated with a particular training course. I object even more strongly to any argument that possession of the certificate should be a requirement for employment, or for contracting out testing services. It is reasonable to talk of scams when the ability of good testers to earn a living is damaged.

What is the point of it all?

Graeber has interesting insights into how bureaucrats can be vague about the values of the bureaucracy: why does the organisation exist? Bureaucrats focus on efficient execution of rational processes, but what is the point of it all? Often the means become the ends: efficiency is an end in itself.

I didn’t argue that point at CAST, but I have done so many times in other talks and articles (e.g. “Teddy bear methods“). If people are doing a difficult, stressful job and you give them prescriptive methods, processes or standards then they will focus on ticking their way down the list. The end towards which they are working becomes compliance with the process, rather than helping the organisation reach its goal. They see their job as producing the outputs from the process, rather than the outcomes the stakeholders want. I gave a talk in London in June 2015 to the British Computer Society’s Special Interest Group in Software Testing in which I argued that testing lacks guiding principles (PDF, opens in a new tab) and ISO 29119 in particular does not offer clear guidance about the purpose of testing.

In a related argument Graeber makes a point that will be familiar to those who have criticised the misuse of testing metrics.

…from inside the system, the algorithms and mathematical formulae by which the world comes to be assessed become, ultimately, not just measures of value, but the source of value itself.

Rent extraction

The most controversial part of my CAST talk was my argument that the pressure to adopt testing standards was entirely consistent with rent seeking in economic theory. Rent seeking, or rent extraction, is what people do when they exploit failings in the market, or rig the market for their own benefit by lobbying for regulation that happens to benefit them. Instead of creating wealth, they take it from other people in a way that is legal, but which is detrimental to the economy, and society, as a whole.

This argument riled some people who took it as a personal attack on their integrity. I’m not going to dwell on that point. I meant no personal slur. Rent seeking is just a feature of modern economies. Saying so is merely being realistic. David Graeber argued the point even more strongly.

The process of financialization has meant that an ever-increasing proportion of corporate profits come in the form of rent extraction of one sort or another. Since this is ultimately little more than legalized extortion, it is accompanied by ever-increasing accumulation of rules and regulations… At the same time, some of the profits from rent extraction are recycled to select portions of the professional classes, or to create new cadres of paper-pushing corporate bureaucrats. This helps a phenomenon I have written about elsewhere: the continual growth, in recent decades, of apparently meaningless, make-work, “bullshit jobs” — strategic vision coordinators, human resources consultants, legal analysts, and the like — despite the fact that even those who hold such positions are half the time secretly convinced they contribute nothing to the enterprise.

In 2014 I wrote about “bullshit jobs“, prompted partly by one of Graeber’s articles. It’s an important point. It is vital that testers define their job so that it offers real value, and they are not merely bullshit functionaries of the corporate bureaucracy.

Utopian bureaucracies

I have believed for a long time that adopting highly prescriptive methods or standards for software development and testing places unfair pressure on people, who are set up to fail. Graeber makes exactly the same point.

Bureaucracies public and private appear — for whatever historical reasons — to be organized in such a way as to guarantee that a significant proportion of actors will not be able to perform their tasks as expected. It’s in this sense that I’ve said one can fairly say that bureaucracies are utopian forms of organization. After all, is this not what we always say of utopians: that they have a naïve faith in the perfectibility of human nature and refuse to deal with humans as they actually are? Which is, are we not also told, what leads them to set impossible standards and then blame the individuals for not living up to them? But in fact all bureaucracies do this, insofar as they set demands they insist are reasonable, and then, on discovering that they are not reasonable (since a significant number of people will always be unable to perform as expected), conclude that the problem is not with the demands themselves but with the individual inadequacy of each particular human being who fails to live up to them.

Testing standards such as ISO 29119, and its predecessor IEEE 829, don’t reflect what developers and testers do, or rather should be doing. They are at odds with the way people think and work in organisations. These standards attempt to represent a highly complex, sometimes chaotic, process in a defined, repeatable model. The end product is usually of dubious quality, late and over budget. Any review of the development will find constant deviations from the standard. The suppliers, and defenders, of the standard can then breathe a sigh of relief. The sacred standard was not followed. It was the team’s fault. If only they’d done it by the book! The possibility that the developers’ and testers’ apparent sins were the only reason anything was produced at all is never considered. This is a dreadful way to treat people, but in many organisations it has been normal for several decades.

Loss of communication

All of the previous arguments by Graeber were entirely consistent with my own thoughts about how corporate bureaucracies operate. It was fascinating to see an anthropologist’s perspective, but it didnt teach me anything that was really new about how testers work in corporations. However, later in the book Graeber developed two arguments that gave me new insights.

Understanding what is happening in a complex, social situation needs effective two way communication. This requires effort, “interpretive labor”. The greater the degree of compulsion, and the greater the bureaucratic regime of rules and forms, the less need there is for such two way communication. Those who can simply issue orders that must be obeyed don’t have to take the trouble to understand the complexities of the situation they’re managing.

…within relations of domination, it is generally the subordinates who are effectively relegated the work of understanding how the social relations in question really work. … It’s those who do not have the power to hire and fire who are left with the work of figuring out what actually did go wrong so as to make sure it doesn’t happen again.

This ties in with the previous argument about utopian bureaucracies. If you impose a inappropriate standard then poor results will be attributed to the inevitable failure to comply. There is no need for senior managers to understand more, and no need to listen to the complaints, the “excuses”, of the people who do understand what is happening. Interestingly, Graeber’s argument about interpretive labor is is consistent with regulatory theory. Good regulation of complex situations requires ongoing communication between the regulator and the regulated. I explained this in the talk on testing principles I mentioned above (slides 38 and 39).

Fear of play

My second new insight from Graeber arrived when he discussed the nature of play and how it relates to bureaucracies. Anthropologists try to maintain a distinction between games and play, a distinction that is easier to maintain in English than in languages like French and German, which use the same word for both. A game has boundaries, set rules and a predetermined conclusion. Play is more free-form and creative. Novelties and surprising results emerge from the act of playing. It is a random, unpredictable and potentially destructive activity. Graeber finishes his discussion of play and games with the striking observation.

What ultimately lies behind the appeal of bureaucracy is fear of play.

Put simply, and rather simplistically, Graeber means that we use bureaucracy to escape the terror of chaotic reality, to bring a semblance (an illusion?) of control to the uncontrollable.

This gave me an tantalising new insight into the reasons people build bureaucratic regimes in organisations. It sent me off into a whole new field of reading on the anthropology of games and play. This has fascinating implications for the debate about standards and testing. We shy away from play, but it is through play that we learn. I don’t have time now to do the topic justice, and it’s much too big and important a subject to be tacked on to the end of this article, but I will return to it. It is yet another example of the way anthropology can help us understand what we are doing as testers. As a starting point I can heartily recommend David Graeber’s book, “The Utopia of Rules”.

Frozen in time – grammar and testing standards

This recent tweet by Tyler Hayes caught my eye. “If you build software you’re an anthropologist whether you like it or not.”

It’s an interesting point, and it’s relevant on more than one level. By and large software is developed by people and for people. That is a statement of the obvious, but developers and testers have generally been reluctant to take on board the full implications. This isn’t a simple point about usability. The software we build is shaped by many assumptions about the users, and how they live and work. In turn, the software can reinforce existing structures and practices. Testers should think about these issues if they’re to provide useful findings to the people who matter. You can’t learn everything you need to know from a requirements specification. This takes us deep into anthropological territory.

What is anthropology?

Social anthropology is defined by University College London as follows.

Social Anthropology is the comparative study of the ways in which people live in different social and cultural settings across the globe. Societies vary enormously in how they organise themselves, the cultural practices in which they engage, as well as their religious, political and economic arrangements.

We build software in a social, economic and cultural context that is shaped by myriad factors, which aren’t necessarily conducive to good software, or a happy experience for the developers and testers, never mind the users. I’ve touched on this before in “Teddy Bear Methods“.

There is much that we can learn from anthropology, and not just to help us understand what we see when we look out at the users and the wider world. I’ve long thought that the software development and testing community would make a fascinating subject for anthropologists.

Bureaucracy, grammar and deference to authority

I recently read “The Utopia of Rules – On Technology, Stupidity, and the Secret Joys of Bureaucracy” by the anthropologist David Graeber.
Graeber has many fascinating insights and arguments about how organisations work, and why people are drawn to bureaucracy. One of his arguments is that regulation is imposed and formalised to try and remove arbitrary, random behaviour in organisations. That’s a huge simplification, but there’s not room here to do Graeber’s argument justice. One passage in particular caught my eye.

People do not invent languages by writing grammars, they write grammars — at least, the first grammars to be written for any given language — by observing the tacit, largely unconscious, rules that people seem to be applying when they speak. Yet once a book exists,and especially once it is employed in schoolrooms, people feel that the rules are not just descriptions of how people do talk, but prescriptions for how they should talk.

It’s easy to observe this phenomenon in places where grammars were only written recently. In many places in the world, the first grammars and dictionaries were created by Christian missionaries in the nineteenth or even twentieth century, intent on translating the Bible and other sacred texts into what had been unwritten languages. For instance, the first grammar for Malagasy, the language spoken in Madagascar, was written in the 1810s and ’20s. Of course, language is changing all the time, so the Malagasy spoken language — even its grammar — is in many ways quite different than it was two hundred years ago. However, since everyone learns the grammar in school, if you point this out, people will automatically say that speakers nowadays are simply making mistakes, not following the rules correctly. It never seems to occur to anyone — until you point it out — that had the missionaries came and written their books two hundred years later, current usages would be considered the only correct ones, and anyone speaking as they had two hundred years ago would themselves be assumed to be in error.

In fact, I found this attitude made it extremely difficult to learn how to speak colloquial Malagasy. Even when I hired native speakers, say, students at the university, to give me lessons, they would teach me how to speak nineteenth-century Malagasy as it was taught in school. As my proficiency improved, I began noticing that the way they talked to each other was nothing like the way they were teaching me to speak. But when I asked them about grammatical forms they used that weren’t in the books, they’d just shrug them off, and say, “Oh, that’s just slang, don’t say that.”

…The Malagasy attitudes towards rules of grammar clearly have… everything to do with a distaste for arbitrariness itself — a distaste which leads to an unthinking acceptance of authority in its most formal, institutional form.

Searching for the “correct” way to develop software

Graeber’s phrase “distate for arbitrariness itself” reminded me of the history of software development. In the 1960s and 70s academics and theorists agonised over the nature of development, trying to discover and articulate what it should be. Their approach was fundamentally mistaken. There are dreadful ways, and there are better ways to develop software but there is no natural, correct way that results in perfect software. The researchers assumed that there was and went hunting for it. Instead of seeking understanding they carried their assumptions about what the answer might be into their studies and went looking for confirmation.

They were trying to understand how the organisational machine worked and looked for mechanical processes. I use the word “machine” carefully, not as a casual metaphor. There really was an assumption that organisations were, in effect, machines. They were regarded as first order cybernetic entities whose behaviour would not vary depending on whether they were being observed. To a former auditor like myself this is a ludicrous assumption. The act of auditing an organisation changes the way that people behave. Even the knowledge that an audit may occur will shape behaviour, and not necessarily for the better (see my article “Cynefin, testing and auditing“). You cannot do the job well without understanding that. Second order cybernetics does recognise this crucial problem and treats observers as participants in the system.

So linear, sequential development made sense. The different phases passing outputs along the production line fitted their conception of the organisation as a machine. Iterative, incremental development looked messy and immature; it was just wrong as far as the researchers were concerned. Feeling one’s way to a solution seemed random, unsystematic – arbitrary.

Development is a difficult and complex job; people will tend to follow methods that make the job feel easier. If managers are struggling with the complexities of managing large projects they are more likely to choose linear, sequential methods that make the job of management easier, or at least less stressful. So when researchers saw development being carried out that way they were observing human behaviour, not a machine operating.

Doubts about this approach were quashed by pointing out that if organisations weren’t quite the neat machine that they should be this would be solved by the rapid advance in the use of computers. This argument looks suspiciously circular because the conclusion that in future organisations would be fully machine-like rests on the unproven premise that software development is a mechanical process which is not subject to human variability when performed properly.

Eliminating “arbitrariness” and ignoring the human element

This might all have been no more than an interesting academic sideline, but it fed back into software development. By the 1970s, when these studies into the nature of development were being carried out, organisations were moving towards increasingly formalised development methods. There was increasing pressure to adopt such methods. Not only were they attractive to managers, the use of more formal methods provided a competitive advantage. ISO certification and CMMI accreditation were increasingly seen as a way to demonstrate that organisations produced high quality software. The evidence may have been weak, but it seemed a plausible claim. These initiatives required formal processes. The sellers of formal methods were happy to look for and cite any intellectual justification for their products. So formal linear methods were underpinned by academic work that assumed that formal linear methods were correct. This was the way that responsible, professional software development was performed. ISO standards were built on this assumption.

If you are trying to define the nature of development you must acknowledge that it is a human activity, carried out by and for humans. These studies about the nature of development were essentially anthropological exercises, but the researchers assumed they were observing and taking apart a machine.

As with the missionaries who were codifying grammar the point in time when these researchers were working shaped the result. If they had carried out their studies earlier in the history of software development they might have struggled to find credible examples of formalised, linear development. In the 1950s software development was an esoteric activity in which the developers could call the shots. 20 years later it was part of the corporate bureaucracy and iterative, incremental development was sidelined. If the studies can been carried out a few decades further on then it would have been impossible to ignore Agile.

As it transpired, formal methods, CMM/CMMI and the first ISO standards concerning development and testing were all creatures of that era when organisations and their activities were seriously regarded as mechanical. Like the early Malagasy grammar books they codified and fossilised a particular, flawed approach at a particular time for an activity that was changing rapidly. ISO 29119 is merely an updated version of that dated approach to testing. It is rooted in a yearning for bureaucratic certainty, a reluctance to accept that ultimately good testing is dependent not on documentation, but on that most irrational, variable and unpredictable of creatures – the human who is working in a culture shaped by humans. Anthropology has much to teach us.

Further reading

That is the end of the essay, but there is a vast amount of material you could read about attempts to understand and define the nature of software development and of organisations. Here is a small selection.

Brian Fitzgerald has written some very interesting articles about the history of development. I recommend in particular “The systems development dilemma: whether to adopt formalised systems development methodologies or not?” (PDF, opens in new tab).

Agneta Olerup wrote this rather heavyweight study of what she calls the
Langeforsian approach to information systems design. Börje Langefors was a highly influential advocate of the mechanical, scientific approach to software development. Langefors’ Wikipedia entry describes him as “one of those who made systems development a science”.

This paper gives a good, readable introduction to first and second order cybernetics (PDF, opens in new tab), including a useful warning about the distinction between models and the entities that they attempt to represent.

All our knowledge of systems is mediated by our simplified representations—or models—of them, which necessarily ignore those aspects of the system which are irrelevant to the purposes for which the model is constructed. Thus the properties of the systems themselves must be distinguished from those of their models, which depend on us as their creators. An engineer working with a mechanical system, on the other hand, almost always know its internal structure and behavior to a high degree of accuracy, and therefore tends to de-emphasize the system/model distinction, acting as if the model is the system.

Moreover, such an engineer, scientist, or “first-order” cyberneticist, will study a system as if it were a passive, objectively given “thing”, that can be freely observed, manipulated, and taken apart. A second-order cyberneticist working with an organism or social system, on the other hand, recognizes that system as an agent in its own right, interacting with another agent, the observer.

Finally, I recommend a fascinating article in the IEEE’s Computer magazine by Craig Larman and Victor Basili, “Iterative and incremental development: a brief history” (PDF, opens in new tab). Larman and Basili argue that iterative and incremental development is not a modern practice, but has been carried out since the 1950s, though they do acknowledge that it was subordinate to the linear Waterfall in the 1970s and 80s. There is a particularly interesting contribution from Gerald Weinberg, a personal communication to the authors, in which he describes how he and his colleagues developed software in the 1950s. The techniques they followed were “indistinguishable from XP”.


This page will carry slide decks for presentations I’ve given at conferences – as I get round to adding them. All the presentations are in PDF format and open in new tabs.

Farewell to “pass or fail”?

I gave this presentation, “Farewell to ‘pass or fail’” at Expo:QA 2014 in Madrid in May 2014. It shows how auditing and software testing have faced similar challenges and how each profession can learn from the other.

Standards – promoting quality or restricting competition?

This presentation is the one that attracted so much attention and controversy at CAST 2014 in New York. It is a critique of software testing standards from the perspective of economics.

A modest proposal for improving the efficiency of testing services

I would like to offer for your perusal a modest proposal for improving the efficiency of testing services whilst producing great benefits for clients, suppliers and testers (with a nod to Dr Jonathan Swift).

Lately I have been reading some fascinating material about the creative process, the ways that we direct our attention, and how these are linked. Whilst cooking dinner one evening I had a sudden insight into how I could launch an exciting and innovative testing service.

It was no accident that I had my eureka moment when I was doing something entirely unrelated to testing. Psychologists recognise that the creative process starts with two stages. Firstly comes the preparation stage. We familiarise ourselves with a cognitively demanding challenge. We then have to step away from the problem and perform some activity that doesn’t require much mental effort. This is the incubation stage, which gives our brain the opportunity to churn away, making connections between the problem, our stored knowledge and past experience. Crucially, it gives us the chance to envisage future possibilities. Suddenly, and without conscious effort, the answer can come, as it did to Archimedes whose original eureka moment arrived in the bath when he realised that the volume of irregular objects could be calculated by the volume of water that they displaced.

My modest proposal is to exploit this eureka principle in an entirely new way for testing. Traditionally, testers have followed the two stage approach to creativity. We have familiarised ourselves with the client, the business problem and the proposed application. We have then moved on to the vital incubation stage of mindless activity. This has traditionally been known as “writing the detailed test plans” and “churning out the test scripts”.

Now the trouble with these documents hasn’t been their negligible value for the actual testing. That’s the whole point of the incubation stage. We have to do something unrelated and mindless so that our brains can come up with creative ideas for testing. No, the real problem with the traditional approach is that there is no direct valuable output at all. The documents merely gather dust. They haven’t even been used to feed the heating.

I therefore intend to launch a start-up testing services company called CleanTest. CleanTest’s testers will familiarise themselves with the client and the application in the preparation stage. Then, for the incubation stage, they will move on to cleaning the data centre, the development shop and the toilets, whilst the creative ideas formulate. Once their creative ideas for testing have formed they will execute the testing.

Everyone will be a winner. The client will have testing performed to at least the same standard as before. They will also have clean offices and be able to save money by getting rid of their existing cleaning contractor. The testers will have increased job satisfaction from seeing shiny clean premises, instead of mouldering shelfware that no-one will ever read. And I will make a pile of money.

Of course it is vital for the credibility of CleanTest that the company is ISO compliant. We will therefore comply with the ISO 14644 cleanrooms standard, and ISO 12625 toilet paper standard. Compliance with two ISO standards will make us twice as responsible as those fly-by-night competitors who are compliant only with ISO 29119.

Anyone who wishes to join with me and invest in this exciting venture is welcome to get in touch. I also have some exciting opportunities that trusted contacts in Nigeria have emailed to me.

A more optimistic conclusion?

This is the final post in a series about how and why so many corporations became embroiled in a bureaucratic mess in which social and political skills are more important than competence.

In my first post “Sick bureaucracies and mere technicians” I talked about Edward Giblin’s analysis back in the early 1980s of the way senior managers had become detached from the real work of many corporations. Not only did this problem persist, but it become far worse.

In my second post, “Digital Taylorism & the drive for standardisation“, I explained how globalisation and technical advances gave impetus to digital Taylorism and the mass standardisation of knowledge work. It was widely recognised that Taylorism damaged creativity, a particularly serious concern with knowledge work. However, that concern was largely ignored, swamped by the trends I discussed in my third post, “Permission to think“.

In this post I will try to offer a more constructive conclusion after three essays of unremitting bleakness!

Deskilling – a chilling future for testing?

When it comes to standardisation of testing the “talented managers” (see “Permission to think“) will tell themselves that they are looking at a bigger picture than the awkward squad (ok, I mean context driven testers here) who complain that this is disastrous for software testing.

Many large corporations are hooked into a paradigm that requires them to simultaneously improve quality and reduce costs, and to do so by de-skilling jobs below the elite level. Of course other tactics are deployed, but deskilling is what concerns me here. The underlying assumption is that standardisation and detailed processes will not only improve quality, but also reduce costs, either directly by outsourcing, or indirectly by permitting benchmarking against outsourcing suppliers.

In the case of testing that doesn’t work. You can do it, but at the expense of the quality of testing. Testing is either a thinking, reflective activity, or it is done badly. However, testing is a mere pawn; it’s very difficult for corporate bureaucrats to make an exception for testing. If they were to do that it would undermine the whole paradigm. If testing is exempt then how could decision makers hold the line when faced with special pleading on behalf of other roles they don’t understand? No, if the quality of testing has to be sacrificed then so be it.

The drive for higher quality at reduced cost is so powerful that its underlying assumption is unchallengeable. Standardisation produces simplicity which allows higher quality and lower costs. That is corporate dogma, and anyone who wants to take a more nuanced approach is in danger of being branded a heretic and denied a hearing. It is easier to fudge the issue and ignore evidence that applying this strategy to testing increases costs and reduces quality.

Small is beautiful

Perhaps my whole story has been unnecessarily bleak. I have been talking about corporations and organisations. I really mean large bodies. The gloomy, even chilling, picture that I’ve been painting doesn’t apply to smaller, more nimble firms. Start-ups, technology focused firms, and specialist testing services providers (or the good ones at least) have a clearer idea of what the company is trying to do. They’re far less likely to sink into a bureaucratic swamp. For one thing it would kill them quickly. Also, to hark back to my first post in this series, “Sick bureaucracies and mere technicians“, such firms are more likely to be task dependent, i.e. the more senior people will probably have a deep understanding of the core business. It is their job to apply that knowledge in the interests of the company, rather than merely to run the corporate bureaucracy.

My advice to testers who want to do good work would be to head for the smaller outfits, the task dependent companies. As a heuristic I’d want to work for a company that was small enough for me to speak to anyone, at any time, who had the power to change things. Then, I’d know that if I saw possible improvements I’d have the chance to sell my ideas to someone who could make a difference. One of the most dispiriting things I ever heard was a senior manager in the global corporation where I worked saying “you’re quite right – but you’d be appalled at how high you could go and find people who’d agree with you, but say that they couldn’t change anything”.

What’s to be done?

Nevertheless, many good testers are working for big corporations, and struggling to make things better. They’re not all going to walk out the door, and they shouldn’t just give up in despair. What can they do? Well, plain speaking will have a limited impact – except on their careers. Senior managers don’t like being told “we’re doing rubbish work and you’re acting like an idiot if you refuse to admit that”.

Corporate managers are under pressure to make the bureaucracy more efficient by standardising working practices and processes. In order to do so they have to redefine what constitutes simple, routine work. Testers have to understand that pressure and respond by lobbying to be allowed to carry out that redefinition themselves. Testing has to be defined by those who understand and respect it so that the thoughtful, reflective, non-routine elements are recognised. Testing must be defined in such a way that it can handle complex problems, and not just simple, ordered problems (see Cynefin).

That takes us back to the segmentation of knowledge workers described by Brown, Lauder and Ashton in The Global Auction (see my previous post “Permission to think“). The workforce is increasingly segmented into developers (those responsible for corporate development, not software developers!), who are given “permission to think”, demonstrators who apply processes, and drones who basically follow a script without being required to engage their brains. If testers have to follow a prescriptive, documentation driven standard like ISO 29119 they are implicitly assigned to the status of drones.

Testers must argue their case so they are represented in the class of developers who are allowed to shape the way the corporation works. The arguments are essentially the same as those that have been deployed against ISO 29119, and can be summed up in the phrase I used at the top; testing is either a thinking, reflective activity, or it is done badly. Testing is an activity that provides crucial information to the corporate elite, the “developers”. As such testers must be given the responsiblity to think, or else senior management will be choking off the flow of vital information about applications and products.

That is a tough task, and I’m sceptical about the chances of testers persuading their corporations to buck a powerful trend. I doubt if many will be successful, but perhaps some brave, persuasive and persistent souls will succeed. They deserve respect and support from the rest of the testing profession.

If large corporations won’t admit their approach is damaging to testing then ultimately I fear that their in-house test teams are doomed. They will be sucked into a vicious circle of commoditised testing that will lead to the work being outsourced to cheaper suppliers. If you’re not doing anything worthwhile there is always someone who can do it cheaper. Iain McCowatt wrote a great blog about this.

Where might hope lie?

Perhaps outsourcing might offer some hope for testing after all. A major motive for adopting standards is to facilitate outsourcing. The service that is being outsourced is standard, neatly defined, and open to benchmarking. Suppliers who can demonstrate they comply with standards have a competitive advantage. That is one of the reasons ISO 29119 is so pernicious. Good testing suppliers will have to ignore that market and make it clear that they are not competing to provide cheaper drones, but highly capable, thinking consultants who can provide valuable insights about products and applications.

The more imaginative, context-driven (and smaller?) suppliers can certainly compete effectively in this fashion. After all they are following an approach that is is both more efficient and more effective. Their focus is on testing rather than documentation and compliance with an irrelevant standard. However, I suspect that is exactly why many large corporations are suspicious of such an approach. The corporate bureaucrat is reassured by visible documents and compliance with an ISO standard.

A new framework?

Perhaps there is room for an alternative approach. I don’t mean an alternative standard, but a framework that shows how good context driven testing is responsible testing that can keep regulators happy. It could tie together the requirements of regulators, auditors and governance professionals with context driven techniques, perhaps a particular context driven approach. The framework could demonstrate links between governance needs and specific context driven techniques. This has been lurking at the back of my mind for a couple of years, but I haven’t yet committed serious effort to the idea. My reading and thinking around the subject of corporate bureaucracy for this series of blog posts has helped shape my understanding of why such an alternative framework might be needed, and why it might work.

An alternative framework in the form of a set of specific, practical, actionable guidelines would ironically be more consistent with ISO’s definition of a standard than ISO 29119 itself is.

A standard is a document that provides requirements, specifications, guidelines or characteristics that can be used consistently to ensure that materials, products, processes and services are fit for their purpose.

Taking the relevant parts of the definition, the framework would provide guidelines that can be used consistently to ensure that testing services are fit for their purpose.

Could this give corporations the quality of testing they require without having to abandon their worldview? Internal testers might still be defined as drones (with a few, senior testers allowed to be demonstrators). External testers can be treated as consultants and allowed to think.

When discussing ISO 29119, and the approach to testing that it embodies, we should always bear in mind that the standard does not exist to provide better testing. It was developed because it fits a corporate mindset that wants to see as many activities as possible defined as simple and routine. Testers who have a vision of better testing, and a better future for testing, have to understand that mindset and deal with it, rather than just kicking ISO 29119 for being a useless piece of verbiage. The standard really is useless, but perhaps we need a more sophisticated approach than just calling it like it is.

Permission to think

This is the third post, or essay, in a series about how and why so many corporations became embroiled in a bureaucratic mess in which social and political skills are more important than competence.

In my first post “Sick bureaucracies and mere technicians” I talked about Edward Giblin’s analysis back in the early 1980s of the way senior managers had become detached from the real work of many corporations. Not only did this problem persist, but it become far worse.

In my second post, “Digital Taylorism & the drive for standardisation“, I explained how globalisation and technical advances gave impetus to digital Taylorism and the mass standardisation of knowledge work. It was widely recognised that Taylorism damaged creativity, a particularly serious concern with knowledge work. However, that concern was largely ignored, swamped by the trends I will discuss here.

The problems of digital Taylorism and excessive standardisation were exacerbated by an unhealthy veneration of the most talented employees (without any coherent explanation of what “talent” means in the corporate context), and a heightened regard for social skills at the expense of technical experience and competence. That bring us back to the concerns of Giblin in the early 1980s.

10,000 times more valuable

Corporations started to believe in the mantra that talent is rare and it is the key to success. There is an alleged quote from Bill Gates that takes the point to the extreme.

A great lathe operator commands several times the wage of an average lathe operator, but a great writer of software code is worth 10,000 times the price of an average software writer.

Also, this is from “The Global Auction” (the book I credited in my last post for inspiring much of this series).

… being good is no longer good enough because much of the value within organizations is believed to be contributed by a minority of employees. John Chambers, CEO of Cisco, is reported as saying, “a world-class engineer with five peers can out-produce 200 regular engineers“. A Corporate Executive Board (CEB) study also found that the best computer programmers are at least 12 times as productive as the average.

These claims raise many questions about the meaning of terms such as “value”, “great”, “worth”, “world-class”, “out-produce” and “productive”. That’s before you get to questions about the evidence on which such claims are based. 10,000 times more valuable? 33 times? 12 times?

I was unable to track down any studies supporting these claims. However, Laurent Bossavit has persevered with the pursuit of similar claims about programmer productivity in his book The Leprechauns of Software Engineering“. Laurent’s conclusion is that such claims are usually either anecdotal or unsubstantiated claims that were made in secondary sources. In the few genuine studies the evidence offered invariably failed to support the claim about huge variations in programmer productivity.

The War for Talent

The CEB study, referred to above, claiming that the best programmers are at least 12 times more productive than the average was reminiscent of one study that did define “productive”.

The top 3% of programmers produce 1,200% more lines of code than the average; the top 20% produce 320% more lines of code than the average.

I’m not sure there’s a more worthless and easily gamed measure of productivity than “lines of code”. No-one who knows anything about writing code would take it seriously as a measure of productivity. Any study that uses it deserves merciless ridicule. If you search for the quote you will find it appearing in many places, usually alongside the claims you can see in this image.Midtown Madness

I have lifted it from a book about hiring “top talent”, “Topgrading (how leading companies win by hiring, coaching and keeping the best people”. The original source for all these claims is the work that the famous consulting firm of McKinsey & Co carried out in the late 1990s. McKinsey’s efforts were turned into an influential, and much cited, book, “The War for Talent“.

The War for Talent argued that there are five “imperatives of talent management”; believing that corporate success is a product of the individual brilliance of “A players”, creating a culture in which superstars will flourish, doing whatever it takes to hire the superstars of the future, accelerating their development, and ruthless differentiation between the superstars and the other employees. The stars should get rewarded lavishly. Of the rest, the also rans, the “B players” should receive modest rewards and the poorer performers, the “C players”, should be dismissed.

Not surprisingly McKinsey’s imperatives have been controversial. The widespread influence of the “War for Talent” attracted unfavourable critical interest. Brown, Lauder and Ashton were politely sceptical about its merits in “The Global Auction”. However, they were more interested in its influence than its reliability. Malcolm Gladwell savaged The War for Talent in a long, persuasive article in the New Yorker (which incidentally is well worth reading just for its discussion of anti-submarine tactics in the Second World War). Gladwell made much of the fact that a prize exemplar of the McKinsey approach was one of its most prominent clients, Enron. The freedom and reckless over-promotion that Enron awarded “the smartest guys in the room” were significant factors in Enron’s collapse. The thrust of Gladwell’s argument resonated with my experience of big corporations; when they thrive it is not because of untrammelled individual brilliance, but because they create the right environment for all their employees to work together effectively.

Andrew Munro of AM Azure Consulting (a British HR consultancy) went much further than Gladwell. In “What happened to The War For Talent exemplars?” (PDF, opens in new tab) he analysed the companies cited in The War for Talent and their subsequent experience. Not only did they seem to be selected initially for no other reason than being McKinsey clients, the more praise they received in the book for their “talent management” the less likely they were to succeed over the following decade.

Munro went into considerable detail. To summarise, he argued that the McKinsey authors started with flawed premises, adopted a dodgy method, went looking for confirmation of their preconceptions, then argued that their findings were best practice and generally applicable when in reality they were the opposite.

The five imperatives don’t even seem applicable to the original sample of U.S. firms. Not only has this approach failed to work as a generic strategy; it looks like it may have had counter-productive consequences

Again, as with the idea that tacit knowledge can be codified, the credibility of the War for Talent is perhaps of secondary importance. What really matters is the influence that it has had. Not even its association with the Enron disaster has tainted it. Nevertheless, it is worth stressing that the most poorly evidenced and indeed damaging strategies can be adopted enthusiastically if they suit powerful people who will personally benefit.

This takes us back to Giblin in the early 1980s. He argued that senior managers were increasingly detached from the real work of the organisation, which they disparaged, because they were more reliant on their social skills than on knowledge of what that real work entailed. As I shall show, the dubious War for Talent, in conjunction with digital Taylorism, made a serious problem worse.

Permission to think

A natural consequence of digital Taylorism and a lauding of the most “talented” employees is that corporations are likely to segment their workforce. In the Global Auction, Brown, Lauder and Ashton saw three types of knowledge worker emerging: developers, demonstrators, and drones.

Developers include the high potentials and top performers… They represent no more than 10–15 percent of an organisation’s workforce given “permission to think” and include senior researchers, managers, and professionals.

Demonstrators are assigned to implement or execute existing knowledge, procedures, or management techniques, often through the aid of software. Much of the knowledge used by consultants, managers, teachers, nurses, technicians, and so forth is standardised or prepackaged. Indeed, although demonstrator roles may include well-qualified people, much of the focus is on effective communication with colleagues and customers.

Drones are involved in monotonous work, and they are not expected to engage their brains. Many call center or data entry jobs are classic examples, where virtually everything that one utters to customers is pre-scripted in software packages. Many of these jobs are also highly mobile as they can be standardized and digitalized.

“Permission to think”? That is an incendiary phrase to bring into a discussion of knowledge work, especially when the authors claim that only 10-15 percent of employees would be allowed such a privilege. Nevertheless, Brown, Lauder and Ashton do argue their case convincingly. This nightmarish scenario follows naturally if corporations are increasingly run by managers who see themselves as a talented elite, and they are under pressure to cut costs by outsourcing and offshoring. That requires the work to be simplified (or at least made to appear simpler) and standardised – and that is going to apply to every activity that can be packaged up, regardless of whether its skilled practitioners actually need to think. Where would testers fit into this? As demonstrators at best, in the case of test managers. The rest? They would be drones.

Empathy over competence

I could have finished this essay on the depressing possibility of testers being denied permission to think. However, digital Taylorism has another feature, or result, that reinforces the trend, with worrying implications for good testing.

As corporations attempted to digitise more knowledge and package work up into standardised processes, the value of such knowledge and work diminished. Or rather, the value that corporations placed on the people with that knowledge and experience reduced. Corporations have been attaching more value to those who have strong social skills rather than traditional technical skills. Brown, Lauder and Ashton quoted at length in The Global Auction the head of global HR at a major bank.

If you are really going to allow people to work compressed hours, work from home, then work needs to be unitised and standardised; other-wise, it can’t be. And as we keep pace with careers, we want to change; we don’t want to stay in the same job for more than 2 years max. They want to move around, have different experiences, grow their skills base so they’re more marketable. So if you’re moving people regularly, they have to be easily able to move into another role. If it’s going to take 6 months to bring them up to speed, then the business is going to suffer. So you need to be able to step into a new role and function. And our approach to that is to deeply understand the profile that you need for the role — the person profile, not the skills profile. What does this person need to have in their profile? If we look at our branch network and the individuals working at the front line with our customers, what do we need there?

We need high-end empathy; we need people who can actually step into the customers’ shoes and understand what that feels like. We need people who enjoy solving problems… so now when we recruit, we look for that high-end empathy and look for that desire to solve problems, that desire to complete things in our profiles…. we can’t teach people to be more flexible, to be more empathetic… but we can teach them the basics of banking. We’ve got core products, core processes; we can teach that quite easily. So we are recruiting against more of the behavioural stuff and teaching the skills stuff, the hard knowledge that you need for the role.

Good social skills are always helpful, indeed they are often vital. I don’t want to denigrate such skills or the people who are good at working with other people. However, there has to be a balance. Corporations require people with both, and it worries me to see them focussing on one and dismissing the other. There is a paradox here. Staff must be more empathetic, but they have to use standardised processes that do their thinking for them; they can’t act in a way that recognises that clients have special, non-standard needs. Perhaps the unspoken idea is that the good soft skills are needed to handle the clients who are getting a poor service?

I recognise this phenomenon. When I worked for a large services company I was sometimes in a position with a client where I lacked the specific skills and experience I should really have had. A manager once reassured me, “don’t worry – just use our intellectual capital, stay one day ahead of the client, and don’t let them see the material you’re relying on”. I had the reputation for giving a reassuring impression to clients. We seemed to get away with it, but I wonder how good a job we were really doing.

If empathy is more valuable than competence then that affects people throughout the corporation, regardless of how highly they are rated. Even those who are allowed to think are more likely to be hired on the basis of their social skills. They will never have to acquire deep technical skills or experience, and what matters more is how they fit in.

In their search for the most talented graduates, corporations focus on the elite universities. They say that this is the most cost-effective way of finding the best people. Effectively, they are outsourcing recruitment to universities who tend to select from a relatively narrow social class. Lauren Rivera in her 2015 book “Pedigree: How Elite Students Get Elite Jobs” quotes a banker, who explains the priority in recruiting.

A lot of this job is attitude, not aptitude… Fit is really important. You know, you will see more of your co-workers than your wife, your kids, your friends, and even your family. So you can be the smartest guy ever, but I don’t care. I need to be comfortable working every day with you, then getting stuck in an airport with you, and then going for a beer after. You need chemistry. Not only that the person is smart, but that you like him.

Rivera’s book is reviewed in informative detail by the Economist, in an article worth reading in its own right.

Unsurprisingly, recruiters tend to go for people like themselves, to hire people with “looking-glass merit” as Rivera puts it. The result, in my opinion, is a self-perpetuating cadre of managerial clones. Managers who have benefited from the dubious hero-worship of the most talented, and who have built careers in an atmosphere of digital Taylorism are unlikely to question a culture which they were hired to perpetuate, and which has subsequently enriched them.

Giblin revisited

In my first post in this series, “Sick bureaucracies and mere technicians“, I talked about Edward Giblin’s concerns in 1981 about how managers had become more concerned with running the corporate bureaucracy whilst losing technical competence, and respect for competence. I summarised his recommendations as follows.

Giblin had several suggestions about how organisations could improve matters. These focused on simplifying the management hierarchy and communication, re-establishing the link between managers and the real work and pushing more decision making further down the hierarchy. In particular, he advocated career development for managers that ensured they acquired a solid grounding in the corporation’s business before they moved into management.

34 years on we have made no progress. The trend that concerned Giblin has been reinforced by wider trends and there seems no prospect of these being reversed, at least in the foreseeable future. In my last post in this series I will discuss the implications for testing and try to offer a more optimistic conclusion.