Teachers, children, testers and leaders (2013)

Testing Planet 2020This article appeared in the March 2013 edition of Testing Planet, which is published by the wonderful Ministry of Testing, one of the most exciting developments in software testing over the last 20 years.

I’m moving this article onto my blog from my website, which will shortly be decommissioned.teachers, children, testers and leaders The article was written in January 2013. Looking at it again I see that I was starting to develop arguments I fleshed out over the next couple of years as part of the Stop 29119 campaign against the testing standard, ISO 29119.

The article

“A tester is someone who knows things can be different” – Gerald Weinberg.

Leaders aren’t necessarily people who do things, or order other people about. To me the important thing about leaders is that they enable other people to do better, whether by inspiration, by example or just by telling them how things can be different – and better. The difference between a leader and a manager is like the difference between a great teacher and, well, the driver of the school bus. Both take children places, but a teacher can take children on a journey that will transform their whole life.

My first year or so in working life after I left university was spent in a fog of confusion. I struggled to make sense of the way companies worked; I must be more stupid than I’d always thought. All these people were charging around, briskly getting stuff done, making money and keeping the world turning; they understood what they were doing and what was going on. They must be smarter than me.

Gradually it dawned on me that very many of them hadn’t a clue. They were no wiser than me. They didn’t really know what was going on either. They thought they did. They had their heads down, working hard, convinced they were contributing to company profits, or at least keeping the losses down.

The trouble was their efforts often didn’t have much to do with the objectives of the organisation, or the true goals of the users and the project in the case of IT. Being busy was confused with being useful. Few people were capable of sitting back, looking at what was going on and seeing what was valuable as opposed to mere work creation.

I saw endless cases of poor work, sloppy service and misplaced focus. I became convinced that we were all working hard doing unnecessary, and even harmful, things for users who quite rightly were distinctly ungrateful. It wasn’t a case of the end justifying the means; it was almost the reverse. The means were only loosely connected to the ends, and we were focussing obsessively on the means without realising that our efforts were doing little to help us achieve our ends.

Formal processes didn’t provide a clear route to our goal. Following the process had become the goal itself. I’m not arguing against processes; just the attitude we often bring to them, confusing the process with the destination, the map with the territory. The quote from Gerald Weinberg absolutely nails the right attitude for testers to bring to their work. There are twin meanings. Testers should know there is a difference between what people expect, or assume, and what really is. They should also know that there is a difference between what is, and what could be.

Testers usually focus on the first sort of difference; seeing the product for what it really is and comparing that to what the users and developers expected. However, the second sort of difference should follow on naturally. What could the product be? What could we be doing better?

Testers have to tell a story, to communicate not just the reality to the stakeholders, but also a glimpse of what could be. Organisations need people who can bring clear headed thinking to confusion, standing up and pointing out that something is wrong, that people are charging around doing the wrong things, that things could be better. Good testers are well suited by instinct to seeing what positive changes are possible. Communicating these possibilities, dispelling the fog, shining a light on things that others would prefer to remain in darkness; these are all things that testers can and should do. And that too is a form of leadership, every bit as much as standing up in front of the troops and giving a rousing speech.

In Hans Christian’s Andersen’s story, the Emperor’s New Clothes, who showed a glimpse of leadership? Not the emperor, not his courtiers; it was the young boy who called out the truth, that the Emperor was wearing no clothes at all. If testers are not prepared to tell it like it is, to explain why things are different from what others are pretending, to explain how they could be better then we diminish and demean our profession. Leaders do not have to be all-powerful figures. They can be anyone who makes a difference; teachers, children. Or even testers.

Quality isn’t something, it provides something (2012)

Quality isn’t something, it provides something (2012)

Testing Planet 2020This article appeared in the July 2012 edition of Testing Planet, which is published by the wonderful Ministry of Testing, one of the most exciting developments in software testing over the last 20 years.

The article was written in June 2012, but I don’t think it has dated. It’s about the way we think and work with other people.ministry of testing logo These are timeless problems. The idea behind E-prime is particularly interesting. Dispensing with the verb “to be” isn’t something to get obsessive or ideological about, but testers should be aware of the important distinction between the way something is and the way it behaves. The original article had only four references so I have checked them, converted them to hyperlinks, and changing the link to Lera Boroditsky’s paper to a link to her TED talk on the same subject.

The article

Quality isn't something, it provides somethingA few weeks ago two colleagues, who were having difficulty working together, asked me to act as peacekeeper in a tricky looking meeting in which they were going to try and sort out their working relationship. I’ll call them Tony and Paul. For various reasons they were sparking off each and creating antagonism that was damaging the whole team.

An hour’s discussion seemed to go reasonably well; Tony talking loudly and passionately, while Paul spoke calmly and softly. Just as I thought we’d reached an accommodation that would allow us all to work together Tony blurted out, “you are cold and calculating, Paul, that’s the problem”.

Paul reacted as if he’d been slapped in the face, made his excuses and left the meeting. I then spent another 20 minutes talking Tony through what had happened, before separately speaking to Paul about how we should respond.

I told Tony that if he’d wanted to make the point I’d inferred from his comments, and from the whole meeting, then he should have said “your behaviour and attitude towards me throughout this meeting, and when we work together, strike me as cold and calculating, and that makes me very uncomfortable”.

“But I meant that!”, Tony replied. Sadly, he hadn’t said that. Paul had heard the actual words and reacted to them, rather than applying the more dispassionate analysis I had used as an observer. Paul meanwhile found Tony’s exuberant volatility disconcerting, and responded to him in a very studied and measured style that unsettled Tony.

Tony committed two sins. Firstly, he didn’t acknowledge the two way nature of the problem. It should have been about how he reacted to Paul, rather than trying to dump all the responsibility onto Paul.

Secondly, he said that Paul is cold and calculating, rather than acting in a way Tony found cold, and calculating at a certain time, in certain circumstances.

I think we’d all see a huge difference between being “something”, and behaving in a “something” way at a certain time, in a certain situation. The verb “to be” gives us this problem. It can mean, and suggest, many different things and can create fog where we need clarity.

Some languages, such as Spanish, maintain a useful distinction between different forms of “to be” depending on whether one is talking about something’s identity or just a temporary attribute or state.

The way we think obviously shapes the language we speak, but increasingly scientists are becoming aware of how the language we use shapes the way that we think. [See this 2017 TED talk, “How Language Shapes Thought”, by Lera Boroditsky]

The problem we have with “to be” has great relevance to testers. I don’t just mean treating people properly, however much importance we rightly attach to working successfully with others. More than that, if we shy away from “to be” then it helps us think more carefully and constructively as testers.

This topic has stretched bigger brains than mine, in the fields of philosophy, psychology and linguistics. Just google “general semantics” if you want to give your brain a brisk workout. You might find it tough stuff, but I don’t think you have to master the underlying concept to benefit from its lessons.

Don’t think of it as intellectual navel gazing. All this deep thought has produced some fascinating results, in particular something called E-prime, a form of English that totally dispenses with “to be” in all its forms; no “I am”, “it is”, or “you are”. Users of E-prime don’t simply replace the verb with an alternative. That doesn’t work. It forces you to think and articulate more clearly what you want to say. [See this classic paper by Kellogg, “Speaking in E-prime” PDF, opens in new tab].

“The banana is yellow” becomes “the banana looks yellow”, which starts to change the meaning. “Banana” and “yellow” are not synonyms. The banana’s yellowness becomes apparent only because I am looking at it, and once we introduce the observer we can acknowledge that the banana appears yellow to us now. Tomorrow the banana might appear brown to me as it ripens. Last week it would have looked green.

You probably wouldn’t disagree with any of that, but you might regard it as a bit abstract and pointless. However, shunning “to be” helps us to think more clearly about the products we test, and the information that we report. E-prime therefore has great practical benefits.

The classic definition of software quality came from Gerald Weinburg in his book “Quality Software Management: Systems Thinking”.

“Quality is value to some person”.

Weinburg’s definition reflects some of the clarity of thought that E-prime requires, though he has watered it down somewhat to produce a snappy aphorism. The definition needs to go further, and “is” has to go!

Weinburg makes the crucial point that we must not regard quality as some intrinsic, absolute attribute. It arises from the value it provides to some person. Once you start thinking along those lines you naturally move on to realising that quality provides value to some person, at some moment in time, in a certain context.

Thinking and communicating in E-prime stops us making sweeping, absolute statements. We can’t say “this feature is confusing”. We have to use a more valuable construction such as “this feature confused me”. But we’re just starting. Once we drop the final, total condemnation of saying the feature is confusing, and admit our own involvement, it becomes more natural to think about and explain the reasons. “This feature confused me … when I did … because of …”.

Making the observer, the time and the context explicit help us by limiting or exposing hidden assumptions. We might or might not find these assumptions valid, but we need to test them, and we need to know about them so we understand what we are really learning as we test the product.

E-prime fits neatly with the scientific method and with the provisional and experimental nature of good testing. Results aren’t true or false. The evidence we gather matches our hypothesis, and therefore gives us greater confidence in our knowledge of the product, or it fails to match up and makes us reconsider what we thought we knew. [See this classic paper by Kellogg & Bourland, “Working with E-prime – some practical notes” PDF, opens in new tab].

Scientific method cannot be accommodated in traditional script-driven testing, which reflects a linear, binary, illusory worldview, pretending to be absolute. It tries to deal in right and wrong, pass and fail, true and false. Such an approach fits in neatly with traditional development techniques which fetishise the rigours of project management, rather than the rigours of the scientific method.

map and road This takes us back to general semantics, which coined the well known maxim that the map is not the territory. Reality and our attempts to model and describe it differ fundamentally from each other. We must not confuse them. Traditional techniques fail largely because they confuse the map with the territory. [See this “Less Wrong” blog post].

In attempting to navigate their way through a complex landscape, exponents of traditional techniques seek the comfort of a map that turns messy, confusing reality into something they can understand and that offers the illusion of being manageable. However, they are managing the process, not the underlying real work. The plan is not the work. The requirements specification is not the requirements. The map is not the territory.

Adopting E-prime in our thinking and communication will probably just make us look like the pedantic awkward squad on a traditional project. But on agile or lean developments E-prime comes into its own. Testers must contribute constructively, constantly, and above all, early. E-prime helps us in all of this. It makes us clarify our thoughts and helps us understand that we gain knowledge provisionally, incrementally and never with absolute certainty.

I was not consciously deploying E-prime during and after the fractious meeting I described earlier. But I had absorbed the precepts sufficiently to instinctively realise that I had two problems; Tony’s response to Paul’s behaviour, and Paul’s response to Tony’s outburst. I really didn’t see it as a matter of “uh oh – Tony is stupid”.

E-prime purists will look askance at my failure to eliminate all forms of “to be” in this article. I checked my writing to ensure that I’ve written what I meant to, and said only what I can justify. Question your use of the verb, and weed out those hidden assumptions and sweeping, absolute statements that close down thought, rather than opening it up. Don’t think you have to be obsessive about it. As far as I am concerned, that would be silly!

An abdication of managerial responsibility?

An abdication of managerial responsibility?

The two recent Boeing 737 MAX crashes have been grimly absorbing for software developers and testers. It seems that the crashes were caused by the MCAS system, which should prevent a stall, responding to false data from a sensor by forcing the planes into steep dives despite the attempts of the pilots to make the planes climb. The MCAS problem may have been a necessary condition for disaster, but it clearly was not sufficient. There were many other factors involved. Most strikingly, it seems that MCAS itself may have been working as specified but there were problems in the original design and the way it interfaces with the sensor and crew.

I have no wish to go into all this in serious detail (yet), but I read an article on the Bloomberg website, “Boeing’s 737 Max software outsourced to $9-an-hour engineers” which contained many sentences and phrases that jumped off the screen at me. These snippets all point towards issues that concern me, that I’ve been talking and writing about recently, or that I’ve been long aware of. I’d like to run through them. I’ll use a brief quote from the Bloomberg article in each section before discussing the implications. All software designers and testers should reflect on these issues.

The commoditization of software development and testing

“Boeing has also expanded a design center in Moscow. At a meeting with a chief 787 engineer in 2008, one staffer complained about sending drawings back to a team in Russia 18 times before they understood that the smoke detectors needed to be connected to the electrical system, said Cynthia Cole, a former Boeing engineer who headed the engineers’ union from 2006 to 2010.

‘Engineering started becoming a commodity’, said Vance Hilderman, who co-founded a company called TekSci that supplied aerospace contract engineers and began losing work to overseas competitors in the early 2000s.”

The threat of testing becoming a commodity has been a long standing concern amongst testers. To a large extent we’re already there. However, I’d assumed, naively perhaps, that this was a route chosen by organisations that could get away with poor testing, in the short term at least. I was deeply concerned to see it happening in a safety critical industry.

To summarise the problem, if software development and testing are seen as commodities, bought and sold on the basis of price, then commercial pressures will push quality downwards. The inevitable pressure sends cost and prices spiralling down to the level set by the lowest cost supplier, regardless of value. Testing is particularly vulnerable. When the value of the testing is low then whatever cost does remain becomes more visible and harder to justify.

There is pressure to keep reducing costs, and if you’re getting little value from testing just about any cost-cutting measure is going to look attractive. If you head down the route of outsourcing, offshoring and increasing commoditization, losing sight of value, you will lock yourself into a vicious circle of poor quality.

Iain McCowatt’s EuroSTAR webinar on “The commoditization of testing” is worth watching.

ETTO – the efficiency-thoroughness trade-off

…the planemakers say global design teams add efficiency as they work around the clock.

Ah! There we have it! Efficiency. Isn’t that a good thing? Of course it is. But there is an inescapable trade-off, and organisations must understand what they are doing. There is a tension between the need to deliver a safe, reliable product or service, and the pressure to do so at the lowest cost possible. The idea of ETTO, the efficiency-thoroughness trade-off was was popularised by Erik Hollnagel.

Making the organisation more efficient means it is less likely to achieve its important goals. Pursuing vital goals, such as safety, comes at the expense of efficiency, which eliminates margins of error and engineering redundancy, with potentially dangerous results. This is well recognised in safety critical industries, obviously including air transport. I’ve discussed this further in my blog, “The dragons of the unknown; part 6 – Safety II, a new way of looking at safety”.

Drift into failure

“’Boeing was doing all kinds of things, everything you can imagine, to reduce cost, including moving work from Puget Sound, because we’d become very expensive here,’ said Rick Ludtke, a former Boeing flight controls engineer laid off in 2017. ‘All that’s very understandable if you think of it from a business perspective. Slowly over time it appears that’s eroded the ability for Puget Sound designers to design.’”

“Slowly over time”. That’s the crucial phrase. Organisations drift gradually into failure. People are working under pressure, constantly making the trade off between efficiency and thoroughness. They keep the show on the road, but the pressure never eases. So margins are increasingly shaved. The organisation finds new and apparently smarter ways of working. Redundancy is eliminated. The workers adapt the official processes. The organisation seems efficient, profitable and safe. Then BANG! Suddenly it isn’t. The factors that had made it successful turn out to be responsible for disaster.

“Drifting into failure” is an important concept to understand for anyone working with complex systems that people will have to use, and for anyone trying to make sense of how big organisations should work, and really do work. See my blog “The dragons of the unknown; part 4 – a brief history of accident models” for a quick introduction to the drift into failure. The idea was developed by Sidney Dekker. Check out his work.

Conway’s Law

“But outsourcing has long been a sore point for some Boeing engineers, who, in addition to fearing job losses say it has led to communications issues and mistakes.

This takes me to one of my favourites, Conway’s Law. In essence it states that the design of systems corresponds to the design of the organisation. It’s not a normative rule, saying that this should (or shouldn’t) happen. It merely says that it generally does happen. Traditionally the organisation’s design shaped the technology. Nowadays the causation might be reversed, with the technology shaping the organisation. Conway’s Law was intended as a sound heuristic, never a hard and fast rule.

Conway's Law

a slide from one of my courses

Perhaps it is less generally applicable today, but for large, long established corporations I think it still generally holds true.

I’m going to let you in on a little trade secret of IT auditors. Conway’s Law was a huge influence on the way we audited systems and development projects.

corollary to Conway's Law

another slide from one of my courses

Audits were always strictly time boxed. We had to be selective in how we used our time and what we looked at. Modern internal auditing is risk based, meaning we would focus on the risks that posed the greatest threat to the organisation, concentrating on the areas most exposed to risk and looking for assurance that the risks were being managed effectively.

Conway’s Law guided the auditors towards low hanging fruit. We knew that we were most likely to find problems at the interfaces, and these were likely to be particularly serious. This was also my experience as a test manager. In both jobs I saw the same pattern unfold when different development teams, or different companies worked on different parts of a project.

Development teams would be locked into their delivery schedule before the high level requirements were clear and complete, or even mutually consistent. The different teams, especially if they were in different companies, based their estimates on assumptions that were flawed, or inconsistent with other teams’ assumptions. Under pressure to reduce estimates and delivery quickly each team might assume they’d be able to do the minimum necessary, especially at the interfaces; other teams would pick up the trickier stuff.

This would create gaps at the interfaces, and cries of “but I thought you were going to do that – we can’t possibly cope in time”. Or the data that was passed from one suite couldn’t be processed by the next one. Both might have been built correctly to their separate specs, but they weren’t consistent. The result would be last minute solutions, hastily lashed together, with inevitable bugs and unforeseen problems down the line – ready to be exposed by the auditors.

Splitting the work across continents and suppliers always creates big management problems. You have to be prepared for these. The additional co-ordination, chasing, reporting and monitoring takes a lot of effort. This all poses big problems for test managers, who have to be strong, perceptive and persuasive to ensure that the testing is planned consistently across the whole solution, rather than being segmented and bolted together at a late stage for cursory integration testing.

Outsourcing and global teams don’t provide a quick fix. Without strong management and a keen awareness of the risks it’s a sure way to let serious problems slip through into production. Surely safety critical industries would be smarter, more responsible? I learned all this back in the 1990s. It’s not new, and when I read Bloomberg’s account of Boeing’s engineering practices I swore, quietly and angrily.


“During the crashes of Lion Air and Ethiopian Airlines planes that killed 346 people, investigators suspect, the MCAS system pushed the planes into uncontrollable dives because of bad data from a single sensor.

That design violated basic principles of redundancy for generations of Boeing engineers, and the company apparently never tested to see how the software would respond, Lemme said. ‘It was a stunning fail,’ he said. ‘A lot of people should have thought of this problem – not one person – and asked about it.’

So the consequences of commoditization, ETTO, the drift into failure and complacency about developing and testing complex, safety critical systems with global teams all came together disastrously in the Lion Air and Ehtiopian Airlines crashes.

A lot of people should certainly have thought of this problem. As a former IT auditor I thought of this passage by Norman Marks, a distinguished commentator on auditing. Writing about risk-based auditing he said;

A jaw-dropping moment happened when I explained my risk assessment and audit plan to the audit committee of the oil company where I was CAE (Tosco Corp.). The CEO asked whether I had considered risks relating to the blending of gasoline, diesel, and jet fuel.

As it happened, I had — but it was not considered high risk; it was more a compliance issue than anything else. But, when I talked to the company’s executives I heard that when Exxon performed an enterprise-wide risk assessment, this area had been identified as their #1 risk!

Poorly-blended jet fuel could lead to Boeing 747s dropping out of the sky into densely-packed urban areas — with the potential to bankrupt the largest (at that time) company in the world. A few years later, I saw the effect of poor blending of diesel fuel when Southern California drivers had major problems and fingers were pointed at us as well as a few other oil companies.

In training courses, when I’ve been talking about the big risks that keep the top management awake at night I’ve used this very example; planes crashing. In big corporations it’s easy for busy people to obsess about the smaller risks, those that delay projects, waste money, or disrupt day to day work. These problems hit us all the time. Disasters happen rarely and we can lose sight of the way the organisation is drifting into catastrophic failure.

That’s where auditors, and I believe testers too, come in. They should be thinking about these big risks. In the case of Boeing the engineers, developers and testers should have spoken out about the problems. The internal auditors should certainly have been looking out for it, and these are the people who have the organisational independence and power to object. They have to be listened to.

An abdication of management responsibility?

“Boeing also has disclosed that it learned soon after Max deliveries began in 2017 that a warning light that might have alerted crews to the issue with the sensor wasn’t installed correctly in the flight-display software. A Boeing statement in May, explaining why the company didn’t inform regulators at the time, said engineers had determined it wasn’t a safety issue.

‘Senior company leadership,’ the statement added, ‘was not involved in the review.’”

Senior management was not involved in the review. Doubtless there are a host of reasons why they were not involved. The bottom line, however, is that it was their responsibility. I spent six years as an IT auditor. In that time only one of my audits led to the group’s chief auditor using that nuclear phrase, which incidentally was not directed at IT management. A very senior executive was accused of “abdicating managerial responsibility”. The result was a spectacular display of bad temper and attempted intimidation of the auditors. We didn’t back down. That controversy related to shady behaviour at a subsidiary where the IT systems were being abused and frauds had become routine. It hardly compared to a management culture that led to hundreds of avoidable deaths.

One of the core tenets of Safety II, the new way of looking at safety, is that there is never a single, root cause for failure in complex systems. There are always multiple causes, all of them necessary, but none of them sufficient, on their own, for disaster. The Boeing 737-MAX case bears that out. No one person was responsible. No single act led to disaster. The fault lies with the corporate culture as a whole, with a culture of leadership that abdicated responsibility, that “wasn’t involved”.

David Graeber’s “The Utopia of Rules: On Technology, Stupidity and the Secret Joys of Bureaucracy”

When I gave my talk at CAST 2014 in New York, “Standards – promoting quality or restricting competition?” I was concentrating on the economic aspects of standards. They are often valuable, but they can be damaging and restrict competition if they are misused. A few months later I bought “The Utopia of Rules: On Technology, Stupidity, and the Secret Joys of Bureaucracy” by David Graeber, Professor of Anthropology at the London School of Economics. I was familiar with Graeber as a challenging and insightful writer. I drew on his work when I wrote “Testing: valuable or bullshit?“. The Utopia of Rules also inspired the blog article I wrote recently, “Frozen in time – grammar and testing standards” in which I discussed the similarity between grammar textbooks and standards, which both codify old usages and practices that no longer match the modern world.

What I hadn’t expected from The Utopia of Rules was how strongly it would support the arguments I made at CAST.

Certification and credentialism

Graeber makes the same argument I deployed against certification. It is being used increasingly to enrich special interests without benefiting society. On page 23 Graeber writes:

Almost every endeavor that used to be considered an art (best learned through doing) now requires formal professional training and a certificate of completion… In some cases, these new training requirements can only be described as outright scams, as when lenders, and those prepared to set up the training programs, jointly lobby the government to insist that, say, all pharmacists be henceforth required to pass some additional qualifying examination, forcing thousands already practicing the profession into night school, which these pharmacists know many will only be able to afford with the help of high-interest student loans. By doing this, lenders are in effect legislating themselves a cut of most pharmacists’ subsequent incomes.

To be clear, my stance on ISTQB training is that it educates testers in a legitimate, though very limited, vision of testing. My objection is to any marketing of the qualification as a certification of testing ability, rather than confirmation that the tester has passed an exam associated with a particular training course. I object even more strongly to any argument that possession of the certificate should be a requirement for employment, or for contracting out testing services. It is reasonable to talk of scams when the ability of good testers to earn a living is damaged.

What is the point of it all?

Graeber has interesting insights into how bureaucrats can be vague about the values of the bureaucracy: why does the organisation exist? Bureaucrats focus on efficient execution of rational processes, but what is the point of it all? Often the means become the ends: efficiency is an end in itself.

I didn’t argue that point at CAST, but I have done so many times in other talks and articles (e.g. “Teddy bear methods“). If people are doing a difficult, stressful job and you give them prescriptive methods, processes or standards then they will focus on ticking their way down the list. The end towards which they are working becomes compliance with the process, rather than helping the organisation reach its goal. They see their job as producing the outputs from the process, rather than the outcomes the stakeholders want. I gave a talk in London in June 2015 to the British Computer Society’s Special Interest Group in Software Testing in which I argued that testing lacks guiding principles (PDF, opens in a new tab) and ISO 29119 in particular does not offer clear guidance about the purpose of testing.

In a related argument Graeber makes a point that will be familiar to those who have criticised the misuse of testing metrics.

…from inside the system, the algorithms and mathematical formulae by which the world comes to be assessed become, ultimately, not just measures of value, but the source of value itself.

Rent extraction

The most controversial part of my CAST talk was my argument that the pressure to adopt testing standards was entirely consistent with rent seeking in economic theory. Rent seeking, or rent extraction, is what people do when they exploit failings in the market, or rig the market for their own benefit by lobbying for regulation that happens to benefit them. Instead of creating wealth, they take it from other people in a way that is legal, but which is detrimental to the economy, and society, as a whole.

This argument riled some people who took it as a personal attack on their integrity. I’m not going to dwell on that point. I meant no personal slur. Rent seeking is just a feature of modern economies. Saying so is merely being realistic. David Graeber argued the point even more strongly.

The process of financialization has meant that an ever-increasing proportion of corporate profits come in the form of rent extraction of one sort or another. Since this is ultimately little more than legalized extortion, it is accompanied by ever-increasing accumulation of rules and regulations… At the same time, some of the profits from rent extraction are recycled to select portions of the professional classes, or to create new cadres of paper-pushing corporate bureaucrats. This helps a phenomenon I have written about elsewhere: the continual growth, in recent decades, of apparently meaningless, make-work, “bullshit jobs” — strategic vision coordinators, human resources consultants, legal analysts, and the like — despite the fact that even those who hold such positions are half the time secretly convinced they contribute nothing to the enterprise.

In 2014 I wrote about “bullshit jobs“, prompted partly by one of Graeber’s articles. It’s an important point. It is vital that testers define their job so that it offers real value, and they are not merely bullshit functionaries of the corporate bureaucracy.

Utopian bureaucracies

I have believed for a long time that adopting highly prescriptive methods or standards for software development and testing places unfair pressure on people, who are set up to fail. Graeber makes exactly the same point.

Bureaucracies public and private appear — for whatever historical reasons — to be organized in such a way as to guarantee that a significant proportion of actors will not be able to perform their tasks as expected. It’s in this sense that I’ve said one can fairly say that bureaucracies are utopian forms of organization. After all, is this not what we always say of utopians: that they have a naïve faith in the perfectibility of human nature and refuse to deal with humans as they actually are? Which is, are we not also told, what leads them to set impossible standards and then blame the individuals for not living up to them? But in fact all bureaucracies do this, insofar as they set demands they insist are reasonable, and then, on discovering that they are not reasonable (since a significant number of people will always be unable to perform as expected), conclude that the problem is not with the demands themselves but with the individual inadequacy of each particular human being who fails to live up to them.

Testing standards such as ISO 29119, and its predecessor IEEE 829, don’t reflect what developers and testers do, or rather should be doing. They are at odds with the way people think and work in organisations. These standards attempt to represent a highly complex, sometimes chaotic, process in a defined, repeatable model. The end product is usually of dubious quality, late and over budget. Any review of the development will find constant deviations from the standard. The suppliers, and defenders, of the standard can then breathe a sigh of relief. The sacred standard was not followed. It was the team’s fault. If only they’d done it by the book! The possibility that the developers’ and testers’ apparent sins were the only reason anything was produced at all is never considered. This is a dreadful way to treat people, but in many organisations it has been normal for several decades.

Loss of communication

All of the previous arguments by Graeber were entirely consistent with my own thoughts about how corporate bureaucracies operate. It was fascinating to see an anthropologist’s perspective, but it didnt teach me anything that was really new about how testers work in corporations. However, later in the book Graeber developed two arguments that gave me new insights.

Understanding what is happening in a complex, social situation needs effective two way communication. This requires effort, “interpretive labor”. The greater the degree of compulsion, and the greater the bureaucratic regime of rules and forms, the less need there is for such two way communication. Those who can simply issue orders that must be obeyed don’t have to take the trouble to understand the complexities of the situation they’re managing.

…within relations of domination, it is generally the subordinates who are effectively relegated the work of understanding how the social relations in question really work. … It’s those who do not have the power to hire and fire who are left with the work of figuring out what actually did go wrong so as to make sure it doesn’t happen again.

This ties in with the previous argument about utopian bureaucracies. If you impose a inappropriate standard then poor results will be attributed to the inevitable failure to comply. There is no need for senior managers to understand more, and no need to listen to the complaints, the “excuses”, of the people who do understand what is happening. Interestingly, Graeber’s argument about interpretive labor is is consistent with regulatory theory. Good regulation of complex situations requires ongoing communication between the regulator and the regulated. I explained this in the talk on testing principles I mentioned above (slides 38 and 39).

Fear of play

My second new insight from Graeber arrived when he discussed the nature of play and how it relates to bureaucracies. Anthropologists try to maintain a distinction between games and play, a distinction that is easier to maintain in English than in languages like French and German, which use the same word for both. A game has boundaries, set rules and a predetermined conclusion. Play is more free-form and creative. Novelties and surprising results emerge from the act of playing. It is a random, unpredictable and potentially destructive activity. Graeber finishes his discussion of play and games with the striking observation.

What ultimately lies behind the appeal of bureaucracy is fear of play.

Put simply, and rather simplistically, Graeber means that we use bureaucracy to escape the terror of chaotic reality, to bring a semblance (an illusion?) of control to the uncontrollable.

This gave me an tantalising new insight into the reasons people build bureaucratic regimes in organisations. It sent me off into a whole new field of reading on the anthropology of games and play. This has fascinating implications for the debate about standards and testing. We shy away from play, but it is through play that we learn. I don’t have time now to do the topic justice, and it’s much too big and important a subject to be tacked on to the end of this article, but I will return to it. It is yet another example of the way anthropology can help us understand what we are doing as testers. As a starting point I can heartily recommend David Graeber’s book, “The Utopia of Rules”.

A single source of truth?

Lately in a chatroom for the International Society for Software Testing there has been some discussion about the idea of a “single source of truth”. I’m familiar with this in the sense of database design. Every piece of data is stored once and the design precludes the possibility of inconsistency, of alternative versions of the same data. That makes sense in this narrow context, but the discussion revealed that the phrase is now being used in a different sense. A single source of truth has been used to describe an oracle of oracles, an ultimate specification on which total reliance can be placed. The implications worry me, especially for financial systems, which is my background.

I’m not comfortable with a single source of truth, especially when it applies to things like bank balances, profit and loss figures, or indeed any non-trivial result of calculations. What might make more sense is to talk of a single statement of truth, and that statement could, and should, have multiple sources so the statement is transparent and can be validated. However, I still wouldn’t want to talk about truth in financial statements. For an insurance premium there are various different measures, which have different uses to different people at different times. When people start talking about a single, true, premium figure they are closing off their minds to reality and trying to redefine it to suit their limited vision.

All of these competing measures could be regarded as true in the right context, but there are other measures which are less defensible and which an expert would consider wrong, or misleading, in any context (eg lumping Insurance Premium Tax into the premium figure). That’s all quite aside from the question of whether these measures are accurate on their own terms.

A “single source of truth” reminds me of arguments I’d have with application designers. Sometimes the problem would be that they wanted to eliminate any redundancy in the design. That could make reconciliation and error detection much harder because the opportunities to spot errors would be reduced. If a calculation was wrong it might stay wrong because no-one would know. A different source of friction was the age old problem of analysts and designers determined to stick rigidly to the requirements without questioning them, or even really thinking about the implications. I suspect I was regarded as a pedantic nuisance, creating problems in places the designers were determined no problems could ever exist – or ever be visible.

Accounting for truth

Conventional financial accounting is based on double entry book-keeping, which requires every transaction to be entered twice, in different places so that the accounts as a whole remain in balance. There may be a single, definitive statement of profit, but that is distilled from multiple sources, with an intricate web of balances and documented, supporting assumptions. The whole thing is therefore verifiable, or auditable. But it’s not truth. It’s more a matter of saying “given these assumptions this set of transactions produces the following profit figure”. Vary the assumptions and you have a different and perhaps equally valid figure – so it’s not truth.

For many years academic accountants, e.g. Christopher Napier, have been doing fascinating work that strays over into philosophy. What is this reality that we are trying to understand? That’s ontology. What can we know about it, and what reliance can we put on that knowledge when we try to report it? That’s epistemology. Why are we doing it? That’s teleology.

The most interesting subject I ever studied in accountancy at university was the problem of inflation accounting. £6-£5=£1 might be a crude profit calculation for an item whose inputs cost you £5 and which you sold for £6. But what if the £5 was a cost incurred 11 months ago. You then buy replacement inputs, which now cost £7, but you’d still only be able to sell the finished product for £6. What does it mean to say you made a profit of £1? Who does that help? Couldn’t you also argue that you made a loss of £1?

What does it mean to add money together when the different elements were captured at dates when the purchasing power equivalent of that money was different? You’re adding apples and oranges. The value of money is dependent on what it can buy. Setting aside short term speculation that is what dictates currency exchange rates. £1 is more valuable than €1 because it buys more. It is meaningless to add £1 + €1 and get 2. An individual currency has different values over time, so is it any more meaningful to add different monetary figures without considering what their value was at the time the data was captured?

The academics pointed out all the problems inflation caused and came up with possible, complicated solutions. However, the profession eventually decided it was all just too difficult and pretty much gave up, except for an international standard for accounting in countries experiencing hyper-inflation (defined as greater than 100% over three years, i.e. a persisting annual rate of at least 26%). As at the end of 2014 the qualifying countries are Belarus, Venezuela, Sudan, Iran and Syria (which has rather more to worry about than financial accounting). For the rest of the world, if you want to add 5 apples and 6 oranges, that’s fine. You’ve now got 11 pieces of fruit. Stop worrying and just do the job.

I’m the treasurer for a church, and I’m often asked how much money we’ve got. I never bother going to the online bank statement, because I know that what people really want to know is how much money is available. So I use the church accounts, which factor in the income and payments that haven’t been cleared, and the money we’re due imminently, and the outgoings to which we’re already committed. These different figures all mesh together and provide a figure that we find useful, but which is different from the bank’s view of our balance. Our own accounts never rely on a single source of truth. There are multiple reconciliation checks to try and flag up errors. The hope is that inputting an incorrect amount will generate a visible error. We’re not reporting truth. All we can say is, so far as we know this is as useful and honest a statement of our finances as we can produce for our purposes, for the Church of Scotland, the Office of the Scottish Charity Regulator and the other stakeholders.

It’s messy and complex – deal with it

What’s it all got to do with testing? If your vision of testing is checking whether the apparent functionality is consistent with the specification as represented in the test script then this sort of messy complexity is a tedious distraction. It’s so much easier to pretend you can confirm the truth using a test script.

However, testing is (or should be) a difficult and intellectually demanding process of teasing out the implications of the application for the stakeholders. If you accept that, then you are far more likely to do something valuable if you stop thinking about any single source of truth. You should be thinking instead about possible sources of insight to help you shed light on the various “truths” that the various stakeholders are seeking. Understanding these different needs, and all the nuances that arise from them is essential for testers.

Assuming that there is a single truth that we can attest to with a simple, binary yes/no answer reduces testing to the level of the accountants who have tried to treat accountancy as a simple arithmetical exercise. Five oranges and six apples add up to eleven pieces of fruit; and so do eleven grapes, and eleven melons. So what? That is a useless and misleading piece of information, like the unqualified statement that the product is sound because we found what the script told us to look for. Testers, accountants and auditors all pick up good money because they are required to provide valuable information to people who need it. They should be expected to deal with messy, complex reality. They should not be allowed to get away with trying to redefine reality so it’s easier to handle.

They can’t handle the truth

Have you ever had to deal with managers or users who were sceptical about the time and effort a piece of work would take? Have you ever complained in vain about a project that was clearly doomed to fail right from the start? Have you ever felt that a project was being planned on the basis of totally unjustified optimism?

If you’ve been in IT for a while there’s a good chance you’ve answered “yes” to at least one of these questions. Over the years I grew wearily familiar with the pattern of willful refusal to consider anything but the happy path to a smooth, speedy delivery of everything on the wish list, within a comical budget that is challenging I admit, but realistic if we all pull together.

Over time I gradually came to realise that many senior managers and stakeholders didn’t want the truth. They want the fiction, to be lied to because knowing the truth would make them responsible for dealing with it. In their world it is better to be deceived and then firefight a failing project than to deal honestly with likely problems and uncertainty. Above all, they can’t bring themselves to deal with the truth of uncertainty. It is far more comfortable to pretend that uncertainty is evidence of lack of competence, that problems can be anticipated, that risks can be ignored or managed out of existence, that complexity can be eliminated by planning and documentation (and by standards).

Telling the truth – a brave act in an unfair world

Perhaps the toughest roles in IT are those that are senior enough to be accountable for the results, but too junior to beat uncomfortable truths into the brains of those who really don’t want to know.

These budding fall guys have the nous and experience to see what is going to happen. One of the rarely acknowledged skills of these battle scarred veterans is the ability to judge the right moment and right way to start shouting the truth loudly. Reveal all too early and they can be written off as negative, defeatist, “not a team player”. Reveal it too late and they will be castigated for covering up imminent failure, and failing to comply with some standard or process. Everyone fails to comply. Not everyone is going to be kicked for it, but late deliverers of bad news are dead meat.

Of course that’s not fair, but that’s hardly the point. Fairness isn’t relevant if the culture is one where rationality, prudence and pragmatism all lead to crazy behaviour because that is what is rewarded. People rationally adapt to the requirement to stop thinking when they see others being punished for honesty and insight.

What is an estimate?

So what’s the answer? The easy one is, “run, and run fast”. Get out and find a healthier culture. However, if you’re staying then you have to deal with the problem of handling senior people who can’t handle the truth.

It is important to be clear in your own mind about what you are being asked for when you have to estimate. Is it a quote? Is there an implied instruction that something must be delivered by a certain date? Are there certain deliverables that are needed by that date, and others that can wait? Could it be a starting point for negotiation? See this article I wrote a few years ago.

Honesty is non-negotiable

It’s a personal stance, but honesty about uncertainty and the likelihood of serious but unforeseeable problems is non-negotiable. I know others have thought I have a rather casual attitude towards job security and contract renewal! However, I can’t stomach the idea of lingering for years in an unhealthy culture. And it’s not as if honesty means telling the senior guys who don’t want the truth that they are morons (even if they are).

Honesty requires clear thinking, and careful explanation of doubt and uncertainty. It means being a good communicator, so that the guys who take the big decisions have a better understanding that your problems will quickly become their problems. It requires careful gathering of relevant information if you are ordered into a project death march so that you can present a compelling case for a rethink when there might still be time for the senior managers and stakeholders to save face. Having the savvy to help the deliberately ignorant to handle the truth really is a valuable skill. Perhaps Jack Nicholson’s character from “A Few Good Men” isn’t such a great role model, however. His honesty in that memorable scene resulted in him being arrested!

Why do you need the report?

Have you ever wondered what the purpose of a report was, whether it was a status report that you had to complete, or a report generated by an application? You may have wondered if there was any real need for the report, and whether anyone would miss it if no-one bothered to produce it.

I have come across countless examples of reports that seemed pointless. What was worse, their existence shaped the job we had to do. The reports did not help people to do the job. They dictated how we worked; production, checking and filing of the reports for future inspection were a fundamental part of the job. In any review of the project, or of our our performance, they were key evidence.

My concern, and cynicism, were sharpened by an experience as an auditor when I saw at first hand how a set of reports were defined for a large insurance company. To misquote Otto von Bismarck’s comment on the creation of laws; reports are like sausages, it is best not to see them being made.

The company was developing a new access controls system, to allow managers to assign access rights and privileges to staff who were using the various underwriting, claims and accounts applications. As an auditor I was a stakeholder, helping to shape the requirements and advising on the controls that might be needed and on possible weaknesses that should be avoided.

One day I was approached by the project manager and a user from the department that defined the working practices at the hundred or so branch offices around the UK and Republic of Ireland. “What control reports should the access control system provide?” was their question.

I said that was not my decision. The reports could not be treated as a bolt on addition to the system. They should not be specified by auditors. The application should provide managers with the information they needed to do their jobs, and if it wasn’t feasible to do that in real time, then reports should be run off to help them. It all depended on what managers needed, and that depended on their responsibilities for managing access. The others were unconvinced by my answer.

A few weeks later the request for me to specify a suite of reports was repeated. Again I declined. This time the matter was escalated. The manager of the branch operations department sat in on the meeting. He made it clear that a suite of reports must be defined and coded by the end of the month, ready for the application to go live.

He was incredulous that I, as an auditor, would not specify the reports. His reasoning was that when auditors visited branches they would presumably check to see whether the reports had been signed and filed. I explained that it was the job of his department to define the jobs and responsibilities of the branch managers, and to decide what reports these managers would need in order to fulfill their responsibilities and do their job.

The manager said that was easy; it was the responsibility of the branch managers to look at the reports, take action if necessary, then sign the reports and file them. That was absurd. I tried to explain that this was all back to front. At the risk of stating the obvious, I pointed out that reports were required only if there was a need for them. That need had to be identified so that the right reports could be produced.

I was dismissed as a troublesome timewaster. The project manager was ordered to produce a suite of reports, “whatever you think would be useful”. The resulting reports were simply clones of the reports that came out from an older access control system, designed for a different technical and office environment, with quite different working practices.

The branch managers were then ordered to check them and file them. The branch operations manager had taken decisive action. The deadline was met. Everyone was happy, except of course the poor branch managers who had to wade through useless reports, and the auditors of course. We were dismayed at the inefficiency and sheer pointlessness of producing reports without any thought about what their purpose was.

That highlighted one of the weaknesses of auditors. People invariably listened to us if we pointed out that something important wasn’t being done. When we said that something pointless was being done there was usually reluctance to stop it.

Anything that people have got used to doing, even if it is wasteful, ineffective and inefficient, acquires its own justification over time. The corporate mindset can be “this is what we do, this is how we do it”. The purpose of the corporate bureaucracy becomes the smooth running of the bureaucracy. Checking reports was a part of a branch manager’s job. It required a mental leap to shift to a position where you have to think whether reports are required, and what useful reporting might comprise. It’s so much easier to snap, “just give us something useful” and move on. That’s decisive management. That’s what’s rewarded. Thinking? Sadly, that can be regard as a self-indulgent, waste of time.

However, few things are more genuinely wasteful of the valuable time of well paid employees than reporting that has no intrinsic value. Reporting that forces us to adapt our work to fit the preconceptions of the report designer gobbles up huge amounts of time and stop us doing work that could be genuinely valuable. The preconceptions that underpin many reports and metrics may once have been justified, and have fitted in with contemporary working practices. However, these preconceptions need to be constantly challenged and re-assessed. Reports and metrics do shape the way we work, and the way we are assessed. So we need to keep asking, “just why do you need the report?”

DRE: changing reality so we can count it

It’s usually true that our attitudes and beliefs are shaped by our early experiences. That applies to my views on software development and testing. My first experience of real responsibility in development and testing was with insurance financial systems. What I learned and experienced will always remain with me. I have always struggled with some of the tenets of traditional testing, and in particular the metrics that are often used.

There has been some recent discussion on Twitter about Defect Removal Efficiency. It was John Stephenson’s blog that set me thinking once again about DRE, a metric I’d long since consigned to my mental dustbin.

If you’re unfamiliar with the metric it is the number of defects found before implementation expressed as a percentage of all the defects discovered within a certain period of going live (i.e live defects plus development defects). The cut off is usually 90 days from implementation. So the more defects reported in testing and the fewer in live running the higher the percentage, and the higher the quality (supposedly). A perfect application would have no live defects and therefore a DRE score of 100%; all defects were found in testing.

John’s point was essentially that DRE can be gamed so easily that it is worthless. I agree. However, even if testers and developers tried not to manipulate DRE, even if it couldn’t be gamed at all it would still be an unhelpful and misleading metric. It’s important to understand why so we can exercise due scepticism about other dodgy metrics, and flawed approaches to software development and testing.

DRE is based on a view of software development, testing and quality that I don’t accept. I don’t see a world in which such a metric might be useful, and it contradicts everything I learned in my early days as a team leader, project manager and test manager.

Here are the four reasons I can’t accept DRE as a valid metric. There are other reasons, but these are the ones that matter most to me.

Software development is not a predictable, sequential manufacturing activity

DRE implicitly assumes that development is like manufacturing, that it’s a predictable exercise in building a well understood and defined artefact. At each stage of the process defects should be progressively eliminated, till the object is completed and DRE should have reached 95% (or whatever).

You can see this sequential mindset clearly in this article by Capers Jones, “Measuring Defect Potentials and Defect Removal Efficency” (PDF, opens in new tab) from QA Journal in 2008.

“In order to achieve a cumulative defect removal efficiency of 95%, it will be necessary to use approximately the following sequence of at least eight defect removal activities:

• Design inspections
• Code inspections
• Unit test
• New function test
• Regression test
• Performance test
• System test
• External Beta test

To go above 95%, additional removal stages will be needed. For example requirements inspections, test case inspections, and specialized forms of testing such as human factors testing, performance testing, and security testing add to defect removal efficiency levels.”

Working through sequential “removal stages” is not software development or testing as I recognise them. When I was working on these insurance finance systems there was no neat sequence through development with defects being progressively removed. Much of the early development work could have been called proof of concept. It wasn’t a matter of coding to a specification and then unit testing against that spec. We were discovering more about the problem and experimenting to see what would work for our users.

Each of these “failures” was a precious nugget of extra information about the problem we were trying to solve. The idea that we would have improved quality by recording everything that didn’t work and calling it a defect would have been laughable. Yet this is the implication of another statement by Capers Jones in a paper on the International Function Point Users Group website (December 2012), “Software Defect Origins and Removal Methods” (PDF, opens in new tab).

“Omitting bugs found in requirements, design, and by unit testing are common quality omissions.”

So experimenting to learn more about the problem without treating the results as formal defects is a quality omission? Tying up developers and testers in bureaucracy by extending formal defect management into unit testing is the way to better quality? I don’t think so.

Once we start to change the way people work simply so that we can gather data for metrics we are not simply encouraging them to game the system. It is worse than that. We are trying to change reality to fit our ability to describe it. We are pretending we can change the territory to fit the map.

Quality is not an absence of something

My second objection to DRE in principle is quite simple. It misrepresents quality. ”Quality is value to some person” as Jerry Weinberg famously said in his book “Quality Software Management: Systems Thinking”.

The insurance applications we were developing were intended to help our users understand the business and products better so that they could take better decisions. The quality of the applications was a matter of how well they helped our users to do that. These users were very smart and had a very clear idea of what they were doing and what they needed. They would have bluntly and correctly told us we were stupid and trying to confuse matters by treating quality as an absence of defects. That takes me on to my next objection to DRE.

Defects are not interchangeable objects

A defect is not an object. It possesses no qualities except those we choose to grant it in specific circumstances. In the case of my insurance applications a defect was simply something we didn’t understand that required investigation. It might be a problem with the application, or it might be some feature of the real world that we hadn’t known about and which would require us to change the application to handle it.

We never counted defects. What is the point of adding up things I don’t understand or don’t know about? I don’t understand quantum physics and I don’t know off hand what colour socks my wife is wearing today. Adding the two pieces of ignorance together to get two is not helpful.

Our acceptance criteria never mentioned defect numbers. The criteria were expressed in accuracy targets against specific oracles, e.g. we would have to reconcile our figures to within 5% of the general ledger. What was the basis for the 5% figure? Our users knew from experience that 95% accuracy was good enough to let them take significantly better decisions than they could without the application. 100% was an ideal, but the users knew that the increase in development time to try and reach that level of accuracy would impose a significant business cost because crucial decisions would have had to be taken blindfolded while we tried to polish up a perfect application.

If there was time we would investigate discrepancies even within the 5% tolerance. If we went above 5% in testing or live running then that was a big deal and we would have to respond accordingly.

You may think that this was a special case. Well yes, but every project has its own business context and user needs. DRE assumes a standard world in which 95% DRE is necessarily better than 90%. The additional cost and delay of chasing that extra 5% could mean the value of the application to the business is greatly reduced. It all depends. Using DRE to compare the quality of different developments assumes that a universal, absolute standard is more relevant than the needs of our users.

Put simply, when we developed these insurance applications, counting defects added nothing to our understanding of what we were doing or our knowledge about the quality of the software. We didn’t count test cases either!

DRE has a simplistic, standardised notion of time

This problem is perhaps related to my earlier objection that DRE assumes developers are manufacturing a product, like a car. Once it rolls off the production line it should be largely defect free. The car then enters its active life and most defects should be revealed fairly quickly.

That analogy made no sense for insurance applications, which are highly date sensitive. Insurance contracts might be paid for up front, or in instalments, but they earn money on a daily basis. At the end of the contract period, typically a year, they have to be renewed. The applications consist of different elements performing distinct roles according to different timetables.

DRE requires an arbitrary cut off beyond which you stop counting the live defects and declare a result. It’s usually 90 days. Applying a 90 day cut-off for calculating DRE and using that as a measure of quality would have been ridiculous for us. Worse, if that had been a measure for which we were held accountable it would have distorted important decisions about implementation. With new insurance applications you might convert all the data from the old application when you implement the new one. Or you might convert policies as they come up for renewal.

Choosing the right tactics for conversion and implementation was a tricky exercise balancing different factors. If DRE with a 90 day threshold were applied then different tactics would give different DRE scores. The team would have a strong incentive to choose the approach that would produce the highest DRE score, and not necessarily the one that was best for the company.

Now of course you could tailor the way DRE is calculated to take account of individual projects, but the whole point of DRE is that people who should know better want to make comparisons across different projects, organisations and industries and decide which produces greater quality. Once you start allowing for all these pesky differences you undermine that whole mindset that wants to see development as a manufacturing process that can be standardised.

DRE matters – for the wrong reasons

DRE might be flawed beyond redemption but metrics like that matter to important people for all the wrong reasons. The logic is circular. Development is like manufacturing, therefore a measure that is appropriate for manufacturing should be adopted. Once it is being used to beat up development shops who score poorly they have an incentive to distort their processes to fit the measure. You have to buy in the consultancy support to adapt the way you work. The flawed metric then justifies the flawed assumptions that underpin the metric. It might be logical nonsense, but there is money to be made there.

So DRE is meaningless because it can be gamed? Yes, indeed, but any serious analysis of the way DRE works reveals that it would be a lousy measure, even if everyone tries to apply it responsibly. Even if it were impossible to game it would still suck. It’s trying to redefine reality so we can count it.

Binary opinions? Yes or no?

I am giving a half day tutorial at EuroSTAR this year (2013), so not surprisingly that has forced me to think around the subject, “questioning auditors questioning testing”.

Over the last few weeks I have been struck by the number of times that I have come across one very interesting word – binary.

It’s an important concept, and it is hugely important in both professions. However, I have become increasingly aware that testing and auditing are taking very different approaches to the concept.

Testing and checking

Discussion of binary results in testing is usually tied in with the debate about the distinction between testing and checking. James Bach and Michael Bolton set out the argument clearly here.

I think this is an important distinction. There are still regiments of testers beavering away with detailed test scripts, checking test results against their preconceptions, rather than exploring the application and performing deep, thoughtful testing. The testing establishment from which these traditional testers take their lead, directly or indirectly, have not engaged with the debate. They have given the unfortunate, and probably accurate, impression that they regard checking and testing as being effectively synonymous in practice.

Reality isn’t binary

Rikard Edgren gave a very good talk on the specific idea of binary opinions at Øredev 2011 and Let’s Test 2012. Here is the Øredev talk.

The slides for Let’s Test are here (opens in new tab). Rikard also wrote a blog on the subject. The key phrase I picked up from Rikard was;

Reality isn’t binary, we can communicate noteworthy information – we don’t know everything in advance.

I’m not going to get further into that debate here. I just want to illustrate the contrast with auditing where Rikard’s comment resonates strongly.

Two types of binary opinion (naturally!)

Firstly, I’d better explain that the type of binary opinions vary depending on whether one is talking about internal or external auditing. In internal auditing they would take the form of pass/fail checking of controls. Are they present? Are they complied with?

Binary opinions in external auditing have historically been largely about the truth and fairness of the company accounts, or about whether the company is a going concern. That has been the core of the external audit report. In recent years there has been the added requirement imposed by the Sarbanes-Oxley Act for US companies to express an opinion on whether the framework of internal controls is effective.

Binary opinions in internal audit – a relic of yesteryear

There isn’t a great deal of debate in internal auditing circles about binary opinions. Traditional internal auditing focused on internal controls. The debate has been held, and the overwhelming consensus, at least in informed circles, is that any audit that offers only binary opinions is hopelessly limited, blinkered and hopelessly outdated.

I like the definition of internal controls from Anthony Catenach.

Internal controls are how management makes sure the company’s business model is operating correctly.

If you view internal controls in that broader perspective then you should be able to see how simple binary opinions are unhelpful. Auditors need to set their findings in context and explain why they are significant and what danger they pose. Simply saying that certain controls are missing, or have not been applied, is unhelpful. That’s not to say that such simplistic audits have vanished.

A couple of weeks ago I was speaking to a friend who works as a developer for a multinational company. He told me that the internal auditors work from a checklist, using questions that require yes/no answers. People are very way of the auditors and answer only direct questions without offering anything more. It horrifies me that auditors should ever accept the answer “yes” or “no” without following up with “why?”.

That sort of auditing is ineffective and unprofessional. I can’t stress strongly enough that audit checklists have a place, but they are not the audit! They are merely the starting point for a conversation.

Previously to illustrate how an audit interview is conducted I have used the analogy of an advocate (barrister or attorney) questioning a witness in court. The advocate cannot know what answer the witness will give and has to vary the follow up questions accordingly, rather than ploughing on with a prepared script. Conducting an audit by checklist is very much like sticking to the script regardless of the answers.

This is now orthodox modern opinion. The opinion formers, the leading lights of the profession know that binary opinions are dated and the debate has moved on to risk; how can auditors inform stakeholders about the risks that matter, the risks that keep them awake at night? How can auditors help management to understand the risks that they are facing and to take decisions that are better informed about the risks?

Regulators and binary opinions in external audit

The debate about binary opinions in internal audit may be largely over but it is still very much alive in external audit. The regulators in the UK and the USA are pushing hard for auditors to provide more useful opinions in their reports rather than relying on simple, and frequently misleading, binary opinions.

The response from the Big 4 audit firms has been cool, but telling the regulators to take a hike is politically tricky! They have to engage with the debate. It’s not good enough for them to defend current practices. The problems with these are glaringly obvious, so they have to respond constructively.

The position is slightly confused by Sarbanes-Oxley’s requirement that external auditors state whether they believe the framework of internal controls is effective. That takes them into internal audit territory, and raises concerns about whether such a judgement can be accurate or helpful. Certainly the experience of recent years isn’t encouraging.

There are countless examples of companies whose accounts have been passed by their external auditors, only to collapse from problems that existed before the audit was conducted. Remember Enron? That debacle led to the demise of one of the world’s biggest firms of accountants, Arthur Andersen. Remember the banks who collapsed? All sailed through their audits, with the auditors picking up multi-million pound fees for offering opinions that proved groundless.

I’m not suggesting that these fees were too high. Perhaps they were too low and worthwhile audits and opinions would be more expensive. However, I am saying that the current reporting regime, with too much emphasis on binary opinions, provides lousy value for money. That is not a minority view. It is the view of the regulators in the UK and USA. It will be interesting to see where the EU moves in this regard.

Testers are not alone

This is far too big and complex an area for me to cover in any detail either now or in my tutorial at EuroStar, even if it is of any interest to any testers except me! However, I think it’s important to understand that there is a big and influential profession wrestling with some of the issues facing testers.

Auditors have to think about how they work, what value they provide, what they should look for, what knowledge they can reasonably provide. Indeed, the more thoughtful auditors are thinking about what knowledge means in their context, how they can “know” things, what constitutes evidence and opinion.

This is epistemology, and it is fascinating. Thinking about this is not some esoteric academic exercise. If we are not clear about what we can know and how we should investigate and report on the knowledge that is available then the danger is that we will end up just faking the whole exercise. We will continue to dress up subjective opinons as “objective” binary verdicts; “yes” this is ok, “no” it isn’t.

Reality doesn’t become clearer simply by pretending that it can be reduced to binary opinions. Quite the reverse, messy reality is obscured by a binary approach. Auditors know that, or at least the clever ones do. There are plenty of smart and capable auditors out there, trying to make sense of what is going on.

The good ones are natural allies of good testers. Seek them out and make them your allies. As for the bad ones, well they are still around as my friend can testify. Their approach is inept and unprofessional. It might not be wise to use these words! It might be interesting to ask them some difficult questions about how they can square their approach with the views of the auditing establishment, the professional bodies and the regulators.

It’s a pity that the self appointed testing establishment, ISTQB and ISO, can’t take a similarly clear line. Sadly their silence effectively endorses binary opinions. Self appointed shouldn’t mean self interested.

How am I wasting your time?

At the weekend I was reading this fascinating column by Oliver Burkeman on cutting out time-wasting activities. He talks about the importance of having a “stop doing” list, as well as a “to do” list. It’s an interesting, well-written piece, mainly about the work of Peter Drucker

I was particularly interested in this quote from Drucker.

“if you’re a boss, develop the habit of asking your underlings, “without coyness”, the one question that will trigger more improvement than any other: “What do I do that wastes your time without contributing to your effectiveness?”

The team culture quadrant

One of the early lessons I learned for myself when I started managing teams was how the culture and strength of the team largely dictated the manager’s job.
team culture quadrant

I formulated a rough quadrant illustrating what I felt my priorities were, shaped by experiences with two employers, with differing cultures.

If the team is weak, consisting of inexperienced or poorly performing members then the priority is to help the team shape up; assisting the willing but inexperienced as they develop, removing the time wasters if possible, and at least preventing them from disrupting the productive team members.

If the culture is healthy then this a challenging, but relatively straightforward and certainly rewarding role, provided that the manager really understands what the team are supposed to be doing.

If the team is strong but the culture is unhealthy then the job of the manager is to protect the team from distraction and problems that would waste their time. The manager deals with the crap so the team doesn’t have to. The manager should also be trying to change the culture, pointing out the problems, arguing for improvements and generally trying to shape the environment so that good teams can do good work as efficiently, effectively and happily as possible.

That’s obviously a tough task, and it might not be possible for an individual to bring about serious improvement, but it’s better to fight constructively than to suffer passively.

I was managing a team in this position once, and a programmer asked me what on earth I did with my time? He couldn’t see what work I was doing? He was a friend, so he knew I wouldn’t take offence. I actually regarded it as confirmation that I was doing a good job.

The team were all high calibre and hard working. I had to spend a lot of time handling the users, turning mushy requirements into stuff we could work with, negotiating with other departments. In the meantime the team were whizzing along in fine style, oblivious to the problems that they weren’t hitting. If the team is in the groove it’s just fine by me if they’re taking that state of affairs for granted.

The job gets really tough and stressful when the manager is faced with a weak team in an unhealthy culture; endless unproductive meetings, pointless reports, meaningless metrics for layers of management who can’t understand them, and overly detailed and proscriptive plans that pretend we can know what individuals will be doing in a few months? Yup, I’ve been there and got paid a good salary for doing little that was genuinely valuable. Meanwhile the team is floundering and needing positive, patient time-consuming support. This is where you earn your ulcers.

The time-wasting rubbish is inescapable, but the priority has to be to strengthen the team. Here it is particularly important to weed out those who are slowing the team down; the idle, the awkward, the cheerfully incompetent who have no interest in improving. It’s never easy, unless they’re contractors. The poor performers are generally well known, and no-one is going to willingly take them off your hands.

So – exactly how am I wasting your time?

If you get to the point where you have a strong team in a healthy culture then that’s great, for the short term at least. The trouble is that you are almost redundant, and you need to be careful that you are not getting in the way of the team. If things are humming along smoothly it’s not a great idea to coast. Sooner or later someone is going to twig that you aren’t contributing much. It’s far better to speak up and point out that your skills are being under-utilised, and maybe it’s time to move on to a new role.

I managed to work all that out for myself, and I still stand by it all. However, I’d never explicitly thought of that point from Peter Drucker; why not ask team members what you do as a manager that wastes their time?

It’s exactly when you might feel you’ve arrived at the happy combination of strong team and healthy culture that the sole, significant remaining problem could be you. I can’t believe I missed it. I’m sure Drucker was right and that it could be the most important question you could ask your team. How am I wasting your time?