David Graeber’s “The Utopia of Rules: On Technology, Stupidity and the Secret Joys of Bureaucracy”

When I gave my talk at CAST 2014 in New York, “Standards – promoting quality or restricting competition?” I was concentrating on the economic aspects of standards. They are often valuable, but they can be damaging and restrict competition if they are misused. A few months later I bought “The Utopia of Rules: On Technology, Stupidity, and the Secret Joys of Bureaucracy” by David Graeber, Professor of Anthropology at the London School of Economics. I was familiar with Graeber as a challenging and insightful writer. I drew on his work when I wrote “Testing: valuable or bullshit?“. The Utopia of Rules also inspired the blog article I wrote recently, “Frozen in time – grammar and testing standards” in which I discussed the similarity between grammar textbooks and standards, which both codify old usages and practices that no longer match the modern world.

What I hadn’t expected from The Utopia of Rules was how strongly it would support the arguments I made at CAST.

Certification and credentialism

Graeber makes the same argument I deployed against certification. It is being used increasingly to enrich special interests without benefiting society. On page 23 Graeber writes:

Almost every endeavor that used to be considered an art (best learned through doing) now requires formal professional training and a certificate of completion… In some cases, these new training requirements can only be described as outright scams, as when lenders, and those prepared to set up the training programs, jointly lobby the government to insist that, say, all pharmacists be henceforth required to pass some additional qualifying examination, forcing thousands already practicing the profession into night school, which these pharmacists know many will only be able to afford with the help of high-interest student loans. By doing this, lenders are in effect legislating themselves a cut of most pharmacists’ subsequent incomes.

To be clear, my stance on ISTQB training is that it educates testers in a legitimate, though very limited, vision of testing. My objection is to any marketing of the qualification as a certification of testing ability, rather than confirmation that the tester has passed an exam associated with a particular training course. I object even more strongly to any argument that possession of the certificate should be a requirement for employment, or for contracting out testing services. It is reasonable to talk of scams when the ability of good testers to earn a living is damaged.

What is the point of it all?

Graeber has interesting insights into how bureaucrats can be vague about the values of the bureaucracy: why does the organisation exist? Bureaucrats focus on efficient execution of rational processes, but what is the point of it all? Often the means become the ends: efficiency is an end in itself.

I didn’t argue that point at CAST, but I have done so many times in other talks and articles (e.g. “Teddy bear methods“). If people are doing a difficult, stressful job and you give them prescriptive methods, processes or standards then they will focus on ticking their way down the list. The end towards which they are working becomes compliance with the process, rather than helping the organisation reach its goal. They see their job as producing the outputs from the process, rather than the outcomes the stakeholders want. I gave a talk in London in June 2015 to the British Computer Society’s Special Interest Group in Software Testing in which I argued that testing lacks guiding principles (PDF, opens in a new tab) and ISO 29119 in particular does not offer clear guidance about the purpose of testing.

In a related argument Graeber makes a point that will be familiar to those who have criticised the misuse of testing metrics.

…from inside the system, the algorithms and mathematical formulae by which the world comes to be assessed become, ultimately, not just measures of value, but the source of value itself.

Rent extraction

The most controversial part of my CAST talk was my argument that the pressure to adopt testing standards was entirely consistent with rent seeking in economic theory. Rent seeking, or rent extraction, is what people do when they exploit failings in the market, or rig the market for their own benefit by lobbying for regulation that happens to benefit them. Instead of creating wealth, they take it from other people in a way that is legal, but which is detrimental to the economy, and society, as a whole.

This argument riled some people who took it as a personal attack on their integrity. I’m not going to dwell on that point. I meant no personal slur. Rent seeking is just a feature of modern economies. Saying so is merely being realistic. David Graeber argued the point even more strongly.

The process of financialization has meant that an ever-increasing proportion of corporate profits come in the form of rent extraction of one sort or another. Since this is ultimately little more than legalized extortion, it is accompanied by ever-increasing accumulation of rules and regulations… At the same time, some of the profits from rent extraction are recycled to select portions of the professional classes, or to create new cadres of paper-pushing corporate bureaucrats. This helps a phenomenon I have written about elsewhere: the continual growth, in recent decades, of apparently meaningless, make-work, “bullshit jobs” — strategic vision coordinators, human resources consultants, legal analysts, and the like — despite the fact that even those who hold such positions are half the time secretly convinced they contribute nothing to the enterprise.

In 2014 I wrote about “bullshit jobs“, prompted partly by one of Graeber’s articles. It’s an important point. It is vital that testers define their job so that it offers real value, and they are not merely bullshit functionaries of the corporate bureaucracy.

Utopian bureaucracies

I have believed for a long time that adopting highly prescriptive methods or standards for software development and testing places unfair pressure on people, who are set up to fail. Graeber makes exactly the same point.

Bureaucracies public and private appear — for whatever historical reasons — to be organized in such a way as to guarantee that a significant proportion of actors will not be able to perform their tasks as expected. It’s in this sense that I’ve said one can fairly say that bureaucracies are utopian forms of organization. After all, is this not what we always say of utopians: that they have a naïve faith in the perfectibility of human nature and refuse to deal with humans as they actually are? Which is, are we not also told, what leads them to set impossible standards and then blame the individuals for not living up to them? But in fact all bureaucracies do this, insofar as they set demands they insist are reasonable, and then, on discovering that they are not reasonable (since a significant number of people will always be unable to perform as expected), conclude that the problem is not with the demands themselves but with the individual inadequacy of each particular human being who fails to live up to them.

Testing standards such as ISO 29119, and its predecessor IEEE 829, don’t reflect what developers and testers do, or rather should be doing. They are at odds with the way people think and work in organisations. These standards attempt to represent a highly complex, sometimes chaotic, process in a defined, repeatable model. The end product is usually of dubious quality, late and over budget. Any review of the development will find constant deviations from the standard. The suppliers, and defenders, of the standard can then breathe a sigh of relief. The sacred standard was not followed. It was the team’s fault. If only they’d done it by the book! The possibility that the developers’ and testers’ apparent sins were the only reason anything was produced at all is never considered. This is a dreadful way to treat people, but in many organisations it has been normal for several decades.

Loss of communication

All of the previous arguments by Graeber were entirely consistent with my own thoughts about how corporate bureaucracies operate. It was fascinating to see an anthropologist’s perspective, but it didnt teach me anything that was really new about how testers work in corporations. However, later in the book Graeber developed two arguments that gave me new insights.

Understanding what is happening in a complex, social situation needs effective two way communication. This requires effort, “interpretive labor”. The greater the degree of compulsion, and the greater the bureaucratic regime of rules and forms, the less need there is for such two way communication. Those who can simply issue orders that must be obeyed don’t have to take the trouble to understand the complexities of the situation they’re managing.

…within relations of domination, it is generally the subordinates who are effectively relegated the work of understanding how the social relations in question really work. … It’s those who do not have the power to hire and fire who are left with the work of figuring out what actually did go wrong so as to make sure it doesn’t happen again.

This ties in with the previous argument about utopian bureaucracies. If you impose a inappropriate standard then poor results will be attributed to the inevitable failure to comply. There is no need for senior managers to understand more, and no need to listen to the complaints, the “excuses”, of the people who do understand what is happening. Interestingly, Graeber’s argument about interpretive labor is is consistent with regulatory theory. Good regulation of complex situations requires ongoing communication between the regulator and the regulated. I explained this in the talk on testing principles I mentioned above (slides 38 and 39).

Fear of play

My second new insight from Graeber arrived when he discussed the nature of play and how it relates to bureaucracies. Anthropologists try to maintain a distinction between games and play, a distinction that is easier to maintain in English than in languages like French and German, which use the same word for both. A game has boundaries, set rules and a predetermined conclusion. Play is more free-form and creative. Novelties and surprising results emerge from the act of playing. It is a random, unpredictable and potentially destructive activity. Graeber finishes his discussion of play and games with the striking observation.

What ultimately lies behind the appeal of bureaucracy is fear of play.

Put simply, and rather simplistically, Graeber means that we use bureaucracy to escape the terror of chaotic reality, to bring a semblance (an illusion?) of control to the uncontrollable.

This gave me an tantalising new insight into the reasons people build bureaucratic regimes in organisations. It sent me off into a whole new field of reading on the anthropology of games and play. This has fascinating implications for the debate about standards and testing. We shy away from play, but it is through play that we learn. I don’t have time now to do the topic justice, and it’s much too big and important a subject to be tacked on to the end of this article, but I will return to it. It is yet another example of the way anthropology can help us understand what we are doing as testers. As a starting point I can heartily recommend David Graeber’s book, “The Utopia of Rules”.

A single source of truth?

Lately in a chatroom for the International Society for Software Testing there has been some discussion about the idea of a “single source of truth”. I’m familiar with this in the sense of database design. Every piece of data is stored once and the design precludes the possibility of inconsistency, of alternative versions of the same data. That makes sense in this narrow context, but the discussion revealed that the phrase is now being used in a different sense. A single source of truth has been used to describe an oracle of oracles, an ultimate specification on which total reliance can be placed. The implications worry me, especially for financial systems, which is my background.

I’m not comfortable with a single source of truth, especially when it applies to things like bank balances, profit and loss figures, or indeed any non-trivial result of calculations. What might make more sense is to talk of a single statement of truth, and that statement could, and should, have multiple sources so the statement is transparent and can be validated. However, I still wouldn’t want to talk about truth in financial statements. For an insurance premium there are various different measures, which have different uses to different people at different times. When people start talking about a single, true, premium figure they are closing off their minds to reality and trying to redefine it to suit their limited vision.

All of these competing measures could be regarded as true in the right context, but there are other measures which are less defensible and which an expert would consider wrong, or misleading, in any context (eg lumping Insurance Premium Tax into the premium figure). That’s all quite aside from the question of whether these measures are accurate on their own terms.

A “single source of truth” reminds me of arguments I’d have with application designers. Sometimes the problem would be that they wanted to eliminate any redundancy in the design. That could make reconciliation and error detection much harder because the opportunities to spot errors would be reduced. If a calculation was wrong it might stay wrong because no-one would know. A different source of friction was the age old problem of analysts and designers determined to stick rigidly to the requirements without questioning them, or even really thinking about the implications. I suspect I was regarded as a pedantic nuisance, creating problems in places the designers were determined no problems could ever exist – or ever be visible.

Accounting for truth

Conventional financial accounting is based on double entry book-keeping, which requires every transaction to be entered twice, in different places so that the accounts as a whole remain in balance. There may be a single, definitive statement of profit, but that is distilled from multiple sources, with an intricate web of balances and documented, supporting assumptions. The whole thing is therefore verifiable, or auditable. But it’s not truth. It’s more a matter of saying “given these assumptions this set of transactions produces the following profit figure”. Vary the assumptions and you have a different and perhaps equally valid figure – so it’s not truth.

For many years academic accountants, e.g. Christopher Napier, have been doing fascinating work that strays over into philosophy. What is this reality that we are trying to understand? That’s ontology. What can we know about it, and what reliance can we put on that knowledge when we try to report it? That’s epistemology. Why are we doing it? That’s teleology.

The most interesting subject I ever studied in accountancy at university was the problem of inflation accounting. £6-£5=£1 might be a crude profit calculation for an item whose inputs cost you £5 and which you sold for £6. But what if the £5 was a cost incurred 11 months ago. You then buy replacement inputs, which now cost £7, but you’d still only be able to sell the finished product for £6. What does it mean to say you made a profit of £1? Who does that help? Couldn’t you also argue that you made a loss of £1?

What does it mean to add money together when the different elements were captured at dates when the purchasing power equivalent of that money was different? You’re adding apples and oranges. The value of money is dependent on what it can buy. Setting aside short term speculation that is what dictates currency exchange rates. £1 is more valuable than €1 because it buys more. It is meaningless to add £1 + €1 and get 2. An individual currency has different values over time, so is it any more meaningful to add different monetary figures without considering what their value was at the time the data was captured?

The academics pointed out all the problems inflation caused and came up with possible, complicated solutions. However, the profession eventually decided it was all just too difficult and pretty much gave up, except for an international standard for accounting in countries experiencing hyper-inflation (defined as greater than 100% over three years, i.e. a persisting annual rate of at least 26%). As at the end of 2014 the qualifying countries are Belarus, Venezuela, Sudan, Iran and Syria (which has rather more to worry about than financial accounting). For the rest of the world, if you want to add 5 apples and 6 oranges, that’s fine. You’ve now got 11 pieces of fruit. Stop worrying and just do the job.

I’m the treasurer for a church, and I’m often asked how much money we’ve got. I never bother going to the online bank statement, because I know that what people really want to know is how much money is available. So I use the church accounts, which factor in the income and payments that haven’t been cleared, and the money we’re due imminently, and the outgoings to which we’re already committed. These different figures all mesh together and provide a figure that we find useful, but which is different from the bank’s view of our balance. Our own accounts never rely on a single source of truth. There are multiple reconciliation checks to try and flag up errors. The hope is that inputting an incorrect amount will generate a visible error. We’re not reporting truth. All we can say is, so far as we know this is as useful and honest a statement of our finances as we can produce for our purposes, for the Church of Scotland, the Office of the Scottish Charity Regulator and the other stakeholders.

It’s messy and complex – deal with it

What’s it all got to do with testing? If your vision of testing is checking whether the apparent functionality is consistent with the specification as represented in the test script then this sort of messy complexity is a tedious distraction. It’s so much easier to pretend you can confirm the truth using a test script.

However, testing is (or should be) a difficult and intellectually demanding process of teasing out the implications of the application for the stakeholders. If you accept that, then you are far more likely to do something valuable if you stop thinking about any single source of truth. You should be thinking instead about possible sources of insight to help you shed light on the various “truths” that the various stakeholders are seeking. Understanding these different needs, and all the nuances that arise from them is essential for testers.

Assuming that there is a single truth that we can attest to with a simple, binary yes/no answer reduces testing to the level of the accountants who have tried to treat accountancy as a simple arithmetical exercise. Five oranges and six apples add up to eleven pieces of fruit; and so do eleven grapes, and eleven melons. So what? That is a useless and misleading piece of information, like the unqualified statement that the product is sound because we found what the script told us to look for. Testers, accountants and auditors all pick up good money because they are required to provide valuable information to people who need it. They should be expected to deal with messy, complex reality. They should not be allowed to get away with trying to redefine reality so it’s easier to handle.

They can’t handle the truth

Have you ever had to deal with managers or users who were sceptical about the time and effort a piece of work would take? Have you ever complained in vain about a project that was clearly doomed to fail right from the start? Have you ever felt that a project was being planned on the basis of totally unjustified optimism?

If you’ve been in IT for a while there’s a good chance you’ve answered “yes” to at least one of these questions. Over the years I grew wearily familiar with the pattern of willful refusal to consider anything but the happy path to a smooth, speedy delivery of everything on the wish list, within a comical budget that is challenging I admit, but realistic if we all pull together.

Over time I gradually came to realise that many senior managers and stakeholders didn’t want the truth. They want the fiction, to be lied to because knowing the truth would make them responsible for dealing with it. In their world it is better to be deceived and then firefight a failing project than to deal honestly with likely problems and uncertainty. Above all, they can’t bring themselves to deal with the truth of uncertainty. It is far more comfortable to pretend that uncertainty is evidence of lack of competence, that problems can be anticipated, that risks can be ignored or managed out of existence, that complexity can be eliminated by planning and documentation (and by standards).

Telling the truth – a brave act in an unfair world

Perhaps the toughest roles in IT are those that are senior enough to be accountable for the results, but too junior to beat uncomfortable truths into the brains of those who really don’t want to know.

These budding fall guys have the nous and experience to see what is going to happen. One of the rarely acknowledged skills of these battle scarred veterans is the ability to judge the right moment and right way to start shouting the truth loudly. Reveal all too early and they can be written off as negative, defeatist, “not a team player”. Reveal it too late and they will be castigated for covering up imminent failure, and failing to comply with some standard or process. Everyone fails to comply. Not everyone is going to be kicked for it, but late deliverers of bad news are dead meat.

Of course that’s not fair, but that’s hardly the point. Fairness isn’t relevant if the culture is one where rationality, prudence and pragmatism all lead to crazy behaviour because that is what is rewarded. People rationally adapt to the requirement to stop thinking when they see others being punished for honesty and insight.

What is an estimate?

So what’s the answer? The easy one is, “run, and run fast”. Get out and find a healthier culture. However, if you’re staying then you have to deal with the problem of handling senior people who can’t handle the truth.

It is important to be clear in your own mind about what you are being asked for when you have to estimate. Is it a quote? Is there an implied instruction that something must be delivered by a certain date? Are there certain deliverables that are needed by that date, and others that can wait? Could it be a starting point for negotiation? See this article I wrote a few years ago.

Honesty is non-negotiable

It’s a personal stance, but honesty about uncertainty and the likelihood of serious but unforeseeable problems is non-negotiable. I know others have thought I have a rather casual attitude towards job security and contract renewal! However, I can’t stomach the idea of lingering for years in an unhealthy culture. And it’s not as if honesty means telling the senior guys who don’t want the truth that they are morons (even if they are).

Honesty requires clear thinking, and careful explanation of doubt and uncertainty. It means being a good communicator, so that the guys who take the big decisions have a better understanding that your problems will quickly become their problems. It requires careful gathering of relevant information if you are ordered into a project death march so that you can present a compelling case for a rethink when there might still be time for the senior managers and stakeholders to save face. Having the savvy to help the deliberately ignorant to handle the truth really is a valuable skill. Perhaps Jack Nicholson’s character from “A Few Good Men” isn’t such a great role model, however. His honesty in that memorable scene resulted in him being arrested!

Why do you need the report?

Have you ever wondered what the purpose of a report was, whether it was a status report that you had to complete, or a report generated by an application? You may have wondered if there was any real need for the report, and whether anyone would miss it if no-one bothered to produce it.

I have come across countless examples of reports that seemed pointless. What was worse, their existence shaped the job we had to do. The reports did not help people to do the job. They dictated how we worked; production, checking and filing of the reports for future inspection were a fundamental part of the job. In any review of the project, or of our our performance, they were key evidence.

My concern, and cynicism, were sharpened by an experience as an auditor when I saw at first hand how a set of reports were defined for a large insurance company. To misquote Otto von Bismarck’s comment on the creation of laws; reports are like sausages, it is best not to see them being made.

The company was developing a new access controls system, to allow managers to assign access rights and privileges to staff who were using the various underwriting, claims and accounts applications. As an auditor I was a stakeholder, helping to shape the requirements and advising on the controls that might be needed and on possible weaknesses that should be avoided.

One day I was approached by the project manager and a user from the department that defined the working practices at the hundred or so branch offices around the UK and Republic of Ireland. “What control reports should the access control system provide?” was their question.

I said that was not my decision. The reports could not be treated as a bolt on addition to the system. They should not be specified by auditors. The application should provide managers with the information they needed to do their jobs, and if it wasn’t feasible to do that in real time, then reports should be run off to help them. It all depended on what managers needed, and that depended on their responsibilities for managing access. The others were unconvinced by my answer.

A few weeks later the request for me to specify a suite of reports was repeated. Again I declined. This time the matter was escalated. The manager of the branch operations department sat in on the meeting. He made it clear that a suite of reports must be defined and coded by the end of the month, ready for the application to go live.

He was incredulous that I, as an auditor, would not specify the reports. His reasoning was that when auditors visited branches they would presumably check to see whether the reports had been signed and filed. I explained that it was the job of his department to define the jobs and responsibilities of the branch managers, and to decide what reports these managers would need in order to fulfill their responsibilities and do their job.

The manager said that was easy; it was the responsibility of the branch managers to look at the reports, take action if necessary, then sign the reports and file them. That was absurd. I tried to explain that this was all back to front. At the risk of stating the obvious, I pointed out that reports were required only if there was a need for them. That need had to be identified so that the right reports could be produced.

I was dismissed as a troublesome timewaster. The project manager was ordered to produce a suite of reports, “whatever you think would be useful”. The resulting reports were simply clones of the reports that came out from an older access control system, designed for a different technical and office environment, with quite different working practices.

The branch managers were then ordered to check them and file them. The branch operations manager had taken decisive action. The deadline was met. Everyone was happy, except of course the poor branch managers who had to wade through useless reports, and the auditors of course. We were dismayed at the inefficiency and sheer pointlessness of producing reports without any thought about what their purpose was.

That highlighted one of the weaknesses of auditors. People invariably listened to us if we pointed out that something important wasn’t being done. When we said that something pointless was being done there was usually reluctance to stop it.

Anything that people have got used to doing, even if it is wasteful, ineffective and inefficient, acquires its own justification over time. The corporate mindset can be “this is what we do, this is how we do it”. The purpose of the corporate bureaucracy becomes the smooth running of the bureaucracy. Checking reports was a part of a branch manager’s job. It required a mental leap to shift to a position where you have to think whether reports are required, and what useful reporting might comprise. It’s so much easier to snap, “just give us something useful” and move on. That’s decisive management. That’s what’s rewarded. Thinking? Sadly, that can be regard as a self-indulgent, waste of time.

However, few things are more genuinely wasteful of the valuable time of well paid employees than reporting that has no intrinsic value. Reporting that forces us to adapt our work to fit the preconceptions of the report designer gobbles up huge amounts of time and stop us doing work that could be genuinely valuable. The preconceptions that underpin many reports and metrics may once have been justified, and have fitted in with contemporary working practices. However, these preconceptions need to be constantly challenged and re-assessed. Reports and metrics do shape the way we work, and the way we are assessed. So we need to keep asking, “just why do you need the report?”

DRE: changing reality so we can count it

It’s usually true that our attitudes and beliefs are shaped by our early experiences. That applies to my views on software development and testing. My first experience of real responsibility in development and testing was with insurance financial systems. What I learned and experienced will always remain with me. I have always struggled with some of the tenets of traditional testing, and in particular the metrics that are often used.

There has been some recent discussion on Twitter about Defect Removal Efficiency. It was John Stephenson’s blog that set me thinking once again about DRE, a metric I’d long since consigned to my mental dustbin.

If you’re unfamiliar with the metric it is the number of defects found before implementation expressed as a percentage of all the defects discovered within a certain period of going live (i.e live defects plus development defects). The cut off is usually 90 days from implementation. So the more defects reported in testing and the fewer in live running the higher the percentage, and the higher the quality (supposedly). A perfect application would have no live defects and therefore a DRE score of 100%; all defects were found in testing.

John’s point was essentially that DRE can be gamed so easily that it is worthless. I agree. However, even if testers and developers tried not to manipulate DRE, even if it couldn’t be gamed at all it would still be an unhelpful and misleading metric. It’s important to understand why so we can exercise due scepticism about other dodgy metrics, and flawed approaches to software development and testing.

DRE is based on a view of software development, testing and quality that I don’t accept. I don’t see a world in which such a metric might be useful, and it contradicts everything I learned in my early days as a team leader, project manager and test manager.

Here are the four reasons I can’t accept DRE as a valid metric. There are other reasons, but these are the ones that matter most to me.

Software development is not a predictable, sequential manufacturing activity

DRE implicitly assumes that development is like manufacturing, that it’s a predictable exercise in building a well understood and defined artefact. At each stage of the process defects should be progressively eliminated, till the object is completed and DRE should have reached 95% (or whatever).

You can see this sequential mindset clearly in this article by Capers Jones, “Measuring Defect Potentials and Defect Removal Efficency” (PDF, opens in new tab) from QA Journal in 2008.

“In order to achieve a cumulative defect removal efficiency of 95%, it will be necessary to use approximately the following sequence of at least eight defect removal activities:

• Design inspections
• Code inspections
• Unit test
• New function test
• Regression test
• Performance test
• System test
• External Beta test

To go above 95%, additional removal stages will be needed. For example requirements inspections, test case inspections, and specialized forms of testing such as human factors testing, performance testing, and security testing add to defect removal efficiency levels.”

Working through sequential “removal stages” is not software development or testing as I recognise them. When I was working on these insurance finance systems there was no neat sequence through development with defects being progressively removed. Much of the early development work could have been called proof of concept. It wasn’t a matter of coding to a specification and then unit testing against that spec. We were discovering more about the problem and experimenting to see what would work for our users.

Each of these “failures” was a precious nugget of extra information about the problem we were trying to solve. The idea that we would have improved quality by recording everything that didn’t work and calling it a defect would have been laughable. Yet this is the implication of another statement by Capers Jones in a paper on the International Function Point Users Group website (December 2012), “Software Defect Origins and Removal Methods” (PDF, opens in new tab).

“Omitting bugs found in requirements, design, and by unit testing are common quality omissions.”

So experimenting to learn more about the problem without treating the results as formal defects is a quality omission? Tying up developers and testers in bureaucracy by extending formal defect management into unit testing is the way to better quality? I don’t think so.

Once we start to change the way people work simply so that we can gather data for metrics we are not simply encouraging them to game the system. It is worse than that. We are trying to change reality to fit our ability to describe it. We are pretending we can change the territory to fit the map.

Quality is not an absence of something

My second objection to DRE in principle is quite simple. It misrepresents quality. ”Quality is value to some person” as Jerry Weinberg famously said in his book “Quality Software Management: Systems Thinking”.

The insurance applications we were developing were intended to help our users understand the business and products better so that they could take better decisions. The quality of the applications was a matter of how well they helped our users to do that. These users were very smart and had a very clear idea of what they were doing and what they needed. They would have bluntly and correctly told us we were stupid and trying to confuse matters by treating quality as an absence of defects. That takes me on to my next objection to DRE.

Defects are not interchangeable objects

A defect is not an object. It possesses no qualities except those we choose to grant it in specific circumstances. In the case of my insurance applications a defect was simply something we didn’t understand that required investigation. It might be a problem with the application, or it might be some feature of the real world that we hadn’t known about and which would require us to change the application to handle it.

We never counted defects. What is the point of adding up things I don’t understand or don’t know about? I don’t understand quantum physics and I don’t know off hand what colour socks my wife is wearing today. Adding the two pieces of ignorance together to get two is not helpful.

Our acceptance criteria never mentioned defect numbers. The criteria were expressed in accuracy targets against specific oracles, e.g. we would have to reconcile our figures to within 5% of the general ledger. What was the basis for the 5% figure? Our users knew from experience that 95% accuracy was good enough to let them take significantly better decisions than they could without the application. 100% was an ideal, but the users knew that the increase in development time to try and reach that level of accuracy would impose a significant business cost because crucial decisions would have had to be taken blindfolded while we tried to polish up a perfect application.

If there was time we would investigate discrepancies even within the 5% tolerance. If we went above 5% in testing or live running then that was a big deal and we would have to respond accordingly.

You may think that this was a special case. Well yes, but every project has its own business context and user needs. DRE assumes a standard world in which 95% DRE is necessarily better than 90%. The additional cost and delay of chasing that extra 5% could mean the value of the application to the business is greatly reduced. It all depends. Using DRE to compare the quality of different developments assumes that a universal, absolute standard is more relevant than the needs of our users.

Put simply, when we developed these insurance applications, counting defects added nothing to our understanding of what we were doing or our knowledge about the quality of the software. We didn’t count test cases either!

DRE has a simplistic, standardised notion of time

This problem is perhaps related to my earlier objection that DRE assumes developers are manufacturing a product, like a car. Once it rolls off the production line it should be largely defect free. The car then enters its active life and most defects should be revealed fairly quickly.

That analogy made no sense for insurance applications, which are highly date sensitive. Insurance contracts might be paid for up front, or in instalments, but they earn money on a daily basis. At the end of the contract period, typically a year, they have to be renewed. The applications consist of different elements performing distinct roles according to different timetables.

DRE requires an arbitrary cut off beyond which you stop counting the live defects and declare a result. It’s usually 90 days. Applying a 90 day cut-off for calculating DRE and using that as a measure of quality would have been ridiculous for us. Worse, if that had been a measure for which we were held accountable it would have distorted important decisions about implementation. With new insurance applications you might convert all the data from the old application when you implement the new one. Or you might convert policies as they come up for renewal.

Choosing the right tactics for conversion and implementation was a tricky exercise balancing different factors. If DRE with a 90 day threshold were applied then different tactics would give different DRE scores. The team would have a strong incentive to choose the approach that would produce the highest DRE score, and not necessarily the one that was best for the company.

Now of course you could tailor the way DRE is calculated to take account of individual projects, but the whole point of DRE is that people who should know better want to make comparisons across different projects, organisations and industries and decide which produces greater quality. Once you start allowing for all these pesky differences you undermine that whole mindset that wants to see development as a manufacturing process that can be standardised.

DRE matters – for the wrong reasons

DRE might be flawed beyond redemption but metrics like that matter to important people for all the wrong reasons. The logic is circular. Development is like manufacturing, therefore a measure that is appropriate for manufacturing should be adopted. Once it is being used to beat up development shops who score poorly they have an incentive to distort their processes to fit the measure. You have to buy in the consultancy support to adapt the way you work. The flawed metric then justifies the flawed assumptions that underpin the metric. It might be logical nonsense, but there is money to be made there.

So DRE is meaningless because it can be gamed? Yes, indeed, but any serious analysis of the way DRE works reveals that it would be a lousy measure, even if everyone tries to apply it responsibly. Even if it were impossible to game it would still suck. It’s trying to redefine reality so we can count it.

Binary opinions? Yes or no?

I am giving a half day tutorial at EuroSTAR this year, so not surprisingly that has forced me to think around the subject, “questioning auditors questioning testing”.

Over the last few weeks I have been struck by the number of times that I have come across one very interesting word – binary.

It’s an important concept, and it is hugely important in both professions. However, I have become increasingly aware that testing and auditing are taking very different approaches to the concept.

Testing and checking

Discussion of binary results in testing is usually tied in with the debate about the distinction between testing and checking. James Bach and Michael Bolton set out the argument clearly here.

The distinction is fundamentally important, but frustratingly the debate hasn’t really got through to the whole of the testing profession.

There are still regiments of testers oblivious to to the distinction, beavering away with detailed test scripts, checking the results. The testing establishment from which these traditional testers take their lead, directly or indirectly, have not engaged with the debate. They have given the unfortunate, and probably accurate, impression that they regard checking and testing as being effectively synonymous in practice.

Reality isn’t binary

Rikard Edgren gave a very good talk on the specific idea of binary opinions at Øredev 2011 and Let’s Test 2012. Here is the Øredev talk.


The slides for Let’s Test are here (opens in new tab). Rikard also wrote a blog on the subject. The key phrase I picked up from Rikard was;

Reality isn’t binary, we can communicate noteworthy information – we don’t know everything in advance.

I’m not going to get further into that debate here. I just want to illustrate the contrast with auditing where Rikard’s comment resonates strongly.

Two types of binary opinion (naturally!)

Firstly, I’d better explain that the type of binary opinions vary depending on whether one is talking about internal or external auditing. In internal auditing they would take the form of pass/fail checking of controls. Are they present? Are they complied with?

Binary opinions in external auditing have historically been largely about the truth and fairness of the company accounts, or about whether the company is a going concern. That has been the core of the external audit report. In recent years there has been the added requirement imposed by the Sarbanes-Oxley Act for US companies to express an opinion on whether the framework of internal controls is effective.

Binary opinions in internal audit – a relic of yesteryear

There isn’t a great deal of debate in internal auditing circles about binary opinions. Traditional internal auditing focused on internal controls. The debate has been held, and the overwhelming consensus, at least in informed circles, is that any audit that offers only binary opinions is hopelessly limited, blinkered and hopelessly outdated.

I like the definition of internal controls from Anthony Catenach.

Internal controls are how management makes sure the company’s business model is operating correctly.

If you view internal controls in that broader perspective then you should be able to see how simple binary opinions are unhelpful. Auditors need to set their findings in context and explain why they are significant and what danger they pose. Simply saying that certain controls are missing, or have not been applied, is unhelpful. That’s not to say that such simplistic audits have vanished.

A couple of weeks ago I was speaking to a friend who works as a developer for a multinational company. He told me that the internal auditors work from a checklist, using questions that require yes/no answers. People are very way of the auditors and answer only direct questions without offering anything more. It horrifies me that auditors should ever accept the answer “yes” or “no” without following up with “why?”.

That sort of auditing is ineffective and unprofessional. I can’t stress strongly enough that audit checklists have a place, but they are not the audit! They are merely the starting point for a conversation.

Previously to illustrate how an audit interview is conducted I have used the analogy of an advocate (barrister or attorney) questioning a witness in court. The advocate cannot know what answer the witness will give and has to vary the follow up questions accordingly, rather than ploughing on with a prepared script. Conducting an audit by checklist is very much like sticking to the script regardless of the answers.

This is now orthodox modern opinion. The opinion formers, the leading lights of the profession know that binary opinions are dated and the debate has moved on to risk; how can auditors inform stakeholders about the risks that matter, the risks that keep them awake at night? How can auditors help management to understand the risks that they are facing and to take decisions that are better informed about the risks?

Regulators and binary opinions in external audit

The debate about binary opinions in internal audit may be largely over but it is still very much alive in external audit. The regulators in the UK and the USA are pushing hard for auditors to provide more useful opinions in their reports rather than relying on simple, and frequently misleading, binary opinions.

The response from the Big 4 audit firms has been cool, but telling the regulators to take a hike is politically tricky! They have to engage with the debate. It’s not good enough for them to defend current practices. The problems with these are glaringly obvious, so they have to respond constructively.

The position is slightly confused by Sarbanes-Oxley’s requirement that external auditors state whether they believe the framework of internal controls is effective. That takes them into internal audit territory, and raises concerns about whether such a judgement can be accurate or helpful. Certainly the experience of recent years isn’t encouraging.

There are countless examples of companies whose accounts have been passed by their external auditors, only to collapse from problems that existed before the audit was conducted. Remember Enron? That debacle led to the demise of one of the world’s biggest firms of accountants, Arthur Andersen. Remember the banks who collapsed? All sailed through their audits, with the auditors picking up multi-million pound fees for offering opinions that proved groundless.

I’m not suggesting that these fees were too high. Perhaps they were too low and worthwhile audits and opinions would be more expensive. However, I am saying that the current reporting regime, with too much emphasis on binary opinions, provides lousy value for money. That is not a minority view. It is the view of the regulators in the UK and USA. It will be interesting to see where the EU moves in this regard.

Testers are not alone

This is far too big and complex an area for me to cover in any detail either now or in my tutorial at EuroStar, even if it is of any interest to any testers except me! However, I think it’s important to understand that there is a big and influential profession wrestling with some of the issues facing testers.

Auditors have to think about how they work, what value they provide, what they should look for, what knowledge they can reasonably provide. Indeed, the more thoughtful auditors are thinking about what knowledge means in their context, how they can “know” things, what constitutes evidence and opinion.

This is epistemology, and it is fascinating. Thinking about this is not some esoteric academic exercise. If we are not clear about what we can know and how we should investigate and report on the knowledge that is available then the danger is that we will end up just faking the whole exercise. We will continue to dress up subjective opinons as “objective” binary verdicts; “yes” this is ok, “no” it isn’t.

Reality doesn’t become clearer simply by pretending that it can be reduced to binary opinions. Quite the reverse, messy reality is obscured by a binary approach. Auditors know that, or at least the clever ones do. There are plenty of smart and capable auditors out there, trying to make sense of what is going on.

The good ones are natural allies of good testers. Seek them out and make them your allies. As for the bad ones, well they are still around as my friend can testify. Their approach is inept and unprofessional. It might not be wise to use these words! It might be interesting to ask them some difficult questions about how they can square their approach with the views of the auditing establishment, the professional bodies and the regulators.

It’s a pity that the self appointed testing establishment, ISTQB and ISO, can’t take a similarly clear line. Sadly their silence effectively endorses binary opinions. Self appointed shouldn’t mean self interested.

How am I wasting your time?

At the weekend I was reading this fascinating column by Oliver Burkeman on cutting out time-wasting activities. He talks about the importance of having a “stop doing” list, as well as a “to do” list. It’s an interesting, well-written piece, mainly about the work of Peter Drucker

I was particularly interested in this quote from Drucker.

“if you’re a boss, develop the habit of asking your underlings, “without coyness”, the one question that will trigger more improvement than any other: “What do I do that wastes your time without contributing to your effectiveness?”

The team culture quadrant

One of the early lessons I learned for myself when I started managing teams was how the culture and strength of the team largely dictated the manager’s job.
team culture quadrant

I formulated a rough quadrant illustrating what I felt my priorities were, shaped by experiences with two employers, with differing cultures.

If the team is weak, consisting of inexperienced or poorly performing members then the priority is to help the team shape up; assisting the willing but inexperienced as they develop, removing the time wasters if possible, and at least preventing them from disrupting the productive team members.

If the culture is healthy then this a challenging, but relatively straightforward and certainly rewarding role, provided that the manager really understands what the team are supposed to be doing.

If the team is strong but the culture is unhealthy then the job of the manager is to protect the team from distraction and problems that would waste their time. The manager deals with the crap so the team doesn’t have to. The manager should also be trying to change the culture, pointing out the problems, arguing for improvements and generally trying to shape the environment so that good teams can do good work as efficiently, effectively and happily as possible.

That’s obviously a tough task, and it might not be possible for an individual to bring about serious improvement, but it’s better to fight constructively than to suffer passively.

I was managing a team in this position once, and a programmer asked me what on earth I did with my time? He couldn’t see what work I was doing? He was a friend, so he knew I wouldn’t take offence. I actually regarded it as confirmation that I was doing a good job.

The team were all high calibre and hard working. I had to spend a lot of time handling the users, turning mushy requirements into stuff we could work with, negotiating with other departments. In the meantime the team were whizzing along in fine style, oblivious to the problems that they weren’t hitting. If the team is in the groove it’s just fine by me if they’re taking that state of affairs for granted.

The job gets really tough and stressful when the manager is faced with a weak team in an unhealthy culture; endless unproductive meetings, pointless reports, meaningless metrics for layers of management who can’t understand them, and overly detailed and proscriptive plans that pretend we can know what individuals will be doing in a few months? Yup, I’ve been there and got paid a good salary for doing little that was genuinely valuable. Meanwhile the team is floundering and needing positive, patient time-consuming support. This is where you earn your ulcers.

The time-wasting rubbish is inescapable, but the priority has to be to strengthen the team. Here it is particularly important to weed out those who are slowing the team down; the idle, the awkward, the cheerfully incompetent who have no interest in improving. It’s never easy, unless they’re contractors. The poor performers are generally well known, and no-one is going to willingly take them off your hands.

So – exactly how am I wasting your time?

If you get to the point where you have a strong team in a healthy culture then that’s great, for the short term at least. The trouble is that you are almost redundant, and you need to be careful that you are not getting in the way of the team. If things are humming along smoothly it’s not a great idea to coast. Sooner or later someone is going to twig that you aren’t contributing much. It’s far better to speak up and point out that your skills are being under-utilised, and maybe it’s time to move on to a new role.

I managed to work all that out for myself, and I still stand by it all. However, I’d never explicitly thought of that point from Peter Drucker; why not ask team members what you do as a manager that wastes their time?

It’s exactly when you might feel you’ve arrived at the happy combination of strong team and healthy culture that the sole, significant remaining problem could be you. I can’t believe I missed it. I’m sure Drucker was right and that it could be the most important question you could ask your team. How am I wasting your time?