The Volkswagen emissions scandal; responsible software testing?

The scandal blows up in Volkswagen’s face

The Volkswagen emissions scandal has been all over the media worldwide since the US Environmental Protection Agency hit VW with a notice of violation on 18th September.

This is a sensational story and there are many important and fascinating aspects to it, but there is one angle I haven’t seen explored that I find fascinating. Many of the early reports focused on the so called “defeat device” that the EPA referred to. That gave the impression the problem was a secret, discrete piece of kit hidden away in the engine. A defeat device, however, is just EPA shorthand for any illegal means of subverting its regulations. Such an illegal device is one that alters the emissions controls in normal running, outside a test. In the VW case the device is software in the car control software that could detect the special conditions under which emissions testing is performed. This is how the EPA reported the violation in its formal notice.

“VW manufactured and installed software in the electronic control module (ECM) of these vehicles that sensed when the vehicle was being tested for compliance with EPA emission standards. For ease of reference, the EPA is calling this the ‘switch’. The ‘switch’ senses whether the vehicle is being tested or not based on various inputs including the position of the steering wheel, vehicle speed, the duration of the engine’s operation, and barometric pressure. These inputs precisely track the parameters of the federal test procedure used for emission testing for EPA certification purposes.

During EPA emission testing, the vehicles’ ECM ran software which produced compliant emission results under an ECM calibration that VW referred to as the ‘dyno calibration’ (referring to the equipment used in emissions testing, called a dynamometer). At all other times during normal vehicle operation, the ‘switch’ was activated and the vehicle ECM software ran a separate ‘road calibration’ which reduced the effectiveness of the emission control system.”

What did Volkswagen’s testers know?

What interests me about this is that the defeat device is integral to the control system (ECM); the switch has to operate as part of the normal running of the car. The software is constantly checking the car’s behaviour to establish whether it is taking part in a federal emissions test or just running about normally. The testing of this switch would therefore have been part of the testing of the ECM. There’s no question of some separate piece of kit or software over-riding the ECM.

This means the software testers were presumably complicit in the conspiracy. If they were not complicit then that would mean that they were unaware of the existence of the different dyno and road calibrations of the ECM. They would have been so isolated from the development and the functionality of the ECM that they couldn’t have been performing any responsible, professional testing at all.

Passing on bad news – even to the very top

That brings me to my real interest. What does responsible and professional testing mean? That is something that the broadly defined testing community hasn’t resolved. The ISTQB/ISO community and the Context Driven School have different ideas about that, and neither has got much beyond high level aspirational statements. These say what testers believe, but don’t provide guiding principles that might help them translate their beliefs into action.

Other professions, or rather serious, established professions, have such guiding principles. After working as an IT auditor I am familiar with the demands that the Institute of Internal Auditors makes on the profession. If internal auditors were to discover the existence of the defeat device then their responsibility would be clear.

Breaking the law by cheating on environmental regulation introduces huge risk to the corporation. The auditors would have to report that and escalate their concern to the Audit Committee, on which non-executive directors should sit. In the case of VW the Audit Committee is responsible for risk management and compliance. Of its four members one is a senior trade union official and another is a Swedish banker. Such external, independent scrutiny is essential for responsible corporate governance. The internal auditors are accountable to them, not the usual management hierarchy.

Of course escalation to the Audit Committee would require some serious deliberation and would be no trivial matter. It would be the nuclear option for internal auditors, but in principle their responsibility is simple and clear; they must pursue and escalate the issue or they are guilty of professional misconduct or negligence. “In principle”; that familiar phrase that is meaningless in software testing.

If internal auditors had deteced the ECM defeat device they might have done so when conducting audit tests on the software as part of a risk based audit, having decided that the regulatory implications meant the ECM was extremely high risk software. However, it is far more likely that they would have discovered it after a tip off from a whistleblower (as is often the case with serious incidents).

What is the responsibility of testers?

This takes us back to the testers. Just what was their responsibility? I know what I would have considered my moral duty as a tester, but I know that I would have left myself in a very vulnerable position if I had been a whistleblower who exposed the existence of the defeat device. As an auditor I would have felt bullet proof. That is what auditor independence means.

So what should testers do when they’re expected to be complicit in activities that are unethical or illegal or which have the whiff of negligence? Until that question is resolved and testers can point to some accepted set of guiding principles then any attempts to create testing standards or treat testing as a profession are just window dressing.

Addendum – 30th September 2015

I thought I’d add this afterthought. I want to be clear that I don’t think the answer to the problem would be to beef up the ISTQB code of ethics and enforce certification on testers. That would be a depressingly retrograde step. ISTQB lacks any clear and accepted vision of what software testing is and should be. The code of ethics is vague and inconsistent with ISTQB’s own practices. It would therefore not be in a credible position to enforce compliance, which would inevitably be selective and arbitrary.

On a more general note, I don’t think any mandatory set of principles is viable or desirable under current and foreseeable circumstances. By “mandatory” I mean principles to which testers would have to sign up and adhere to if they wanted to work as testers.

As for ISO 29119, I don’t think that it is relevant one way or another to the VW case. The testers could have complied with the standard whilst conspiring in criminal acts. That would not take a particularly imaginative form of creative compliance.

Sarbanes-Oxley & scripted testing

This post was prompted by an article from 2013 by Mukesh Sharma that Sticky Minds recycled this week. I disagree with much of the article, about exploratory and scripted testing and about the nature of checklists. However, I’m going to restrict myself to Mukesh Sharma’s comments about regulatory compliance, specifically Sarbanes Oxley.

“In such (regulatory) scenarios the reliance on scripted testing is heavy, with almost no room for exploratory testing. Other examples include testing for Sarbanes-Oxley… and other such laws and acts, which are highly regulated and require strict adherence to defined guidelines.”

Let’s be clear. The Sarbanes-Oxley legislation does not mention software testing, never mind prescribe how it should be performed. It does mention testing, but this is the testing that auditors perform. Standards and quality control also feature, but these relate to the work of accountants and auditors.

Nevertheless, compliance with Sarbanes-Oxley does require “strict adherence to defined guidelines” but this is a requirement that is inferred from the legislation and not the law itself. The guidelines with which software testers must comply are locally defined testing policies and processes. Each compliant organisation must be able to point to a document that says “this is how we test here”. The legislation does have plenty to say about guidelines, but these are guidelines for sentencing miscreants. I suppose the serious consequences of non-compliance go a long way to explaining the over-reaction to Sarbanes-Oxley.

I suspect the pattern was that companies and consultants looked at how they could comply by following their existing approach to development and testing, then reinforced that. Having demonstrated that this would produce compliance they claimed that this was the way to comply. Big consultancies have always been happy to sell document heavy, process driven solutions because this gives them plenty of opportunity to wheel out inexperienced, young graduates to do the grunt work tailoring the boiler plate templates and documents.

I used to detest Sarbanes-Oxley, but that was because I saw it as reinforcing damaging practices. I’m still hardly a fan, but I eventually came to take a more considered approach because it doesn’t have to be that way. If you go to look at what the auditors have to say about Sarbanes-Oxley you get a very different perspective. ISACA (the professional body for IT auditors) provides a guide to SOX compliance (free to members only) and it doesn’t mention scripts at all. Appropriate test environments is a far bigger concern.

ISACA’s COBIT 5 model for IT governance (the full model is free to members only) doesn’t refer to manual test scripts. It does require testers to “consider the appropriate balance between automated scripted tests and interactive user testing”. For manual testing COBIT 5 prefers the phrase “clearly defined test instructions” rather than “scripts”. The requirement is for testers to be clear about what will be done, not to document traditional test scripts in great detail in advance. COBIT 5 is far more insistent on the need to plan your testing carefully, have proper test environments and retain the evidence. You have do all that properly, it’s non-negotiable.

COBIT 5 matters because if you comply with that then you will comply with Sarbanes-Oxley. Consultancies who claim that you have to follow their heavyweight, document driven processes in order to comply are being misleading. You can do it that way, just like you could drive from New York to Miami via Chicago. You get there in the end, but there are better ways!

Exploratory testing, Context Driven Testing and Bach & Bolton’s Rapid Test Management are all consistent with the demands of Sarbanes-Oxley compliance provided you know what you’re doing and take the problem seriously, caveats that apply to any testing approach. If anyone tells you that Sarbanes-Oxley requires you to test in a particular way challenge them to quote the relevant piece of legislation or an appropriate auditor’s interpretation. You can be sure that it’s a veiled sales pitch – or they don’t know what they are talking about. Or both perhaps!

Games & scripts – stopping at red lights

I’ve been reading about game development lately. It’s fascinating. I’m sure there is much that conventional developers and testers could learn from the games industry. Designers need to know their users, how they think and what they expect. That’s obvious, but we’ve often only paid lip service to these matters. The best game designers have thought much more seriously and deeply about the users than most software developers.

Game designers have to know what their customers want from games. They need to understand why some approaches work and others fail. Developing the sort of games that players love is expensive. If the designers get it wrong then the financial consequences are ruinous. Inevitably the more thoughtful designers stray over into anthropology.

That is a subject I want to write about in more depth, but in the meantime this is a short essay I wrote in slightly amended form for Eurostar’s Test Huddle prompted by a blog by Chris Bateman the game designer. Bateman was talking about the nature of play and games, and an example he used made me think about how testing has traditionally been conducted.Midtown Madness

“Ramon Romero of Microsoft’s Games User Research showed footage of various random people playing games for the first time. I was particular touched by the middle aged man who drove around in Midtown Madness as if he was playing a driving simulator. ‘This is a great game’, he said, as he stopped at a red light and waited for it to change.”

That’s ridiculous isn’t it? Treating a maniacal, race game as a driving simulator? Maybe, but I’m not sure. That user was enjoying himself, playing the game entirely in line with his expectations of what the game should be doing.

The story reminded me of testers who embark on their testing armed with beliefs that are just as wildly misplaced about what the game, sorry, the application should be doing. They might have exhaustive scripts generated from requirements documents that tell them exactly what the expected behaviour should be, and they could be dangerously deluded.

Most of my experience has been with financial applications. They have been highly complicated, and the great concern was always about the unexpected behaviour, the hidden functionality that could allow users to steal money or screw up the data.

Focusing on the documented requirements, happy paths and the expected errors is tackling an application like that Midtown Madness player; driving carefully around the city, stopping at red lights and scrupulously obeying the law. Then, when the application is released the real users rampage around discovering what it can really do.

Was that cheerfully naïve Midtown Madness player really ridiculous? He was just having fun his way. He wasn’t paid a good salary to play the game. The ones who are truly ridiculous are the testers who are paid to find out what applications can do and naively think they can do so by sticking to their scripts. Perhaps test scripts are rather like traffic regulations. Both say what someone thinks should be going on, but how much does that actually tell you about what is really happening out there, on the streets, in the wild where the real users play?

A single source of truth?

Lately in a chatroom for the International Society for Software Testing there has been some discussion about the idea of a “single source of truth”. I’m familiar with this in the sense of database design. Every piece of data is stored once and the design precludes the possibility of inconsistency, of alternative versions of the same data. That makes sense in this narrow context, but the discussion revealed that the phrase is now being used in a different sense. A single source of truth has been used to describe an oracle of oracles, an ultimate specification on which total reliance can be placed. The implications worry me, especially for financial systems, which is my background.

I’m not comfortable with a single source of truth, especially when in applies to things like bank balances, profit and loss figures, or indeed any non-trivial result of calculations. What might make more sense is to talk of a single statement of truth, and that statement could, and should, have multiple sources so the statement is transparent and can be validated. However, I still wouldn’t want to talk about truth in financial statements. For an insurance premium there are various different measures, which have different uses to different people at different times. When people start talking about a single, true, premium figure they are closing off their minds to reality and trying to redefine it to suit their limited vision.

All of these competing measures could be regarded as true in the right context, but there are other measures which are less defensible and which an expert would consider wrong, or misleading, in any context (eg lumping Insurance Premium Tax into the premium figure). That’s all quite aside from the question of whether these measures are accurate on their own terms.

A “single source of truth” reminds me of arguments I’d have with application designers. Sometimes the problem would be that they wanted to eliminate any redundancy in the design. That could make reconciliation and error detection much harder because the opportunities to spot errors would be reduced. If a calculation was wrong it might stay wrong because no-one would know. A different source of friction was the age old problem of analysts and designers determined to stick rigidly to the requirements without questioning them, or even really thinking about the implications. I suspect I was regarded as a pedantic nuisance, creating problems in places the designers were determined no problems could ever exist – or ever be visible.

Accounting for truth

Conventional financial accounting is based on double entry book-keeping, which requires every transaction to be entered twice, in different places so that the accounts as a whole remain in balance. There may be a single, definitive statement of profit, but that is distilled from multiple sources, with an intricate web of balances and documented, supporting assumptions. The whole thing is therefore verifiable, or auditable. But it’s not truth. It’s more a matter of saying “given these assumptions this set of transactions produces the following profit figure”. Vary the assumptions and you have a different and perhaps equally valid figure – so it’s not truth.

For many years academic accountants, e.g. Christopher Napier, have been doing fascinating work that strays over into philosophy. What is this reality that we are trying to understand? That’s ontology. What can we know about it, and what reliance can we put on that knowledge when we try to report it? That’s epistemology. Why are we doing it? That’s teleology.

The most interesting subject I ever studied in accountancy at university was the problem of inflation accounting. £6-£5=£1 might be a crude profit calculation for an item whose inputs cost you £5 and which you sold for £6. But what if the £5 was a cost incurred 11 months ago. You then buy replacement inputs, which now cost £7, but you’d still only be able to sell the finished product for £6. What does it mean to say you made a profit of £1? Who does that help? Couldn’t you also argue that you made a loss of £1?

What does it mean to add money together when the different elements were captured at dates when the purchasing power equivalent of that money was different. You’re adding apples and oranges. The value of money is dependent on what it can buy. Setting aside short term speculation that is what dictates currency exchange rates. £1 is more valuable than €1 because it buys more. It is meaningless to add £1 + €1 and get 2. An individual currency has different values over time, so is it any more meaningful to add different monetary figures without considering what their value was at the time the data was captured?

The academics pointed out all the problems inflation caused and came up with possible, complicated solutions. However, the profession eventually decided it was all just too difficult and pretty much gave up, except for an international standard for accounting in countries experiencing hyper-inflation (defined as greater than 100% over three years, i.e. a persisting annual rate of at least 26%). As at the end of 2014 the qualifying countries are Belarus, Venezuela, Sudan, Iran and Syria (which has rather more to worry about than financial accounting). For the rest of the world, if you want to add 5 apples and 6 oranges, that’s fine. You’ve now got 11 pieces of fruit. Stop worrying and just do the job.

I’m the treasurer for a church, and I’m often asked how much money we’ve got. I never bother going to the online bank statement, because I know that what people really want to know is how much money is available. So I use the church accounts, which factor in the income and payments that haven’t been cleared, and the money we’re due imminently, and the outgoings to which we’re already committed. These different figures all mesh together and provide a figure that we find useful, but which is different from the bank’s view of our balance. Our own accounts never rely on a single source of truth. There are multiple reconciliation checks to try and flag up errors. The hope is that inputting an incorrect amount will generate a visible error. We’re not reporting truth. All we can say is, so far as we know this is as useful and honest a statement of our finances as we can produce for our purposes, for the Church of Scotland, the Office of the Scottish Charity Regulator and the other stakeholders.

It’s messy and complex – deal with it

What’s it all got to do with testing? If your vision of testing is checking whether the apparent functionality is consistent with the specification as represented in the test script then this sort of messy complexity is a tedious distraction. It’s so much easier to pretend you can confirm the truth using a test script.

However, testing is (or should be) a difficult and intellectually demanding process of teasing out the implications of the application for the stakeholders. If you accept that, then you are far more likely to do something valuable if you stop thinking about any single source of truth. You should be thinking instead about possible sources of insight to help you shed light on the various “truths” that the various stakeholders are seeking. Understanding these different needs, and all the nuances that arise from them is essential for testers.

Assuming that there is a single truth that we can attest to with a simple, binary yes/no answer reduces testing to the level of the accountants who have tried to treat accountancy as a simple arithmetical exercise. Five oranges and six apples add up to eleven pieces of fruit; and so do eleven grapes, and eleven melons. So what? That is a useless and misleading piece of information, like the unqualified statement that the product is sound because we found what the script told us to look for. Testers, accountants and auditors all pick up good money because they are required to provide valuable information to people who need it. They should be expected to deal with messy, complex reality. They should not be allowed to get away with trying to redefine reality so it’s easier to handle.

They can’t handle the truth

Have you ever had to deal with managers or users who were sceptical about the time and effort a piece of work would take? Have you ever complained in vain about a project that was clearly doomed to fail right from the start? Have you ever felt that a project was being planned on the basis of totally unjustified optimism?

If you’ve been in IT for a while there’s a good chance you’ve answered “yes” to at least one of these questions. Over the years I grew wearily familiar with the pattern of willful refusal to consider anything but the happy path to a smooth, speedy delivery of everything on the wish list, within a comical budget that is challenging I admit, but realistic if we all pull together.

Over time I gradually came to realise that many senior managers and stakeholders didn’t want the truth. They want the fiction, to be lied to because knowing the truth would make them responsible for dealing with it. In their world it is better to be deceived and then firefight a failing project than to deal honestly with likely problems and uncertainty. Above all, they can’t bring themselves to deal with the truth of uncertainty. It is far more comfortable to pretend that uncertainty is evidence of lack of competence, that problems can be anticipated, that risks can be ignored or managed out of existence, that complexity can be eliminated by planning and documentation (and by standards).

Telling the truth – a brave act in an unfair world

Perhaps the toughest roles in IT are those that are senior enough to be accountable for the results, but too junior to beat uncomfortable truths into the brains of those who really don’t want to know.

These budding fall guys have the nous and experience to see what is going to happen. One of the rarely acknowledged skills of these battle scarred veterans is the ability to judge the right moment and right way to start shouting the truth loudly. Reveal all too early and they can be written off as negative, defeatist, “not a team player”. Reveal it too late and they will be castigated for covering up imminent failure, and failing to comply with some standard or process. Everyone fails to comply. Not everyone is going to be kicked for it, but late deliverers of bad news are dead meat.

Of course that’s not fair, but that’s hardly the point. Fairness isn’t relevant if the culture is one where rationality, prudence and pragmatism all lead to crazy behaviour because that is what is rewarded. People rationally adapt to the requirement to stop thinking when they see others being punished for honesty and insight.

What is an estimate?

So what’s the answer? The easy one is, “run, and run fast”. Get out and find a healthier culture. However, if you’re staying then you have to deal with the problem of handling senior people who can’t handle the truth.

It is important to be clear in your own mind about what you are being asked for when you have to estimate. Is it a quote? Is there an implied instruction that something must be delivered by a certain date? Are there certain deliverables that are needed by that date, and others that can wait? Could it be a starting point for negotiation? See this article I wrote a few years ago.

Honesty is non-negotiable

It’s a personal stance, but honesty about uncertainty and the likelihood of serious but unforeseeable problems is non-negotiable. I know others have thought I have a rather casual attitude towards job security and contract renewal! However, I can’t stomach the idea of lingering for years in an unhealthy culture. And it’s not as if honesty means telling the senior guys who don’t want the truth that they are morons (even if they are).

Honesty requires clear thinking, and careful explanation of doubt and uncertainty. It means being a good communicator, so that the guys who take the big decisions have a better understanding that your problems will quickly become their problems. It requires careful gathering of relevant information if you are ordered into a project death march so that you can present a compelling case for a rethink when there might still be time for the senior managers and stakeholders to save face. Having the savvy to help the deliberately ignorant to handle the truth really is a valuable skill. Perhaps Jack Nicholson’s character from “A Few Good Men” isn’t such a great role model, however. His honesty in that memorable scene resulted in him being arrested!

Personal statement of opposition to ISO 29119 based on principle


I believe that prescriptive standards are inappropriate for software testing. This paper states my objections in principle, i.e. it explains why I believe that the decision to develop ISO 29119 was fundamentally misguided. These objections are valid without any consideration of the detailed content of the standard, which testers cannot view for themselves unless they or their employers buy a set. Members of the Context Driven School of testing (CDT) were making these arguments before ISO 29119 was issued, or even conceived.

This is not a conventional article in essay style. It is more in the nature of a work of reference, providing sources, links and a basis for further work. I expect to update it periodically. The original version arose from work that I did for the Association for Software Testing’s (AST) Committee on Standards and Professional Practices.

My objections in principle fall into four categories, based on regulatory theory and practice, the nature of software development, the social sciences, and the need for fair competition. These objections are based on academic and practical evidence. A full explanation of my objections could run to book length. I will therefore restrict myself to a brief statement of each objection, with supporting links.

The AST’s mission and the principles of CDT have informed and guided my analysis so I shall start by quoting them.

AST mission

The AST’s mission, as stated on its website is as follows.

“The Association for Software Testing is dedicated to advancing the understanding of the science and practice of software testing according to Context-Driven principles.

The Association for Software Testing (AST) is an international non-profit professional association with members in over 50 countries. AST is dedicated and strives to build a testing community that views the role of testing as skilled, relevant, and essential to the production of faster, better, and less expensive software products. We value a scientific approach to developing and evaluating techniques, processes, and tools. We believe that a self-aware, self-critical attitude is essential to understanding and assessing the impact of new ideas on the practice of testing.”

CDT principles

The seven basic principles of Context-Driven Testing (CDT) are as follows.

  1. “The value of any practice depends on its context.
  2. There are good practices in context, but there are no best practices.
  3. People, working together, are the most important part of any project’s context.
  4. Projects unfold over time in ways that are often not predictable.
  5. The product is a solution. If the problem isn’t solved, the product doesn’t work.
  6. Good software testing is a challenging intellectual process.
  7. Only through judgment and skill, exercised cooperatively throughout the entire project, are we able to do the right things at the right times to effectively test our products.”

1 – Objections based on regulatory theory and practice

1a – Principles and rules

There has been much discussion in recent years about the relative merits of principles and rules in regulating and influencing behavior. For the purposes of this paper I will define principles as non-specific statements of what is expected, and rules as detailed and specific statements of what must be done.

I believe that for a complex, context dependent and cognitively demanding activity such as software testing it is unhelpful to present a binary choice between either principles or rules. It is far more effective to combine a set of general principles applying to all testers with specific rules that are relevant to particular organizations and settings. See Braithwaite’s “Rules and principles; a theory of legal certainty”. For these rules and principles to work effectively there should be constant discussion, or regulatory conversations (PDF, opens in a new tab) between regulators and those being regulated about the meaning and application of the principles.

The situation is confused because standards in legal discussions are usually assumed to be synonymous with principles. A standard is usually conceived as a clear statement of what must be achieved by an activity, rather than how it should be performed in detail. Standards for software testing have traditionally be pitched at the detailed level of rules, e.g. IEEE 829.

A standard that would be appropriate for software testing would therefore be brief, clear and non-specific.

A suitable example of such a standard would be the International Standards for the Professional Practice of Internal Auditing.

1b – Regulators

Following on from the last point I believe that the specific requirements of industry regulators are of paramount importance for testers in those industries. Any testing standards should be framed as principles in such a way that they are consistent with those requirements and not attempt to provide competing, detailed rules.

As an example, the US Food and Drugs Administration offers clear advice about what is required, focusing on the need for testing to provide high quality evidence that is capable of standing up in court. The FDA advice does not specify exactly how this should be done, but does allow companies to adopt the “least burdensome approach”. See “General Principles of Software Validation; Final Guidance for Industry and FDA Staff”, 2002.

1c – Accountability

A frequent justification of the need for software testing standards is that testers should be accountable to stakeholders: testers must demonstrate that they are providing valuable information, effectively and efficiently.

We agree that accountability is important, but do not believe that prescriptive standards meet that need. In evidence I point to the requirements of the audit profession.

1c1 – IIA standards

The standards of the global Institute of Internal Auditors offer no support for generic, prescriptive testing standards. The Global Technology Audit Guide, “Management of IT Audit” (PDF, opens in a new tab), 1st edition, 2006 describes the Snowflake Theory of IT Audit:

“Every IT environment is unique and represents a unique set of risks. The differences make it increasingly difficult to take a generic or checklist approach to auditing.”

In the section “Frameworks and Standards” the Audit Guide says:

“One challenge auditors face when executing IT audit work is knowing what to audit against. Most organizations have not fully developed IT control baselines for all applications and technologies. The rapid evolution of technology could likely render any baselines useless after a short period of time.”

Although the IIA Global Technology Audit Guides refer to ISO standards “for consideration” as baselines against which to audit they make no mention of software testing standards and emphasize the need to understand the different mix of standards, methods and tools that each organization uses, and that they should not expect to see any set of “best practice” implemented wholesale without customization.

1c2 – Information Systems Audit and Control Association

COBIT 5 is ISACA’s model for IT governance. It stresses the need for clear organization-specific testing standards, and for careful documentation of test plans and results. However, there is no mention of organizations incorporating external standards. “Best practices” are to be used as a “reference when improving and tailoring”. Organizations should “employ code inspection, test-driven development practices, automated testing, continuous integration, walk-throughs and testing of applications as appropriate.” COBIT 5 has countless references to various external standards, but none to testing standards.

COBIT 5 is important because it is widely accepted as an effective means of complying with the requirements of Sarbanes-Oxley.

1c3 – End of binary opinions

The audit profession has traditionally offered binary opinions on whether accounts were true and fair, or internal controls were present or lacking. Increasingly the profession requires more nuanced reporting about the risks that matter, the risks that keep stakeholders up at night. This requires testers to offer more valuable information about the quality of products than merely saying how many test cases were run and passed. Testers have to tell a valuable story rather than rely on filling in generic standard based templates.

1c4 – Lifebelt for inexperienced investigators

Prescriptive standards act as a lifebelt for investigators and auditors who lack the experience and confidence to make their own judgments. They search for something they can use to give them reassuring answers about how a job should be done. The US Government Accountability Office in its March 2015 report (PDF, opens in a new tab) on the problems with the website checked the project for compliance with the IEEE 829 test documentation standard. The last revision was made in 2008, and it said that the content of standards more than five years old “do not wholly reflect the present state of the art”. In fact IEEE 829 is hopelessly out of date. It is now quite irrelevant to modern testing.

Standards have credibility and huge influence simply from their status as standards. If testers are to have standards it is essential that they are relevant, credible and framed in a way that does not mislead investigators.

2 – Objections based on the nature of software development

2a – Experience with IEEE829

2a1 – Documentation obsession

IEEE 829 was for many years the pre-eminent testing standard. It had a huge influence on testing. Many organizations based their testing processes and documents on this standard and its templates.

The result was a widespread fixation with excessively large, uninformative documents, which became the focus for testing teams’ activities, rather than acquiring knowledge about the product under test.

2a2 – Traditional methods

IEEE 829 was aligned conceptually with traditional methods, especially Structured Methods, which were based on the assumption that linear and documentation driven processes are necessary for a quality product and that more flexible, light-weight documentation approaches were irresponsible. However, Structured Methods were based on intuition rather than evidence and studies showed that a document driven approach did not match the way people think when creating software. There is a lack of evidence that the detailed and document driven approach associated with IEEE 829 was ever an effective, generic approach to testing.

2b – Complexity

Prescriptive processes are ill equipped to handle complex activities like software development. In software development it is impossible to state with certainty what the end product should look like at an early stage. This undermines the rational for a heavy investment in advance testing documentation.

2c – Cynefin

The Cynefin Framework is a valuable way to help us to make sense of the differing responses that different situations require. Software development and testing are inherently complex, and therefore “best practice” and prescriptive standards are inappropriate. A flexible, iterative approach is required.

3 – Objections based on the social sciences

3a – The nature of expertise.

Prescriptive standards do not take account of how people acquire skills and apply their knowledge and experience in cognitively demanding jobs such as testing.

3a1 – Tacit knowledge

Michael Polanyi and Harry Collins have offered valuable arguments about how we apply knowledge. Polanyi introduced the term “tacit knowledge” to describe the knowledge we have and use but cannot articulate; Collins built on his work. Foisting prescriptive standards onto skilled people is counter productive and encourages them to concentrate on the means rather than the ends.

3a2 – Schön’s reflection-in-action

Donald Schön (PDF, opens in a new tab) argues that creative professionals, such as software designers or architects, have an iterative approach to developing ideas. Much of their knowledge is tacit. They can’t write down all of their knowledge as a neatly defined process. To gain access to what they know they have to perform the creative act so they can learn, reflect on what they’ve learned and then apply the new knowledge. This is inconsistent with following a prescriptive process.

3b – Goal displacement

Losing sight of the true goals and focusing on methods, processes and documentation is a constant danger of prescriptive standards. Not only does this reflect my own experience there is a wealth of academic studies backing this up.

3b1 – Trained incapacity

A full century ago, in 1914, Thorstein Veblen identified the problem of trained incapacity. People who are trained in specific skills can lack the ability to adapt. Their response has worked in the past, and they apply it regardless thereafter. They focus on responding in the way they have been trained, and cannot see that the circumstances require a different response. Their training has rendered them incapable of doing the job effectively unless it fits their mental framework.

3b2 – Ways of seeing

In 1935 Kenneth Burke built on Veblen’s work, arguing that trained incapacity meant that one’s abilities become blindnesses. People can focus on the means or the ends, not both, and their specific training in prescriptive methods or processes leads them to focus on the means. They do not even see what they are missing.

3b3 – Conformity to the rules

Robert Merton made the point more explicitly in 1957.

“Adherence to the rules… becomes an end in itself… Formalism, even ritualism, ensues with an unchallenged insistence upon punctilious adherence to formalized procedures. This may be exaggerated to the point where primary concern with conformity to the rules interferes with the achievement of the purposes of the organization”.

So the problem had been recognised before software development was even in its infancy.

3c – Defense against anxiety

3c1 – Social defenses

Isabel Menzies Lyth (PDF, opens in a new tab) provided a different slant in the 1950s using her experience in psychoanalysis. Her specialism was analyzing the social dynamics of organizations.

She argued that the main factor shaping an organization’s structure and processes was the need for people to cope with stress and anxiety. “Social defenses” were therefore built to help people cope. The defenses identified by Menzies Lyth included rigid processes that removed discretion and the need for decision making, hierarchical staffing structures, increased specialization, and people being managed as fungible (i.e. readily interchangeable) units, rather than skilled professionals.

3c2 – Transitional objects

Donald Winnicott’s contribution (PDF, opens in a new tab) was the idea of the transitional object. This is something that helps infants to cope with loosening the bonds with their mother. Babies don’t distinguish between themselves and their mother. Objects like security blankets and teddy bears give them something comforting to cling onto while they come to terms with the beginnings of independence in a big, scary world.

David Wastell linked the work of Menzies Lyth and Winnicott. He found that developers used Structured Methods as transitional objects, i.e. as a defence mechanism to alleviate the stress and anxiety of a difficult job.

Wastell could see no evidence that Structured Methods worked. The evidence was that the resulting systems were no better than the old ones, took much longer to develop and were more expensive. Managers became hooked on technique and lost sight of the true goal.

“Methodology becomes a fetish, a procedure used with pathological rigidity for its own sake, not as a means to an end. Used in this way, methodology provides a relief against anxiety; it insulates the practitioner from the risks and uncertainties of real engagement with people and problems.”

3d – Loss of communication

Effective two way communication requires effort, “interpretive labor”. The anthropologist David Graeber (PDF, opens in a new tab) argues that the greater the degree of force or compulsion, and the greater the bureaucratic regime of rules and forms, the less is the communication. Those who issue the orders don’t need to, and therefore don’t take the trouble to understand the complexities of the situation they’re managing. This problem works against regulatory conversations (see 1a).

4 – Objections based on the need for competition in testing services

4a – ISO brand advantage

ISO has the reputation for being the global leader in standardization. Any standard that it issues has a huge advantage over alternative methods, simply because of the ISO brand. It is therefore vital that any testing standard is both credible and widely accepted on its own merits. My view, based on the evidence I have set out, is that a detailed, prescriptive standard could not meet those tests.

4b – Market for lemons

Buyers of testing services are often ill informed about the quality of the service that they buy. Economists recognize that where there is asymmetric information (PDF, opens in a new tab) purchasers are vulnerable and the market is distorted. Naive buyers at used car auctions cannot distinguish between good cars and lemons. This puts the sellers of lemons at an advantage. They can get a higher price than their product is worth; sellers of better products find it difficult to reach the price they want and are likely to leave the market, which becomes dominated by poor products.

For the reasons I have outlined, any prescriptive testing standard will help sellers of poor testing services to sell plausible “lemon” services. That will make it harder for sellers of high quality testing services to gain a worthwhile price; prices will drift downwards as testing is commoditized, sold on price rather than quality.

4c – Compulsion through contracts

Any ISO standard is likely to be referenced in contracts by lawyers and managers. This will introduce compulsion. This was the profession’s experience with IEEE 829. Even when IEEE 829 was not directly mandated it had huge influence on many organizations’ test strategies.

If standards are detailed, prescriptive and mandatory then that will reduce the flexibility that testers need for context driven testing. This would not fit the principles driven style of regulation that I outlined as being desirable in 1a. It would also lead to poorer communication, as described in 3d.

Standards – a charming illusion of action

The other day I posted an article I’d written that appeared on the uTest blog a few weeks ago. It was a follow up to an article I wrote last year about ISO 29119. Pmhut (the Project Management Hut website) provided an interesting comment.

”…are you sure that the ISO standards will be really enforced on testing – notably if they don’t really work? After all, lawyers want to get paid and clients want their projects done (regardless of how big the clients are).”

Well, as I answered, whether or not ISO 29119 works is, in a sense, irrelevant. Whether or not it is adopted and enforced will not depend on its value or efficacy. ISO 29119 might go against the grain of good software development and testing, but it is very much aligned with a hugely pervasive trend in bureaucratic, corporate life.

I pointed the commenter to an article I wrote on “Teddy Bear Methods”. People cling to methods not because they work, but because they gain comfort from doing so. That is the only way they can deal with difficult, stressful jobs in messy and complex environments. I could also have pointed to this article “Why do we think we’re different?”, in which I talk about goal displacement, our tendency to focus on what we can manage while losing sight of what we’re supposed to be managing.

A lesson from Afghanistan

I was mulling over this when I started to read a fascinating looking book I was given at Christmas; “Heirs to Forgotten Kingdoms” by Gerard Russell, a deep specialist in the Middle East and a fluent Arabic and Farsi speaker.

The book is about minority religions in the Middle East. Russell is a former diplomat in the British Foreign Office. The foreword was by Rory Stewart, the British Conservative MP. Stewart was writing about his lack of surprise that Russell, a man deeply immersed in the culture of the region, had left the diplomatic service, then added;

”Foreign services and policy makers now want ‘management competency’ – slick and articulate plans, not nuance, deep knowledge, and complexity.”

That sentence resonated with me, and reminded me of a blistering passage from Stewart’s great book “The Places in Between”, his account of walking through the mountains of Afghanistan in early 2002 in the immediate aftermath of the expulsion of the Taliban and the NATO intervention.

Rory Stewart is a fascinating character, far removed from the modern identikit politician. The book is almost entirely a dispassionate account of his adventures and the people whom he met and who provided him with hospitality. Towards the end he lets rip, giving his brutally honest and well-informed perspective of the inadequacies of the western, bureaucratic, managerial approach to building a democratic state where none had previously existed.

It’s worth quoting at some length.

“I now had half a dozen friends working in embassies, thinktanks, international development agencies, the UN and the Afghan government, controlling projects worth millions of dollars. A year before they had been in Kosovo or East Timor and in a year’s time they would have been moved to Iraq or Washington or New York.

Their objective was (to quote the United Nations Assistance Mission for Afghanistan) ‘The creation of a centralised, broad-based, multi-ethnic government committed to democracy, human rights and the rule of law’. They worked twelve- or fourteen- hour days, drafting documents for heavily-funded initiatives on ‘democratisation’, ‘enhancing capacity’, ‘gender’, ‘sustainable development,’ ‘skills training’ or ‘protection issues’. They were mostly in their late twenties or early thirties, with at least two degrees – often in international law, economics or development. They came from middle class backgrounds in Western countries and in the evenings they dined with each other and swapped anecdotes about corruption in the Government and the incompetence of the United Nations. They rarely drove their 4WDs outside Kabul because they were forbidden to do so by their security advisers. There were people who were experienced and well informed about conditions in rural areas of Afghanistan. But such people were barely fifty individuals out of many thousands. Most of the policy makers knew next to nothing about the villages where 90% of the population of Afghanistan lived…

Their policy makers did not have the time, structures or resources for a serious study of an alien culture. They justified their lack of knowledge and experience by focusing on poverty and implying that dramatic cultural differences did not exist. They acted as though villagers were interested in all the priorities of international organisations, even when they were mutually contradictory…

Critics have accused this new breed of administrators of neo-colonialism. But in fact their approach is not that of a nineteenth-century colonial officer. Colonial administrations may have been racist and exploitative but they did at least work seriously at the business of understanding the people they were governing. They recruited people prepared to spend their entire careers in dangerous provinces of a single alien nation. They invested in teaching administrators and military officers the local language…

Post-conflict experts have got the prestige without the effort or stigma of imperialism. Their implicit denial of the difference between cultures is the new mass brand of international intervention. Their policy fails but no one notices. There are no credible monitoring bodies and there is no one to take formal responsibility. Individual officers are never in any one place and rarely in any one organisation long enough to be adequately assessed. The colonial enterprise could be judged by the security or revenue it delivered, but neo-colonialists have no such performance criteria. In fact their very uselessness benefits them. By avoiding any serious action or judgement they, unlike their colonial predecessors, are able to escape accusations of racism, exploitation and oppression.

Perhaps it is because no one requires more than a charming illusion of action in the developing world. If the policy makers know little about the Afghans, the public knows even less, and few care about policy failure when the effects are felt only in Afghanistan.”

Stewart’s experience and insight, backed up by the recent history of Afghanistan, allow him to present an irrefutable case. Yet, in the eyes of pretty much everyone who matters he is wrong. Governments and the military are prepared to ignore the evidence and place their trust in irrelevant and failed techniques rather than confront the awful truth; they don’t know what they’re doing and they can’t know the answers.

Vast sums of money, and millions of lives are at stake. Yet very smart and experienced people will cling on to things that don’t work, and will repeat their mistakes in the future. Stewart, meanwhile, is very unlikely to be allowed anywhere near the levers of power in the United Kingdom. Being right isn’t necessarily a great career move.

Deep knowledge, nuance and complexity

I’m conscious that I’m mixing up quite different subjects here. Software development and testing are very different activities from state building. However, both are complex and difficult. Governments fail repeatedly at something as important and high-profile as constructing new, democratic states, and do so without feeling the need to reconsider their approach. If that can happen in the glare of publicity is it likely that corporations will refrain from adopting and enforcing standards just because they don’t work? Whether or not they work barely matters. Such approaches fit the mindset and culture of many organisations, especially large bureaucracies, and once adopted it is very difficult to persuade them to abandon them.

Any approach to testing that is based on standardisation is doomed to fail unless you define success in a way that is consistent with the flawed assumptions of the standardisation. What’s the answer? Not adopting standards that don’t work is an obvious start, but that doesn’t take you very far. You’ve got to acknowledge those things that Stewart referred to in his foreword to Gerard Russell’s book; answers aren’t easy, they require deep knowledge, an understanding of nuance and an acceptance of complexity.

A video worth watching

Finally, I’d strongly recommend this video of Rory Stewart being interviewed by Harry Kreisler of the University of California about his experiences and the problems I’ve been discussing. I’ve marked the parts I found most interesting.

34 minutes; Stewart is asked about applying abstract ideas in practice.

40:20; Stewart talks about a modernist approach of applying measurement, metrics and standardisation in contexts where they are irrelevant.

47:05; Harry Kreisler and then Stewart talk about participants failing to spot the obvious, that their efforts are futile.

49:33; Stewart says how his Harvard students regarded him as a colourful contrarian, and that all Afghanistan needed was a new plan and new resources.