The Volkswagen emissions scandal; responsible software testing?

The scandal blows up in Volkswagen’s face

The Volkswagen emissions scandal has been all over the media worldwide since the US Environmental Protection Agency hit VW with a notice of violation on 18th September.

This is a sensational story and there are many important and fascinating aspects to it, but there is one angle I haven’t seen explored that I find fascinating. Many of the early reports focused on the so called “defeat device” that the EPA referred to. That gave the impression the problem was a secret, discrete piece of kit hidden away in the engine. A defeat device, however, is just EPA shorthand for any illegal means of subverting its regulations. Such an illegal device is one that alters the emissions controls in normal running, outside a test. In the VW case the device is software in the car control software that could detect the special conditions under which emissions testing is performed. This is how the EPA reported the violation in its formal notice.

“VW manufactured and installed software in the electronic control module (ECM) of these vehicles that sensed when the vehicle was being tested for compliance with EPA emission standards. For ease of reference, the EPA is calling this the ‘switch’. The ‘switch’ senses whether the vehicle is being tested or not based on various inputs including the position of the steering wheel, vehicle speed, the duration of the engine’s operation, and barometric pressure. These inputs precisely track the parameters of the federal test procedure used for emission testing for EPA certification purposes.

During EPA emission testing, the vehicles’ ECM ran software which produced compliant emission results under an ECM calibration that VW referred to as the ‘dyno calibration’ (referring to the equipment used in emissions testing, called a dynamometer). At all other times during normal vehicle operation, the ‘switch’ was activated and the vehicle ECM software ran a separate ‘road calibration’ which reduced the effectiveness of the emission control system.”

What did Volkswagen’s testers know?

What interests me about this is that the defeat device is integral to the control system (ECM); the switch has to operate as part of the normal running of the car. The software is constantly checking the car’s behaviour to establish whether it is taking part in a federal emissions test or just running about normally. The testing of this switch would therefore have been part of the testing of the ECM. There’s no question of some separate piece of kit or software over-riding the ECM.

This means the software testers were presumably complicit in the conspiracy. If they were not complicit then that would mean that they were unaware of the existence of the different dyno and road calibrations of the ECM. They would have been so isolated from the development and the functionality of the ECM that they couldn’t have been performing any responsible, professional testing at all.

Passing on bad news – even to the very top

That brings me to my real interest. What does responsible and professional testing mean? That is something that the broadly defined testing community hasn’t resolved. The ISTQB/ISO community and the Context Driven School have different ideas about that, and neither has got much beyond high level aspirational statements. These say what testers believe, but don’t provide guiding principles that might help them translate their beliefs into action.

Other professions, or rather serious, established professions, have such guiding principles. After working as an IT auditor I am familiar with the demands that the Institute of Internal Auditors makes on the profession. If internal auditors were to discover the existence of the defeat device then their responsibility would be clear.

Breaking the law by cheating on environmental regulation introduces huge risk to the corporation. The auditors would have to report that and escalate their concern to the Audit Committee, on which non-executive directors should sit. In the case of VW the Audit Committee is responsible for risk management and compliance. Of its four members one is a senior trade union official and another is a Swedish banker. Such external, independent scrutiny is essential for responsible corporate governance. The internal auditors are accountable to them, not the usual management hierarchy.

Of course escalation to the Audit Committee would require some serious deliberation and would be no trivial matter. It would be the nuclear option for internal auditors, but in principle their responsibility is simple and clear; they must pursue and escalate the issue or they are guilty of professional misconduct or negligence. “In principle”; that familiar phrase that is meaningless in software testing.

If internal auditors had deteced the ECM defeat device they might have done so when conducting audit tests on the software as part of a risk based audit, having decided that the regulatory implications meant the ECM was extremely high risk software. However, it is far more likely that they would have discovered it after a tip off from a whistleblower (as is often the case with serious incidents).

What is the responsibility of testers?

This takes us back to the testers. Just what was their responsibility? I know what I would have considered my moral duty as a tester, but I know that I would have left myself in a very vulnerable position if I had been a whistleblower who exposed the existence of the defeat device. As an auditor I would have felt bullet proof. That is what auditor independence means.

So what should testers do when they’re expected to be complicit in activities that are unethical or illegal or which have the whiff of negligence? Until that question is resolved and testers can point to some accepted set of guiding principles then any attempts to create testing standards or treat testing as a profession are just window dressing.

Addendum – 30th September 2015

I thought I’d add this afterthought. I want to be clear that I don’t think the answer to the problem would be to beef up the ISTQB code of ethics and enforce certification on testers. That would be a depressingly retrograde step. ISTQB lacks any clear and accepted vision of what software testing is and should be. The code of ethics is vague and inconsistent with ISTQB’s own practices. It would therefore not be in a credible position to enforce compliance, which would inevitably be selective and arbitrary.

On a more general note, I don’t think any mandatory set of principles is viable or desirable under current and foreseeable circumstances. By “mandatory” I mean principles to which testers would have to sign up and adhere to if they wanted to work as testers.

As for ISO 29119, I don’t think that it is relevant one way or another to the VW case. The testers could have complied with the standard whilst conspiring in criminal acts. That would not take a particularly imaginative form of creative compliance.

I have followed up this article with a second post, written on 7th October.

Sarbanes-Oxley & scripted testing

This post was prompted by an article from 2013 by Mukesh Sharma that Sticky Minds recycled this week. I disagree with much of the article, about exploratory and scripted testing and about the nature of checklists. However, I’m going to restrict myself to Mukesh Sharma’s comments about regulatory compliance, specifically Sarbanes Oxley.

“In such (regulatory) scenarios the reliance on scripted testing is heavy, with almost no room for exploratory testing. Other examples include testing for Sarbanes-Oxley… and other such laws and acts, which are highly regulated and require strict adherence to defined guidelines.”

Let’s be clear. The Sarbanes-Oxley legislation does not mention software testing, never mind prescribe how it should be performed. It does mention testing, but this is the testing that auditors perform. Standards and quality control also feature, but these relate to the work of accountants and auditors.

Nevertheless, compliance with Sarbanes-Oxley does require “strict adherence to defined guidelines” but this is a requirement that is inferred from the legislation and not the law itself. The guidelines with which software testers must comply are locally defined testing policies and processes. Each compliant organisation must be able to point to a document that says “this is how we test here”. The legislation does have plenty to say about guidelines, but these are guidelines for sentencing miscreants. I suppose the serious consequences of non-compliance go a long way to explaining the over-reaction to Sarbanes-Oxley.

I suspect the pattern was that companies and consultants looked at how they could comply by following their existing approach to development and testing, then reinforced that. Having demonstrated that this would produce compliance they claimed that this was the way to comply. Big consultancies have always been happy to sell document heavy, process driven solutions because this gives them plenty of opportunity to wheel out inexperienced, young graduates to do the grunt work tailoring the boiler plate templates and documents.

I used to detest Sarbanes-Oxley, but that was because I saw it as reinforcing damaging practices. I’m still hardly a fan, but I eventually came to take a more considered approach because it doesn’t have to be that way. If you go to look at what the auditors have to say about Sarbanes-Oxley you get a very different perspective. ISACA (the professional body for IT auditors) provides a guide to SOX compliance (free to members only) and it doesn’t mention scripts at all. Appropriate test environments is a far bigger concern.

ISACA’s COBIT 5 model for IT governance (the full model is free to members only) doesn’t refer to manual test scripts. It does require testers to “consider the appropriate balance between automated scripted tests and interactive user testing”. For manual testing COBIT 5 prefers the phrase “clearly defined test instructions” rather than “scripts”. The requirement is for testers to be clear about what will be done, not to document traditional test scripts in great detail in advance. COBIT 5 is far more insistent on the need to plan your testing carefully, have proper test environments and retain the evidence. You have do all that properly, it’s non-negotiable.

COBIT 5 matters because if you comply with that then you will comply with Sarbanes-Oxley. Consultancies who claim that you have to follow their heavyweight, document driven processes in order to comply are being misleading. You can do it that way, just like you could drive from New York to Miami via Chicago. You get there in the end, but there are better ways!

Exploratory testing, Context Driven Testing and Bach & Bolton’s Rapid Test Management are all consistent with the demands of Sarbanes-Oxley compliance provided you know what you’re doing and take the problem seriously, caveats that apply to any testing approach. If anyone tells you that Sarbanes-Oxley requires you to test in a particular way challenge them to quote the relevant piece of legislation or an appropriate auditor’s interpretation. You can be sure that it’s a veiled sales pitch – or they don’t know what they are talking about. Or both perhaps!

Games & scripts – stopping at red lights

I’ve been reading about game development lately. It’s fascinating. I’m sure there is much that conventional developers and testers could learn from the games industry. Designers need to know their users, how they think and what they expect. That’s obvious, but we’ve often only paid lip service to these matters. The best game designers have thought much more seriously and deeply about the users than most software developers.

Game designers have to know what their customers want from games. They need to understand why some approaches work and others fail. Developing the sort of games that players love is expensive. If the designers get it wrong then the financial consequences are ruinous. Inevitably the more thoughtful designers stray over into anthropology.

That is a subject I want to write about in more depth, but in the meantime this is a short essay I wrote in slightly amended form for Eurostar’s Test Huddle prompted by a blog by Chris Bateman the game designer. Bateman was talking about the nature of play and games, and an example he used made me think about how testing has traditionally been conducted.Midtown Madness

“Ramon Romero of Microsoft’s Games User Research showed footage of various random people playing games for the first time. I was particular touched by the middle aged man who drove around in Midtown Madness as if he was playing a driving simulator. ‘This is a great game’, he said, as he stopped at a red light and waited for it to change.”

That’s ridiculous isn’t it? Treating a maniacal, race game as a driving simulator? Maybe, but I’m not sure. That user was enjoying himself, playing the game entirely in line with his expectations of what the game should be doing.

The story reminded me of testers who embark on their testing armed with beliefs that are just as wildly misplaced about what the game, sorry, the application should be doing. They might have exhaustive scripts generated from requirements documents that tell them exactly what the expected behaviour should be, and they could be dangerously deluded.

Most of my experience has been with financial applications. They have been highly complicated, and the great concern was always about the unexpected behaviour, the hidden functionality that could allow users to steal money or screw up the data.

Focusing on the documented requirements, happy paths and the expected errors is tackling an application like that Midtown Madness player; driving carefully around the city, stopping at red lights and scrupulously obeying the law. Then, when the application is released the real users rampage around discovering what it can really do.

Was that cheerfully naïve Midtown Madness player really ridiculous? He was just having fun his way. He wasn’t paid a good salary to play the game. The ones who are truly ridiculous are the testers who are paid to find out what applications can do and naively think they can do so by sticking to their scripts. Perhaps test scripts are rather like traffic regulations. Both say what someone thinks should be going on, but how much does that actually tell you about what is really happening out there, on the streets, in the wild where the real users play?

A single source of truth?

Lately in a chatroom for the International Society for Software Testing there has been some discussion about the idea of a “single source of truth”. I’m familiar with this in the sense of database design. Every piece of data is stored once and the design precludes the possibility of inconsistency, of alternative versions of the same data. That makes sense in this narrow context, but the discussion revealed that the phrase is now being used in a different sense. A single source of truth has been used to describe an oracle of oracles, an ultimate specification on which total reliance can be placed. The implications worry me, especially for financial systems, which is my background.

I’m not comfortable with a single source of truth, especially when it applies to things like bank balances, profit and loss figures, or indeed any non-trivial result of calculations. What might make more sense is to talk of a single statement of truth, and that statement could, and should, have multiple sources so the statement is transparent and can be validated. However, I still wouldn’t want to talk about truth in financial statements. For an insurance premium there are various different measures, which have different uses to different people at different times. When people start talking about a single, true, premium figure they are closing off their minds to reality and trying to redefine it to suit their limited vision.

All of these competing measures could be regarded as true in the right context, but there are other measures which are less defensible and which an expert would consider wrong, or misleading, in any context (eg lumping Insurance Premium Tax into the premium figure). That’s all quite aside from the question of whether these measures are accurate on their own terms.

A “single source of truth” reminds me of arguments I’d have with application designers. Sometimes the problem would be that they wanted to eliminate any redundancy in the design. That could make reconciliation and error detection much harder because the opportunities to spot errors would be reduced. If a calculation was wrong it might stay wrong because no-one would know. A different source of friction was the age old problem of analysts and designers determined to stick rigidly to the requirements without questioning them, or even really thinking about the implications. I suspect I was regarded as a pedantic nuisance, creating problems in places the designers were determined no problems could ever exist – or ever be visible.

Accounting for truth

Conventional financial accounting is based on double entry book-keeping, which requires every transaction to be entered twice, in different places so that the accounts as a whole remain in balance. There may be a single, definitive statement of profit, but that is distilled from multiple sources, with an intricate web of balances and documented, supporting assumptions. The whole thing is therefore verifiable, or auditable. But it’s not truth. It’s more a matter of saying “given these assumptions this set of transactions produces the following profit figure”. Vary the assumptions and you have a different and perhaps equally valid figure – so it’s not truth.

For many years academic accountants, e.g. Christopher Napier, have been doing fascinating work that strays over into philosophy. What is this reality that we are trying to understand? That’s ontology. What can we know about it, and what reliance can we put on that knowledge when we try to report it? That’s epistemology. Why are we doing it? That’s teleology.

The most interesting subject I ever studied in accountancy at university was the problem of inflation accounting. £6-£5=£1 might be a crude profit calculation for an item whose inputs cost you £5 and which you sold for £6. But what if the £5 was a cost incurred 11 months ago. You then buy replacement inputs, which now cost £7, but you’d still only be able to sell the finished product for £6. What does it mean to say you made a profit of £1? Who does that help? Couldn’t you also argue that you made a loss of £1?

What does it mean to add money together when the different elements were captured at dates when the purchasing power equivalent of that money was different? You’re adding apples and oranges. The value of money is dependent on what it can buy. Setting aside short term speculation that is what dictates currency exchange rates. £1 is more valuable than €1 because it buys more. It is meaningless to add £1 + €1 and get 2. An individual currency has different values over time, so is it any more meaningful to add different monetary figures without considering what their value was at the time the data was captured?

The academics pointed out all the problems inflation caused and came up with possible, complicated solutions. However, the profession eventually decided it was all just too difficult and pretty much gave up, except for an international standard for accounting in countries experiencing hyper-inflation (defined as greater than 100% over three years, i.e. a persisting annual rate of at least 26%). As at the end of 2014 the qualifying countries are Belarus, Venezuela, Sudan, Iran and Syria (which has rather more to worry about than financial accounting). For the rest of the world, if you want to add 5 apples and 6 oranges, that’s fine. You’ve now got 11 pieces of fruit. Stop worrying and just do the job.

I’m the treasurer for a church, and I’m often asked how much money we’ve got. I never bother going to the online bank statement, because I know that what people really want to know is how much money is available. So I use the church accounts, which factor in the income and payments that haven’t been cleared, and the money we’re due imminently, and the outgoings to which we’re already committed. These different figures all mesh together and provide a figure that we find useful, but which is different from the bank’s view of our balance. Our own accounts never rely on a single source of truth. There are multiple reconciliation checks to try and flag up errors. The hope is that inputting an incorrect amount will generate a visible error. We’re not reporting truth. All we can say is, so far as we know this is as useful and honest a statement of our finances as we can produce for our purposes, for the Church of Scotland, the Office of the Scottish Charity Regulator and the other stakeholders.

It’s messy and complex – deal with it

What’s it all got to do with testing? If your vision of testing is checking whether the apparent functionality is consistent with the specification as represented in the test script then this sort of messy complexity is a tedious distraction. It’s so much easier to pretend you can confirm the truth using a test script.

However, testing is (or should be) a difficult and intellectually demanding process of teasing out the implications of the application for the stakeholders. If you accept that, then you are far more likely to do something valuable if you stop thinking about any single source of truth. You should be thinking instead about possible sources of insight to help you shed light on the various “truths” that the various stakeholders are seeking. Understanding these different needs, and all the nuances that arise from them is essential for testers.

Assuming that there is a single truth that we can attest to with a simple, binary yes/no answer reduces testing to the level of the accountants who have tried to treat accountancy as a simple arithmetical exercise. Five oranges and six apples add up to eleven pieces of fruit; and so do eleven grapes, and eleven melons. So what? That is a useless and misleading piece of information, like the unqualified statement that the product is sound because we found what the script told us to look for. Testers, accountants and auditors all pick up good money because they are required to provide valuable information to people who need it. They should be expected to deal with messy, complex reality. They should not be allowed to get away with trying to redefine reality so it’s easier to handle.