Posted by: James Christie | February 7, 2014

Testing: valuable or bullshit?

I’ve recently been thinking about automation, not specifically about test automation, rather about the wider issue of machines replacing humans and how that might affect testers.Frey, C. Osborne, M. 'the future of employment' Oxford Univ 2013

It started when I discussed this chart with a friend, who is a church pastor. He had spotted that there was only a probability of 0.08% that computerisation would result in job losses for the clergy in the next two decades.

I was intrigued by the list, in particular the 94% probability that robots would take over from auditors. That’s nonsense. Auditors are now being asked to understand people, risks and assess corporate culture. They are moving away from the style of auditing that would have lent itself to computerisation.

Tick and bash compliance checking auditing is increasingly seen as old fashioned and discredited. Of course, much of the auditing done prior to the financial crash was a waste of time, but the answer is to do it properly, not replace inadequate human practices with cheap machines.

I periodically see similar claims that testing can be fully automated. The usual process is to misunderstand what a job entails, define it in a way that makes it amenable to automation, then say that automation is inevitable and desirable.

If the job of a pastor were to stand at the front of the church and read out a prepared sermon, then that could be done by a robot. However, the authors of this study correctly assumed that the job entails rather more than that.

Drill into the assumptions behind claims about automation and you will often find that they’re all hooey. Or that was my particular presumption, however. Confirmation bias isn’t just something affects other people!

The future of employment

So I went off to have a look at the sources for that study. Here is the paper itself, “The future of employment” (PDF, opens in a new tab) by Frey and Osborne.

The first think that struck me was the table above was only a very small selection of the jobs covered in the study. Here are the other jobs that are concerned with IT.

Job Probability
Software developers (applications) 0.04
Software developers (systems software) 0.13
Information security analysts, web developers, network architects 0.21
Computer programmers 0.48
Accountants & auditors 0.94
Inspectors, testers, sorters, samplers, & weighers 0.98

So testers are in even greater danger of being automated out of a job than auditors. You can see what’s going on. Testers have been assigned to a group that defines testing as checking. (See James Bach’s and Michael Bolton’s discussion of testing and checking). Not surprisingly the authors have then reasoned that testing can be readily automated.

There are some interesting questions raised by this list. The chances are almost 50:50 that computer programming jobs will be lost, yet 24 to 1 against applications software developers losing their jobs. Are these necessarily different jobs? Or is it just cooler to be a software developer than a computer programmer?

In fairness to the authors they are using job categories defined by the US Department of Labor. It’s also worth explaining that the authors don’t actually refer to the probability figure as being the probability that jobs would be lost. That would have made the conclusions meaningless. How many jobs would be lost? A probability figure could apply only to a certain level of job loss, e.g. 90% probability that 50% of the jobs would go, or 10% probability that all jobs would be lost.

The authors are calculating the “probability of computerisation”. I think they are really using susceptibility to computerisation as a proxy for probability. That susceptibility can be inferred from the characteristics that the US Department of Labor has defined for each of these jobs.

The process of calculating the probability is summarised as follows.

“…while sophisticated algorithms and developments in MR (mobile robotics), building upon big data, now allow many non-routine tasks to be automated, occupations that involve complex perception and manipulation tasks, creative intelligence tasks, and social intelligence tasks are unlikely to be substituted by computer capital over the next decade or two. The probability of an occupation being automated can thus be described as a function of these task characteristics.”

So testing is clearly lined up with the jobs that can be automated by sophisticated algorithms that might build upon big data. It doesn’t fall in with the jobs that require complex perception, creative intelligence and social intelligence.

Defining testing by defining the bugs that matter

Delving into the study, and its supporting sources, confirms this. The authors excitedly cite studies that have found that technology can spot bugs. Needless to say the “bugs” have been defined as those that technology can find. NB – all links are to PDFs, which open in a new tab.

Algorithms can further automatically detect bugs in software Hangal and Lam, 2002; Livshits and Zimmermann, 2005; Kim et al., 2008), with a reliability that humans are unlikely to match. Big databases of code also offer the eventual prospect of algorithms that learn how to write programs to satisfy specifications provided by a human. Such an approach is likely to eventually improve upon human programmers, in the same way that human-written compilers eventually proved inferior to automatically optimised compilers… Such algorithmic improvements over human judgement are likely to become increasingly common.

There we have it. Testing is just checking. Algorithms are better than human judgement at performing the tasks that we’ve already framed as being more suited to algorithms. Now we can act on that and start replacing testers. Okaaay.

The supporting sources are a variety of papers outlining what are essentially tools, or possible tools, that can improve the quality of coding. A harsh verdict would be that their vision is only to improve unit testing. Note the range of dates, going back to 2002, which I think weakens the argument that one can use them to predict trends in the two decades from now. If these developments are so influential why haven’t they already started to change the world?

However, I don’t want to go into the detail of these papers, or whether they are valid. I’m quite happy to accept that they are correct and make a useful contribution within the limits of their relevance. I do think these limits are tighter than the authors have assumed, but that’s not what concerns me.

The point is that confusing testing with checking places testers at the front of the line to be automated. If you define the bugs that matter as those that can be caught by automation then you define testing in a damaging way. That would be bad enough, but too many people in IT and the testing profession have followed policies and practices that keep testers firmly in the firing line.

There are four broad reasons for this;

  • a widely presented false choice between automated and manual testing,
  • a blindness to the value of testing that leads to value being sacrificed in attempts to cut costs,
  • testing standards, which encourage a mechanical and linear approach,
  • a general corporate willingness to create and tolerate meaningless jobs.

Automated or manaul testing – a false choice

The subtext of the manual versus automated false dichotomy seems to be that manual is the poor, unprofessional relation of high quality, sophisticated automation. I wonder if part of the problem is a misplaced belief in the value of repeatability, for which CMMI has to take its full share of the blame.

The thinking goes, if something can be automated it is repeatable; it can be tailored to be precise, continually generating accurate, high quality results. Automated testing and “best practice” go hand in glove.

In contrast, manual testing seems frustratingly unpredictable. The actions of the testers are contingent. I think that is an interesting word. I like to use it in the way it is used in philosophy and logic. Certain things are not absolutely true or necessarily so; they are true or false, necessary or redundant, depending on other factors or on observation. Dictionaries offer subtly different alternative meanings. Contingent means accidental, casual or fortuitous according to dictionary.com. These are incredibly loaded words that are anathema to people trying to lever an organisation up through the CMMI levels.

I understand “contingent”, as a word and concept, as being neutral, useful and not particularly related to “repeatable”, certainly not its opposite. It is sensible and pragmatic to regard actions in testing as being contingent – it all depends. Others do regard “contingent” and “repeatable” as opposites; “contingent” then becomes evidence of chaos and unpredictability that can be cleaned up with repeatable automation.

Some people regard “it all depends” as an honest statement of uncertainty. Others regard it a weak and evasive admission of ignorance. There has always been a destructive yearning for unattainable certainty in software development.

To be clear, I am not decrying repeatability. I mean only that putting too much emphasis on it is unhelpful. Placing too much emphasis on automation because it is repeatable, and decrying manual testing, sells true testing short. It demeans testing, and makes it more vulnerable to the prophets of full automation.

Costs chasing value downwards in a vicious cycle

There are a couple of related vicious circles. Testing that is rigid, script-driven and squeezed at the end of project doesn’t provide much value. (Iain McCowatt wrote a great blog about this.)

So unless project managers are prepared to step back and question their world view their response, entirely rational when viewed from a narrow perspective, is to squeeze testing further.

When the standard of testing is as poor as it often is on traditional projects there is little value to be lost by squeezing testing harder because the reduced costs more than compensate for the reduced benefits. So the cycle continues.

Meanwhile, if value is low then the costs become more visible and harder to justify. There is pressure to reduce these costs, and if you’re not getting any value then just about any cost-cutting measure is going to look attractive. So we’re heading down the route of outsourcing, offshoring and the commoditization of testing. Testing is seen as an undifferentiated commodity, bought and sold on the basis of price. The inevitable pressure is to send cost and prices spiralling down to the level set by the lowest cost supplier, regardless of value.

If project managers, and their corporate masters, were prepared to liberate the testers, and ensure that they were high quality people, with highly developed skills, they could do vastly more effective and valuable work at a lower cost. But that comes back to questioning their world view. It’s always tough to make people do that when they’ve built a career on sticking to a false world view.

Standards

And now I come to the classic false world view pervading testing; the idea that it should be standardised. I have written about this before, and I’d like to quote what I wrote in my blog last November.

Standards encourage a dangerous illusion. They feed the hunger to believe, against all the evidence, that testing, and software development in general, are neat, essentially linear activities that can be be rendered orderly and controllable with sufficient advance documentation. Standards feed the illusion that testing can be easier than it really is, and performed by people less skilled than are really needed.

Standards are designed to raise the status of testing. The danger is that they will have the opposite result. By focussing on aspirations towards order, repeatability and predictability, by envisaging testing as a checking exercise, the proponents of testing will inadvertently encourage others to place testing at the front of the queue for automation.

Bullshit jobs

This is probably the most controversial point. Technological innovation has created the opportunity to do things that were previously either impossible or not feasible. Modern corporations have grown so complex that merely operating the mechanics of the corporate bureaucracy has become a job in itself. Never mind what the corporation is supposed to achieve, for huge numbers of people the end towards which they are working is merely the continued running of the machine.

Put these two trends together and you get a proliferation of jobs that have no genuine value, but which are possible only because there are tools to support them. There’s no point to them. The organisation wouldn’t suffer if they were dispensed with. The possibility of rapid communication becomes the justification for rapid communication. In previous times people could have been assigned responsibility to complete a job, and left to get on with it because the technology wasn’t available to micro-manage them.

These worthless jobs have been beautifully described as ”bullshit jobs” by David Graeber. His perspective is that of an anthropologist. He argues that technological progression has freed up people to do these jobs, and it has suited the power of financial capital to create jobs to keep otherwise troublesome people employed. Well, you can decide that for yourself, but I do think that these jobs are a real feature of modern life, and I firmly believe that such jobs will be early candidates for automation. If there is a nagging doubt about their value, and it they’re only possible because of technology, why not go the whole hog and automate them?

What’s that got to do with testing you may ask? Testing frequently falls into that category. Or at least testing as it is often performed; commoditized, script-driven, process-constrained testing. I’ve worked as a test manager knowing full well that my role was pointless. I wasn’t there to drive good testing. The team wasn’t being given the chance to do real testing. We were just going through the motions, and I was managing the testing process, not managing testing.

Most of my time was spent organising and writing plans that would ultimately bear little relation to the needs of the users or the testers. Then, during test execution, I would be spending all my time collating daily progress reports for more senior managers who in turn would spend all their time reading these reports, micro-managing their subordinates, and providing summaries for the next manager up the hierarchy; all pointless. No-one was prepared to admit that the “testing” was an expensive way to do nothing worthwhile. It was unthinkable to scrap testing altogether, but no-one was allowed to think their way to a model that allowed real testing.

As Graeber would put it, it was all bullshit. Heck, why not just automate the lot and get rid of these expensive wasters?

Testing isn’t meant to be easy – it’s meant to be valuable

This has been a long article, but I think the message is so important it’s worth giving some space to my arguments.

Too many people outside the testing profession think of testing as being low status checking that doesn’t provide much value. Sadly, too many people inside the profession unwittingly ally themselves with that mindset. They’d be horrified at the suggestion, but I think it’s a valid charge.

Testers should constantly fight back against attempts to define them in ways that make them susceptible to replacement by automation. They should always stress the importance of sapient, intelligent testing. It’s not easy, it’s not mechanical, it’s not something that beginners can do to an acceptable standard simply by following a standardised process.

If testers aren’t going to follow typists, telephonists and filing clerks onto the scrapheap we have to ensure that we define the profession. We must do so in such a way that no-one could seriously argue that there is a 98% chance of it being automated out of existence.

98%? We know it’s nonsense. We should be shouting that out.

Posted by: James Christie | January 7, 2014

DRE: changing reality so we can count it

It’s usually true that our attitudes and beliefs are shaped by our early experiences. That applies to my views on software development and testing. My first experience of real responsibility in development and testing was with insurance financial systems. What I learned and experienced will always remain with me. I have always struggled with some of the tenets of traditional testing, and in particular the metrics that are often used.

There has been some recent discussion on Twitter about Defect Removal Efficiency. It was John Stephenson’s blog that set me thinking once again about DRE, a metric I’d long since consigned to my mental dustbin.

If you’re unfamiliar with the metric it is the number of defects found before implementation expressed as a percentage of all the defects discovered within a certain period of going live (i.e live defects plus development defects). The cut off is usually 90 days from implementation. So the more defects reported in testing and the fewer in live running the higher the percentage, and the higher the quality (supposedly). A perfect application would have no live defects and therefore a DRE score of 100%; all defects were found in testing.

John’s point was essentially that DRE can be gamed so easily that it is worthless. I agree. However, even if testers and developers tried not to manipulate DRE, even if it couldn’t be gamed at all it would still be an unhelpful and misleading metric. It’s important to understand why so we can exercise due scepticism about other dodgy metrics, and flawed approaches to software development and testing.

DRE is based on a view of software development, testing and quality that I don’t accept. I don’t see a world in which such a metric might be useful, and it contradicts everything I learned in my early days as a team leader, project manager and test manager.

Here are the four reasons I can’t accept DRE as a valid metric. There are other reasons, but these are the ones that matter most to me.

Software development is not a predictable, sequential manufacturing activity

DRE implicitly assumes that development is like manufacturing, that it’s a predictable exercise in building a well understood and defined artefact. At each stage of the process defects should be progressively eliminated, till the object is completed and DRE should have reached 95% (or whatever).

You can see this sequential mindset clearly in this article by Capers Jones, “Measuring Defect Potentials and Defect Removal Efficency” (PDF, opens in new tab) from QA Journal in 2008.

“In order to achieve a cumulative defect removal efficiency of 95%, it will be necessary to use approximately the following sequence of at least eight defect removal activities:

• Design inspections
• Code inspections
• Unit test
• New function test
• Regression test
• Performance test
• System test
• External Beta test

To go above 95%, additional removal stages will be needed. For example requirements inspections, test case inspections, and specialized forms of testing such as human factors testing, performance testing, and security testing add to defect removal efficiency levels.”

Working through sequential “removal stages” is not software development or testing as I recognise them. When I was working on these insurance finance systems there was no neat sequence through development with defects being progressively removed. Much of the early development work could have been called proof of concept. It wasn’t a matter of coding to a specification and then unit testing against that spec. We were discovering more about the problem and experimenting to see what would work for our users.

Each of these “failures” was a precious nugget of extra information about the problem we were trying to solve. The idea that we would have improved quality by recording everything that didn’t work and calling it a defect would have been laughable. Yet this is the implication of another statement by Capers Jones in a paper on the International Function Point Users Group website (December 2012), “Software Defect Origins and Removal Methods” (PDF, opens in new tab).

“Omitting bugs found in requirements, design, and by unit testing are common quality omissions.”

So experimenting to learn more about the problem without treating the results as formal defects is a quality omission? Tying up developers and testers in bureaucracy by extending formal defect management into unit testing is the way to better quality? I don’t think so.

Once we start to change the way people work simply so that you can gather data for metrics we are not simply encouraging them to game the system. It is worse than that. We are trying to change reality to fit our ability to describe it. We are pretending we can change the territory to fit the map.

Quality is not an absence of something

My second objection to DRE in principle is quite simple. It misrepresents quality. ”Quality is value to some person” as Jerry Weinberg famously said in his book “Quality Software Management: Systems Thinking”.

The insurance applications we were developing were intended to help our users understand the business and products better so that they could take better decisions. The quality of the applications was a matter of how well they helped our users to do that. These users were very smart and had a very clear idea of what they were doing and what they needed. They would have bluntly and correctly told us we were stupid and trying to confuse matters by treating quality as an absence of defects. That takes me on to my next objection to DRE.

Defects are not interchangeable objects

A defect is not an object. It possesses no qualities except those we choose to grant it in specific circumstances. In the case of my insurance applications a defect was simply something we didn’t understand that required investigation. It might be a problem with the application, or it might be some feature of the real world that we hadn’t known about and which would require us to change the application to handle it.

We never counted defects. What is the point of adding up things I don’t understand or don’t know about? I don’t understand quantum physics and I don’t know off hand what colour socks my wife is wearing today. Adding the two pieces of ignorance together to get two is not helpful.

Our acceptance criteria never mentioned defect numbers. The criteria were expressed in accuracy targets against specific oracles, e.g. we would have to reconcile our figures to within 5% of the general ledger. What was the basis for the 5% figure? Our users knew from experience that 95% accuracy was good enough to let them take significantly better decisions than they could without the application. 100% was an ideal, but the users knew that the increase in development time to try and reach that level of accuracy would impose a significant business cost because crucial decisions would have had to be taken blindfolded while we tried to polish up a perfect application.

If there was time we would investigate discrepancies even within the 5% tolerance. If we went above 5% in testing or live running then that was a big deal and we would have to respond accordingly.

You may think that this was a special case. Well yes, but every project has its own business context and user needs. DRE assumes a standard world in which 95% DRE is necessarily better than 90%. The additional cost and delay of chasing that extra 5% could mean the value of the application to the business is greatly reduced. It all depends. Using DRE to compare the quality of different developments assumes that a universal, absolute standard is more relevant than the needs of our users.

Put simply, when we developed these insurance applications, counting defects added nothing to our understanding of what we were doing or our knowledge about the quality of the software. We didn’t count test cases either!

DRE has a simplistic, standardised notion of time

This problem is perhaps related to my earlier objection that DRE assumes developers are manufacturing a product, like a car. Once it rolls off the production line it should be largely defect free. The car then enters its active life and most defects should be revealed fairly quickly.

That analogy made no sense for insurance applications, which are highly date sensitive. Insurance contracts might be paid for up front, or in instalments, but they earn money on a daily basis. At the end of the contract period, typically a year, they have to be renewed. The applications consist of different elements performing distinct roles according to different timetables.

DRE requires an arbitrary cut off beyond which you stop counting the live defects and declare a result. It’s usually 90 days. Applying a 90 day cut-off for calculating DRE and using that as a measure of quality would have been ridiculous for us. Worse, if that had been a measure for which we were held accountable it would have distorted important decisions about implementation. With new insurance applications you might convert all the data from the old application when you implement the new one. Or you might convert policies as they come up for renewal.

Choosing the right tactics for conversion and implementation was a tricky exercise balancing different factors. If DRE with a 90 day threshold were applied then different tactics would give different DRE scores. The team would have a strong incentive to choose the approach that would produce the highest DRE score, and not necessarily the one that was best for the company.

Now of course you could tailor the way DRE is calculated to take account of individual projects, but the whole point of DRE is that people who should know better want to make comparisons across different projects, organisations and industries and decide which produces greater quality. Once you start allowing for all these pesky differences you undermine that whole mindset that wants to see development as a manufacturing process that can be standardised.

DRE matters – for the wrong reasons

DRE might be flawed beyond redemption but metrics like that matter to important people for all the wrong reasons. The logic is circular. Development is like manufacturing, therefore a measure that is appropriate for manufacturing should be adopted. Once it is being used to beat up development shops who score poorly they have an incentive to distort their processes to fit the measure. You have to buy in the consultancy support to adapt the way you work. The flawed metric then justifies the flawed assumptions that underpin the metric. It might be logical nonsense, but there is money to be made there.

So DRE is meaningless because it can be gamed? Yes, indeed, but any serious analysis of the way DRE works reveals that it would be a lousy measure, even if everyone tries to apply it responsibly. Even if it were impossible to game it would still suck. It’s trying to redefine reality so we can count it.

Posted by: James Christie | December 17, 2013

“This could easily be the testing industry”

My article “Testing standards? Can we do better?” attracted a lot of attention. A couple of weeks after I wrote it I wondered if I’d overstated the case against standards, and the danger that they pose to good testing. I went back to read the article again and decided that it was all entirely reasonable. The case against standards is actually far stronger. I merely touched on a few angles. If you want more meat go to this article I wrote in 2012, and check out the links to Michael Bolton’s work.

I’m returning to the subject today because of an exchange on Twitter (PDF, opens in new tab). I expressed my concern at a possible future in which testing is governed by certification and standards, both of which are mandated by contracts that refer to them. This would be “best practice”. It would be what responsible professionals do, and those who dissent would be wilfully insisting on working in an unprofessional, irresponsible manner. They would be consciously taking money from clients with the intention of doing a sub-standard job.

That’s the conclusion I have to draw from a “white paper” by Testing Solutions Group promoting the ISO 29119 testing standard.

Imagine an industry where qualifications are based on accepted standards, required services are specified in contracts that reference these same standards, and best industry practices are based on the foundation of an agreed body of knowledge – this could easily be the testing industry of the near future.

That is a prospect that alarms and depresses me. I don’t think it will happen so long as good, responsible testers continue to speak out. However, it might happen in the way that Pete Walen suggested in the Twitter exchange; if the standards lobby get the ear of legislators who could mandate that public sector projects must be compliant with standards, or if they decide that non-compliance could be prima facie evidence of negligence.

Well, that won’t happen while I’m in testing. If that future ever comes to pass my career in testing will be over. I have worked in the painful, inflexible and dysfunctional way that invariably follows mandatory, standards-driven contracts. I’ve no interest in trying to do the wrong things more efficiently, or in a slightly more up-to-date fashion. I will walk away without looking back.

The future of testing? Please don’t let that happen. At the very least, don’t let it happen “easily”!

Posted by: James Christie | November 25, 2013

In praise of ignorance

My EuroSTAR 2013 tutorial in Gothenburg was titled “Questioning auditors questioning testing”. Not surprisingly a recurring theme of the tutorial was risk. Do we really understand risk? How do we deal with it? These are important questions for both testers and auditors.

I argued that both auditors and testers, in their different ways, have struggled to deal with risk. The failure of auditors contributed to the financial crash of 2007/8. The problems within the testing profession may have been less conspicuous, but they have had a significant impact on our ability to do an effective job.

One of the issues I discussed was our tendency to perform naïve and mechanical risk assessments. I’m sure you’ve seen risk matrices like this one from the Health and Safety Executive, the UK Government inspectorate responsible for regulating workplace safety.

UK HSE risk matrix

There are two fundamental problems with such a matrix that should make testers wary of using it.

Firstly, it implies that cells in the matrix with equal scores reflect equally acceptable positions. Is that really the case? Is a trivial chance of a catastrophic outcome genuinely as acceptable as the near certain chance of a trivially damaging outcome? The HSE deals with the sort of risks that lead national news bulletins when they come to pass; they regulate the nuclear industry, chemical plants and North Sea oil rigs.

I suspect the HSE takes a rather more nuanced approach to risks than is implied by the scoring in the matrix.

The second basic problem with these risk matrices is that we often lack the evidence to assign credible estimates of probability and impact to the risks.

This problem applies in particular to probabilities. Is there a reasonable basis for the figures we’ve assigned to the probabilities? Are they guesses? Are we performing precisely engineered and sophisticated calculations that are ultimately based on arbitrary or unsound assumptions? It makes a huge difference to the outcomes, but we can be vague to the point of cluelessness about the basis for these calculations.

Such a matrix may be relevant for risks where both the probability and the likely impact of something going wrong are well understood. That is often not the case during the early stages of a software development when the testing is being planned.

What’s the point of putting a number on the probability?

Whilst I was preparing my tutorial I came across an interesting case that illustrated the limitations of assigning probabilities when we’ve no experience or historic data.

Enrico Fermi

Enrico Fermi

I was reading about the development of the atomic bomb during the Second World War. Before the first bomb was tested the scientists were concerned about the possibility that a nuclear explosion might set the atmosphere on fire and wipe out life on earth. Enrico Fermi, the brilliant Italian nuclear physicist who worked on the development of the atomic bomb, estimated the probability of such a catastrophe at 10%.

I was astonished. How could anyone have taken the decision to explode an atomic bomb after receiving such scientific advice? My curiousity was aroused and I did some background reading on the episode. I learned that Fermi had also been asked in 1939 for his estimate of the probability that nuclear fission could be controlled for power or weapons. His estimate was 10%.

Then, in a separate article, I discovered that in 1950 he had estimated the probability that humans would have developed the technology to travel faster than light by 1960. You’ve guessed it. The answer was 10%.

Apparently Fermi had the reputation for being a sound estimator, when (and this is a crucial qualification) he had the information to support a reasonable estimate. Without such information he was clearly liable to take a guess. If something might happen, but he thought it unlikely, then he assigned a probability of 10%.

I think most of us do no better than Fermi. Indeed, the great majority are probably far worse. Are we really any more capable than Enrico Fermi of assigning probabilities to a naïve risk matrix that would allow simple, mechanical calculations of relative outcomes?

I strongly suspect that if Enrico Fermi had thought anyone would take his estimates and slot them into a simplistic risk formula to guide decision making then he’d have objected. Yet many of us see nothing wrong with such a simplistic approach to risk. I wonder if that’s simply because our risk assessments are little more than a tickbox exercise, a task that has to be knocked off to show we are following “the process”.

The incertitude matrix – uncertainty, ambiguity and ignorance

The risk matrix clearly assumes greater knowledge of probabilities and outcomes than we usually have. A more useful depiction of the true situation is provided by O’Riordan and Cox’s incertitude matrix.incertitude

In this representation the conventional risk matrix occupies only the top left hand corner. We are in a position to talk about risk only when we have well defined outcomes and a realistic basis for assessing the probabilities.

If we understand the outcomes, but not the probabilities then we are in a state of uncertainty. If we understand the probabilities of events, but not the outcomes then we are dealing with ambiguity.

Ambiguity is an interesting situation. It seems more relevant to scientific problems than software development. My wife works in the field of climate change adaptation for a Scottish Government agency. She recognises ambiguity in her line of work, where the probability of initial events might be reasonably well understood, but it isn’t possible to define the outcomes. Feedback mechanisms, or an unknown tipping point, might turn a benign outcome into a catastrophic one in ways we can’t predict with confidence.

One area where ambiguity could exist in software development is in the way that social media can create entirely unpredictable outcomes. An error that might have had little impact 10 years ago could now spiral into something far more serious if it catches people’s attention and goes viral.

Nevertheless, uncertainty, rather than ambiguity, is probably the quadrant where testers and developers are more likely to find themselves. Here, we can identify outcomes with confidence, but not assign meaningful probabilities to them.

However, uncertainty is unlikely to be a starting point. To get there we have to know what part of the product or application could fail, how it might fail and what the effect would be. We might sometimes know that at the start, if this is a variant on a well understood product, but often we have to learn it all.

The usual starting point, our default position, should be one of ignorance. We don’t know what can go wrong and what the impact might be, and we almost certainly don’t know anything with confidence about the probabilities.

Ignorant and proud!

Sadly, in business as well as software development, an honest admission of ignorance is seen as weakness. The pretence that we know more than we do is welcomed and admired rather than being derided as dangerous bluster. Such misplaced confidence leads to disappointment, stress, frustration, misdirected effort, and actually makes it harder to learn about products and applications. We deceive ourselves that we know things that we don’t, and stop digging to find out the true situation.

So please, speak up for ignorance! Only if we admit what we truly don’t know can we hope to learn the lessons that will give our stakeholders the insights that they need. Surely we need to look for and understand the most damaging failures before we start thinking of assigning probabilities that might guide the rest of our testing. Don’t assume that knowledge we’ve not gained can ever be a valid starting point for calculations of risk. Knowledge has to be earned, not assumed.

Posted by: James Christie | November 18, 2013

Thinking the impossible? Or wishing for the impossible?

At EuroSTAR 2013 in Gothenburg there was a striking contrast between messages coming out of tutorials that were taking place at the same time.

Ian Rowland was talking about how we can do amazing things by “thinking the impossible”. Meanwhile, along the corridor I was giving out a much more mundane and downbeat message in my tutorial about how testers can work constructively with auditors.

I was talking about how auditors are allergic to statements of brainless optimism. Plans should be based on evidence that they are achievable, not on wishful thinking that defies the evidence.

You might think Ian was contradicting me, but I was entirely happy with his message when he repeated it in a later keynote.failure is not an option

In my tutorial I referred to a tweet from James Bach that made the telling point that “people who say failure is not an option are in fact selecting the failure option: by driving truth away”.

brainless optimism slideI backed that up with a slide showing a tiresome and glib illustration of a little man boldly turning “impossible” into “possible”, with two strokes of a pen. That sort of unthinking positivity really riles me.

The unthinking “can do” spirit

As an auditor I regularly reviewed project plans and frequently they were implausibly optimistic. We were rarely in a position to block such plans. It wasn’t our place to do so. That was a decision for operational and IT management. We would comment, but ultimately management was responsible for the decision to proceed and for the consequences.

Only once was I a direct stakeholder. I insisted that a project should be replanned. The intention was to carry out user testing, user training and then start a phased roll-out during the six week English school summer holidays. That’s when every office was running with reduced staff levels. Initially I was rather unpopular, but the project team were quietly relieved once the responsibility was taken out of their hands. They’d been under unreasonable pressure to go live in September. It was never going to happen.

In that case I was able to defend the project team, but more often I saw the damaging effect on staff who were committed to unrealistic, ludicrously optimistic timescales.

I once interviewed a Chief Technology Officer, who candidly admitted that the culture of the company was that it was far better to say “Yes we can” and then emerge from the project sweaty, exhausted and hopelessly late than it was to say at the start how long it would actually take. He said the users were never prepared to accept the truth up front.

I remember another project whose schedule required a vital user expert to be available for 10 days in November. She already had two weeks holiday booked, and was committed to 20 days working for another project, all in November, a total of 40 working days. Of course both projects were business critical, with fixed target dates that had been dumped on them by senior management. Both projects were late – of course.

If auditors are involved early on in the planning of projects they can sniff out such problems, or force them to be confronted. Sometimes projects are well aware of problems that will stop them hitting their targets but they are scared to flag them up. The problem is kicked into the long grass in the hope that dealing with an urgent problem further down the line will be less damaging than getting a reputation for being negative by complaining early on.

That fear might seem irrational, even bizarre, but it is entirely justifiable. I reviewed a development that had had serious problems and was very late. The development team lead had said the schedule was unrealistic given the budget and available staff. She was removed from her role. Her replacement said she could do it. Eventually the development work was completed in about the time the original team lead had predicted. The successor was praised for her commitment, whereas the original team lead was judged to lack the right attitude. Her career was blighted, and she was deeply disillusioned when I interviewed her.

Be lucky! That’s an order!

Usually when senior management overdoses on the gung ho spirit and signs up to the John Wayne school of inspirational leadership the result isn’t “thinking the impossible”. The result is an order to the troops to be lucky – freakishly lucky. This isn’t thinking the impossible. It’s thinking the impossible will happen if we switch off our brains and treat the troops like disposable cannon fodder.

If the results of unthinking, high volume managerial optimism have been appalling in the past then the evidence is that they will be appalling in the future. There has to be a reason to assume things will get better, and brainless optimism provides remarkably poor evidence.

If your experience of following standards, inappropriate “best practices” and excessively prescriptive processes has been dismal then you won’t get better results in future from sheer optimism, force of will and a refusal to acknowledge the truth.

Insistence on trying to do the wrong things better and faster is not only irrational, it is a deep insult to the people whose working lives are being made miserable. If you have experienced persistent failure and insist on continuing to work in the same way then the clear implication is that failure is the fault of the people, the very people who ensured anything worthwhile was ever delivered.

Ian Rowland had a marvellous and inspirational message. Sadly it’s a message that can prove disastrous if it’s picked up by the sort of managers who regard thinking and reflection as a frivolous waste of time. Be inspired by Ian’s message. Just be very careful who you share it with! Yes, thinking the impossible is wonderful. But that really does require thinking, not wishing!

Posted by: James Christie | November 12, 2013

Testing standards? Can we do better?

At EuroSTAR 2013 I had a brief disagreement about software testing standards with Stuart Reid. To be more accurate, I was one of a group of sceptics pressing Stuart, who was putting up a battling defence of standards. He has been working on the new standard ISO 29119 and made a very interesting and revealing point. He insisted that the critics of standards don’t understand their true nature; they are not compulsory.

The introduction to standards makes it clear that their use is optional. They become mandatory only if someone insists that they must be applied in a certain context, often by writing them into a contract or a set of in-house development standards. Then, and only then, is it appropriate to talk about compulsion. That compulsion comes not from the standard itself, but from the contract or the managerial directive.

I found that argument unconvincing. Indeed I thought it effectively conceded the argument and amounted to no more than a plea in mitigation rather than a plausible defence.

Even a cursory analysis of this defence reveals that it is entirely specious, merely a statement of the obvious. Of course it is a choice made by people to make standards mandatory, but that choice is heavily influenced by the quite inappropriate status of IEEE 829 and, in all likelihood ISO 29119, as standards. Calling them standards gives them a prestige and authority that would be missing if they were called guidelines. The defenders of standards usually want it both ways. They refer to standards when they are making an implicit appeal to authority. They refer to the standards as guidelines when they are on the defensive. That doesn’t wash. Standards and guidelines are not synonymous.

Stuart’s defence struck me as very interesting because it was entirely consistent with what I have long believed; the rationale behind standards, and their implicit attraction, is that they can be given mandatory status by organisations and lawyers with a poor grasp of software testing.

The standards become justified by the mandatory status assigned to them by non-testers. The justification does not come from any true intrinsic value or any wisdom that they might impart to practitioners. It comes from the aura of the word “standard” and the creators of standards know that this gives them a competitive advantage.

Creating standards is a commercial activity

Standards are not produced on a disinterested “take it or leave it” basis. They do not merely offer another option to the profession. Standards are created by people from the companies who will benefit from their existence, the companies who will sell the services to implement the new standard. In my experience heavyweight, document-driven processes require large numbers of expensive consultants (though not necessarily highly skilled consultants). Creating standards is a commercial activity. The producers of standards are quite consciously creating a market for the standards.

If the creators of standards were merely expanding the market to create a profitable niche for themselves that might not be a big deal. However, the benefit that accrues to them comes at the the expense of everyone else.

It comes at the expense of the testers who are frequently committed to following inappropriate and demoralising practices.

It comes at the expense of their employers who are incurring greater and unnecessary costs for results that are poorer than they need be.

It comes at the expense of the whole testing profession. The standards encourage a dangerous illusion. They feed the hunger to believe, against all the evidence, that testing, and software development in general, are neat, essentially linear activities that can be be rendered orderly and controllable with sufficient advance documentation. Standards feed the illusion that testing can be easier than it really is, and performed by people less skilled than are really needed.

As I said in my EuroSTAR tutorial last week, testing is not meant to be easy, it’s meant to be valuable.

Good contracts or bad contracts?

It is understandable that the contract lawyers find standards attractive. Not only do standards offer the lawyers the illusion that they promote high quality and define the correct way for professionals to work, they also offer the lawyers something they can get their teeth into. A standard makes it easier to structure a contract if you don’t know about the subject area. The standard doesn’t actually have to be useful. The point is that it helps generate deliverables along the way, and it requires the testers to work in a way that is easy to monitor.

Contracts are most useful when they specify the end, or the required value; not when they dictate how teams should reach the destination. Prescriptive contracts can turn unwarranted assumptions about the means into contractually mandatory ends.

I once faced what looked like a horrendously difficult challenge. I had to set up a security management process for a large client, who wanted assurance that the process would work effectively from the very start. This had been interpreted by my employer as meaning that the client required a full-scale, realistic test, with simulated security breaches to establish whether they would be detected and how we would respond. This would have been very difficult to arrange, and extremely expensive to carry out. Failure to deliver on the due date would have resulted in heavy weekly penalties until we could comply. However, the requirement was written into the contract so I was told we would have to do it.

I was sceptical, and went back to the client to discuss their needs in detail. It turned out that they simply needed to be reassured that the process would work, smoothly and quickly. Bringing together the right people from the client and supplier for a morning to walk through the process in detail would do just as well, at a tiny fraction of the cost. Once I had secured the client’s agreement it was straightforward to have the contract changed so that it reflected where they really wanted to end up, rather than stipulating a poorly understood route to that destination.

On many other occasions I have been stuck with a contract that could not be changed and where it was mandatory for testers to comply with milestones and deliverables that had minimal relevance to the real problem, but which required such obsessive attention that they detracted from the real work.

Software testing standards encourage that sort of goal displacement; management attention is directed not at the work, but at a dubious abstract representation of the work. Their attention is directed to the map, and they lose sight of the territory.

We can do better

Sure, no-one has to be a sucker. No-one has to buy the snake oil of standards, but caveat emptor (let the buyer beware) is the legal fallback of the huckster. It is hardly a motto to inspire. Testers can do better than that.

What is the answer? Unfortunately blogs like this preach largely to the converted. The argument against standards is accepted within the Context Driven School. The challenge is to take that argument out into the corporations who are instinctively more comfortable, or complacent, with standards than with a more flexible and thoughtful approach.

I tried to challenge that complacency in my EuroSTAR tutorial, “Questioning auditors questioning testing”. I demonstrated exactly why and how software testing standards are largely irrelevant to the needs of the worldwide Institute of Internal Auditors and also the Information Systems Audit and Control Association. I also explained how more thoughtful and effective testing, as promoted by the Context Driven School, can be consistent with the level of professionalism, accountability and evidence that auditors require.

If we can spread the message that testing can be better and cheaper then corporations might start to discourage the lawyers from writing damaging contracts. They might shy away from the consultancies offering standards driven processes.

Perhaps that will require more than blogs, articles and impassioned conference speeches. Do we need a counterpart to testing standards, an anti-standard perhaps? That would entail a clearly documented explanation of the links between good testing practices and governance models.

An alternative would have to demonstrate how good testing can be accountable from the perspective of auditors, rather than merely asserting it. It would also be directed not just at testers, but also at auditors to persuade them that testing is an area where they should be proactively involved, trying to force improvements. The testers who work for consultancies that profit from standards will never come on board. The auditors might.

But whatever form such an initiative might take it must not be called a standard, anything but that!

Posted by: James Christie | September 24, 2013

Not “right”, but as good as I can do

This is probably the last in the series of articles running up to my upcoming EuroSTAR tutorial ”Questioning auditors questioning testing”.

In my last blog post I explained my concerns about the way that testers have traditionally adopted an excessively positivist attitude, i.e. they conducted testing as if it were a controlled scientific experiment that would allow them to announce definite, confident answers.

I restricted that article to my rejection of the positivist approach. However, I need to follow it up with my explanation of why I can’t accept the opposite position, that of the anti-positivist or interpretivist. The interpretivist would argue that there is no single, fixed reality. Everything we know is socially constructed. Researchers into social activities have to work with the subjects of the research, learning together and building a joint account of what reality might be. Everything is relative. There is no single truth, just truths.

I don’t think that is any more helpful than rigid positivism. This article is about how I groped for a reasonable position that avoided the extremes.

Understanding, not condemning

My experience as an auditor forced me to think about these issues. That shaped my attitudes to learning and uncertainty when I switched to testing.

I’ve seen internal auditors taking what I considered a disastrously unprofessional approach largely because they never bothered to consider such points. Their lack of intellectual curiousity meant that they unwittingly adopted a rigid scientific positivist approach, totally unaware that a true scientist would never start off by relying on unproven assumptions. That rigid approach was the only option they could consider. That was their worldview. Anything that didn’t match their simplistic, binary view was wrong, an audit finding. They’d plough through audits relying to a ludicrous extent on checklists, without acquiring an adequate understanding of the business context they were trampling around in. They would ask questions that required yes/no answers, and that was all they would accept.

The tragedy was that the organisation where I saw this was in fear of corporate audit, and so the perspective of the auditors shaped commercial decisions. It was better to lose money than to receive a poor audit report. This fear reinforced the dysfunctional approach to audit. Sadly that approach deskilled the job. Audits could be performed by anyone who was literate, so it became a low status job. Auditors were feared but not respected, a dreadful outcome for both audit and the whole company.

Prior to this I had worked as an auditor in a company that had adopted a risk based approach before it became the norm. Compliance, and simple checks against defined standards and controls were seen as being interesting, but of limited value. We had to consider the bigger picture, understand the reasons for apparent non-compliance and try to explain the significance of our findings.

The temptations that can seduce a novice

At first I found all this intimidating. The temptation for a novice is to adopt one of two extremes. I could either get too close to the subjects of my audits and accept the excuses of IT management, which would have been very easy given that I’d just moved over from being a developer and expected to resume my career in IT one day. Or I could veer to the other extreme and rely on formal standards, processes and predefined checklists all of which offer the seductive illusion of a Right Answer.

This is essentially the dichotomy between interpretivism and positivism. Please don’t misunderstand me. I am not rejecting either outright. I am not qualified to do that. I just don’t believe that either is a helpful worldview for testers or auditors.

I knew I had to avoid the extremes and position myself in the middle. This was simply being pragmatic. I had to provide an explanation of what was going on, and reach conclusions. This required me to say that certain things could and should be done better. Senior management were not helped by simplistic answers that didn’t allow for the context, but neither were they interested in waffle that failed to offer helpful advice. They were paying me to make informed judgements, not make excuses.

Maybe not “right”, but as good as I can do

It took me a couple of years to feel comfortable and confident in my judgement. Where should I position myself on any particular complicated problem? I never felt convinced I was right, but I learned to be confident that I was taking a stance that was justifiable in the circumstances and reflected the context and the risks. I learned to be comfortable with “this is as good as I can do”, and take comfort that this was far more valuable to my employers than being either a hardline positivist or interpretivist.

As I conducted each audit I would fairly quickly build up a rough picture, a model, of what I thought should be happening. This understanding was always provisional, capable of being tested and revised all the way through the audit in a constant cycle of revision. Occasionally the audit would have to be aborted because it quickly revealed some fundamental problem that we had not envisaged. In such cases there was no point pursuing the detail in the audit plan. It would have been like reviewing the decoration of a house when the roof had been blown off.

On other occasions our findings were radically different from those that we had expected. We had been sent in because of particular concerns, but the apparent problems were merely symptoms of a quite separate problem with roots elsewhere. It was essential that we had the mental flexibility, and the openness to realise that what we thought we knew was wrong.

The dangers of the extremes

I’m not comfortable positioning myself, and the learning approach that we took, in any particular school between positivism and relativism. It would be interesting to pin it down but working out the right label isn’t a priority. The important point, for my attitude towards auditing and testing, is that I have to be wary of the dangers inherent in the extremes.

In testing, the greater and more obvious danger is that of positivism, or rather an excessive regard for its validity and relevance in this context. In auditing, relativism is also a huge danger, and it’s fatal to the independence and credibility of auditors if they start identifying too closely with the auditees. There can be great commercial and organisational pressure to go with the flow. There have been countless examples of internal and external auditors who blew it in this way and failed to say “this is wrong”.

The dangerous temptation of going with the flow also exists in testing. Indeed, it could be argued that the greatest danger of relativism in testing is to accept the naïve positivism embodied in traditional approaches, and to pretend that detailed scripts, and meticulous documentation can convey the Truth against which the testers have to measure the product. That’s an interesting paradox, but not one I intend to pursue now.

Frustration with “faking it”

Anyway, after learning how to handle myself in a professional and enlightened audit department I couldn’t contain my frustration when I later had to deal with blinkered auditors who couldn’t tell the difference between an audit and a checklist. I don’t regard that as real auditing. It’s faking it, in very much the same style that James Bach has described in testing; running through the process, producing immaculate documents, but taking great care not to learn anything that might really matter – or that might disrupt plans, budgets or precious assumptions.

Inevitably, when I switched to testing, and had to endure the problems of the traditional document driven approach I experienced very much the same frustrations that I had had with poor auditing. It was with great relief that I came across the work of the Context Driven School, and realised that testing wasn’t a messed up profession after all; it was just a profession with some messed up practices, that we can go along with or resist. It’s our choice, and the practical and learning experiences that I enjoyed as an auditor meant I could go only one way.

Posted by: James Christie | September 18, 2013

Testing inside the box?

The EuroSTAR online softare testing summit took place yesterday. I missed the morning, but there were a few things that leapt out at me in the afternoon.

When Fiona Charles was talking about test strategies she said, “we always have a model, whether conscious or unconscious”.

Keith Klain later talked about the dangers of testers trying to provide confidence rather than information. Confidence should never be the goal of testing. If we try to instil confidence then we are likely to be reinforcing biases and invalid assumptions.

In between Fiona’s and Keith’s talks I came across this quote from the Economist about the reasons for the 2007/8 financial crash.

With half a decade’s hindsight, it is clear the crisis had multiple causes. The most obvious is the financiers themselves – especially the irrationally exuberant Anglo-Saxon sort, who claimed to have found a way to banish risk when in fact they had simply lost track of it.

These three points resonated with me because over the last day or so I have been thinking about the relationship between risk, our view of the world and our inbuilt biases as part of my upcoming EuroSTAR tutorial.

During the tutorial I want to discuss a topic that is fundamental to the work that testers and auditors do. We might not think much about it but our attitudes towards ontology and epistemology shape everything that we do.

Don’t switch off! I’m only going to skim the surface. I won’t be getting heavy.

Lessons from social science

Ontology means looking at whether things exist, and in what form. Epistemology is about how and whether we can know about anything. So “is this real?” is an ontological question, and “how do I know that it is real, and what can I know about it?” are epistemological questions.

Testing and auditing are both forms of exploration and research. We have to understand our mindset before we go off exploring. Our attitudes and our biases will shape what we go looking for. If we don’t think about them beforehand we will simply go and find what we were looking for, or interpret whatever we do find to suit our preconceptions. We won’t bother to look for anything else. We will not even be aware that there is anything else to find.

Both testing and auditing have been plagued by an unspoken, unrecognised devotion to a positivist outlook. This makes the ontological assumption that we are working in a realm of unambiguous, objective facts and the epistemoligical assumption that we can observe, measure and conduct objective experiments (audits or tests) on these facts. Crucially this mindset implicitly assumes that we can know all that is relevant.

If you have this mindset then reality looks like the subject of a scientific experiment. We can plan, conduct and report on our test or audit as if it were an experiment; the subject is under our control, we manipulate the inputs, check the output against our theory and reach a clear, objective conclusion.

If we buy into the positivist attitude then we can easily accept the validity of detailed test scripts and audit by checklist. We’ve defined our reality and we run through binary checks to confirm that all is as it should be. Testing standards, CMMI maturity levels, certification all seem natural progressions towards ever greater control and order.

We don’t need to worry about the unknown, because our careful planning and meticulous documentation have ruled that out. All we need to do is run through our test plan and we will have eliminated uncertainty. We can confidently state our conclusions and give our stakeholders a nice warm glow of confidence.

Reality in business and software development is nothing like that. If we adopt an uncritical positivist approach we have just climbed into a mental box to ensure that we cannot see what is really out there. We have defined our testing universe as being that which is within our comprehension and reach. We have redefined risk as being that which is within our control and which we can manage and remove.

Getting out of the box

Perhaps the testers, and certainly the auditors, at the banks that crashed failed to realise they were in a world in which risk had not been banished. It was still out there, bigger and scarier than they could ever have imagined, but they had chosen to adopt a mindset that kept them in a neat and tidy box.

When you forget about the philosophical and social science jargon it all comes down to whether you want to work in the box, pretending the world suits your worldview, or get out of the box and help your stakeholders understand scary, messy reality.

Social science researchers know the pitfalls of unconsciously assuming a worldview without realising how it shapes our investigation. Auditors are starting to realise this and discuss big questions about what they can know and how they can know it. Testers? Well, not all of them, not yet, but we all need to start thinking along those lines. That’s part of the message I hope to get across at Gothenburg and elsewhere over the coming months.

Posted by: James Christie | September 7, 2013

Questioning auditors questioning testing

Since I chose the title for my tutorial, “questioning auditors questioning testing” at EuroSTAR this year I have become increasingly aware of how relevant both the title and the topic are.

Testers are used to being questioned, whether it’s by project managers, senior management, users – and auditors too of course. It’s easy to get wrapped up in the problems and pressures of our own profession and forget that other people are working under scrutiny too.

When I worked as an internal auditor I always knew we had to demonstrate that we were “adding value” to the company. I dislike that phrase, but the underlying point is crucial. Auditors can go through the motions and produce detailed, unhelpful reports that are of little real value to the organisation. Alternatively they can get to the heart of their role and provide advice that makes a difference. Only by doing so can they justify their salaries.

Auditors are under scrutiny too

There are certainly some audit departments that just go through the motions. A couple of decades ago such auditors would have accounted for the vast majority of the profession. Happily my audit experience was in a company that was in the vanguard of the new approach to audit. Our work was risk based, always aware of the wider context and absolutely never driven by checklists. We were in the minority then but the tide has now turned emphatically. I don’t know whether the time-serving, low value, low quality auditors are now in the minority but they are certainly under serious pressure from their own profession and the regulators to up their game and adjust to the modern world.

The UK branch of the Institute of Internal Auditors has just issued guidance to its members that internal audit departments in financial services should include within their scope…

…the risk and control culture of the organisation. This should include assessing whether the processes (e.g. appraisal and remuneration), actions (e.g. decision making) and “tone at the top” are in line with the values, ethics, risk appetite and policies of the organisation.

Audit departments should have an internal QA team with highly experienced auditors. Their role would be to “ensure” (the IIA’s bold choice of word) that audit plans and reports are risk based, with opinions that are “adequately evidenced”. These uber-auditors would be expected to challenge their colleagues, even their own management, and report directly to the board if there are problems. It sounds an interesting job!

The world has changed since the turn of the millenium. We have seen huge corporations crash, and banks that were too big to fail crash spectacularly. There has been widespread and legitimate disappointment, anger even, at the performance of both external and internal auditors.

Auditors are going to have to be more accountable. They in turn are going to be audited. That’s the way that the world is going.

Will auditors who are increasingly expected to query the culture of a company and the “tone at the top”, who are themselves subject to intense scrutiny, really be content with working their way down a checklist? Will they really be happy to tick boxes as you show them the shelfware, the endless documents meticulously completed to IEEE829 standard templates? Or will they want to sit down with you and understand the risks you identified and investigated so that they can relate them to the risks that keep the stakeholders awake at night?

Just today I was looking at the questions asked about testing practices by a company that sells liability insurance. The questions reflect a dated view of testing, and assume that effective, responsible testing follows the old document driven approach.

Risk management is vital for all companies, but those in financial services have to be particularly skilled. It’s not simply a matter of protecting the company. They are selling their expertise at handling risk though their products.

Any insurer who views testing in the same way as that liability insurer has lost sight of the true risks. That would be a legitimate concern for the auditors of that company. Now, in the UK at least, the auditors will have an explicit responsibility to challenge poor practice.

Why am I telling you this? Testers have to work constructively with auditors, and they have to understand where they are coming from.

Winning friends and influencing auditors

The subtitle of my tutorial is a conscious nod towards Dale Carnegie’s phenomenally successful self-help book “How to win friends and influence people”. You might dismiss Carnegie’s book as a trite collection of obvious pieces of advice, but the sorry truth is that many of the basic truths about human nature are still routinely neglected in business.

I chose the subtitle because I wanted to stress the importance of understanding auditors and where they are coming from, rather than assuming that they are the hard nosed suits from head office. If we are defensive, expecting a battle with auditors then that is what we are likely to get. The relationship between auditors and testers is a human relationship every bit as much as a business one. Whether that relationship is good or bad is largely a matter of how well the personal relationship is handled.

One of Carnegie’s key points is a quote from the Roman writer Publilius Syrus.

We are interested in others when they are interested in us.

If we show an interest in other peoples’ problems then they will be interested in ours. Show an interest in what the auditors are trying to do and they are more likely to take a positive interest in your work.

From personal experience I know how scary it is to embark on a new audit when you can’t use a checklist that pretends to provide all the questions and answers. Look at it from the auditors’ point of view. They have to learn quickly about a new project, a new business area, or some new technology. They have to be able to discuss the important risks and issues with people who are highly experienced and possibly hostile. The auditors know that within a few weeks (at most) they will have to issue an intelligent report that tells a persuasive story identifying problems and possible improvements. It was always a relief to meet people who understood our role, who wanted to work with us and who saw us as valuable allies in their attempt to do a better job.

Auditors can help testers

I have seen both testing and auditing change enormously over the past couple of decades. Enlightened testers and auditors now have far more in common with each other than they do with the old school, checklist/script practitioners in their own professions.

Both testers and auditors are, or should be, enquiring, inquisitive people trying to provide more information about new products and applications and crucially about the risks that their employers and clients are facing.

If auditors are viewed in that positive light then testers should want to get involved with them as early as possible, and to keep speaking to them. It shouldn’t be a one way conversation, with testers justifying themselves. Good auditors will have a different perspective from normal project members. They will help testers to see a bigger picture, to understand the business risks better, and their input should help testers to come up with important new ideas for testing.

My tutorial won’t be a simple matter of telling people the magic tricks, or correct phrases to get the auditors off your back. What I do hope it will provide is an insight into why good auditors will support good testing, rather than impressive documentation. I also hope it will show testers how they can fight back constructively against the poor auditors who are still out there. If auditors are giving you a hard time over your failure to write unnecessary documents then it is good to have the ammunition you need to defend yourself, to turn the tables on them and help them to do a better job!

Posted by: James Christie | August 28, 2013

Binary opinions? Yes or no?

I am giving a half day tutorial at EuroSTAR this year, so not surprisingly that has forced me to think around the subject, “questioning auditors questioning testing”.

Over the last few weeks I have been struck by the number of times that I have come across one very interesting word – binary.

It’s an important concept, and it is hugely important in both professions. However, I have become increasingly aware that testing and auditing are taking very different approaches to the concept.

Testing and checking

Discussion of binary results in testing is usually tied in with the debate about the distinction between testing and checking. James Bach and Michael Bolton set out the argument clearly here.

The distinction is fundamentally important, but frustratingly the debate hasn’t really got through to the whole of the testing profession.

There are still regiments of testers oblivious to to the distinction, beavering away with detailed test scripts, checking the results. The testing establishment from which these traditional testers take their lead, directly or indirectly, have not engaged with the debate. They have given the unfortunate, and probably accurate, impression that they regard checking and testing as being effectively synonymous in practice.

Reality isn’t binary

Rikard Edgren gave a very good talk on the specific idea of binary opinions at Øredev 2011 and Let’s Test 2012. Here is the Øredev talk.


The slides for Let’s Test are here (opens in new tab). Rikard also wrote a blog on the subject. The key phrase I picked up from Rikard was;

Reality isn’t binary, we can communicate noteworthy information – we don’t know everything in advance.

I’m not going to get further into that debate here. I just want to illustrate the contrast with auditing where Rikard’s comment resonates strongly.

Two types of binary opinion (naturally!)

Firstly, I’d better explain that the type of binary opinions vary depending on whether one is talking about internal or external auditing. In internal auditing they would take the form of pass/fail checking of controls. Are they present? Are they complied with?

Binary opinions in external auditing have historically been largely about the truth and fairness of the company accounts, or about whether the company is a going concern. That has been the core of the external audit report. In recent years there has been the added requirement imposed by the Sarbanes-Oxley Act for US companies to express an opinion on whether the framework of internal controls is effective.

Binary opinions in internal audit – a relic of yesteryear

There isn’t a great deal of debate in internal auditing circles about binary opinions. Traditional internal auditing focused on internal controls. The debate has been held, and the overwhelming consensus, at least in informed circles, is that any audit that offers only binary opinions is hopelessly limited, blinkered and hopelessly outdated.

I like the definition of internal controls from Anthony Catenach.

Internal controls are how management makes sure the company’s business model is operating correctly.

If you view internal controls in that broader perspective then you should be able to see how simple binary opinions are unhelpful. Auditors need to set their findings in context and explain why they are significant and what danger they pose. Simply saying that certain controls are missing, or have not been applied, is unhelpful. That’s not to say that such simplistic audits have vanished.

A couple of weeks ago I was speaking to a friend who works as a developer for a multinational company. He told me that the internal auditors work from a checklist, using questions that require yes/no answers. People are very way of the auditors and answer only direct questions without offering anything more. It horrifies me that auditors should ever accept the answer “yes” or “no” without following up with “why?”.

That sort of auditing is ineffective and unprofessional. I can’t stress strongly enough that audit checklists have a place, but they are not the audit! They are merely the starting point for a conversation.

Previously to illustrate how an audit interview is conducted I have used the analogy of an advocate (barrister or attorney) questioning a witness in court. The advocate cannot know what answer the witness will give and has to vary the follow up questions accordingly, rather than ploughing on with a prepared script. Conducting an audit by checklist is very much like sticking to the script regardless of the answers.

This is now orthodox modern opinion. The opinion formers, the leading lights of the profession know that binary opinions are dated and the debate has moved on to risk; how can auditors inform stakeholders about the risks that matter, the risks that keep them awake at night? How can auditors help management to understand the risks that they are facing and to take decisions that are better informed about the risks?

Regulators and binary opinions in external audit

The debate about binary opinions in internal audit may be largely over but it is still very much alive in external audit. The regulators in the UK and the USA are pushing hard for auditors to provide more useful opinions in their reports rather than relying on simple, and frequently misleading, binary opinions.

The response from the Big 4 audit firms has been cool, but telling the regulators to take a hike is politically tricky! They have to engage with the debate. It’s not good enough for them to defend current practices. The problems with these are glaringly obvious, so they have to respond constructively.

The position is slightly confused by Sarbanes-Oxley’s requirement that external auditors state whether they believe the framework of internal controls is effective. That takes them into internal audit territory, and raises concerns about whether such a judgement can be accurate or helpful. Certainly the experience of recent years isn’t encouraging.

There are countless examples of companies whose accounts have been passed by their external auditors, only to collapse from problems that existed before the audit was conducted. Remember Enron? That debacle led to the demise of one of the world’s biggest firms of accountants, Arthur Andersen. Remember the banks who collapsed? All sailed through their audits, with the auditors picking up multi-million pound fees for offering opinions that proved groundless.

I’m not suggesting that these fees were too high. Perhaps they were too low and worthwhile audits and opinions would be more expensive. However, I am saying that the current reporting regime, with too much emphasis on binary opinions, provides lousy value for money. That is not a minority view. It is the view of the regulators in the UK and USA. It will be interesting to see where the EU moves in this regard.

Testers are not alone

This is far too big and complex an area for me to cover in any detail either now or in my tutorial at EuroStar, even if it is of any interest to any testers except me! However, I think it’s important to understand that there is a big and influential profession wrestling with some of the issues facing testers.

Auditors have to think about how they work, what value they provide, what they should look for, what knowledge they can reasonably provide. Indeed, the more thoughtful auditors are thinking about what knowledge means in their context, how they can “know” things, what constitutes evidence and opinion.

This is epistemology, and it is fascinating. Thinking about this is not some esoteric academic exercise. If we are not clear about what we can know and how we should investigate and report on the knowledge that is available then the danger is that we will end up just faking the whole exercise. We will continue to dress up subjective opinons as “objective” binary verdicts; “yes” this is ok, “no” it isn’t.

Reality doesn’t become clearer simply by pretending that it can be reduced to binary opinions. Quite the reverse, messy reality is obscured by a binary approach. Auditors know that, or at least the clever ones do. There are plenty of smart and capable auditors out there, trying to make sense of what is going on.

The good ones are natural allies of good testers. Seek them out and make them your allies. As for the bad ones, well they are still around as my friend can testify. Their approach is inept and unprofessional. It might not be wise to use these words! It might be interesting to ask them some difficult questions about how they can square their approach with the views of the auditing establishment, the professional bodies and the regulators.

It’s a pity that the self appointed testing establishment, ISTQB and ISO, can’t take a similarly clear line. Sadly their silence effectively endorses binary opinions. Self appointed shouldn’t mean self interested.

Older Posts »

Categories

Follow

Get every new post delivered to your Inbox.

Join 52 other followers