Posted by: James Christie | June 25, 2014

Why do you need the report?

Have you ever wondered what the purpose of a report was, whether it was a status report that you had to complete, or a report generated by an application? You may have wondered if there was any real need for the report, and whether anyone would miss it if no-one bothered to produce it.

I have come across countless examples of reports that seemed pointless. What was worse, their existence shaped the job we had to do. The reports did not help people to do the job. They dictated how we worked; production, checking and filing of the reports for future inspection were a fundamental part of the job. In any review of the project, or of our our performance, they were key evidence.

My concern, and cynicism, were sharpened by an experience as an auditor when I saw at first hand how a set of reports were defined for a large insurance company. To misquote Otto von Bismarck’s comment on the creation of laws; reports are like sausages, it is best not to see them being made.

The company was developing a new access controls system, to allow managers to assign access rights and privileges to staff who were using the various underwriting, claims and accounts applications. As an auditor I was a stakeholder, helping to shape the requirements and advising on the controls that might be needed and on possible weaknesses that should be avoided.

One day I was approached by the project manager and a user from the department that defined the working practices at the hundred or so branch offices around the UK and Republic of Ireland. “What control reports should the access control system provide?” was their question.

I said that was not my decision. The reports could not be treated as a bolt on addition to the system. They should not be specified by auditors. The application should provide managers with the information they needed to do their jobs, and if it wasn’t feasible to do that in real time, then reports should be run off to help them. It all depended on what managers needed, and that depended on their responsibilities for managing access. The others were unconvinced by my answer.

A few weeks later the request for me to specify a suite of reports was repeated. Again I declined. This time the matter was escalated. The manager of the branch operations department sat in on the meeting. He made it clear that a suite of reports must be defined and coded by the end of the month, ready for the application to go live.

He was incredulous that I, as an auditor, would not specify the reports. His reasoning was that when auditors visited branches they would presumably check to see whether the reports had been signed and filed. I explained that it was the job of his department to define the jobs and responsibilities of the branch managers, and to decide what reports these managers would need in order to fulfill their responsibilities and do their job.

The manager said that was easy; it was the responsibility of the branch managers to look at the reports, take action if necessary, then sign the reports and file them. That was absurd. I tried to explain that this was all back to front. At the risk of stating the obvious, I pointed out that reports were required only if there was a need for them. That need had to be identified so that the right reports could be produced.

I was dismissed as a troublesome timewaster. The project manager was ordered to produce a suite of reports, “whatever you think would be useful”. The resulting reports were simply clones of the reports that came out from an older access control system, designed for a different technical and office environment, with quite different working practices.

The branch managers were then ordered to check them and file them. The branch operations manager had taken decisive action. The deadline was met. Everyone was happy, except of course the poor branch managers who had to wade through useless reports, and the auditors of course. We were dismayed at the inefficiency and sheer pointlessness of producing reports without any thought about what their purpose was.

That highlighted one of the weaknesses of auditors. People invariably listened to us if we pointed out that something important wasn’t being done. When we said that something pointless was being done there was usually reluctance to stop it.

Anything that people have got used to doing, even if it is wasteful, ineffective and inefficient, acquires its own justification over time. The corporate mindset can be “this is what we do, this is how we do it”. The purpose of the corporate bureaucracy becomes the smooth running of the bureaucracy. Checking reports was a part of a branch manager’s job. It required a mental leap to shift to a position where you have to think whether reports are required, and what useful reporting might comprise. It’s so much easier to snap, “just give us something useful” and move on. That’s decisive management. That’s what’s rewarded. Thinking? Sadly, that can be regard as a self-indulgent, waste of time.

However, few things are more genuinely wasteful of the valuable time of well paid employees than reporting that has no intrinsic value. Reporting that forces us to adapt our work to fit the preconceptions of the report designer gobbles up huge amounts of time and stop us doing work that could be genuinely valuable. The preconceptions that underpin many reports and metrics may once have been justified, and have fitted in with contemporary working practices. However, these preconceptions need to be constantly challenged and re-assessed. Reports and metrics do shape the way we work, and the way we are assessed. So we need to keep asking, “just why do you need the report?”

Posted by: James Christie | June 24, 2014

Teddy bear methods

Introduction

I wrote this article for Testing Planet a couple of months ago. Most of the feedback was positive, but I received some criticism on Twitter that I’ve been unfair on Structured Methods. I want to make it clear that I’m talking about the big, formal methodologies such as SSADM, and BIS Structured Analysis & Design, the particular variant we used where I worked. These are based on the work of Yourdon, Constantine and DeMarco.

My problem isn’t with individual techniques, many of which are valuable. I’ve always been obsessive about structured programming when I code. I was trained to regard that as simply good, responsible practice. I’ve also long been a fan of IDEF0, which I’ve personally found useful both as an auditor and a tester to clarify my understanding of what is going on, and what should be happening but maybe isn’t. IDEF0 is definitely a technique from structured methods.

So the problem isn’t with individual components. It’s with the way they are bolted together into a full blown, formal methodology. The development process is linear and rigid. Practitioners are encouraged, often mandated to follow the whole method, and nothing but the method. It is a brave project manager who deviates from the formal method knowing that if he/she delivers anything that is less than perfect there will be hard questions about failure to follow the prescribed process, which implicitly assumes that it is the only route to the Correct Solution. You then see the sort of scenario I’ve described in my article.

And here is the article

When I was a child my father often used to refer to me with an exasperated smile as “lawyer Jim”. He was amused by the way I would always argue and question, not in a disobedient or truculent way. I just wanted to know what was going on, and why things were happening the way they were. I demanded good explanations, and if the explanations didn’t make sense then the adult could expect a thorough cross-examination till we’d got to the heart of the matter.

I didn’t end up as a lawyer, but it’s not really surprising that I did become an auditor and a tester. I’ve always wanted to understand what’s happening around me. It has always struck me as a challenge if someone says “just because”, or “that’s the way we do things round here”.

When I left university I struggled for the first few years to make any sense at all of the places where I worked. I started off training to be a chartered accountant, but I hated that. It wasn’t the day to day auditing, it was the culture. I struggled to see how we were really doing anything useful, how our work added up to valuable information for the stakeholders. It all seemed a charade, and I wasn’t convinced anyone thought differently; it was just a good way to make money.

Structured Methods – something I just didn’t get

I ended up in IT, at a great place where I learned a huge amount; about business, people, organisations, IBM operating systems and utilities, how to have fun coding, all sorts. What I didn’t really learn much about was how to make the prevailing Structured Methods work.

Structured Methods were “The BIG Thing” then, in the mid 80s. They were intended to revolutionise the way that systems were developed. No longer would a bunch of geeky cowboys hack their way through the code till some ramshackle application could be dumped on the poor users. Structured Methods meant that rigorously trained, professional, software engineers would meticulously construct applications that perfectly reflected the requirements of the users.

There were a couple of problems with this, apart from the obvious one that the results were lousy. Firstly, you can’t specify requirements perfectly in advance in the way that Structured Methods assumed. The whole rickety shebang depended on some pretty ludicrous assumptions about requirements; that these could be specified precisely, accurately and in detail up front, and also that they could be largely inferred from the existing system and its flaws. It was a pretty limited view of what was possible.

The other huge problem was the attempt to make development a mechanical process. You churned out the artefacts from one stage, fed them into the next stage and so on along the production line. Fair enough, if there actually was a production line. The trouble was that that the results of the analysis weren’t really connected to the resulting design, not in any coherent and consistent way. The design depended on old-school, unfashionable things like skill, experience and judgement.

Developers would plod agonisingly through the process, generating a skip-load of incomprehensible documentation that no-one would ever read again, then at the crucial point of turning requirements into a physical design, they’d have to wing it.

Anyway, I didn’t get Structured Methods, and managed to arrange my career so I could go my own sweet way and keep enjoying myself.

I knew that Structured Methods didn’t work. That was pretty obvious. What I didn’t realise was that that was irrelevant. People weren’t using them because they worked, or even truly believed deep down that they worked. So what was going on?

Social defences

Let’s wind back a few decades for some interesting insights. Isabel Menzies Lyth was a distinguished psychoanalyst whose particular specialism was analysing the dynamics of groups and organisations.

In 1959 she set off a bomb under the medical profession in the UK with a paper called “The functions of social systems as a defence against anxiety” (PDF, opens in a new tab). She was writing about nursing, and argued that the main factors shaping an organisation’s structure and processes were the primary task, the technology used and the social and psychological needs of the people. The dominant factor was not the task or even the technology; it was the need for managers and nurses to cope with the stress and anxiety of a tough job.

As a result “social defences” were built to help people cope. The defences identified by Menzies Lyth included rigid processes that removed discretion and the need for decision making by nurses, hierarchical staffing structures, increased specialisation, and nurses being managed as fungible (i.e readily interchangeable) units, rather than skilled professionals.

These defences solidified the culture, structure and processes in hospitals. The rigidity damaged performance of the primary task, i.e. caring for patients, and racked up the stress for the nurses themselves. The defences were therefore strengthened. The response was to adapt the job so that nurses became emotionally distanced from the patients and their outcomes. Patients were increasingly regarded as subjects for treatment, rather than people. Sub-standard performance had to be defined as standard. Acceptance of sub-standard performance became part of the defence mechanism.

Practices in the health professions have certainly changed over the last half century, but Menzies Lyth’s insights into how people deal with stressful jobs in organisations have an important lesson for us.

David Wastell and transtional objects

In the 1990s a British academic, David Wastell, linked Menzies Lyth’s insights with work by Donald Winnicott, a paediatrician and psychoanalyst. Winnicott’s big contribution was the idea of the transitional object (PDF, opens in a new tab).

This is something that helps infants to cope with loosening the bonds with their mother. Babies don’t distinguish between themselves and their mother. Objects like security blankets and teddy bears give them something comforting to cling onto while they come to terms with the beginnings of independence in a big, scary world.

Wastell studied development shops that used Structured Methods. He interpreted his findings in the light of the work of Menzies Lyth and Winnicott. Wastell found that the way that developers used the method, and their mindset, meant that Structured Methods had become a transitional object, i.e. a defence mechanism to alleviate the stress and anxiety of a difficult job (see “The fetish of technique, methodology as a social defence”, not free I’m afraid).

Wastell could see no evidence from his own studies, or from the research literature, to suggest that Structured Methods worked. The evidence was that the resulting systems were no better than the old ones, took much longer to develop and were more expensive.

I could recognise the patterns Wastell was describing. Managers became hooked on the technique and lost sight of the true goal.

“Methodology becomes a fetish, a procedure used with pathological rigidity for its own sake, not as a means to an end. Used in this way, methodology provides a relief against anxiety; it insulates the practitioner from the risks and uncertainties of real engagement with people and problems.”

Teddy bears are certainly great for toddlers. They are cute and cuddly, and they help children as they learn about the world. However, Wastell was at pains to point out that it is deeply harmful when developers cling on to their own transitional objects. Systems development has to be a process of learning, he argued (see “Learning dysfunctions in information systems development…”). Seizing on Structured Methods as transitional objects, and obsessing about compliance with a rigid technique, prevented learning.

In chasing a warped vision of “maturity” the proponents and users of Structured Methods were perversely refusing to grow into true maturity. Practitioners became trapped, and never grew beyond the teddy bear stage.

Like the nurses and managers in Menzies Lyth’s study, developers were defining sub-standard levels of performance as the professional standard. The rigid techniques were not really helping the nurses who treating patients. The managerial processes did help the managers to deal with stress and anxiety, but they made the job harder for the nurses to cope with, which in turn led to nurses concentrating on process rather than patients.

Likewise in systems development techniques that help the managers to cope with stress and anxiety, and which give them an illusory, reassuring sense of control, are harmful to the team members. They have to cope by focussing on the technique, mastery of the tool, or compliance with the standard. In doing that they can feel that they are doing a good job – so long as they don’t have to think about whether they are really working towards the true ends of the organisation.

Trusting the method, or asking awkward questions?

Sadly, the more mature and thoughtful nurses in Menzies Lyth’s study were less likely to complete their training and stay in the profession. It was the less thoughtful, less emotionally mature nurses who could adapt more easily to the regime.

Perhaps that was why I was uncomfortable with Structured Methods, and with the idea of standards and rigid processes. My parents always encouraged my awkward questioning. It was important to question, argue and think. If you want to grow up you have to face the reality that worthwhile jobs are going to bring some stress and anxiety. We have to be honest enough to face up to that, rather than pretend that a cute and cuddly new standard, method or technique will conjure up an easier life. Maybe I was right in the first few years after I left university and the people I was watching really didn’t understand why they were working the way they were, or how their activities helped achieve anything worthwhile.

Tough jobs don’t become easy just because you redefine them in a way that makes them feel easier. That way you just risk detaching your activities from the true objectives you’re being paid to achieve. Sooner or later someone is going to catch on and realise that you’re getting paid for something that’s of little value. If you’re not doing anything useful there will always be someone who can do it cheaper.

High quality systems depend on good people, with the right skills and experience. Their managers have to be strong enough, and honest enough, to confront problems, and recognise psychological strains. They shouldn’t evade them. If they want something to help them cope with the stress and anxiety, buy them some teddy bears!

Posted by: James Christie | February 7, 2014

Testing: valuable or bullshit?

I’ve recently been thinking about automation, not specifically about test automation, rather about the wider issue of machines replacing humans and how that might affect testers.Frey, C. Osborne, M. 'the future of employment' Oxford Univ 2013

It started when I discussed this chart with a friend, who is a church pastor. He had spotted that there was only a probability of 0.08% that computerisation would result in job losses for the clergy in the next two decades.

I was intrigued by the list, in particular the 94% probability that robots would take over from auditors. That’s nonsense. Auditors are now being asked to understand people, risks and assess corporate culture. They are moving away from the style of auditing that would have lent itself to computerisation.

Tick and bash compliance checking auditing is increasingly seen as old fashioned and discredited. Of course, much of the auditing done prior to the financial crash was a waste of time, but the answer is to do it properly, not replace inadequate human practices with cheap machines.

I periodically see similar claims that testing can be fully automated. The usual process is to misunderstand what a job entails, define it in a way that makes it amenable to automation, then say that automation is inevitable and desirable.

If the job of a pastor were to stand at the front of the church and read out a prepared sermon, then that could be done by a robot. However, the authors of this study correctly assumed that the job entails rather more than that.

Drill into the assumptions behind claims about automation and you will often find that they’re all hooey. Or that was my particular presumption, however. Confirmation bias isn’t just something affects other people!

The future of employment

So I went off to have a look at the sources for that study. Here is the paper itself, “The future of employment” (PDF, opens in a new tab) by Frey and Osborne.

The first think that struck me was the table above was only a very small selection of the jobs covered in the study. Here are the other jobs that are concerned with IT.

Job Probability
Software developers (applications) 0.04
Software developers (systems software) 0.13
Information security analysts, web developers, network architects 0.21
Computer programmers 0.48
Accountants & auditors 0.94
Inspectors, testers, sorters, samplers, & weighers 0.98

 

So testers are in even greater danger of being automated out of a job than auditors. You can see what’s going on. Testers have been assigned to a group that defines testing as checking. (See James Bach’s and Michael Bolton’s discussion of testing and checking). Not surprisingly the authors have then reasoned that testing can be readily automated.

There are some interesting questions raised by this list. The chances are almost 50:50 that computer programming jobs will be lost, yet 24 to 1 against applications software developers losing their jobs. Are these necessarily different jobs? Or is it just cooler to be a software developer than a computer programmer?

In fairness to the authors they are using job categories defined by the US Department of Labor. It’s also worth explaining that the authors don’t actually refer to the probability figure as being the probability that jobs would be lost. That would have made the conclusions meaningless. How many jobs would be lost? A probability figure could apply only to a certain level of job loss, e.g. 90% probability that 50% of the jobs would go, or 10% probability that all jobs would be lost.

The authors are calculating the “probability of computerisation”. I think they are really using susceptibility to computerisation as a proxy for probability. That susceptibility can be inferred from the characteristics that the US Department of Labor has defined for each of these jobs.

The process of calculating the probability is summarised as follows.

“…while sophisticated algorithms and developments in MR (mobile robotics), building upon big data, now allow many non-routine tasks to be automated, occupations that involve complex perception and manipulation tasks, creative intelligence tasks, and social intelligence tasks are unlikely to be substituted by computer capital over the next decade or two. The probability of an occupation being automated can thus be described as a function of these task characteristics.”

So testing is clearly lined up with the jobs that can be automated by sophisticated algorithms that might build upon big data. It doesn’t fall in with the jobs that require complex perception, creative intelligence and social intelligence.

Defining testing by defining the bugs that matter

Delving into the study, and its supporting sources, confirms this. The authors excitedly cite studies that have found that technology can spot bugs. Needless to say the “bugs” have been defined as those that technology can find. NB – all links are to PDFs, which open in a new tab.

Algorithms can further automatically detect bugs in software Hangal and Lam, 2002; Livshits and Zimmermann, 2005; Kim et al., 2008), with a reliability that humans are unlikely to match. Big databases of code also offer the eventual prospect of algorithms that learn how to write programs to satisfy specifications provided by a human. Such an approach is likely to eventually improve upon human programmers, in the same way that human-written compilers eventually proved inferior to automatically optimised compilers… Such algorithmic improvements over human judgement are likely to become increasingly common.

There we have it. Testing is just checking. Algorithms are better than human judgement at performing the tasks that we’ve already framed as being more suited to algorithms. Now we can act on that and start replacing testers. Okaaay.

The supporting sources are a variety of papers outlining what are essentially tools, or possible tools, that can improve the quality of coding. A harsh verdict would be that their vision is only to improve unit testing. Note the range of dates, going back to 2002, which I think weakens the argument that one can use them to predict trends in the two decades from now. If these developments are so influential why haven’t they already started to change the world?

However, I don’t want to go into the detail of these papers, or whether they are valid. I’m quite happy to accept that they are correct and make a useful contribution within the limits of their relevance. I do think these limits are tighter than the authors have assumed, but that’s not what concerns me.

The point is that confusing testing with checking places testers at the front of the line to be automated. If you define the bugs that matter as those that can be caught by automation then you define testing in a damaging way. That would be bad enough, but too many people in IT and the testing profession have followed policies and practices that keep testers firmly in the firing line.

There are four broad reasons for this;

  • a widely presented false choice between automated and manual testing,
  • a blindness to the value of testing that leads to value being sacrificed in attempts to cut costs,
  • testing standards, which encourage a mechanical and linear approach,
  • a general corporate willingness to create and tolerate meaningless jobs.

Automated or manual testing – a false choice

The subtext of the manual versus automated false dichotomy seems to be that manual is the poor, unprofessional relation of high quality, sophisticated automation. I wonder if part of the problem is a misplaced belief in the value of repeatability, for which CMMI has to take its full share of the blame.

The thinking goes, if something can be automated it is repeatable; it can be tailored to be precise, continually generating accurate, high quality results. Automated testing and “best practice” go hand in glove.

In contrast, manual testing seems frustratingly unpredictable. The actions of the testers are contingent. I think that is an interesting word. I like to use it in the way it is used in philosophy and logic. Certain things are not absolutely true or necessarily so; they are true or false, necessary or redundant, depending on other factors or on observation. Dictionaries offer subtly different alternative meanings. Contingent means accidental, casual or fortuitous according to dictionary.com. These are incredibly loaded words that are anathema to people trying to lever an organisation up through the CMMI levels.

I understand “contingent”, as a word and concept, as being neutral, useful and not particularly related to “repeatable”, certainly not its opposite. It is sensible and pragmatic to regard actions in testing as being contingent – it all depends. Others do regard “contingent” and “repeatable” as opposites; “contingent” then becomes evidence of chaos and unpredictability that can be cleaned up with repeatable automation.

Some people regard “it all depends” as an honest statement of uncertainty. Others regard it a weak and evasive admission of ignorance. There has always been a destructive yearning for unattainable certainty in software development.

To be clear, I am not decrying repeatability. I mean only that putting too much emphasis on it is unhelpful. Placing too much emphasis on automation because it is repeatable, and decrying manual testing, sells true testing short. It demeans testing, and makes it more vulnerable to the prophets of full automation.

Costs chasing value downwards in a vicious cycle

There are a couple of related vicious circles. Testing that is rigid, script-driven and squeezed at the end of project doesn’t provide much value. (Iain McCowatt wrote a great blog about this.)

So unless project managers are prepared to step back and question their world view their response, entirely rational when viewed from a narrow perspective, is to squeeze testing further.

When the standard of testing is as poor as it often is on traditional projects there is little value to be lost by squeezing testing harder because the reduced costs more than compensate for the reduced benefits. So the cycle continues.

Meanwhile, if value is low then the costs become more visible and harder to justify. There is pressure to reduce these costs, and if you’re not getting any value then just about any cost-cutting measure is going to look attractive. So we’re heading down the route of outsourcing, offshoring and the commoditization of testing. Testing is seen as an undifferentiated commodity, bought and sold on the basis of price. The inevitable pressure is to send cost and prices spiralling down to the level set by the lowest cost supplier, regardless of value.

If project managers, and their corporate masters, were prepared to liberate the testers, and ensure that they were high quality people, with highly developed skills, they could do vastly more effective and valuable work at a lower cost. But that comes back to questioning their world view. It’s always tough to make people do that when they’ve built a career on sticking to a false world view.

Standards

And now I come to the classic false world view pervading testing; the idea that it should be standardised. I have written about this before, and I’d like to quote what I wrote in my blog last November.

Standards encourage a dangerous illusion. They feed the hunger to believe, against all the evidence, that testing, and software development in general, are neat, essentially linear activities that can be be rendered orderly and controllable with sufficient advance documentation. Standards feed the illusion that testing can be easier than it really is, and performed by people less skilled than are really needed.

Standards are designed to raise the status of testing. The danger is that they will have the opposite result. By focussing on aspirations towards order, repeatability and predictability, by envisaging testing as a checking exercise, the proponents of testing will inadvertently encourage others to place testing at the front of the queue for automation.

Bullshit jobs

This is probably the most controversial point. Technological innovation has created the opportunity to do things that were previously either impossible or not feasible. Modern corporations have grown so complex that merely operating the mechanics of the corporate bureaucracy has become a job in itself. Never mind what the corporation is supposed to achieve, for huge numbers of people the end towards which they are working is merely the continued running of the machine.

Put these two trends together and you get a proliferation of jobs that have no genuine value, but which are possible only because there are tools to support them. There’s no point to them. The organisation wouldn’t suffer if they were dispensed with. The possibility of rapid communication becomes the justification for rapid communication. In previous times people could have been assigned responsibility to complete a job, and left to get on with it because the technology wasn’t available to micro-manage them.

These worthless jobs have been beautifully described as ”bullshit jobs” by David Graeber. His perspective is that of an anthropologist. He argues that technological progression has freed up people to do these jobs, and it has suited the power of financial capital to create jobs to keep otherwise troublesome people employed. Well, you can decide that for yourself, but I do think that these jobs are a real feature of modern life, and I firmly believe that such jobs will be early candidates for automation. If there is a nagging doubt about their value, and it they’re only possible because of technology, why not go the whole hog and automate them?

What’s that got to do with testing you may ask? Testing frequently falls into that category. Or at least testing as it is often performed; commoditized, script-driven, process-constrained testing. I’ve worked as a test manager knowing full well that my role was pointless. I wasn’t there to drive good testing. The team wasn’t being given the chance to do real testing. We were just going through the motions, and I was managing the testing process, not managing testing.

Most of my time was spent organising and writing plans that would ultimately bear little relation to the needs of the users or the testers. Then, during test execution, I would be spending all my time collating daily progress reports for more senior managers who in turn would spend all their time reading these reports, micro-managing their subordinates, and providing summaries for the next manager up the hierarchy; all pointless. No-one was prepared to admit that the “testing” was an expensive way to do nothing worthwhile. It was unthinkable to scrap testing altogether, but no-one was allowed to think their way to a model that allowed real testing.

As Graeber would put it, it was all bullshit. Heck, why not just automate the lot and get rid of these expensive wasters?

Testing isn’t meant to be easy – it’s meant to be valuable

This has been a long article, but I think the message is so important it’s worth giving some space to my arguments.

Too many people outside the testing profession think of testing as being low status checking that doesn’t provide much value. Sadly, too many people inside the profession unwittingly ally themselves with that mindset. They’d be horrified at the suggestion, but I think it’s a valid charge.

Testers should constantly fight back against attempts to define them in ways that make them susceptible to replacement by automation. They should always stress the importance of sapient, intelligent testing. It’s not easy, it’s not mechanical, it’s not something that beginners can do to an acceptable standard simply by following a standardised process.

If testers aren’t going to follow typists, telephonists and filing clerks onto the scrapheap we have to ensure that we define the profession. We must do so in such a way that no-one could seriously argue that there is a 98% chance of it being automated out of existence.

98%? We know it’s nonsense. We should be shouting that out.

Posted by: James Christie | January 7, 2014

DRE: changing reality so we can count it

It’s usually true that our attitudes and beliefs are shaped by our early experiences. That applies to my views on software development and testing. My first experience of real responsibility in development and testing was with insurance financial systems. What I learned and experienced will always remain with me. I have always struggled with some of the tenets of traditional testing, and in particular the metrics that are often used.

There has been some recent discussion on Twitter about Defect Removal Efficiency. It was John Stephenson’s blog that set me thinking once again about DRE, a metric I’d long since consigned to my mental dustbin.

If you’re unfamiliar with the metric it is the number of defects found before implementation expressed as a percentage of all the defects discovered within a certain period of going live (i.e live defects plus development defects). The cut off is usually 90 days from implementation. So the more defects reported in testing and the fewer in live running the higher the percentage, and the higher the quality (supposedly). A perfect application would have no live defects and therefore a DRE score of 100%; all defects were found in testing.

John’s point was essentially that DRE can be gamed so easily that it is worthless. I agree. However, even if testers and developers tried not to manipulate DRE, even if it couldn’t be gamed at all it would still be an unhelpful and misleading metric. It’s important to understand why so we can exercise due scepticism about other dodgy metrics, and flawed approaches to software development and testing.

DRE is based on a view of software development, testing and quality that I don’t accept. I don’t see a world in which such a metric might be useful, and it contradicts everything I learned in my early days as a team leader, project manager and test manager.

Here are the four reasons I can’t accept DRE as a valid metric. There are other reasons, but these are the ones that matter most to me.

Software development is not a predictable, sequential manufacturing activity

DRE implicitly assumes that development is like manufacturing, that it’s a predictable exercise in building a well understood and defined artefact. At each stage of the process defects should be progressively eliminated, till the object is completed and DRE should have reached 95% (or whatever).

You can see this sequential mindset clearly in this article by Capers Jones, “Measuring Defect Potentials and Defect Removal Efficency” (PDF, opens in new tab) from QA Journal in 2008.

“In order to achieve a cumulative defect removal efficiency of 95%, it will be necessary to use approximately the following sequence of at least eight defect removal activities:

• Design inspections
• Code inspections
• Unit test
• New function test
• Regression test
• Performance test
• System test
• External Beta test

To go above 95%, additional removal stages will be needed. For example requirements inspections, test case inspections, and specialized forms of testing such as human factors testing, performance testing, and security testing add to defect removal efficiency levels.”

Working through sequential “removal stages” is not software development or testing as I recognise them. When I was working on these insurance finance systems there was no neat sequence through development with defects being progressively removed. Much of the early development work could have been called proof of concept. It wasn’t a matter of coding to a specification and then unit testing against that spec. We were discovering more about the problem and experimenting to see what would work for our users.

Each of these “failures” was a precious nugget of extra information about the problem we were trying to solve. The idea that we would have improved quality by recording everything that didn’t work and calling it a defect would have been laughable. Yet this is the implication of another statement by Capers Jones in a paper on the International Function Point Users Group website (December 2012), “Software Defect Origins and Removal Methods” (PDF, opens in new tab).

“Omitting bugs found in requirements, design, and by unit testing are common quality omissions.”

So experimenting to learn more about the problem without treating the results as formal defects is a quality omission? Tying up developers and testers in bureaucracy by extending formal defect management into unit testing is the way to better quality? I don’t think so.

Once we start to change the way people work simply so that you can gather data for metrics we are not simply encouraging them to game the system. It is worse than that. We are trying to change reality to fit our ability to describe it. We are pretending we can change the territory to fit the map.

Quality is not an absence of something

My second objection to DRE in principle is quite simple. It misrepresents quality. ”Quality is value to some person” as Jerry Weinberg famously said in his book “Quality Software Management: Systems Thinking”.

The insurance applications we were developing were intended to help our users understand the business and products better so that they could take better decisions. The quality of the applications was a matter of how well they helped our users to do that. These users were very smart and had a very clear idea of what they were doing and what they needed. They would have bluntly and correctly told us we were stupid and trying to confuse matters by treating quality as an absence of defects. That takes me on to my next objection to DRE.

Defects are not interchangeable objects

A defect is not an object. It possesses no qualities except those we choose to grant it in specific circumstances. In the case of my insurance applications a defect was simply something we didn’t understand that required investigation. It might be a problem with the application, or it might be some feature of the real world that we hadn’t known about and which would require us to change the application to handle it.

We never counted defects. What is the point of adding up things I don’t understand or don’t know about? I don’t understand quantum physics and I don’t know off hand what colour socks my wife is wearing today. Adding the two pieces of ignorance together to get two is not helpful.

Our acceptance criteria never mentioned defect numbers. The criteria were expressed in accuracy targets against specific oracles, e.g. we would have to reconcile our figures to within 5% of the general ledger. What was the basis for the 5% figure? Our users knew from experience that 95% accuracy was good enough to let them take significantly better decisions than they could without the application. 100% was an ideal, but the users knew that the increase in development time to try and reach that level of accuracy would impose a significant business cost because crucial decisions would have had to be taken blindfolded while we tried to polish up a perfect application.

If there was time we would investigate discrepancies even within the 5% tolerance. If we went above 5% in testing or live running then that was a big deal and we would have to respond accordingly.

You may think that this was a special case. Well yes, but every project has its own business context and user needs. DRE assumes a standard world in which 95% DRE is necessarily better than 90%. The additional cost and delay of chasing that extra 5% could mean the value of the application to the business is greatly reduced. It all depends. Using DRE to compare the quality of different developments assumes that a universal, absolute standard is more relevant than the needs of our users.

Put simply, when we developed these insurance applications, counting defects added nothing to our understanding of what we were doing or our knowledge about the quality of the software. We didn’t count test cases either!

DRE has a simplistic, standardised notion of time

This problem is perhaps related to my earlier objection that DRE assumes developers are manufacturing a product, like a car. Once it rolls off the production line it should be largely defect free. The car then enters its active life and most defects should be revealed fairly quickly.

That analogy made no sense for insurance applications, which are highly date sensitive. Insurance contracts might be paid for up front, or in instalments, but they earn money on a daily basis. At the end of the contract period, typically a year, they have to be renewed. The applications consist of different elements performing distinct roles according to different timetables.

DRE requires an arbitrary cut off beyond which you stop counting the live defects and declare a result. It’s usually 90 days. Applying a 90 day cut-off for calculating DRE and using that as a measure of quality would have been ridiculous for us. Worse, if that had been a measure for which we were held accountable it would have distorted important decisions about implementation. With new insurance applications you might convert all the data from the old application when you implement the new one. Or you might convert policies as they come up for renewal.

Choosing the right tactics for conversion and implementation was a tricky exercise balancing different factors. If DRE with a 90 day threshold were applied then different tactics would give different DRE scores. The team would have a strong incentive to choose the approach that would produce the highest DRE score, and not necessarily the one that was best for the company.

Now of course you could tailor the way DRE is calculated to take account of individual projects, but the whole point of DRE is that people who should know better want to make comparisons across different projects, organisations and industries and decide which produces greater quality. Once you start allowing for all these pesky differences you undermine that whole mindset that wants to see development as a manufacturing process that can be standardised.

DRE matters – for the wrong reasons

DRE might be flawed beyond redemption but metrics like that matter to important people for all the wrong reasons. The logic is circular. Development is like manufacturing, therefore a measure that is appropriate for manufacturing should be adopted. Once it is being used to beat up development shops who score poorly they have an incentive to distort their processes to fit the measure. You have to buy in the consultancy support to adapt the way you work. The flawed metric then justifies the flawed assumptions that underpin the metric. It might be logical nonsense, but there is money to be made there.

So DRE is meaningless because it can be gamed? Yes, indeed, but any serious analysis of the way DRE works reveals that it would be a lousy measure, even if everyone tries to apply it responsibly. Even if it were impossible to game it would still suck. It’s trying to redefine reality so we can count it.

Posted by: James Christie | December 17, 2013

“This could easily be the testing industry”

My article “Testing standards? Can we do better?” attracted a lot of attention. A couple of weeks after I wrote it I wondered if I’d overstated the case against standards, and the danger that they pose to good testing. I went back to read the article again and decided that it was all entirely reasonable. The case against standards is actually far stronger. I merely touched on a few angles. If you want more meat go to this article I wrote in 2012, and check out the links to Michael Bolton’s work.

I’m returning to the subject today because of an exchange on Twitter (PDF, opens in new tab). I expressed my concern at a possible future in which testing is governed by certification and standards, both of which are mandated by contracts that refer to them. This would be “best practice”. It would be what responsible professionals do, and those who dissent would be wilfully insisting on working in an unprofessional, irresponsible manner. They would be consciously taking money from clients with the intention of doing a sub-standard job.

That’s the conclusion I have to draw from a “white paper” by Testing Solutions Group promoting the ISO 29119 testing standard.

Imagine an industry where qualifications are based on accepted standards, required services are specified in contracts that reference these same standards, and best industry practices are based on the foundation of an agreed body of knowledge – this could easily be the testing industry of the near future.

That is a prospect that alarms and depresses me. I don’t think it will happen so long as good, responsible testers continue to speak out. However, it might happen in the way that Pete Walen suggested in the Twitter exchange; if the standards lobby get the ear of legislators who could mandate that public sector projects must be compliant with standards, or if they decide that non-compliance could be prima facie evidence of negligence.

Well, that won’t happen while I’m in testing. If that future ever comes to pass my career in testing will be over. I have worked in the painful, inflexible and dysfunctional way that invariably follows mandatory, standards-driven contracts. I’ve no interest in trying to do the wrong things more efficiently, or in a slightly more up-to-date fashion. I will walk away without looking back.

The future of testing? Please don’t let that happen. At the very least, don’t let it happen “easily”!

Edit. A petition has been set up in August 2014 to calling for ISO to withdraw ISO 29119 on the ground that it lacks the consensus that its own rules requires. Consensus is defined in ISO/IEC Guide 2:2004 as follows.

“Consensus: General agreement, characterized by the absence of sustained opposition to substantial issues by any important part of the concerned interests and by a process that involves seeking to take into account the views of all parties concerned and to reconcile any conflicting arguments.”

The petition argues, correctly, that there is no consensus. Further, the process did not seek to take into account the views of all parties concerned. The standard reflects one particular view of how testing should be conducted and marginalises those who disagree. If governments and companies insist that ISO 29119 should be adopted, and that suppliers should comply, this will have a dramatic, and damaging, effect on testing and our careers.

I urge all testers to sign the petition.

Posted by: James Christie | November 25, 2013

In praise of ignorance

My EuroSTAR 2013 tutorial in Gothenburg was titled “Questioning auditors questioning testing”. Not surprisingly a recurring theme of the tutorial was risk. Do we really understand risk? How do we deal with it? These are important questions for both testers and auditors.

I argued that both auditors and testers, in their different ways, have struggled to deal with risk. The failure of auditors contributed to the financial crash of 2007/8. The problems within the testing profession may have been less conspicuous, but they have had a significant impact on our ability to do an effective job.

One of the issues I discussed was our tendency to perform naïve and mechanical risk assessments. I’m sure you’ve seen risk matrices like this one from the Health and Safety Executive, the UK Government inspectorate responsible for regulating workplace safety.

UK HSE risk matrix

There are two fundamental problems with such a matrix that should make testers wary of using it.

Firstly, it implies that cells in the matrix with equal scores reflect equally acceptable positions. Is that really the case? Is a trivial chance of a catastrophic outcome genuinely as acceptable as the near certain chance of a trivially damaging outcome? The HSE deals with the sort of risks that lead national news bulletins when they come to pass; they regulate the nuclear industry, chemical plants and North Sea oil rigs.

I suspect the HSE takes a rather more nuanced approach to risks than is implied by the scoring in the matrix.

The second basic problem with these risk matrices is that we often lack the evidence to assign credible estimates of probability and impact to the risks.

This problem applies in particular to probabilities. Is there a reasonable basis for the figures we’ve assigned to the probabilities? Are they guesses? Are we performing precisely engineered and sophisticated calculations that are ultimately based on arbitrary or unsound assumptions? It makes a huge difference to the outcomes, but we can be vague to the point of cluelessness about the basis for these calculations.

Such a matrix may be relevant for risks where both the probability and the likely impact of something going wrong are well understood. That is often not the case during the early stages of a software development when the testing is being planned.

What’s the point of putting a number on the probability?

Whilst I was preparing my tutorial I came across an interesting case that illustrated the limitations of assigning probabilities when we’ve no experience or historic data.

Enrico Fermi

Enrico Fermi

I was reading about the development of the atomic bomb during the Second World War. Before the first bomb was tested the scientists were concerned about the possibility that a nuclear explosion might set the atmosphere on fire and wipe out life on earth. Enrico Fermi, the brilliant Italian nuclear physicist who worked on the development of the atomic bomb, estimated the probability of such a catastrophe at 10%.

I was astonished. How could anyone have taken the decision to explode an atomic bomb after receiving such scientific advice? My curiousity was aroused and I did some background reading on the episode. I learned that Fermi had also been asked in 1939 for his estimate of the probability that nuclear fission could be controlled for power or weapons. His estimate was 10%.

Then, in a separate article, I discovered that in 1950 he had estimated the probability that humans would have developed the technology to travel faster than light by 1960. You’ve guessed it. The answer was 10%.

Apparently Fermi had the reputation for being a sound estimator, when (and this is a crucial qualification) he had the information to support a reasonable estimate. Without such information he was clearly liable to take a guess. If something might happen, but he thought it unlikely, then he assigned a probability of 10%.

I think most of us do no better than Fermi. Indeed, the great majority are probably far worse. Are we really any more capable than Enrico Fermi of assigning probabilities to a naïve risk matrix that would allow simple, mechanical calculations of relative outcomes?

I strongly suspect that if Enrico Fermi had thought anyone would take his estimates and slot them into a simplistic risk formula to guide decision making then he’d have objected. Yet many of us see nothing wrong with such a simplistic approach to risk. I wonder if that’s simply because our risk assessments are little more than a tickbox exercise, a task that has to be knocked off to show we are following “the process”.

The incertitude matrix – uncertainty, ambiguity and ignorance

The risk matrix clearly assumes greater knowledge of probabilities and outcomes than we usually have. A more useful depiction of the true situation is provided by O’Riordan and Cox’s incertitude matrix.incertitude

In this representation the conventional risk matrix occupies only the top left hand corner. We are in a position to talk about risk only when we have well defined outcomes and a realistic basis for assessing the probabilities.

If we understand the outcomes, but not the probabilities then we are in a state of uncertainty. If we understand the probabilities of events, but not the outcomes then we are dealing with ambiguity.

Ambiguity is an interesting situation. It seems more relevant to scientific problems than software development. My wife works in the field of climate change adaptation for a Scottish Government agency. She recognises ambiguity in her line of work, where the probability of initial events might be reasonably well understood, but it isn’t possible to define the outcomes. Feedback mechanisms, or an unknown tipping point, might turn a benign outcome into a catastrophic one in ways we can’t predict with confidence.

One area where ambiguity could exist in software development is in the way that social media can create entirely unpredictable outcomes. An error that might have had little impact 10 years ago could now spiral into something far more serious if it catches people’s attention and goes viral.

Nevertheless, uncertainty, rather than ambiguity, is probably the quadrant where testers and developers are more likely to find themselves. Here, we can identify outcomes with confidence, but not assign meaningful probabilities to them.

However, uncertainty is unlikely to be a starting point. To get there we have to know what part of the product or application could fail, how it might fail and what the effect would be. We might sometimes know that at the start, if this is a variant on a well understood product, but often we have to learn it all.

The usual starting point, our default position, should be one of ignorance. We don’t know what can go wrong and what the impact might be, and we almost certainly don’t know anything with confidence about the probabilities.

Ignorant and proud!

Sadly, in business as well as software development, an honest admission of ignorance is seen as weakness. The pretence that we know more than we do is welcomed and admired rather than being derided as dangerous bluster. Such misplaced confidence leads to disappointment, stress, frustration, misdirected effort, and actually makes it harder to learn about products and applications. We deceive ourselves that we know things that we don’t, and stop digging to find out the true situation.

So please, speak up for ignorance! Only if we admit what we truly don’t know can we hope to learn the lessons that will give our stakeholders the insights that they need. Surely we need to look for and understand the most damaging failures before we start thinking of assigning probabilities that might guide the rest of our testing. Don’t assume that knowledge we’ve not gained can ever be a valid starting point for calculations of risk. Knowledge has to be earned, not assumed.

Posted by: James Christie | November 18, 2013

Thinking the impossible? Or wishing for the impossible?

At EuroSTAR 2013 in Gothenburg there was a striking contrast between messages coming out of tutorials that were taking place at the same time.

Ian Rowland was talking about how we can do amazing things by “thinking the impossible”. Meanwhile, along the corridor I was giving out a much more mundane and downbeat message in my tutorial about how testers can work constructively with auditors.

I was talking about how auditors are allergic to statements of brainless optimism. Plans should be based on evidence that they are achievable, not on wishful thinking that defies the evidence.

You might think Ian was contradicting me, but I was entirely happy with his message when he repeated it in a later keynote.failure is not an option

In my tutorial I referred to a tweet from James Bach that made the telling point that “people who say failure is not an option are in fact selecting the failure option: by driving truth away”.

brainless optimism slideI backed that up with a slide showing a tiresome and glib illustration of a little man boldly turning “impossible” into “possible”, with two strokes of a pen. That sort of unthinking positivity really riles me.

The unthinking “can do” spirit

As an auditor I regularly reviewed project plans and frequently they were implausibly optimistic. We were rarely in a position to block such plans. It wasn’t our place to do so. That was a decision for operational and IT management. We would comment, but ultimately management was responsible for the decision to proceed and for the consequences.

Only once was I a direct stakeholder. I insisted that a project should be replanned. The intention was to carry out user testing, user training and then start a phased roll-out during the six week English school summer holidays. That’s when every office was running with reduced staff levels. Initially I was rather unpopular, but the project team were quietly relieved once the responsibility was taken out of their hands. They’d been under unreasonable pressure to go live in September. It was never going to happen.

In that case I was able to defend the project team, but more often I saw the damaging effect on staff who were committed to unrealistic, ludicrously optimistic timescales.

I once interviewed a Chief Technology Officer, who candidly admitted that the culture of the company was that it was far better to say “Yes we can” and then emerge from the project sweaty, exhausted and hopelessly late than it was to say at the start how long it would actually take. He said the users were never prepared to accept the truth up front.

I remember another project whose schedule required a vital user expert to be available for 10 days in November. She already had two weeks holiday booked, and was committed to 20 days working for another project, all in November, a total of 40 working days. Of course both projects were business critical, with fixed target dates that had been dumped on them by senior management. Both projects were late – of course.

If auditors are involved early on in the planning of projects they can sniff out such problems, or force them to be confronted. Sometimes projects are well aware of problems that will stop them hitting their targets but they are scared to flag them up. The problem is kicked into the long grass in the hope that dealing with an urgent problem further down the line will be less damaging than getting a reputation for being negative by complaining early on.

That fear might seem irrational, even bizarre, but it is entirely justifiable. I reviewed a development that had had serious problems and was very late. The development team lead had said the schedule was unrealistic given the budget and available staff. She was removed from her role. Her replacement said she could do it. Eventually the development work was completed in about the time the original team lead had predicted. The successor was praised for her commitment, whereas the original team lead was judged to lack the right attitude. Her career was blighted, and she was deeply disillusioned when I interviewed her.

Be lucky! That’s an order!

Usually when senior management overdoses on the gung ho spirit and signs up to the John Wayne school of inspirational leadership the result isn’t “thinking the impossible”. The result is an order to the troops to be lucky – freakishly lucky. This isn’t thinking the impossible. It’s thinking the impossible will happen if we switch off our brains and treat the troops like disposable cannon fodder.

If the results of unthinking, high volume managerial optimism have been appalling in the past then the evidence is that they will be appalling in the future. There has to be a reason to assume things will get better, and brainless optimism provides remarkably poor evidence.

If your experience of following standards, inappropriate “best practices” and excessively prescriptive processes has been dismal then you won’t get better results in future from sheer optimism, force of will and a refusal to acknowledge the truth.

Insistence on trying to do the wrong things better and faster is not only irrational, it is a deep insult to the people whose working lives are being made miserable. If you have experienced persistent failure and insist on continuing to work in the same way then the clear implication is that failure is the fault of the people, the very people who ensured anything worthwhile was ever delivered.

Ian Rowland had a marvellous and inspirational message. Sadly it’s a message that can prove disastrous if it’s picked up by the sort of managers who regard thinking and reflection as a frivolous waste of time. Be inspired by Ian’s message. Just be very careful who you share it with! Yes, thinking the impossible is wonderful. But that really does require thinking, not wishing!

Posted by: James Christie | November 12, 2013

Testing standards? Can we do better?

At EuroSTAR 2013 I had a brief disagreement about software testing standards with Stuart Reid. To be more accurate, I was one of a group of sceptics pressing Stuart, who was putting up a battling defence of standards. He has been working on the new standard ISO 29119 and made a very interesting and revealing point. He insisted that the critics of standards don’t understand their true nature; they are not compulsory.

The introduction to standards makes it clear that their use is optional. They become mandatory only if someone insists that they must be applied in a certain context, often by writing them into a contract or a set of in-house development standards. Then, and only then, is it appropriate to talk about compulsion. That compulsion comes not from the standard itself, but from the contract or the managerial directive.

I found that argument unconvincing. Indeed I thought it effectively conceded the argument and amounted to no more than a plea in mitigation rather than a plausible defence.

Even a cursory analysis of this defence reveals that it is entirely specious, merely a statement of the obvious. Of course it is a choice made by people to make standards mandatory, but that choice is heavily influenced by the quite inappropriate status of IEEE 829 and, in all likelihood ISO 29119, as standards. Calling them standards gives them a prestige and authority that would be missing if they were called guidelines. The defenders of standards usually want it both ways. They refer to standards when they are making an implicit appeal to authority. They refer to the standards as guidelines when they are on the defensive. That doesn’t wash. Standards and guidelines are not synonymous.

Stuart’s defence struck me as very interesting because it was entirely consistent with what I have long believed; the rationale behind standards, and their implicit attraction, is that they can be given mandatory status by organisations and lawyers with a poor grasp of software testing.

The standards become justified by the mandatory status assigned to them by non-testers. The justification does not come from any true intrinsic value or any wisdom that they might impart to practitioners. It comes from the aura of the word “standard” and the creators of standards know that this gives them a competitive advantage.

Creating standards is a commercial activity

Standards are not produced on a disinterested “take it or leave it” basis. They do not merely offer another option to the profession. Standards are created by people from the companies who will benefit from their existence, the companies who will sell the services to implement the new standard. In my experience heavyweight, document-driven processes require large numbers of expensive consultants (though not necessarily highly skilled consultants). Creating standards is a commercial activity. The producers of standards are quite consciously creating a market for the standards.

If the creators of standards were merely expanding the market to create a profitable niche for themselves that might not be a big deal. However, the benefit that accrues to them comes at the the expense of everyone else.

It comes at the expense of the testers who are frequently committed to following inappropriate and demoralising practices.

It comes at the expense of their employers who are incurring greater and unnecessary costs for results that are poorer than they need be.

It comes at the expense of the whole testing profession. The standards encourage a dangerous illusion. They feed the hunger to believe, against all the evidence, that testing, and software development in general, are neat, essentially linear activities that can be be rendered orderly and controllable with sufficient advance documentation. Standards feed the illusion that testing can be easier than it really is, and performed by people less skilled than are really needed.

As I said in my EuroSTAR tutorial last week, testing is not meant to be easy, it’s meant to be valuable.

Good contracts or bad contracts?

It is understandable that the contract lawyers find standards attractive. Not only do standards offer the lawyers the illusion that they promote high quality and define the correct way for professionals to work, they also offer the lawyers something they can get their teeth into. A standard makes it easier to structure a contract if you don’t know about the subject area. The standard doesn’t actually have to be useful. The point is that it helps generate deliverables along the way, and it requires the testers to work in a way that is easy to monitor.

Contracts are most useful when they specify the end, or the required value; not when they dictate how teams should reach the destination. Prescriptive contracts can turn unwarranted assumptions about the means into contractually mandatory ends.

I once faced what looked like a horrendously difficult challenge. I had to set up a security management process for a large client, who wanted assurance that the process would work effectively from the very start. This had been interpreted by my employer as meaning that the client required a full-scale, realistic test, with simulated security breaches to establish whether they would be detected and how we would respond. This would have been very difficult to arrange, and extremely expensive to carry out. Failure to deliver on the due date would have resulted in heavy weekly penalties until we could comply. However, the requirement was written into the contract so I was told we would have to do it.

I was sceptical, and went back to the client to discuss their needs in detail. It turned out that they simply needed to be reassured that the process would work, smoothly and quickly. Bringing together the right people from the client and supplier for a morning to walk through the process in detail would do just as well, at a tiny fraction of the cost. Once I had secured the client’s agreement it was straightforward to have the contract changed so that it reflected where they really wanted to end up, rather than stipulating a poorly understood route to that destination.

On many other occasions I have been stuck with a contract that could not be changed and where it was mandatory for testers to comply with milestones and deliverables that had minimal relevance to the real problem, but which required such obsessive attention that they detracted from the real work.

Software testing standards encourage that sort of goal displacement; management attention is directed not at the work, but at a dubious abstract representation of the work. Their attention is directed to the map, and they lose sight of the territory.

We can do better

Sure, no-one has to be a sucker. No-one has to buy the snake oil of standards, but caveat emptor (let the buyer beware) is the legal fallback of the huckster. It is hardly a motto to inspire. Testers can do better than that.

What is the answer? Unfortunately blogs like this preach largely to the converted. The argument against standards is accepted within the Context Driven School. The challenge is to take that argument out into the corporations who are instinctively more comfortable, or complacent, with standards than with a more flexible and thoughtful approach.

I tried to challenge that complacency in my EuroSTAR tutorial, “Questioning auditors questioning testing”. I demonstrated exactly why and how software testing standards are largely irrelevant to the needs of the worldwide Institute of Internal Auditors and also the Information Systems Audit and Control Association. I also explained how more thoughtful and effective testing, as promoted by the Context Driven School, can be consistent with the level of professionalism, accountability and evidence that auditors require.

If we can spread the message that testing can be better and cheaper then corporations might start to discourage the lawyers from writing damaging contracts. They might shy away from the consultancies offering standards driven processes.

Perhaps that will require more than blogs, articles and impassioned conference speeches. Do we need a counterpart to testing standards, an anti-standard perhaps? That would entail a clearly documented explanation of the links between good testing practices and governance models.

An alternative would have to demonstrate how good testing can be accountable from the perspective of auditors, rather than merely asserting it. It would also be directed not just at testers, but also at auditors to persuade them that testing is an area where they should be proactively involved, trying to force improvements. The testers who work for consultancies that profit from standards will never come on board. The auditors might.

But whatever form such an initiative might take it must not be called a standard, anything but that!

Edit. A petition has been set up in August 2014 to calling for ISO to withdraw ISO 29119 on the ground that it lacks the consensus that its own rules requires. Consensus is defined in ISO/IEC Guide 2:2004 as follows.

“Consensus: General agreement, characterized by the absence of sustained opposition to substantial issues by any important part of the concerned interests and by a process that involves seeking to take into account the views of all parties concerned and to reconcile any conflicting arguments.”

The petition argues, correctly, that there is no consensus. Further, the process did not seek to take into account the views of all parties concerned. The standard reflects one particular view of how testing should be conducted and marginalises those who disagree. If governments and companies insist that ISO 29119 should be adopted, and that suppliers should comply, this will have a dramatic, and damaging, effect on testing and our careers.

I urge all testers to sign the petition.

Posted by: James Christie | September 24, 2013

Not “right”, but as good as I can do

This is probably the last in the series of articles running up to my upcoming EuroSTAR tutorial ”Questioning auditors questioning testing”.

In my last blog post I explained my concerns about the way that testers have traditionally adopted an excessively positivist attitude, i.e. they conducted testing as if it were a controlled scientific experiment that would allow them to announce definite, confident answers.

I restricted that article to my rejection of the positivist approach. However, I need to follow it up with my explanation of why I can’t accept the opposite position, that of the anti-positivist or interpretivist. The interpretivist would argue that there is no single, fixed reality. Everything we know is socially constructed. Researchers into social activities have to work with the subjects of the research, learning together and building a joint account of what reality might be. Everything is relative. There is no single truth, just truths.

I don’t think that is any more helpful than rigid positivism. This article is about how I groped for a reasonable position that avoided the extremes.

Understanding, not condemning

My experience as an auditor forced me to think about these issues. That shaped my attitudes to learning and uncertainty when I switched to testing.

I’ve seen internal auditors taking what I considered a disastrously unprofessional approach largely because they never bothered to consider such points. Their lack of intellectual curiousity meant that they unwittingly adopted a rigid scientific positivist approach, totally unaware that a true scientist would never start off by relying on unproven assumptions. That rigid approach was the only option they could consider. That was their worldview. Anything that didn’t match their simplistic, binary view was wrong, an audit finding. They’d plough through audits relying to a ludicrous extent on checklists, without acquiring an adequate understanding of the business context they were trampling around in. They would ask questions that required yes/no answers, and that was all they would accept.

The tragedy was that the organisation where I saw this was in fear of corporate audit, and so the perspective of the auditors shaped commercial decisions. It was better to lose money than to receive a poor audit report. This fear reinforced the dysfunctional approach to audit. Sadly that approach deskilled the job. Audits could be performed by anyone who was literate, so it became a low status job. Auditors were feared but not respected, a dreadful outcome for both audit and the whole company.

Prior to this I had worked as an auditor in a company that had adopted a risk based approach before it became the norm. Compliance, and simple checks against defined standards and controls were seen as being interesting, but of limited value. We had to consider the bigger picture, understand the reasons for apparent non-compliance and try to explain the significance of our findings.

The temptations that can seduce a novice

At first I found all this intimidating. The temptation for a novice is to adopt one of two extremes. I could either get too close to the subjects of my audits and accept the excuses of IT management, which would have been very easy given that I’d just moved over from being a developer and expected to resume my career in IT one day. Or I could veer to the other extreme and rely on formal standards, processes and predefined checklists all of which offer the seductive illusion of a Right Answer.

This is essentially the dichotomy between interpretivism and positivism. Please don’t misunderstand me. I am not rejecting either outright. I am not qualified to do that. I just don’t believe that either is a helpful worldview for testers or auditors.

I knew I had to avoid the extremes and position myself in the middle. This was simply being pragmatic. I had to provide an explanation of what was going on, and reach conclusions. This required me to say that certain things could and should be done better. Senior management were not helped by simplistic answers that didn’t allow for the context, but neither were they interested in waffle that failed to offer helpful advice. They were paying me to make informed judgements, not make excuses.

Maybe not “right”, but as good as I can do

It took me a couple of years to feel comfortable and confident in my judgement. Where should I position myself on any particular complicated problem? I never felt convinced I was right, but I learned to be confident that I was taking a stance that was justifiable in the circumstances and reflected the context and the risks. I learned to be comfortable with “this is as good as I can do”, and take comfort that this was far more valuable to my employers than being either a hardline positivist or interpretivist.

As I conducted each audit I would fairly quickly build up a rough picture, a model, of what I thought should be happening. This understanding was always provisional, capable of being tested and revised all the way through the audit in a constant cycle of revision. Occasionally the audit would have to be aborted because it quickly revealed some fundamental problem that we had not envisaged. In such cases there was no point pursuing the detail in the audit plan. It would have been like reviewing the decoration of a house when the roof had been blown off.

On other occasions our findings were radically different from those that we had expected. We had been sent in because of particular concerns, but the apparent problems were merely symptoms of a quite separate problem with roots elsewhere. It was essential that we had the mental flexibility, and the openness to realise that what we thought we knew was wrong.

The dangers of the extremes

I’m not comfortable positioning myself, and the learning approach that we took, in any particular school between positivism and relativism. It would be interesting to pin it down but working out the right label isn’t a priority. The important point, for my attitude towards auditing and testing, is that I have to be wary of the dangers inherent in the extremes.

In testing, the greater and more obvious danger is that of positivism, or rather an excessive regard for its validity and relevance in this context. In auditing, relativism is also a huge danger, and it’s fatal to the independence and credibility of auditors if they start identifying too closely with the auditees. There can be great commercial and organisational pressure to go with the flow. There have been countless examples of internal and external auditors who blew it in this way and failed to say “this is wrong”.

The dangerous temptation of going with the flow also exists in testing. Indeed, it could be argued that the greatest danger of relativism in testing is to accept the naïve positivism embodied in traditional approaches, and to pretend that detailed scripts, and meticulous documentation can convey the Truth against which the testers have to measure the product. That’s an interesting paradox, but not one I intend to pursue now.

Frustration with “faking it”

Anyway, after learning how to handle myself in a professional and enlightened audit department I couldn’t contain my frustration when I later had to deal with blinkered auditors who couldn’t tell the difference between an audit and a checklist. I don’t regard that as real auditing. It’s faking it, in very much the same style that James Bach has described in testing; running through the process, producing immaculate documents, but taking great care not to learn anything that might really matter – or that might disrupt plans, budgets or precious assumptions.

Inevitably, when I switched to testing, and had to endure the problems of the traditional document driven approach I experienced very much the same frustrations that I had had with poor auditing. It was with great relief that I came across the work of the Context Driven School, and realised that testing wasn’t a messed up profession after all; it was just a profession with some messed up practices, that we can go along with or resist. It’s our choice, and the practical and learning experiences that I enjoyed as an auditor meant I could go only one way.

Posted by: James Christie | September 18, 2013

Testing inside the box?

The EuroSTAR online softare testing summit took place yesterday. I missed the morning, but there were a few things that leapt out at me in the afternoon.

When Fiona Charles was talking about test strategies she said, “we always have a model, whether conscious or unconscious”.

Keith Klain later talked about the dangers of testers trying to provide confidence rather than information. Confidence should never be the goal of testing. If we try to instil confidence then we are likely to be reinforcing biases and invalid assumptions.

In between Fiona’s and Keith’s talks I came across this quote from the Economist about the reasons for the 2007/8 financial crash.

With half a decade’s hindsight, it is clear the crisis had multiple causes. The most obvious is the financiers themselves – especially the irrationally exuberant Anglo-Saxon sort, who claimed to have found a way to banish risk when in fact they had simply lost track of it.

These three points resonated with me because over the last day or so I have been thinking about the relationship between risk, our view of the world and our inbuilt biases as part of my upcoming EuroSTAR tutorial.

During the tutorial I want to discuss a topic that is fundamental to the work that testers and auditors do. We might not think much about it but our attitudes towards ontology and epistemology shape everything that we do.

Don’t switch off! I’m only going to skim the surface. I won’t be getting heavy.

Lessons from social science

Ontology means looking at whether things exist, and in what form. Epistemology is about how and whether we can know about anything. So “is this real?” is an ontological question, and “how do I know that it is real, and what can I know about it?” are epistemological questions.

Testing and auditing are both forms of exploration and research. We have to understand our mindset before we go off exploring. Our attitudes and our biases will shape what we go looking for. If we don’t think about them beforehand we will simply go and find what we were looking for, or interpret whatever we do find to suit our preconceptions. We won’t bother to look for anything else. We will not even be aware that there is anything else to find.

Both testing and auditing have been plagued by an unspoken, unrecognised devotion to a positivist outlook. This makes the ontological assumption that we are working in a realm of unambiguous, objective facts and the epistemoligical assumption that we can observe, measure and conduct objective experiments (audits or tests) on these facts. Crucially this mindset implicitly assumes that we can know all that is relevant.

If you have this mindset then reality looks like the subject of a scientific experiment. We can plan, conduct and report on our test or audit as if it were an experiment; the subject is under our control, we manipulate the inputs, check the output against our theory and reach a clear, objective conclusion.

If we buy into the positivist attitude then we can easily accept the validity of detailed test scripts and audit by checklist. We’ve defined our reality and we run through binary checks to confirm that all is as it should be. Testing standards, CMMI maturity levels, certification all seem natural progressions towards ever greater control and order.

We don’t need to worry about the unknown, because our careful planning and meticulous documentation have ruled that out. All we need to do is run through our test plan and we will have eliminated uncertainty. We can confidently state our conclusions and give our stakeholders a nice warm glow of confidence.

Reality in business and software development is nothing like that. If we adopt an uncritical positivist approach we have just climbed into a mental box to ensure that we cannot see what is really out there. We have defined our testing universe as being that which is within our comprehension and reach. We have redefined risk as being that which is within our control and which we can manage and remove.

Getting out of the box

Perhaps the testers, and certainly the auditors, at the banks that crashed failed to realise they were in a world in which risk had not been banished. It was still out there, bigger and scarier than they could ever have imagined, but they had chosen to adopt a mindset that kept them in a neat and tidy box.

When you forget about the philosophical and social science jargon it all comes down to whether you want to work in the box, pretending the world suits your worldview, or get out of the box and help your stakeholders understand scary, messy reality.

Social science researchers know the pitfalls of unconsciously assuming a worldview without realising how it shapes our investigation. Auditors are starting to realise this and discuss big questions about what they can know and how they can know it. Testers? Well, not all of them, not yet, but we all need to start thinking along those lines. That’s part of the message I hope to get across at Gothenburg and elsewhere over the coming months.

Older Posts »

Categories

Follow

Get every new post delivered to your Inbox.

Join 52 other followers