Teachers, children, testers and leaders (2013)

Testing Planet 2020This article appeared in the March 2013 edition of Testing Planet, which is published by the wonderful Ministry of Testing, one of the most exciting developments in software testing over the last 20 years.

I’m moving this article onto my blog from my website, which will shortly be decommissioned.teachers, children, testers and leaders The article was written in January 2013. Looking at it again I see that I was starting to develop arguments I fleshed out over the next couple of years as part of the Stop 29119 campaign against the testing standard, ISO 29119.

The article

“A tester is someone who knows things can be different” – Gerald Weinberg.

Leaders aren’t necessarily people who do things, or order other people about. To me the important thing about leaders is that they enable other people to do better, whether by inspiration, by example or just by telling them how things can be different – and better. The difference between a leader and a manager is like the difference between a great teacher and, well, the driver of the school bus. Both take children places, but a teacher can take children on a journey that will transform their whole life.

My first year or so in working life after I left university was spent in a fog of confusion. I struggled to make sense of the way companies worked; I must be more stupid than I’d always thought. All these people were charging around, briskly getting stuff done, making money and keeping the world turning; they understood what they were doing and what was going on. They must be smarter than me.

Gradually it dawned on me that very many of them hadn’t a clue. They were no wiser than me. They didn’t really know what was going on either. They thought they did. They had their heads down, working hard, convinced they were contributing to company profits, or at least keeping the losses down.

The trouble was their efforts often didn’t have much to do with the objectives of the organisation, or the true goals of the users and the project in the case of IT. Being busy was confused with being useful. Few people were capable of sitting back, looking at what was going on and seeing what was valuable as opposed to mere work creation.

I saw endless cases of poor work, sloppy service and misplaced focus. I became convinced that we were all working hard doing unnecessary, and even harmful, things for users who quite rightly were distinctly ungrateful. It wasn’t a case of the end justifying the means; it was almost the reverse. The means were only loosely connected to the ends, and we were focussing obsessively on the means without realising that our efforts were doing little to help us achieve our ends.

Formal processes didn’t provide a clear route to our goal. Following the process had become the goal itself. I’m not arguing against processes; just the attitude we often bring to them, confusing the process with the destination, the map with the territory. The quote from Gerald Weinberg absolutely nails the right attitude for testers to bring to their work. There are twin meanings. Testers should know there is a difference between what people expect, or assume, and what really is. They should also know that there is a difference between what is, and what could be.

Testers usually focus on the first sort of difference; seeing the product for what it really is and comparing that to what the users and developers expected. However, the second sort of difference should follow on naturally. What could the product be? What could we be doing better?

Testers have to tell a story, to communicate not just the reality to the stakeholders, but also a glimpse of what could be. Organisations need people who can bring clear headed thinking to confusion, standing up and pointing out that something is wrong, that people are charging around doing the wrong things, that things could be better. Good testers are well suited by instinct to seeing what positive changes are possible. Communicating these possibilities, dispelling the fog, shining a light on things that others would prefer to remain in darkness; these are all things that testers can and should do. And that too is a form of leadership, every bit as much as standing up in front of the troops and giving a rousing speech.

In Hans Christian’s Andersen’s story, the Emperor’s New Clothes, who showed a glimpse of leadership? Not the emperor, not his courtiers; it was the young boy who called out the truth, that the Emperor was wearing no clothes at all. If testers are not prepared to tell it like it is, to explain why things are different from what others are pretending, to explain how they could be better then we diminish and demean our profession. Leaders do not have to be all-powerful figures. They can be anyone who makes a difference; teachers, children. Or even testers.

Quality isn’t something, it provides something (2012)

Quality isn’t something, it provides something (2012)

Testing Planet 2020This article appeared in the July 2012 edition of Testing Planet, which is published by the wonderful Ministry of Testing, one of the most exciting developments in software testing over the last 20 years.

The article was written in June 2012, but I don’t think it has dated. It’s about the way we think and work with other people.ministry of testing logo These are timeless problems. The idea behind E-prime is particularly interesting. Dispensing with the verb “to be” isn’t something to get obsessive or ideological about, but testers should be aware of the important distinction between the way something is and the way it behaves. The original article had only four references so I have checked them, converted them to hyperlinks, and changing the link to Lera Boroditsky’s paper to a link to her TED talk on the same subject.

The article

Quality isn't something, it provides somethingA few weeks ago two colleagues, who were having difficulty working together, asked me to act as peacekeeper in a tricky looking meeting in which they were going to try and sort out their working relationship. I’ll call them Tony and Paul. For various reasons they were sparking off each and creating antagonism that was damaging the whole team.

An hour’s discussion seemed to go reasonably well; Tony talking loudly and passionately, while Paul spoke calmly and softly. Just as I thought we’d reached an accommodation that would allow us all to work together Tony blurted out, “you are cold and calculating, Paul, that’s the problem”.

Paul reacted as if he’d been slapped in the face, made his excuses and left the meeting. I then spent another 20 minutes talking Tony through what had happened, before separately speaking to Paul about how we should respond.

I told Tony that if he’d wanted to make the point I’d inferred from his comments, and from the whole meeting, then he should have said “your behaviour and attitude towards me throughout this meeting, and when we work together, strike me as cold and calculating, and that makes me very uncomfortable”.

“But I meant that!”, Tony replied. Sadly, he hadn’t said that. Paul had heard the actual words and reacted to them, rather than applying the more dispassionate analysis I had used as an observer. Paul meanwhile found Tony’s exuberant volatility disconcerting, and responded to him in a very studied and measured style that unsettled Tony.

Tony committed two sins. Firstly, he didn’t acknowledge the two way nature of the problem. It should have been about how he reacted to Paul, rather than trying to dump all the responsibility onto Paul.

Secondly, he said that Paul is cold and calculating, rather than acting in a way Tony found cold, and calculating at a certain time, in certain circumstances.

I think we’d all see a huge difference between being “something”, and behaving in a “something” way at a certain time, in a certain situation. The verb “to be” gives us this problem. It can mean, and suggest, many different things and can create fog where we need clarity.

Some languages, such as Spanish, maintain a useful distinction between different forms of “to be” depending on whether one is talking about something’s identity or just a temporary attribute or state.

The way we think obviously shapes the language we speak, but increasingly scientists are becoming aware of how the language we use shapes the way that we think. [See this 2017 TED talk, “How Language Shapes Thought”, by Lera Boroditsky]

The problem we have with “to be” has great relevance to testers. I don’t just mean treating people properly, however much importance we rightly attach to working successfully with others. More than that, if we shy away from “to be” then it helps us think more carefully and constructively as testers.

This topic has stretched bigger brains than mine, in the fields of philosophy, psychology and linguistics. Just google “general semantics” if you want to give your brain a brisk workout. You might find it tough stuff, but I don’t think you have to master the underlying concept to benefit from its lessons.

Don’t think of it as intellectual navel gazing. All this deep thought has produced some fascinating results, in particular something called E-prime, a form of English that totally dispenses with “to be” in all its forms; no “I am”, “it is”, or “you are”. Users of E-prime don’t simply replace the verb with an alternative. That doesn’t work. It forces you to think and articulate more clearly what you want to say. [See this classic paper by Kellogg, “Speaking in E-prime” PDF, opens in new tab].

“The banana is yellow” becomes “the banana looks yellow”, which starts to change the meaning. “Banana” and “yellow” are not synonyms. The banana’s yellowness becomes apparent only because I am looking at it, and once we introduce the observer we can acknowledge that the banana appears yellow to us now. Tomorrow the banana might appear brown to me as it ripens. Last week it would have looked green.

You probably wouldn’t disagree with any of that, but you might regard it as a bit abstract and pointless. However, shunning “to be” helps us to think more clearly about the products we test, and the information that we report. E-prime therefore has great practical benefits.

The classic definition of software quality came from Gerald Weinburg in his book “Quality Software Management: Systems Thinking”.

“Quality is value to some person”.

Weinburg’s definition reflects some of the clarity of thought that E-prime requires, though he has watered it down somewhat to produce a snappy aphorism. The definition needs to go further, and “is” has to go!

Weinburg makes the crucial point that we must not regard quality as some intrinsic, absolute attribute. It arises from the value it provides to some person. Once you start thinking along those lines you naturally move on to realising that quality provides value to some person, at some moment in time, in a certain context.

Thinking and communicating in E-prime stops us making sweeping, absolute statements. We can’t say “this feature is confusing”. We have to use a more valuable construction such as “this feature confused me”. But we’re just starting. Once we drop the final, total condemnation of saying the feature is confusing, and admit our own involvement, it becomes more natural to think about and explain the reasons. “This feature confused me … when I did … because of …”.

Making the observer, the time and the context explicit help us by limiting or exposing hidden assumptions. We might or might not find these assumptions valid, but we need to test them, and we need to know about them so we understand what we are really learning as we test the product.

E-prime fits neatly with the scientific method and with the provisional and experimental nature of good testing. Results aren’t true or false. The evidence we gather matches our hypothesis, and therefore gives us greater confidence in our knowledge of the product, or it fails to match up and makes us reconsider what we thought we knew. [See this classic paper by Kellogg & Bourland, “Working with E-prime – some practical notes” PDF, opens in new tab].

Scientific method cannot be accommodated in traditional script-driven testing, which reflects a linear, binary, illusory worldview, pretending to be absolute. It tries to deal in right and wrong, pass and fail, true and false. Such an approach fits in neatly with traditional development techniques which fetishise the rigours of project management, rather than the rigours of the scientific method.

map and road This takes us back to general semantics, which coined the well known maxim that the map is not the territory. Reality and our attempts to model and describe it differ fundamentally from each other. We must not confuse them. Traditional techniques fail largely because they confuse the map with the territory. [See this “Less Wrong” blog post].

In attempting to navigate their way through a complex landscape, exponents of traditional techniques seek the comfort of a map that turns messy, confusing reality into something they can understand and that offers the illusion of being manageable. However, they are managing the process, not the underlying real work. The plan is not the work. The requirements specification is not the requirements. The map is not the territory.

Adopting E-prime in our thinking and communication will probably just make us look like the pedantic awkward squad on a traditional project. But on agile or lean developments E-prime comes into its own. Testers must contribute constructively, constantly, and above all, early. E-prime helps us in all of this. It makes us clarify our thoughts and helps us understand that we gain knowledge provisionally, incrementally and never with absolute certainty.

I was not consciously deploying E-prime during and after the fractious meeting I described earlier. But I had absorbed the precepts sufficiently to instinctively realise that I had two problems; Tony’s response to Paul’s behaviour, and Paul’s response to Tony’s outburst. I really didn’t see it as a matter of “uh oh – Tony is stupid”.

E-prime purists will look askance at my failure to eliminate all forms of “to be” in this article. I checked my writing to ensure that I’ve written what I meant to, and said only what I can justify. Question your use of the verb, and weed out those hidden assumptions and sweeping, absolute statements that close down thought, rather than opening it up. Don’t think you have to be obsessive about it. As far as I am concerned, that would be silly!

Traditional techniques and motivating staff (2010)

traditional techniques & motivating staffTesting Planet 2020This article appeared in the February 2010 edition of Software Testing Club Magazine, now the Testing Planet. The STC has evolved into the wonderful Ministry of Testing, one of the most exciting developments in software testing over the last 20 years.

That might seem a low bar; testing isn’t meant to be a thrill a minute. But the Ministry of Testing has been a gale of fresh air sweeping through the industry, mixing great content and conferences with an approach to testing that has managed to be both entertaining and deeply serious. It has been a consistent voice of sanity and decency in an industry that has had too much cynicism and short sightedness.ministry of testing logo

I’m moving this article onto my blog fromy my website, which will shortly be decommissioned. Looking back I was interested to see that I didn’t post this article on the website immediately. I had some reservations about the article. I wondered if I had taken a rather extreme stance. I do believe that rigid standards and processes can be damaging, and I certainly believe that enforcing strict compliance, at the expense of initiative and professional judgement, undermines morale.

However, I thought I had perhaps gone too far and might have been thought to be dismissing the idea of any formality, and that I might be seen to be advocating software development as an entirely improvised activity with everyone winging it. That’s not the case. We need to have some structure, some shape and formality to our work. It’s just that prescriptive standards and processes aren’t sensitive to context and become a straitjacket. This was written in January 2010 and it was a theme I spent a good deal of time on when the ISO 29119 standard was released a few years later and the Stop 29119 campaign swung into action.

So I still largely stand by this article, though I think it is lacking in nuance in some respects. In particular the bald statement “development isn’t engineering”, while true does require greater nuance, unpacking and explanation. Development isn’t engineering in the sense that engineering is usually understood, and it’s certainly not akin to civil engineering. But it should aspire to be more “engineering like”, while remaining realistic about the reality of software development. I was particularly interested to see that I described reality as being chaotic in 2010, a couple of years before I started to learn about Cynefin.

The article

Do we follow the standards or use our initiative?

Recently I’ve been thinking and writing about the effects of testing standards. The more I thought, the more convinced I became that standards, or any rigid processes, can damage the morale, and even the professionalism, of IT professionals if they are not applied wisely.

The problem is that calling them “standards” implies that they are mandatory and should be applied in all cases. The word should be reserved for situations where compliance is essential, eg security, good housekeeping or safety critical applications.

I once worked for a large insurance company as an IT auditor in Group Audit. I was approached by Information Services. Would I consider moving to lead a team developing new management information (MI) applications? It sounded interesting, so I said yes.

On my first day in the new role I asked my new manager what I had to do. He stunned me when he said. “You tell me. I’ll give you the contact details for your users. Go and see them. They’re next in line to get an MI application. See what they need, then work out how you’re going to deliver it. Speak to other people to see how they’ve done it, but it’s up to you”.

The company did have standards and processes, but they weren’t rigid and they weren’t very useful in the esoteric world of insurance MI, so we were able to pick and choose how we developed applications.

My users were desperate for a better understanding of their portfolio; what was profitable, and what was unprofitable. I had no trouble getting a manager and a senior statistician to set aside two days to brief me and my assistant. There was just us, a flip chart, and gallons of coffee as they talked us through the market they were competing in, the problems they faced and their need for better information from the underwriting and claims applications with which they did business.

I realised that it was going to be a pig of a job to give them what they needed. It would take several months. However, I could give them about a quarter of what they needed in short order. So we knocked up a quick disposable application in a couple of weeks that delighted them, and then got to work on the really tricky stuff.

The source systems proved to be riddled with errors and poor quality data, so it took longer than expected. However, we’d got the users on our side by giving them something quickly, so they were patient.

It took so long to get phase 1 of the application working to acceptable tolerances that I decided to scrap phase 2, which was nearly fully coded, and rejig the design of the first part so that it could do the full job on its own. That option had been ruled out at the start because there seemed to be insurmountable performance problems.

Our experience with testing had shown that we could make the application run much faster than we’d thought possible, but that the fine tuning of the code to produce accurate MI was a nightmare. It therefore made sense to clone jobs and programs wholesale to extend the first phase and forget about trying to hack the phase 2 code into shape.

The important point is that I was allowed to take a decision that meant binning several hundred hours of coding effort and utterly transforming a design that had been signed off.

I took the decision during a trip to the dentist, discussed it with my assistant on my return, sold the idea to the users and only then did I present my management with a fait accompli. They had no problems with it. They trusted my judgement, and I was taking the users along with me.

The world changed and an outsourcing deal meant I was working for a big supplier, with development being driven by formal processes, rigid standards and contracts. This wasn’t all bad. It did give developers some protection from the sort of unreasonable pressure that could be brought to bear when relationships were less formal. However, it did mean that I never again had the same freedom to use my own initiative and judgement.

The bottom line was that it could be better to do the wrong thing for the corporately correct reason, than to do the right thing the “wrong” way. By “better” I mean better for our careers, and not better for the customers.

Ultimately that is soul destroying. What really gets teams fired up is when developers, testers and users all see themselves as being on the same side, determined to produce a great product.

Development isn’t engineering

Reality is chaotic. Processes are perfectly repeatable only if one pretends that reality is neat, orderly and predictable. The result is strain, tension and developers ordered to do the wrong things for the “right” reasons, to follow the processes mandated by standards and by the contract.

Instead of making developers more “professional” it has exactly the opposite effect. It reduces them to the level of, well, second hand car salesmen, knocking out old cars with no warranty. It’s hardly a crime, but it doesn’t get me excited.

Development and testing become drudgery. Handling the users isn’t a matter of building lasting relationships with fellow professionals. It’s a matter of “managing the stakeholders”, being diplomatic with them rather than candid, and if all else fails telling them “to read the ******* contract”.

This isn’t a rant about contractual development. Contracts don’t have to be written so that the development team is in a strait-jacket. It’s just that traditional techniques fit much more neatly with contracts than agile, or any iterative approach.

Procurement is much simpler if you pretend that traditional, linear techniques are best practice; if you pretend that software development is like civil engineering, and that developing an application is like building a bridge.

Development and testing are really not like that all. The actual words used should be a good clue. Development is not the same as construction. Construction is what you do when you’ve developed an idea to the point where it can be manufactured, or built.

Traditional techniques were based on that fundamental flaw; the belief that development was engineering, and that repeatable success required greater formality, more tightly defined processes and standards, and less freedom for developers.

Development is exploring

Good development is a matter of investigation, experimentation and exploration. It’s about looking at the possibilities, and evaluating successive versions. It’s not about plodding through a process document.

Different customers, different users and different problems will require different approaches. These various approaches are not radically different from each other, but they are more varied than is allowed for by rigid and formal processes.

Any organisation that requires development teams to adhere to these processes, rather than make their own judgements based on their experience and their users’ needs, frequently requires the developers to do the wrong things.

This is demoralising, and developers working under these constraints have the initiative, enthusiasm and intellectual energy squeezed out of them. As they build their careers in such an atmosphere they become corporate bureaucrats.

They rise to become not development managers, but managers of the development process; not test managers, but managers of the test process. Their productivity is measured in meetings and reports. Sometimes the end product seems a by-product of the real business; doing the process.

If people are to realise their potential they need managers who will do as mine did; who will say, “here is your problem, tell me how you’re going to solve it”.

We need guidance from processes and standards in the same way as we need guidance from more experienced practitioners, but they should be suggestions of possible approaches so that teams don’t flounder around in confused ignorance, don’t try to re-invent the wheel, and don’t blunder into swamps that have consumed previous projects.

If development is exploration it is thrilling and brings out the best in us. People rise to the challenge, learn and grow. They want to do it again and get better at it. If development means plodding through a process document it is a grind.

I know which way has inspired me, which way has given users applications I’ve been proud of. It wasn’t the formal way. It wasn’t the traditional way. Set the developers free!

Testers and coders are both developers (2009)

This article appeared in September 2009 as an opinion piece on the front page of TEST magazine‘s website. I’m moving the article onto my blog from my website, which will be decommissioned soon. It might be an old article, but it remains valid.

The article

testers and coders are both developersWhen I was a boy I played football non-stop; in organised matches, in playgrounds or in the park, even kicking coal around the street!

There was a strict hierarchy. The good players, the cool kids; they were forwards. The one who couldn’t play were defenders. If you were really hopeless you were a goalkeeper. Defending was boring. Football was about fun, attacking and scoring goals.

When I moved into IT I found a similar hierarchy. I had passed the programming aptitude test. I was one of the elite. I had a big head, to put it mildly! The operators were the defenders, the ones who couldn’t do the fun stuff. We were vaguely aware they thought the coding kids were irresponsible cowboys, but who cared?

As for testers, well, they were the goalkeepers. Frankly, they were lucky to be allowed to play at all. They did what they were told. Independence? You’re joking, but if they were good they were allowed to climb the ladder and become programmers.

Gradually things changed. Testers became more clearly identified with the users. They weren’t just menial team members. A clear career path opened up as testing professionals.

However, that didn’t earn them respect from programmers. Now testing is changing again. Agile gives testers the chance to learn and apply interesting coding skills. Testers can be just as technical as coders. They might code in different ways, for different reasons, but they can be coders too.

That’s great isn’t it? Well, up to a point. It’s fantastic that testers have these exciting opportunities. But I worry that programmers might start respecting the more technical testers for the wrong reason, and that testers who don’t code will still be looked down on. Testers shouldn’t try to impress programmers with their coding skills. That’s just a means to an end.

We’ll still need testers who don’t code and it’s vital that if testers are to achieve the respect they deserve then they must be valued for all the skills they bring to the team, not just the skills they share with programmers. For a start, we should stop referring to developers and testers. Testers always were part of the development process. In Agile teams they quite definitely are developers. It’s time everyone acknowledged that. Development is a team game.

Football teams who played the way we used to as kids got thrashed if they didn’t grow up and play as a team. Development teams who don’t ditch similar attitudes will be equally ineffective.

Bridging the gap – Agile and the troubled relationship between UX and software engineering (2009)

Bridging the gap – Agile and the troubled relationship between UX and software engineering (2009)

This article appeared as the cover story for the September 2009 edition of TEST magazine.

I’m moving the article onto my blog from my website, which will be decommissioned soon. If you choose to read the article please bear in mind that it was written in August 2009 and is therefore inevitably dated in some respects, though I think that much of it is still valid, especially where I am discussing the historical problems.

The article

TEST magazine coverFor most of its history software engineering has had great difficulty building applications that users found enjoyable. Far too many applications were frustrating and wasted users’ time.

That has slowly changed with the arrival of web developments, and I expect the spread of Agile development to improve matters further.

I’m going to explain why I think developers have had difficulty building usable applications and why user interaction designers have struggled to get their ideas across. For simplicity I’ll just refer to these designers and psychologists as “UX”. There are a host of labels and acronyms I could have used, and it really wouldn’t have helped in a short article.

Why software engineering didn’t get UX

Software engineering’s problems with usability go back to its roots, when geeks produced applications for fellow geeks. Gradually applications spread from the labs to commerce and government, but crucially users were wage slaves who had no say in whether they would use the application. If they hated it they could always find another job!

Gradually applications spread into the general population until the arrival of the internet meant that anyone might be a user. Now it really mattered if users hated your application. They would be gone for good, into the arms of the competition.

Software engineering had great difficulty coming to terms with this. The methods it had traditionally used were poison to usability. The Waterfall lifecycle was particularly damaging.

The Waterfall had two massive flaws. At its root was the implicit assumption that you can tackle the requirements and design up front, before the build starts. This led to the second problem; iteration was effectively discouraged.

Users cannot know what they want till they’ve seen what is possible and what can work. In particular, UX needs iteration to let analysts and users build on their understanding of what is required.

The Waterfall meant that users could not see and feel what an application was like until acceptance testing at the end when it was too late to correct defects that could be dismissed as cosmetic.

The Waterfall was a project management approach to development, a means of keeping control, not building good products. This made perfect sense to organisations who wanted tight contracts and low costs.

The desire to keep control and make development a more predictable process explained the damaging attempt to turn software engineering into a profession akin to civil engineering.

So developers were sentenced to 20 years hard labour with structured methodologies, painstakingly creating an avalanche of documentation; moving from the current physical system, through the current logical system to a future logical system and finally a future physical system.

However, the guilty secret of software engineering was that translating requirements into a design wasn’t just a difficult task that required a methodical approach; it’s a fundamental problem for developers. It’s not a problem specific to usability requirements, and it was never resolved in traditional techniques.

The mass of documentation obscured the fact that crucial design decisions weren’t flowing predictably and objectively from the requirements, but were made intuitively by the developers – people who by temperament and training were polar opposites of typical users.

Why UX didn’t get software development

Not surprisingly, given the massive documentation overhead of traditional techniques, and developers’ propensity to pragmatically tailor and trim formal methods, the full process was seldom followed. What actually happened was more informal and opaque to outsiders.

The UX profession understandably had great difficulty working out what was happening. Sadly they didn’t even realise that they didn’t get it. They were hampered by their naivety, their misplaced sense of the importance of UX and their counter-productive instinct to retain their independence from developers.

If developers had traditionally viewed functionality as a matter of what the organisation required, UX went to the other extreme and saw functionality as being about the individual user. Developers ignored the human aspect, but UX ignored commercial realities – always a fast track to irrelevance.

UX took software engineering at face value, tailoring techniques to fit what they thought should be happening rather than the reality. They blithely accepted the damaging concept that usability was all about the interface; that the interface was separate from the application.

This separability concept was flawed on three grounds. Conceptually it was wrong. It ignored the fact that the user experience depends on how the whole application works, not just the interface.

Architecturally it was wrong. Detailed technical design decisions can have a huge impact on the users. Finally separability was wrong organisationally. It left the UX profession stranded on the margins, in a ghetto, available for consultation at the end of the project, and then ignored when their findings were inconvenient.

An astonishing amount of research and effort went into justifying this fallacy, but the real reason UX bought the idea was that it seemed to liberate them from software engineering. Developers could work on the boring guts of the application while UX designed a nice interface that could be bolted on at the end, ready for testing. However, this illusory freedom actually meant isolation and impotence.

The fallacy of separability encouraged reliance on usability testing at the end of the project, on summative testing to reveal defects that wouldn’t be fixed, rather than formative testing to stop these defects being built in the first place.

There’s an argument that there’s no such thing as effective usability testing. If it takes place at the end it’s too late to be effective. If it’s effective then it’s done iteratively during design, and it’s just good design rather than testing.

So UX must be hooked into the development process. It must take place early enough to allow alternative designs to be evaluated. Users must therefore be involved early and often. Many people in UX accept this completely, though the separability fallacy is still alive and well. However, its days are surely numbered. I believe, and hope, that the Agile movement will finally kill it.

Agile and UX – the perfect marriage?

The mutual attraction of Agile and UX isn’t simply a case of “my enemy’s enemy is my friend”. Certainly they do have a common enemy in the Waterfall, but each discipline really does need the other.

Agile gives UX the chance to hook into development, at the points where it needs to be involved to be effective. Sure, with the Waterfall it is possible to smuggle effective UX techniques into a project, but they go against the grain. It takes strong, clear-sighted project managers and test managers to make them work. The schedule and political pressures on these managers to stop wasting time iterating and to sign off each stage is huge.

If UX needs Agile, then equally Agile needs UX if it is to deliver high quality applications. The opening principle of the Agile Manifesto states that “our highest priority is to satisfy the customer through early and continuous delivery of valuable software”.

There is nothing in Agile that necessarily guarantees better usability, but if practitioners believe in that principle then they have to take UX seriously and use UX professionals to interpret users’ real needs and desires. This is particularly important with web applications when developers don’t have direct access to the end users.

There was considerable mutual suspicion between the two communities when Agile first appeared. Agile was wary of UX’s detailed analyses of the users, and suspected that this was a Waterfall style big, up-front requirements phase.

UX meanwhile saw Agile as a technical solution to a social, organisational problem and was rightly sceptical of claims that manual testing would be a thing of the past. Automated testing of the human interaction with the application is a contradiction.

Both sides have taken note of these criticisms. Many in UX now see the value in speeding up the user analysis and focussing on the most important user groups, and Agile has recognised the value of up-front analysis to help them understand the users.

The Agile community is also taking a more rounded view of testing, and how UX can fit into the development process. In particular check out Brian Marick’s four Agile testing quadrants.Agile Testing Quadrants

UX straddles quadrants two and three. Q2 contains the up-front analysis to shape the product, and Q3 contains the evaluation of the results. Both require manual testing, and use such classic UX tools as personas (fictional representative users) and wireframes (sketches of webpages).

Other people who’re working on the boundary of Agile and UX are Jeff Patton, Jared Spool and Anders Ramsay. They’ve come up with some great ideas, and it’s well worth checking them out. Microsoft have also developed an interesting technique called the RITE method.

This is an exciting field. Agile changes the role of testers and requires them to learn new skills. The integration of UX into Agile will make testing even more varied and interesting.

There will still be tension between UX and software professionals who’ve been used to working remotely from the users. However, Agile should mean that this is a creative tension, with each group supporting and restraining the others.

This is a great opportunity. Testers will get the chance to help create great applications, rather than great documentation!

What happens to usability when development goes offshore? (2009)

What happens to usability when development goes offshore? (2009)

Testing ExperienceThis article appeared in the March 2009 edition of Testing Experience magazine, which is no longer published. I’m moving the article onto my blog from my website, which will be decommissioned soon.

If you choose to read the article please bear in mind that it was written in January 2009 and is therefore inevitably dated in some respects, though I think that much of it is still valid.

The references in the article were all structured for a paper magazine. There are no hyperlinks and I have not tried to recreate them and check out whether they still work.

The article

Two of the most important trends in software development over the last 20 years have been the increasing number of companies sending development work to cheaper labour markets, and the increasing attention that is paid to the usability of applications.offshore usability article

Developers in Europe and North America cannot fail to have missed the trend to offshore development work and they worry about the long-term implications.

Usability, however, has had a mixed history. Many organizations and IT professionals have been deeply influenced by the need to ensure that their products and applications are usable; many more are only vaguely aware of this trend and do not take it seriously.

As a result, many developers and testers have missed the significant implications that offshoring has for building usable applications, and underestimate the problems of testing for usability. I will try to explain these problems, and suggest possible remedial actions that testers can take if they find themselves working with offshore developers. I will be looking mainly at India, the largest offshoring destination, because information has been more readily available. However, the problems and lessons apply equally to other offshoring countries.

According to NASSCOM, the main trade body representing the Indian IT industry [1], the numbers of IT professionals employed in India rose from 430,000 in 2001 to 2,010,000 in 2008. The numbers employed by offshore IT companies rose 10 fold, from 70,000 to 700,000.

It is hard to say how many of these are usability professionals. Certainly at the start of the decade there were only a handful in India. Now, according to Jumkhee Iyengar, of User in Design, “a guesstimate would be around 5,000 to 8,000”. At most that’s about 0.4% of the people working in IT. Even if they were all involved in offshore work they would represent no more than 1% of the total.

Does that matter? Jakob Nielsen, the leading usability expert, would argue that it does. His rule of thumb [2] is that “10% of project resources must be devoted to usability to ensure a good quality product”.

Clearly India is nowhere near capable of meeting that figure. To be fair, the same can be said of the rest of the world given that 10% represents Nielsen’s idea of good practice, and most organizations have not reached that stage. Further, India traditionally provided development of “back-office” applications, which are less likely to have significant user interaction.

Nevertheless, the shortage of usability professionals in the offshoring destinations does matter. Increasingly offshore developments have involved greater levels of user interaction, and any shortage of usability expertise in India will damage the quality of products.

Sending development work offshore always introduces management and communication problems. Outsourcing development, even within the same country, poses problems for usability. When the development is both offshored and outsourced, the difficulties in producing a usable application multiply. If there are no usability professionals on hand, the danger is that the developers will not only fail to resolve those problems – they will probably not even recognize that they exist.

Why can outsourcing be a problem for usability?

External software developers are subject to different pressures from internal developers, and this can lead to poorer usability. I believe that external suppliers are less likely to be able to respond to the users’ real needs, and research supports this. [3, 4, 5, 6, 7]

Obviously external suppliers have different goals from internal developers. Their job is to deliver an application that meets the terms of the contract and makes a profit in doing so. Requirements that are missed or are vague are unlikely to be met, and usability requirements all too often fall into one of these categories. This is not simply a matter of a lack of awareness. Usability is a subjective matter, and it is difficult to specify precise, objective, measurable and testable requirements. Indeed, trying too hard to do so can be counter-productive if the resulting requirements are too prescriptive and constrain the design.

A further problem is that the formal nature of contractual relationships tends to push clients towards more traditional, less responsive and less iterative development processes, with damaging results for usability. If users and developers are based in different offices, working for different employers, then rapid informal feedback becomes difficult.

Some of the studies that found these problems date back to the mid 90s. However, they contain lessons that remain relevant now. Many organizations have still not taken these lessons on board, and they are therefore facing the same problems that others confronted 10 or even 20 years ago.

How can offshoring make usability problems worse?

So, if simple outsourcing to a supplier in the same country can be fraught with difficulty, what are the usability problems that organizations face when they offshore?

Much of the discussion of this harks back to an article by Jakob Nielsen in 2002 [2]. Nielsen stirred up plenty of discussion about the problem, much of it critical.

“Offshore design raises the deeper problem of separating interaction designers and usability professionals from the users. User-centered design requires frequent access to users: the more frequent the better.”

If the usability professionals need to be close to the users, can they stay onshore and concentrate on the design while the developers build offshore? Nielsen was emphatic on that point.

“It is obviously not a solution to separate design and implementation since all experience shows that design, usability, and implementers need to proceed in tight co-ordination. Even separating teams across different floors in the same building lowers the quality of the resulting product (for example, because there will be less informal discussions about interpretations of the design spec).”

So, according to Nielsen, the usability professionals have to follow the developers offshore. However, as we’ve seen, the offshore countries have nowhere near enough trained professionals to cover the work. Numbers are increasing, but not by enough to keep pace with the growth in offshore development, never mind the demands of local commerce.

This apparent conundrum has been dismissed by many people who have pointed out, correctly, that offshoring is not an “all or nothing” choice. Usability does not have to follow development. If usability is a concern, then user design work can be retained onshore, and usability expertise can be deployed in both countries. This is true, but it is a rather unfair criticism of Nielsen’s arguments. The problem he describes is real enough. The fact that it can be mitigated by careful planning certainly does not mean the problem can be ignored.

User-centred design assumes that developers, usability analysts and users will be working closely together. Offshoring the developers forces organizations to make a choice between two unattractive options; separating usability professionals from the users, or separating them from the developers.

It is important that organizations acknowledge this dilemma and make the choice explicitly, based on their needs and their market. Every responsible usability professional would be keenly aware that their geographical separation from their users was a problem, and so those organizations that hire usability expertise offshore are at least implicitly acknowledging the problems caused by offshoring. My concern is for those organizations who keep all the usability professionals onshore and either ignore the problems, or assume that they don’t apply in their case.

How not to tackle the problems

Jhumkee Iyengar has studied the responses of organizations wanting to ensure that offshore development will give them usable applications [8]. Typically they have tried to do so without offshore usability expertise. They have used two techniques sadly familiar to those who have studied usability problems; defining the user interaction requirements up-front and sending a final, frozen specification to the offshore developers, or adopting the flawed and fallacious layering approach.

Attempting to define detailed up-front requirements is anathema to good user-centred design. It is an approach consistent with the Waterfall approach and is attractive because it is neat and easy to manage (as I discussed in my article on the V Model in Testing Experience, issue 4).

Building a usable application that allows users and customers to achieve their personal and corporate goals painlessly and efficiently requires iteration, prototyping and user involvement that is both early in the lifecycle and repeated throughout it.

The layering approach was based on the fallacy that the user interface could be separated from the functionality of the application, and that each could be developed separately. This fallacy was very popular in the 80s and 90s. Its influence has persisted, not because it is valid, but because it lends an air of spurious respectability to what people want to do anyway.

Academics expended a huge amount of effort trying to justify this separability. Their arguments, their motives and the consequences of their mistake are worth a full article in their own right. I’ll restrict myself to saying that the notion of separability was flawed on three counts.

  • It was flawed conceptually because usability is a product of the experience of the user with the whole application, not just the interface.
  • It was flawed architecturally, because design decisions taken by system architects can have a huge impact on the user experience.
  • Finally, it was flawed in political and organizational terms because it encouraged usability professionals to retreat into a ghetto, isolated from the hubbub of the developers, where they would work away on an interface that could be bolted onto the application in time for user testing.

Lewis & Rieman [3] memorably savaged the idea that usability professionals could hold themselves aloof from the application design, calling it “the peanut butter theory of usability”

“Usability is seen as a spread that can be smeared over any design, however dreadful, with good results if the spread is thick enough. If the underlying functionality is confusing, then spread a graphical user interface on it. … If the user interface still has some problems, smear some manuals over it. If the manuals are still deficient, smear on some training which you force users to take.”

If the usability professionals stay onshore, and adopt either the separability or the peanut butter approach, the almost inescapable result is that they will be delegating decisions about usability to the offshore developers.

Developers are just about the worst people to take these decisions. They are too close to the application, and instinctively see workarounds to problems that might appear insoluble to real users.

Developers also have a different mindset when approaching technology. Even if they understand the business context of the applications they can’t unlearn their technical knowledge and experience to see the application as a user would; and this is if developers and users are in the same country. The cultural differences are magnified if they are based in different continents.

The relative lack of maturity of usability in the offshoring destinations means that developers often have an even less sophisticated approach than developers in the client countries. User interaction is regarded as an aesthetic matter restricted to the interface, with the developers more interested in the guts of the application.

Pradeep Henry reported in 2003 that most user interfaces at Indian companies were being designed by programmers, and that in his opinion they had great difficulty switching from their normal technical, system-focused mindset to that of a user. [9]

They also had very little knowledge of user centered design techniques. This is partly a matter of education, but there is more to it. In explaining the shortage of usability expertise in India, Jhumkee Iyengar told me that she believes important factors are the “phenomenal success of Indian IT industry, which leads people to question the need for usability, and the offshore culture, which has traditionally been a ‘back office culture’ not conducive to usability”.

The situation is, however, certainly improving. Although the explosive growth in development work in India, China and Eastern Europe has left the usability profession struggling to keep up, the number of usability experts has grown enormously over the last 10 years. There are nowhere near enough, but there are firms offering this specialist service keen to work with offshoring clients.

This trend is certain to continue because usability is a high value service. It is a hugely attractive route to follow for these offshore destinations, complementing and enhancing the traditional offshore development service.

Testers must warn of the dangers

The significance of all this from the perspective of testers is that even though usability faces significant threats when development is offshored, there are ways to reduce the dangers and the problems. They cannot be removed entirely, but offshoring offers such cost savings it will continue to grow and it is important that testers working for client companies understand these problems and can anticipate them.

Testers may not always, or often, be in a position to influence whether usability expertise is hired locally or offshore. However, they can flag up the risks of whatever approach is used, and adopt an appropriate test strategy.

The most obvious danger is if an application has significant interaction with the user and there is no specialist usability expertise on the project. As I said earlier, this could mean that the project abdicates responsibility for crucial usability decisions to the offshore developers.

Testers should try to prevent a scenario where the interface and user interaction are pieced together offshore, and thrown “over the wall” to the users back in the client’s country for acceptance testing when it may be too late to fix even serious usability defects.

Is it outside the traditional role of a tester to lobby project management to try and change the structure of the project? Possibly, but if testers can see that the project is going to be run in a way that makes it hard to do their job effectively then I believe they have a responsibility to speak out.

I’m not aware of any studies looking at whether outsourcing contracts (or managed service agreements) are written in such prescriptive detail that they restrict the ability of test managers to tailor their testing strategy to the risks they identify. However, going by my experience and the anecdotal evidence I’ve heard, this is not an issue. Testing is not usually covered in detail in contracts, thus leaving considerable scope to test managers who are prepared to take the initiative.

Although I’ve expressed concern about the dangers of relying on a detailed up front specification there is no doubt that if the build is being carried out by offshore developers then they have to be given clear, detailed, unambiguous instructions.

The test manager should therefore set a test strategy that allows for significant static testing of the requirements documents. These should be shaped by walkthroughs and inspections to check that the usability requirements are present, complete, stated in sufficient detail to be testable, yet not defined so precisely that they constrain the design and rule out what might have been perfectly acceptable solutions to the requirements.

Once the offshore developers have been set to work on the specification it is important that there is constant communication with them and ongoing static testing as the design is fleshed out.

Hienadz Drahun leads an offshore interaction design team in Belarus. He stresses the importance of communication. He told me that “communication becomes a crucial point. You need to maintain frequent and direct communication with your development team.”

Dave Cronin of the US Cooper usability design consultancy wrote an interesting article about this in 2004, [10].

“We already know that spending the time to holistically define and design a software product dramatically increases the likelihood that you will deliver an effective and pleasurable experience to your customers, and that communication is one of the critical ingredients to this design process. All this appears to be even more true if you decide to have the product built in India or Eastern Europe.

To be absolutely clear, to successfully outsource product development, it is crucial that every aspect of the product be specifically defined, designed and documented. The kind of hand-waving you may be accustomed to when working with familiar and well-informed developers will no longer suffice.”

Significantly Cronin did not mention testing anywhere in his article, though he does mention “feedback” during the design process.

The limits of usability testing

One of the classic usability mistakes is to place too much reliance on usability testing. In fact, I’ve heard it argued that there is no such thing as usability testing. It’s a provocative argument, but it has some merit.

If usability is dependent only on testing, then it will be left to the end of the development, and serious defects will be discovered too late in the project for them to be fixed.

“They’re only usability problems, the users can work around them” is the cry from managers under pressure to implement on time. Usability must be incorporated into the design stages, with possible solutions being evaluated and refined. Usability is therefore produced not by testing, but by good design practices.

Pradeep Henry called his new team “Usability Lab” when he introduced usability to Cognizant, the offshore outsourcing company, in India. However, the name and the sight of the testing equipment in the lab encouraged people to equate usability with testing. As Henry explained;

“Unfortunately, equating usability with testing leads people to believe that programmers or graphic designers should continue to design the user interface and that usability specialists should be consulted only later for testing.”

Henry renamed his team the Cognizant Usability Group (now the Cognizant Usability Center of Excellence). [9]

Tactical improvements testers can make

So if usability evaluation has to be integrated into the whole development process then what can testers actually do in the absence of usability professionals? Obviously it will be difficult, but if iteration is possible during design, and if you can persuade management that there is a real threat to quality then you can certainly make the situation a lot better.

There is a lot of material readily available to guide you. I would suggest the following.

Firstly, Jakob Nielsen’s Discount Usability Engineering [11] consists of cheap prototyping (maybe just paper based), heuristic (rule based) evaluation and getting users to talk their way through the application, thinking out loud as they work their way through a scenario.

Steve Krug’s “lost our lease” usability testing basically says that any usability testing is better than none, and that quick and crude testing can be both cheap and effective. Krug’s focus is more on the management of this approach rather than the testing techniques themselves, so it fits with Nielsen’s DUE, rather than being an alternative in my opinion. It’s all in his book “Don’t make me think”. [12]

Lockwood & Constantine’s Collaborative Usability Inspections offer a far more formal and rigorous approach, though you may be stretching yourself to take this on without usability professionals. It entails setting up formal walk-throughs of the proposed design, then iteration to remove defects and improve the product. [13, 14, 15]

On a lighter note, Alan Cooper’s book “The inmates are running the asylum” [15], is an entertaining rant on the subject. Cooper’s solution to the problem is his Interaction Design approach. The essence of this is that software development must include a form of functional analysis that seeks to understand the business problem from the perspective of the users, based on their personal and corporate goals, working through scenarios to understand what they will want to do.

Cooper’s Interaction Design strikes a balance between the old, flawed extremes of structured methods (which ignored the individual) and traditional usability (which often paid insufficient attention to the needs of the organization). I recommend this book not because I think that a novice could take this technique on board and make it work, but because it is very readable and might make you question your preconceptions and think about what is desirable and possible.

Longer term improvements

Of course it’s possible that you are working for a company that is still in the process of offshoring and where it is still possible to influence the outcome. It is important that the invitation to tender includes a requirement that suppliers can prove expertise and experience in usability engineering. Additionally, the client should expect potential suppliers to show they can satisfy the following three criteria.

  • The supplier should have a process or lifecycle model that not only has usability engineering embedded within it but that also demonstrates how the onshore and offshore teams will work together to achieve usability. The process must involve both sides.

    Offshore suppliers have put considerable effort into developing such frameworks. Three examples are Cognizant’s “End-to-End UI Process”, HFI’s “Schaffer-Weinschenk Method™” and Persistent’s “Overlap Usability”.

  • Secondly, suppliers should carry out user testing with users from the country where the application will be used. The cultural differences are too great to use people who happen to be easily available to the offshore developers.

    Remote testing entails usability experts based in one location conducting tests with users based elsewhere, even another continent. It would probably not be the first choice of most usability professionals, but it is becoming increasingly important. As Jumkhee Iyengar told me it “may not be the best, but it works and we have had good results. A far cry above no testing.”

  • Finally, suppliers should be willing to send usability professionals to the onshore country for the requirements gathering. This is partly a matter of ensuring that the requirements gathering takes full account of usability principles. It is also necessary so that these usability experts can fully understand the client’s business problem and can communicate it to the developers when they return offshore.

It’s possible that there may still be people in your organization who are sceptical about the value of usability. There has been a lot of work done on the return on investment that user centered design can bring. It’s too big a topic for this article, but a simple internet search on “usability” and “roi” will give you plenty of material.

What about the future?

There seems no reason to expect any significant changes in the trends we’ve seen over the last 10 years. The offshoring countries will continue to produce large numbers of well-educated, technically expert IT professionals. The cost advantage of developing in these countries will continue to attract work there.

Proactive test managers can head off some of the usability problems associated with offshoring. They can help bring about higher quality products even if their employers have not allowed for usability expertise on their projects. However, we should not have unrealistic expectations about what they can achieve. High quality, usable products can only be delivered consistently by organizations that have a commitment to usability and who integrate usability throughout the design process.

Offshoring suppliers will have a huge incentive to keep advancing into user centered design and usability consultancy. The increase in offshore development work creates the need for such services, whilst the specialist and advanced nature of the work gives these suppliers a highly attractive opportunity to move up the value chain, selling more expensive services on the basis of quality rather than cost.

The techniques I have suggested are certainly worthwhile, but they may prove no more than damage limitation. As Hienadz Drahun put it to me; “to get high design quality you need to have both a good initial design and a good amount of iterative usability evaluation. Iterative testing alone is not able to turn a bad product design into a good one. You need both.” Testers alone cannot build usability into an application, any more than they can build in quality.

Testers in the client countries will increasingly have to cope with the problems of working with offshore development. It is important that they learn how to work successfully with offshoring and adapt to it.

They will therefore have to be vigilant about the risks to usability of offshoring, and advise their employers and clients how testing can best mitigate these risks, both on a short term tactical level, i.e. on specific projects where there is no established usability framework, and in the longer term, where there is the opportunity to shape the contracts signed with offshore suppliers.

There will always be a need for test management expertise based in client countries, working with the offshore teams, but it will not be the same job we knew in the 90s.

References

[1] NASSCOM (2009). “Industry Trends, IT-BPO Sector-Overview”.

[2] Nielsen, J. (2002). “Offshore Usability”. Jakob Nielsen’s website.

[3] Lewis, C., Rieman, J. (1994). “Task-Centered User Interface Design: A Practical Introduction”. University of Colorado e-book.

[4] Artman, H. (2002). “Procurer Usability Requirements: Negotiations in Contract Development”. Proceedings of the second Nordic conference on Human-Computer Interaction (NordiCHI) 2002.

[5] Holmlid, S. Artman, H. (2003). “A Tentative Model for Procuring Usable Systems” (NB PDF download). 10th International Conference on Human-Computer Interaction, 2003.

[6] Grudin, J. (1991). “Interactive Systems: Bridging the Gaps Between Developers and Users”. Computer, April 1991, Vol. 24, No. 4 (subscription required).

[7] Grudin, J. (1996). “The Organizational Contexts of Development and Use”. ACM Computing Surveys. Vol 28, issue 1, March 1996, pp 169-171 (subscription required).

[8] Iyengar, J. (2007). “Usability Issues in Offshore Development: an Indian Perspective”. Usability Professionals Association Conference, 2007 (UPA membership required.

[9] Henry, P. (2003). “Advancing UCD While Facing Challenges Working from Offshore”. ACM Interactions, March/April 2003 (subscription required).

[10] Cronin D. (2004). “Designing for Offshore Development”. Cooper Journal blog.

[11] Nielsen, J. (1994). “Guerrilla HCI: Using Discount Usability Engineering to Penetrate the Intimidation Barrier”, Jakob Nielsen’s website.

[12] Krug, S. (2006). “Don’t Make Me Think!: A Common Sense Approach to Web Usability” 2nd edition. New Riders.

[13] Constantine, L. Lockwood, L. (1999). “Software for Use”. Addison Wesley.

[14] Lockwood, L. (1999). “Collaborative Usability Inspecting – more usable software and sites” (NB PDF download), Constantine and Lockwood website.

[15] Cooper, A. (2004). “The Inmates Are Running the Asylum: Why High-tech Products Drive Us Crazy and How to Restore the Sanity”. Sams.

The seductive and dangerous V model (2008)

The seductive and dangerous V model (2008)

Testing Experience

This is an expanded version of an article I wrote for the December 2008 edition of Testing Experience, a magazine which is no longer published. I’m moving the article onto my blog from my website, which will be decommissioned soon.

Inevitably the article is dated in some respects, especially where I discuss possible ways for testers to ameliorate the V Model if they are forced to use it. There’s no mention of exploratory testing. That’s simply because my aim in writing the article was to help people understand the flaws of the V Model and how they can work around them in traditional environments that try to apply the model. A comparison of exploratory and traditional scripted testing techniques was far too big a topic to be shoe-horned in here.

However, I think the article still has value in helping to explain why software development and testing took the paths it followed. For the article I drew heavily on reading and research I carried out for my MSc dissertation, an exercice that taught me a huge amount about the history of software development, and why we ended up where we did.

The references in the article were all structured for a paper magazine. There are no hyperlinks and I have not tried to recreate them and check out whether they still work.

The article

The seductive and dangerous V Model

The Project Manager finishes the meeting by firing a question at the Programme Test Manager, and myself, the Project Test Manager.

“The steering board’s asking questions about quality. Are we following a formal testing model?”.

The Programme Test Manager doesn’t hesitate. “We always use the V Model. It’s the industry standard”.

The Project Manager scribbles a note. “Good”.

Fast forward four months, and I’m with the Programme Manager and the test analysts. It’s a round table session, to boost morale and assure everyone that their hard work is appreciated.

The Programme Manager is upbeat. “Of course you all know about the problems we’ve been having, but when you look at the big picture, it’s not so bad. We’re only about 10% behind where we should have been. We’re still on target to hit the implementation date”.

This is a red rag to a bullish test manager, so I jump in. “Yes, but that 10% is all at the expense of the testing window. We’ve lost half our time”.

The Programme Manager doesn’t take offence, and laughs. “Oh, come on! You’re not going to tell me you’ve not seen that before? If you’re unhappy with that then maybe you’re in the wrong job!”.

I’m not in the mood to let it drop. “That doesn’t make it right. There’s always a readiness to accept testing getting squeezed, as if the test window is contingency. If project schedules and budgets overrun, project managers can’t get away with saying ‘oh – that always happens'”.

He smiles and moves smoothly on, whilst the test analysts try to look deadpan, and no doubt some decide that career progression to test management isn’t quite as attractive as they used to think.

Test windows do get squeezed

The Programme Manager was quite right. That’s what happens on Waterfall projects, and saying that you’re using the V Model does nothing to stop that almost routine squeeze.

In “Testing Experience” issue 2, Dieter Arnouts [1] outlined how test management fits into different software development lifecycle models. I want to expand on the V Model, its problems and what testers can do in response.

In the earlier meeting the Programme Test Manager had given an honest and truthful answer, but I wonder if he was actually wholly misleading. Yes, we were using the V Model, and unquestionably it would be the “test model” of choice for most testing professionals, in the UK at least.

However, I question whether it truly qualifies as a “test model”, and whether its status as best practice, or industry standard, is deserved.

A useful model has to represent the way that testing can and should be carried out. A model must therefore be coherent, reasonably precisely defined and grounded in the realities of software development.

Coherence at the expense of precision

If the V Model, as practised in the UK, has coherence it is at the expense of precision. Worldwide there are several different versions and interpretations. At one extreme is the German V-Model, [2], the official project management methodology of the German government. It is roughly equivalent to Prince-2, but more directly relevant to software development. Unquestionably this is coherent and rigorously defined.

Of course, it’s not the V Model at all, not in the sense that UK testers understand it. For one thing the V stands not for the familiar V shaped lifecycle model, but for “vorgehens”, German for “going forwards”. However, this model does promote a V shaped lifecycle, and some seem to think that this is the real V Model, in its pure form.

The US also has a government standard V Model [3], which dates back about 20 years, like its German counterpart. Its scope is rather narrower, being a systems development lifecycle model, but still more far more detailed and more rigorous than most UK practitioners would understand by the V Model.

The understanding that most of us in the UK have of the V Model is probably based on the V Model as it’s taught in the ISEB Foundation Certificate in Software Testing [4]. The syllabus merely describes the Model without providing an illustration, and does not even name the development levels on the left hand side of the V. Nor does it prescribe a set number of levels of testing. Four levels is “common”, but there could be more or less on any project.

This makes sense to an experienced practitioner. It is certainly coherent, in that it is intuitive. It is easy for testers to understand, but it is far from precise. Novices have no trouble understanding the model to the level required for a straightforward multiple choice exam, but they can struggle to get to grips with what exactly the V Model is. If they search on the internet their confusion will deepen. Wikipedia is hopelessly confused on the matter, with two separate articles on the V Model, which fail to make a clear distinction between the German model [5] and the descriptive lifecycle model [6] familiar to testers. Unsurprisingly there are many queries on internet forums asking “what is the V Model?”

There are so many variations that in practice the V Model can mean just about whatever you want it to mean. This is a typical, and typically vague, illustration of the model, or rather, or a version of the V model.

V model 2

It is interesting to do a Google images search on “V Model”. One can see a huge range of images illustrating the model. All share the V shape, and all show an arrow or line linking equivalent stages in each leg. However, the nature of the link is vague and inconsistent. Is it static testing of deliverables? Simple review and sign-off of specifications? Early test planning? Even the direction of the arrow varies. The Wikipedia article on the German V Model has the arrows going in different directions in different diagrams. There is no explanation for this. It simply reflects the confusion in the profession about what exactly the V Model is.

Boiled down to its most basic form, the V Model can mean simply that test planning should start early on, and the test plans should be based on the equivalent level of the left hand development leg. In practice, I fear that that is all it usually does mean.

The trouble is that this really isn’t a worthwhile advance on the Waterfall Model. The V Model is no more than a testing variant of the Waterfall. At best, it is a form of damage limitation, mitigating some of the damage that the Waterfall has on testing. To understand the significance of this for the quality of applications we need to look at the history and defects of the Waterfall Model itself.

Why the Waterfall is bad for testing and quality

In 1970 Royce wrote the famous paper [7] depicting the Waterfall Model. It is a strange example of how perceptive and practical advice can be distorted and misapplied. Using the analogy of a waterfall, Royce set up a straw man model to describe how software developments had been managed in the 1960s. He then demolished the model. Unfortunately posterity has credited him with inventing the model

The Waterfall assumes that a project can be broken up into separate, linear phases and that the output of each phase provides the input for the next phase, thus implying that each would be largely complete before the next one would start. Although Royce did not make the point explicitly, his Waterfall assumes that requirements can, and must, be defined accurately at the first pass, before any significant progress has been made with the design. The problems with the Waterfall, as seen by Royce, are its inability to handle changes, mainly changes forced by test defects, and the inflexibility of the model in dealing with iteration. Further flaws were later identified, but Royce’s criticisms are still valid.

Royce makes the point that prior to testing, all the stages of the project have predictable outputs, i.e. the results of rigorous analysis. The output of the testing stage is inherently unpredictable in the Waterfall, yet this is near the end of the project. Therefore, if the outcome of testing is not what was predicted or wanted, then reworks to earlier stages can wreck the project. The assumption that iteration can be safely restricted to successive stages doesn’t hold. Such reworking could extend right back to the early stages of design, and is not a matter of polishing up the later steps of the previous phase.

Although Royce did not dwell on the difficulty of establishing the requirements accurately at the first pass, this problem did trouble other, later writers. They concluded that not only is it impossible in practice to define the requirements accurately before construction starts, it is wrong in principle because the process of developing the application changes the users’ understanding of what is desirable and possible and thus changes the requirements. Furthermore, there is widespread agreement that the Waterfall is particularly damaging for interactive applications.

The problems of the Waterfall are therefore not because the analysts and developers have screwed up. The problems are inherent in the model. They are the result of following it properly, not badly!

The Waterfall and project management

So if the Waterfall was originally depicted as a straw man in 1970, and if it has been consistently savaged by academics, why is there still debate about it? Respected contemporary textbooks still defend it. Books such as Hallows’ “Information Systems Project Management” [8], a valuable book, frequently used for university courses. The title gives the game away. The Waterfall was shaped by project management requirements, and it therefore facilitates project management. Along with the V model it is neater, and much easier to manage, plan and control than an iterative approach, which looks messy and unpredictable to the project manager.

Raccoon in 1997 [9] used a revealing phrase, describing the Waterfall.

“If the Waterfall model were wrong, we would stop arguing over it. Though the Waterfall model may not describe the whole truth, it describes an interesting structure that occurs in many well-defined projects and it will continue to describe this truth for a long time to come. I expect the Waterfall model will live on for the next one hundred years and more”.

Note the key phrase, “an interesting structure that occurs in many well-defined projects”.

The Waterfall allows project managers to structure projects neatly, and it is good for project plans not applications!

It is often argued that in practice the Waterfall is not applied in its pure form; that there is always some limited iteration and that there is invariably some degree of overlap between the phases. Defenders of the model therefore argue that critics don’t understand how it can be used effectively.

Hallows argues that changes are time-consuming and expensive regardless of the model followed. He dismisses the criticism that the Waterfall doesn’t allow a return to an earlier stage by saying that this would be a case of an individual project manager being too inflexible, rather than a problem with the model.

This is naive. The problem is not solved by better project management, because project management itself has contributed towards the problem; or rather the response of rational project managers to the pressures facing them.

The dangerous influence of project management had been recognised by the early 1980s. Practitioners had always been contorting development practices to fit their project management model, a point argued forcibly by McCracken & Jackson in 1982 [10].

“Any form of life cycle is a project management structure imposed on system development. To contend that any life cycle scheme, even with variations, can be applied to all system development is either to fly in the face of reality or to assume a life cycle so rudimentary as to be vacuous.”

In hindsight the tone of McCracken & Jackson’s paper, and the lack of response to it for years is reminiscent of Old Testament prophets raging in the wilderness. They were right, but ignored.

The symbiosis between project management and the Waterfall has meant that practitioners have frequently been compelled to employ methods that they knew were ineffective. This is most graphically illustrated by the UK government’s mandated use of the PRINCE2 project management method and SSADM development methodology. These two go hand in hand.

They are not necessarily flawed, and this article does not have room for their merits and problems, but they are associated with a traditional approach such as the Waterfall.

The UK’s National Audit Office stated in 2003 [11] that “PRINCE requires a project to be organised into a number of discrete stages, each of which is expected to deliver end products which meet defined quality requirements. Progress to the next stage of the project depends on successfully meeting the delivery targets for the current stage. The methodology fits particularly well with the ‘waterfall’ approach.”

The NAO says in the same paper that “the waterfall … remains the preferred approach for developing systems where it is very important to get the specification exactly right (for example, in processes which are computationally complex or safety critical)”. This is current advice. The public sector tends to be more risk averse than the private sector. If auditors say an approach is “preferred” then it would take a bold and confident project manager to reject that advice.

This official advice is offered in spite of the persistent criticism that it is never possible to define the requirements precisely in advance in the style assumed by the Waterfall model, and that attempting to do so is possible only if one is prepared to steamroller the users into accepting a system that doesn’t satisfy their goals.

The UK government is therefore promoting the use of a project management method partly because it fits well with a lifecycle that is fundamentally flawed because it has been shaped by project management needs rather than those of software development.

The USA and commercial procurement

The history of the Waterfall in the USA illustrates its durability and provides a further insight into why it will survive for some time to come; its neat fit with commercial procurement practices.

The US military was the main customer for large-scale software development contracts in the 1970s and insisted on formal and rigorous development methods. The US Department of Defense (DoD) did not explicitly mandate the Waterfall, but their insistence in Standard DOD-STD-2167 [12] on a staged development approach, with a heavy up-front documentation overhead, and formal review and sign-off of all deliverables at the end of each stage, effectively ruled out any alternative. The reasons for this were quite explicitly to help the DoD to keep control of procurement.

In the 80s the DoD relaxed its requirements and allowed iterative developments. However, it did not forbid the Waterfall, and the Standard’s continued insistence on formal reviews and audits that were clearly consistent with the Waterfall gave developers and contractors every incentive to stick with that model.

The damaging effects of this were clearly identified to the DoD at the time. A report of the Defense Science Board Task Force [13] criticised the effects of the former Standard, and complained that the reformed version did not go nearly far enough.

However, the Task Force had to acknowledge that “evolutionary development plays havoc with the customary forms of competitive procurement, … and they with it.”

The Task Force contained such illustrious experts as Frederick Brooks, Vic Basili and Barry Boehm. These were reputable insiders, part of a DoD Task Force, not iconoclastic academic rebels. They knew the problems caused by the Waterfall and they understood that the rigid structure of the model provided comfort and reassurance that large projects involving massive amounts of public money were under control. They therefore recommended appropriate remedies, involving early prototyping and staged awarding of contracts. They were ignored. Such was the grip of the Waterfall nearly 20 years after it had first been criticised by Royce.

The DoD did not finally abandon the Waterfall till Military Standard 498 (MIL-STD-498) seven years later in 1994, by which time the Waterfall was embedded in the very soul of the IT profession.

Even now the traditional procurement practices referred to by the Task Force, which fit much more comfortably with the Waterfall and the V Model, are being followed because they facilitate control, not quality. It is surely significant that the V Model is the only testing model that students of the Association of Chartered Certified Accountants learn about. It is the model for accountants and project managers, not developers or testers. The contractual relationship between client and supplier reinforces the rigid project management style of development.

Major George Newberry, a US Air Force officer specialising in software acquisition and responsible for collating USAF input to the defence standards, complained in 1995 [14] about the need to deliver mind-numbing amounts of documentation in US defence projects because of the existing standards.

“DOD-STD-2167A imposes formal reviews and audits that emphasize the Waterfall Model and are often nonproductive ‘dog and pony shows’. The developer spends thousands of staff-hours preparing special materials for these meetings, and the acquirer is then swamped by information overload.”

This is a scenario familiar to any IT professional who has worked on external contracts, especially in the public sector. Payments are tied to the formal reviews and dependent on the production of satisfactory documentation. The danger is that supplier staff become fixated on the production of the material that pays their wages, rather than the real substance of the project.

Nevertheless, as noted earlier, the UK public sector in particular, and large parts of the private sector are still wedded to the Waterfall method and this must surely bias contracts against a commitment to quality.

V for veneer?

What is seductive and damaging about the V Model is that it gives the Waterfall approach credibility. It has given a veneer of respectability to a process that almost guarantees shoddy quality. The most damaging aspect is perhaps the effect on usability.

The V Model discourages active user involvement in evaluating the design, and especially the interface, before the formal user acceptance testing stage. By then it is too late to make significant changes to the design. Usability problems can be dismissed as “cosmetic” and the users are pressured to accept a system that doesn’t meet their needs. This is bad if it is an application for internal corporate users. It is potentially disastrous if it is a web application for customers.

None of this is new to academics or test consultants who’ve had to deal with usability problems. However, what practitioners do in the field can often lag many years behind what academics and specialist consultants know to be good practice. Many organisations are a long, long way from the leading edge.

Rex Black provided a fine metaphor for this quality problem in 2002 [15]. After correctly identifying that V Model projects are driven by cost and schedule constraints, rather than quality, Black argues that the resultant fixing of the implementation date effectively locks the top of the right leg of the V in place, while the pivot point at the bottom slips further to the right, thus creating Black’s “ski slope and cliff”.

The project glides down the ski slope, then crashes into the “quality cliff” of the test execution stages that have been compressed into an impossible timescale.

The Waterfall may have resulted in bad systems, but its massive saving grace for companies and governments alike was that they were developed in projects that offered at least the illusion of being manageable! This suggests, as Raccoon stated [9], that the Waterfall may yet survive another hundred years.

The V Model’s great attractions were that it fitted beautifully into the structure of the Waterfall, it didn’t challenge that model, and it just looks right; comfortable and reassuring.

What can testers do to limit the damage?

I believe strongly that iterative development techniques must be used wherever possible. However, such techniques are beyond the scope of this article. Here I am interested only in explaining why the V Model is defective, why it has such a grip on our profession, and what testers can do to limit its potential damage.

The key question is therefore; how can we improve matters when we find we have to use it? As so often in life just asking the question is half the battle. It’s crucial that testers shift their mindset from an assumption that the V Model will guide them through to a successful implementation, and instead regard it as a flawed model with a succession of mantraps that must be avoided.

Testers must first accept the counter-intuitive truth that the Waterfall and V Model only work when their precepts are violated. This won’t come as any great surprise to experienced practitioners, though it is a tough lesson for novices to learn.

Developers and testers may follow models and development lifecyles in theory, but often it’s no more than lip service. When it comes to the crunch we all do whatever works and ditch the theory. So why not adopt techniques that work and stop pretending that the V Model does?

In particular, iteration happens anyway! Embrace it. The choice is between planning for iteration and frantically winging it, trying to squeeze in fixes and reworking of specifications.

Even the most rigid Waterfall project would allow some iteration during test execution. It is crucial that testers ensure there is no confusion between test cycles exercising different parts of the solution, and reruns of previous tests to see whether fixes have been applied. Testers must press for realistic allowances for reruns. One cycle to reveal defects and another to retest is planning for failure.

Once this debate has been held (and won!) with project management the tester should extend the argument. Make the point that test execution provides feedback about quality and risks. It cannot be right to defer feedback. Where it is possible to get feedback early it must be taken.

It’s not the testers’ job to get the quality right. It’s not the testers’ responsibility to decide if the application is fit to implement. It’s our responsibility to ensure that the right people get accurate feedback about quality at the right time. That means feedback to analysts and designers early enough to let them fix problems quickly and cheaply. This feedback and correction effectively introduces iteration. Acknowledging this allows us to plan for it.

Defenders of the V Model would argue that that is entirely consistent with the V Model. Indeed it is. That is the point.

However, what the V Model doesn’t do adequately is help testers to force the issue; to provide a statement of sound practice, an effective, practical model that will guide them to a happy conclusion. It is just too vague and wishy washy. In so far as the V Model means anything, it means to start test planning early, and to base your testing on documents from equivalent stages on the left hand side of the V.

Without these, the V Model is nothing. A fundamental flaw of the V Model is that it is not hooked into the construction stages in the way that its proponents blithely assume. Whoever heard of a development being delayed because a test manager had not been appointed?

“We’ll just crack on”, is the response to that problem. “Once we’ve got someone appointed they can catch up.”

Are requirements ever nailed down accurately and completely before coding starts? In practice, no. The requirements keep evolving, and the design is forced to change. The requirements and design keep changing even as the deadline for the end of coding nears.

“Well, you can push back the delivery dates for the system test plan and the acceptance test plan. Don’t say we’re not sympathetic to the testers!”

What is not acknowledged is that if test planning doesn’t start early, and if the solution is in a state of flux till the last moment, one is not left with a compromised version of the V Model. One is left with nothing whatsoever; no coherent test model.

Testing has become the frantic, last minute, ulcer inducing sprint it always was under the Waterfall and that the V Model is supposed to prevent.

It is therefore important that testers agitate for the adoption of a model that honours the good intentions of the V Model, but is better suited to the realties of development and the needs of the testers.

Herzlich’s W Model

An interesting extension of the V Model is Paul Herzlich’s W Model [16].

Herzlich W model

The W Model removes the vague and ambiguous lines linking the left and right legs of the V and replaces them with parallel testing activities, shadowing each of the development activities.

As the project moves down the left leg, the testers carry out static testing (i.e. inspections and walkthroughs) of the deliverables at each stage. Ideally prototyping and early usability testing would be included to test the system design of interactive systems at a time when it would be easy to solve problems. The emphasis would then switch to dynamic testing once the project moves into the integration leg.

There are several interesting aspects to the W Model. Firstly, it drops the arbitrary and unrealistic assumption that there should be a testing stage in the right leg for each development stage in the left leg. Each of the development stages has its testing shadow, within the same leg.

The illustration shows a typical example where there are the same number of stages in each leg, but it’s possible to vary the number and the nature of the testing stages as circumstances require without violating the principles of the model.

Also, it explicitly does not require the test plan for each dynamic test stage to be based on the specification produced in the twin stage on the left hand side. There is no twin stage of course, but this does address one of the undesirable by-products of a common but unthinking adoption of the V Model; a blind insistence that test plans should be generated from these equivalent documents, and only from those documents.

A crucial advantage of the W Model is that it encourages testers to define tests that can be built into the project plan, and on which development activity will be dependent, thus making it harder for test execution to be squeezed at the end of the project.

However, starting formal test execution in parallel with the start of development must not mean token reviews and sign-offs of the documentation at the end of each stage. Commonly under the V Model, and the Waterfall, test managers receive specifications with the request to review and sign off within a few days what the developers hope is a completed document. In such circumstances test managers who detect flaws can be seen as obstructive rather than constructive. Such last minute “reviews” do not count as early testing.

Morton’s Butterfly Model

Another variation on the V Model is the little known Butterfly Model [17] by Stephen Morton, which shares many features of the W Model.

Butterfly Model

The butterfly metaphor is based on the idea that clusters of testing are required throughout the development to tease out information about the requirements, design and build. These micro-iterations can be structured into the familiar testing stages, and the early static testing stages envisaged by the W Model.

In this model these micro-iterations explicitly shape the development

Butterfly Model micro-iterations

during the progression down the development leg. In essence, each micro-iteration can be represented by a butterfly; the left wing for test analysis, the right wing for specification and design, and the body is test execution, the muscle that links and drives the test, which might consist of more than one piece of analysis and design, hence the segmented wings. Sadly, this model does not seem to have been fully fleshed out, and in spite of its attractions it has almost vanished from sight.

Conclusion – the role of education

The beauty of the W and Butterfly Models is that they fully recognise the flaws of the V Model, but they can be overlaid on the V. That allows the devious and imaginative test manager to smuggle a more helpful and relevant testing model into a project committed to the V Model without giving the impression that he or she is doing anything radical or disruptive.

The V Model is so vague that a test manager could argue with a straight face that the essential features of the W or Butterfly are actually features of the V Model as the test manager believes it must be applied in practice. I would regard this as constructive diplomacy rather than spinning a line!

I present the W and Butterfly Models as interesting possibilities but what really matters is that test managers understand the need to force planned iteration into the project schedule, and to hook testing activities into the project plan so that “testing early” becomes meaningful rather than a comforting and irrelevant platitude. It is possible for test managers to do any of this provided they understand the flaws of the V Model and how to improve matters. This takes us onto the matter of education.

The V Model was the only “model” testers learned about when they took the old ISEB Foundation Certificate. Too many testers regarded that as the end of their education in testing. They were able to secure good jobs or contracts with their existing knowledge. Spending more time and money continuing their learning was not a priority.

As a result of this, and the pressures of project management and procurement, the V Model is unchallenged as the state of the art for testing in the minds of many testers, project managers and business managers.

The V Model will not disappear just because practitioners become aware of its problems. However, a keen understanding of its limitations will give them a chance to anticipate these problems and produce higher quality applications.

I don’t have a problem with testers attempting to extend their knowledge and skills through formal qualifications such as ISEB and ISTQB. However, it is desperately important that they don’t think that what they learn from these courses comprises All You Ever Need to Know About The Only Path to True Testing. They’re biased towards traditional techniques and don’t pay sufficient attention to exploratory testing.

Ultimately we are all responsible for our own knowledge and skills; for our own education. We’ve got to go out there and find out what is possible, and to understand what’s going on. Testers need to make sure they’re aware of the work of testing experts such as Cem Kamer, James Bach, Brian Marick, Lisa Crispin and Michael Bolton. These people have put a huge amount of priceless material out on the internet for free. Go and find it!

References

[1] Arnouts, D. (2008). “Test management in different Software development life cycle models”. “Testing Experience” magazine, issue 2, June 2008.

[2] IABG. “Das V-Modell”. This is in German, but there are links to English pages and a downloadable version of the full documentation in English.

[3] US Dept of Transportation, Federal Highway Administration. “Systems Engineering Guidebook for ITS”.

[4] BCS ISEB Foundation Certificate in Software Testing – Syllabus (NB PDF download).

[5] Wikipedia. “V-Model”.

[6] Wikipedia. “V-Model (software development)”.

[7] Royce, W. (1970). “Managing the Development of Large Software Systems”, IEEE Wescon, August 1970.

[8] Hallows, J. (2005). “Information Systems Project Management”. 2nd edition. AMACOM, New York.

[9] Raccoon, L. (1997). “Fifty Years of Progress in Software Engineering”. SIGSOFT Software Engineering Notes Vol 22, Issue 1 (Jan. 1997). pp88-104.

[10] McCracken, D., Jackson, M. (1982). “Life Cycle Concept Considered Harmful”, ACM SIGSOFT Software Engineering Notes, Vol 7 No 2, April 1982. Subscription required.

[11] National Audit Office. (2003).”Review of System Development – Overview”.

[12] Department of Defense Standard 2167 (DOD-STD-2167). (1975) “Defense System Software Development”, US Government defence standard.

[13] “Defense Science Board Task Force On Military Software – Report” (extracts), (1987). ACM SIGAda Ada Letters Volume VIII , Issue 4 (July/Aug. 1988) pp35-46. Subscription required.

[14] Newberry, G. (1995). “Changes from DOD-STD-2167A to MIL-STD-498”, Crosstalk – the Journal of Defense Software Engineering, April 1995.

[15] Black, R. (2002). “Managing the Testing Process”, p415. Wiley 2002.

[16] Herzlich, P. (1993). “The Politics of Testing”. Proceedings of 1st EuroSTAR conference, London, Oct. 25-28, 1993.

[17] Morton, S. (2001). “The Butterfly Model for Test Development”. Sticky Minds website.

The Post Office Horizon IT scandal, part 2 – evidence & the “off piste” issue

In the first post of this three part series about the scandal of the Post Office’s Horizon IT system I explained the concerns I had about the approach to errors and accuracy. In this post I’ll talk about my experience working as an IT auditor investigating frauds, and my strong disapproval for the way the Post Office investigated and prosecuted the Horizon cases.

Evidence, certainty and prosecuting fraud

Although I worked on many fraud cases that resulted in people going to prison I was never required to give evidence in person. This was because we built our case so meticulously, with an overwhelmingly compelling set of evidence, that the fraudsters always pleaded guilty rather than risk antagonising the court with a wholly unconvincing plea of innocence.

We always had to be aware of the need to find out what had happened, rather than simply to sift for evidence that supported our working hypothesis. We had to follow the trail of evidence, but remain constantly alert to the possibility we might miss vital, alternative routes that could lead to a different conclusion. It’s very easy to fall quickly into the mindset that the suspect is definitely guilty and ignore anything that might shake that belief. Working on these investigations gave me great sympathy for the police carrying out detective work. If you want to make any progress you can’t follow up everything, but you have to be aware of the significance of the choices you don’t make.

In these cases there was a clear and obvious distinction between the investigators and the prosecutors. We, the IT auditors, would do enough investigation for us to be confident we had the evidence to support a conviction. We would then present that package of evidence to the police, who were invariably happy to run with a case where someone else had done the leg work. The police would do some confirmatory investigation of their own, but it was our work that would put people in jail. The prosecution of the cases was the responsibility of the Crown Prosecution Service in England & Wales, and the Procurator Fiscal Service in Scotland. That separation of responsibilities helps to guard against some of the dangers that concerned me about bias during investigation.

This separation didn’t apply in the case of the Post Office, which for anachronistic, historical reasons, employs its own prosecutors. It also has its own investigation service. There’s nothing unusual about internal investigators, but when they are working with an in house prosecution service that creates the danger of unethical behaviour. It the case of the Post Office the conduct of prosecutions was disgraceful.

The usual practice was to charge a sub-postmaster with theft and false accounting, even if the suspect had flagged up a problem with the accounts and there was no evidence that he or she had benefitted from a theft, or even committed one. Under pressure sub-postmasters would usually accept a deal. The more serious charge of theft would be dropped if they pleaded quilty to false accounting, which would allow the Post Office to pursue them for the losses.

What made this practice shameful was that the Post Office knew it had no evidence for theft that would secure a conviction. This doesn’t seem to have troubled them. They knew the suspects were guilty. They were protecting the interests of the Post Office and the end justified the means.

The argument that the prosecution tactics were deplorable is being taken very seriously. The Criminal Cases Review Commission has referred 39 Horizon cases for appeal, on the grounds of “abuse of process” by the prosecution.

The approach taken by Post Office investigators and prosecutors was essentially to try and ignore the weakest points of their case, while concentrating on the strongest points. This strikes me as fundamentally wrong. It is unprofessional and unethical. It runs counter to my experience.

Although I was never called to appear as a witness in court, when I was assembling the evidence to be used in a fraud trial I always prepared on the assumption I would have to face a barrister, or advocate, who had been sufficiently well briefed to home in on any possible areas of doubt, or uncertainty. I had to be prepared to face an aggressive questioner who could understand where weak points might lie in the prosecution case. The main areas of concern were where it was theoretically possible that data might have been tampered with, or where it was possible that someone else had taken the actions that we were pinning on the accused. Our case was only as strong as the weakest link in the chain of evidence. I had to be ready to explain why the jury should be confident “beyond reasonable doubt” that the accused was guilty.

Yes, it was theoretically possible that a systems programmer could have bypassed access controls and tampered with the logs, but it was vanishingly unlikely that they could have set up a web of consistent evidence covering many applications over many months, even years, and that they could have done so without leaving any trace.

In any case, these sysprogs lacked the deep application knowledge required. Some applications developers, and the IT auditors, did have the application knowledge, but they lacked the necessary privileges to subvert access controls before tampering with evidence.

The source code and JCL decks for all the fraud detection programs would have been available to the defence so that an expert witness could dissect them. We not only had to do the job properly, we had to be confident we could justify our code in court.

Another theoretical possibility was that another employee had logged into the accused’s account to make fraudulent transactions, but we could match these transactions against network logs showing that the actions had always been taken from the terminal sitting on the accused’s desk during normal office hours. I could sit at my desk in head office and use a network monitoring tool to watch what a suspect was doing hundreds of mile away. In one case I heard a colleague mention that the police were trailing a suspect around Liverpool that afternoon. I told my colleague to get back to the cops and tell them they were following the wrong guy. Our man was sitting at his desk in Preston and I could see him working. Half an hour later the police phoned back to say we were right.

In any case, fanciful speculation that our evidence had been manufactured hit the problem of motive; the accused was invariably enjoying a lifestyle well beyond his or her salary, whereas those who might have tampered with evidence had nothing to gain and a secure job, pension and mortgage to lose.

I’ve tried to explain our mindset and thought processes so that you can understand why I was shocked to read about what happened at the Post Office. We investigated and prepared meticulously in case we had to appear in court. That level of professional preparation goes a long way to explaining why we were never called to give evidence. The fraudsters always put their hands up when they realised how strong the evidence was.

Superusers going “off piste”

One of the most contentious aspects of the Horizon case was the prevalence of Transaction Corrections, i.e. corrections applied centrally by IT support staff to correct errors. The Post Office seems to have regarded these as being a routine part of the system, in the wider sense of the word “system”. But it regarded them as being outside the scope of the technical Horizon system. They were just a routine, administrative matter.

I came across an astonishing phrase in the judgment [PDF, opens in new tab, see page 117], lifted from an internal Post Office document. “When we go off piste we use APPSUP”. That is a powerful user privilege which allows users to do virtually anything. It was intended “for unenvisaged ad-hoc live amendment” of data. It had been used on average about once a day, and was assigned on a permanent basis to the ID’s of all the IT support staff looking after Horizon.

I’m not sure readers will realise how shocking the phrase “off piste” is in that context to someone with solid IT audit experience in a respectable financial services company. Picture the reaction of of a schools inspector coming across an email saying “our teachers are all tooled up with Kalashnikovs in case things get wild in the playground”. It’s not just a question of users holding a superuser privilege all the time, bad though that is. It reveals a lot about the organisation and its systems if staff have to jump in and change live data routinely. An IT shop that can’t control superusers effectively probably doesn’t control much. It’s basic.

Where I worked as an IT auditor nobody was allowed to have an account with which they could create, amend or delete production data. There were elaborate controls applied whenever an ad hoc or emergency change had to be made. We had to be confident in the integrity of our data. If we’d discovered staff having permanent update access to live data, for when they went “off piste”, we’d have raised the roof and wouldn’t have eased off till the matter was fully resolved. And if the company had been facing a court action that was centred on how confident we could be in our systems and data we’d have argued strongly that we should cut our losses and settle once we were aware of the “off piste” problem.

Were the Post Office’s internal auditors aware of this? Yes, but they clearly did nothing. If I hadn’t discovered that powerful user privileges were out of control on the first day of a two day, high level, IT installation audit I’d have been embarrassed. It’s that basic. However, the Post Office’s internal auditors don’t have the excuse of incompetence. The problem was flagged up by the external auditors Ernst & Young in 2011. If internal audit was unaware of a problem raised by the external auditors they were stealing their salaries.

The only times when work has ever affected my sleep have been when I knew that the police were going to launch dawn raids on suspects’ houses. I would lie in bed thinking about the quality of the evidence I’d gathered. Had I got it all? Had I missed anything? Could I rely on the data and the systems? I worried because I knew that people were going to have the police hammering on their their front doors at 6 o’clock in the morning.

I am appalled that Post Office investigators and prosecutors could approach fraud investigations with the attitude “what can we do to get a conviction?”. They pursued the sub-postmasters aggressively, knowing the weaknesses in Horizon and the Post Office; that was disgraceful.

In the final post in this series I’ll look further at the role of internal audit, how it should be independent and its role in keeping an eye on risk. In all those respects the Post Office’s internal auditors have fallen short.

The Post Office Horizon IT scandal, part 1 – errors and accuracy

For the last few years I’ve been following the controversy surrounding the Post Office’s accounting system, Horizon. This controls the accounts of some 11,500 Post Office branches around the UK. There was a series of alleged frauds by sub-postmasters, all of whom protested their innocence. Nevertheless, the Post Office prosecuted these cases aggressively, pushing the supposed perpetrators into financial ruin, and even suicide. The sub-postmasters affected banded together to take a civil action against the Post Office, claiming that no frauds had taken place but that the discrepancies arose from system errors.

I wasn’t surprised to see that the sub-postmasters won their case in December 2019, with the judge providing some scathing criticism of the Post Office, and Fujitsu, the IT supplier, who had to pay £57.75 million to settle the case. Further, in March 2020 the Criminal Cases Review Commission decided to refer for appeal the convictions of 39 subpostmasters, based on the argument that their prosecution involved an “abuse of process”. I will return to the prosecution tactics in my next post.

Having worked as an IT auditor, including fraud investigations, and as a software tester the case intrigued me. It had many features that would have caused me great concern if I had been working at the Post Office and I’d like to discuss a few of them. The case covered a vast amount of detail. If you want to see the full 313 page judgment you can find it here [PDF, opens in new tab].

What caught my eye when I first heard about this case were the arguments about whether the problems were caused by fraud, system error, or user error. As an auditor who worked on the technical side of many fraud cases the idea that there could be any confusion between fraud and system error makes me very uncomfortable. The system design should incorporate whatever controls are necessary to ensure such confusion can’t arise.

When we audited live systems we established what must happen and what must not happen, what the system must do and what it must never do. We would ask how managers could know that the system would do the right things, and never do the wrong things. We then tested the system looking for evidence that these controls were present and effective. We would try to break the system, evading the controls we knew should be there, and trying to exploit missing or ineffective controls. If we succeeded we’d expect, at the least, the system to hold unambiguous evidence about what we had done.

As for user error, it’s inevitable that users will make mistakes and systems should be designed to allow for that. “User error” is an inadequate explanation for things going wrong. If the system can’t do that then it is a system failure. Mr Justice Fraser, the judge, took the same line. He expected the system “to prevent, detect, identify, report or reduce the risk” of user error. He concluded that controls had been put in place, but they had failed and that Fujitsu had “inexplicably” chosen to treat one particularly bad example of system error as being the fault of a user.

The explanation for the apparently inexplicable might lie in the legal arguments surrounding the claim by the Post Office and Fujitsu that Horizon was “robust”. The rival parties could not agree even on the definition of “robust” in this context, never mind whether the system was actually robust.

Nobody believed that “robust” meant error free. That would be absurd. No system is perfect and it was revealed that Horizon had a large, and persistent number of bugs, some serious. The sub-postmasters’ counsel and IT expert argued that “robust” must mean that it was extremely unlikely the system could produce the sort of errors that had ruined so many lives. The Post Office confused matters by adopting different definitions at different times, which was made clear when they were asked to clarify the point and they provided an IT industry definition of robustness that sat uneasily with their earlier arguments.

The Post Office approach was essentially top down. Horizon was robust because it could handle any risks that threatened its ability to perform its overall business role. They then took a huge logical leap to claim that because Horizon was robust by their definition it couldn’t be responsible for serious errors at the level of individual branch accounts.

Revealingly, the Post Office and Fujitsu named bugs using the branch where they had first occurred. Two of the most significant were the Dalmellington Bug, discovered at a branch in Ayrshire, and the Callendar Square Bug, also from a Scottish branch, in Falkirk. This naming habit linked bugs to users, not the system.

The Dalmellington Bug entailed a user repeatedly hitting a key when the system froze as she was trying to acknowledge receipt of a consignment of £8,000 in cash. Unknown to her each time she struck the key she accepted responsibility for a further £8,000. The bug created a discrepancy of £24,000 for which she was held responsible.

Similarly, the Callendar Square Bug generated spurious, duplicate financial transactions for which the user was considered to be responsible, even though this was clearly a database problem.

The Horizon system processed millions of transactions a day and did so with near 100% accuracy. The Post Office’s IT expert therefore tried to persuade the judge that the odds were 2 in a million that any particular error could be attributable to the system.

Unsurprisingly the judge rejected this argument. If only 0.0002% of transactions were to go wrong then a typical day’s processing of eight million transactions would lead to 16 errors. It would be innumerate to look at one of those outcomes and argue that there was a 2 in a million chance of it being a system error. That probability would make sense only if one of the eight million were chosen at random. The supposed probability is irrelevant if you have chosen a case for investigation because you know it has a problem.

It seemed strange that the Post Office persisted with its flawed perspective. I knew all too well from my own experience of IT audit and testing that different systems, in different contexts, demanded different approaches to accuracy. For financial analysis and modelling it was counter-productive to chase 100% accuracy. It would be too difficult and time consuming. The pursuit might introduce such complexity and fragility to the system that it would fail to produce anything worthwhile, certainly in the timescales required. 98% accuracy might be good enough to give valuable answers to management, quickly enough for them to exploit them. Even 95% could be good enough in some cases.

In other contexts, when dealing with financial transactions and customers’ insurance policies you really do need a far higher level of accuracy. If you don’t reach 100% you need some way of spotting and handling the exceptions. These are not theoretical edge cases. They are people’s insurance policies or claims payments. Arguing that losing a tiny fraction of 1% is acceptable, would have been appallingly irresponsible, and I can’t put enough stress on the point that as IT auditors we would have come down hard, very hard, on anyone who tried to take that line. There are some things the system should always do, and some it should never do. Systems should never lose people’s data. They should never inadvertently produce apparently fraudulent transactions that could destroy small businesses and leave the owners destitute. The amounts at stake in each individual Horizon case were trivial as far as the Post Office was concerned, immaterial in accountancy jargon. But for individual sub-postmasters they were big enough to change, and to ruin, lives.

The willingness of the Post Office and Fujitsu to absolve the system of blame and accuse users instead was such a constant theme that it produced a three letter acronym I’d never seen before; UEB, or user error bias. Naturally this arose on the claimants’ side. The Post Office never accepted its validity, but it permeated their whole approach; Horizon was robust, therefore any discrepancies must be the fault of users, whether dishonestly or accidentally, and they could proceed safely on that basis. I knew from my experience that this was a dreadful mindset with which to approach fraud investigations. I will turn to this in my next post in this series.

“You’re a tester! You can’t tell developers what to do!”

Michael Bolton posted this tweet to which I responded. When Michael asked for more information I started to respond in a tweet, which turned into a thread, which grew to the point where it became ridiculous, so I turned it into a blog post.

It was a rushed migration of insurance systems following a merger. I was test manager. The requirements were incomplete. That’s being polite. In a migration there are endless edge cases where conversion algorithms struggle to handle old, or odd data. The requirements failed to state what should happen if policies couldn’t be assigned to an appropriate pigeon hole in the new system. They didn’t say we should not lose policyholders’ data, and that became highly controversial.

The lack of detail in the requirements inevitably meant, given the complexities of a big migration, which are seldom appreciated by outsiders, that data was lost on each pass. Only a small percentage was dropped, but in the live migration these would be real policies protecting real people. My team’s testing picked this problem up. It was basic, pretty much the first thing we’d look for. The situation didn’t improve significantly in successive test cycles and it was clear that there were would be dropped records in the live runs.

I approached the development team lead and suggested that migration programs should report on the unassigned records and output them to a holding file awaiting manual intervention. Letting them fall through a gap and disappear was amateurish. The testers could flag up the problem, but the fix should come from the users and developers. It wouldn’t be the testers’ job to detect and investigate these problems when it came to the live migration. My test team would not be involved. Our remit extended only to the test runs and the mess would have to be investigated by end users, who lacked our technical skills and experience. The contract wasn’t going to pay for us to do any work when live running started and we would be immediately assigned to other work.

The lead developer argued that her team couldn’t write code that hadn’t been specifically requested by users. The users’ priority was a rapid migration. Quality and accuracy were unimportant. The users were telling us, informally, that if a small proportion of policies was lost then it was no big deal.

The client didn’t offer a coherent argument to justify their stance and they refused to put in writing anything that would look incriminating in a post mortem, as if they thought that would protect them from a savaging from the auditors. They were too busy trying to force the migration through to get into a serious debate about quality, ethics, or how to handle the profusion of edge cases. The nearest they came to a defence of their position was a vague assertion that policyholders would eventually notice and complain that they’d heard nothing from the insurer about renewing their policies. There were some serious problems with that stance.

Failing to convert responsibly would have made us, as the supplier, complicit in the client’s negligence. There might not have been an explicit requirement not to lose data, but it was basic professionalism on the part of the developers to ensure they didn’t do so. It was absurd to assume that all the policyholders affected would notice before their policies expired. People would find they were uninsured only when they suffered an accident. The press would have loved the story.

We were dealing with motor insurance, which drivers legally must have. It would eventually have come to the attention of the police that the client was accidentally cancelling policies. I knew, from my audit experience, that the police took a very dim view of irresponsible system practices by motor insurers.

The development team leader was honest, but deep reflection about her role was not her style. She had a simplistic understanding of her job and the nature of the relationship with the client. She genuinely believed the developers couldn’t cut a line of code that had not been requested, even though that argument didn’t stand up to any serious scrutiny. Delivering a program for any serious business application involves all sorts of small, but vital coding decisions that are of no interest to those who write the requirements but which affect whether or not the program meets the requirements. The team leader, however, dug her heels in and thought I was exceeding my authority by pushing the point.

Frustratingly, her management backed her up for reasons I could understand, though disagreed with. A quick, botched migration suited everyone in the short term. It was my last test project for a few years. I received an approach asking if I was interested in an information security management role. Both the role and the timing were attractive. A break from testing felt like a good idea.

Edit: Reading this again I feel I should offer some defence for the attitude of the development team lead. The behaviour, and the culture, of the client meant that she was under massive pressure. There was not enough time to develop the migration routines, or to fix them in between test runs. Users were bypassing both her and me, the test manager, when they reported defects. They would go straight to the relevant developers and we were both struggling to keep track and enforce any discipline. The users also went behind her back to lean on developers to make last minute, untested changes to code that was already in the release pipeline into production. She made sure her team knew the correct response was, “**** off – it’s more than my job’s worth.”

Throughout the migration the users were acting appallingly, trying to pin the blame on her team for problems they had created. She was in a very difficult position, and that largely explains an attitude that would have seemed absurd and irrational at a calmer time. Her resistance to my requests was infuriating, but I had a lot of sympathy for her, and none whatsoever for the users who were making her life a misery.

There’s an important wider point here. When we’re dealing with “difficult” people it’s helpful to remember that they might be normal people dealing with very difficult problems. Don’t let it get personal. Think about the underlying causes, not the obvious victim.