Teachers, children, testers and leaders (2013)

Testing Planet 2020This article appeared in the March 2013 edition of Testing Planet, which is published by the wonderful Ministry of Testing, one of the most exciting developments in software testing over the last 20 years.

I’m moving this article onto my blog from my website, which will shortly be decommissioned.teachers, children, testers and leaders The article was written in January 2013. Looking at it again I see that I was starting to develop arguments I fleshed out over the next couple of years as part of the Stop 29119 campaign against the testing standard, ISO 29119.

The article

“A tester is someone who knows things can be different” – Gerald Weinberg.

Leaders aren’t necessarily people who do things, or order other people about. To me the important thing about leaders is that they enable other people to do better, whether by inspiration, by example or just by telling them how things can be different – and better. The difference between a leader and a manager is like the difference between a great teacher and, well, the driver of the school bus. Both take children places, but a teacher can take children on a journey that will transform their whole life.

My first year or so in working life after I left university was spent in a fog of confusion. I struggled to make sense of the way companies worked; I must be more stupid than I’d always thought. All these people were charging around, briskly getting stuff done, making money and keeping the world turning; they understood what they were doing and what was going on. They must be smarter than me.

Gradually it dawned on me that very many of them hadn’t a clue. They were no wiser than me. They didn’t really know what was going on either. They thought they did. They had their heads down, working hard, convinced they were contributing to company profits, or at least keeping the losses down.

The trouble was their efforts often didn’t have much to do with the objectives of the organisation, or the true goals of the users and the project in the case of IT. Being busy was confused with being useful. Few people were capable of sitting back, looking at what was going on and seeing what was valuable as opposed to mere work creation.

I saw endless cases of poor work, sloppy service and misplaced focus. I became convinced that we were all working hard doing unnecessary, and even harmful, things for users who quite rightly were distinctly ungrateful. It wasn’t a case of the end justifying the means; it was almost the reverse. The means were only loosely connected to the ends, and we were focussing obsessively on the means without realising that our efforts were doing little to help us achieve our ends.

Formal processes didn’t provide a clear route to our goal. Following the process had become the goal itself. I’m not arguing against processes; just the attitude we often bring to them, confusing the process with the destination, the map with the territory. The quote from Gerald Weinberg absolutely nails the right attitude for testers to bring to their work. There are twin meanings. Testers should know there is a difference between what people expect, or assume, and what really is. They should also know that there is a difference between what is, and what could be.

Testers usually focus on the first sort of difference; seeing the product for what it really is and comparing that to what the users and developers expected. However, the second sort of difference should follow on naturally. What could the product be? What could we be doing better?

Testers have to tell a story, to communicate not just the reality to the stakeholders, but also a glimpse of what could be. Organisations need people who can bring clear headed thinking to confusion, standing up and pointing out that something is wrong, that people are charging around doing the wrong things, that things could be better. Good testers are well suited by instinct to seeing what positive changes are possible. Communicating these possibilities, dispelling the fog, shining a light on things that others would prefer to remain in darkness; these are all things that testers can and should do. And that too is a form of leadership, every bit as much as standing up in front of the troops and giving a rousing speech.

In Hans Christian’s Andersen’s story, the Emperor’s New Clothes, who showed a glimpse of leadership? Not the emperor, not his courtiers; it was the young boy who called out the truth, that the Emperor was wearing no clothes at all. If testers are not prepared to tell it like it is, to explain why things are different from what others are pretending, to explain how they could be better then we diminish and demean our profession. Leaders do not have to be all-powerful figures. They can be anyone who makes a difference; teachers, children. Or even testers.

Quality isn’t something, it provides something (2012)

Quality isn’t something, it provides something (2012)

Testing Planet 2020This article appeared in the July 2012 edition of Testing Planet, which is published by the wonderful Ministry of Testing, one of the most exciting developments in software testing over the last 20 years.

The article was written in June 2012, but I don’t think it has dated. It’s about the way we think and work with other people.ministry of testing logo These are timeless problems. The idea behind E-prime is particularly interesting. Dispensing with the verb “to be” isn’t something to get obsessive or ideological about, but testers should be aware of the important distinction between the way something is and the way it behaves. The original article had only four references so I have checked them, converted them to hyperlinks, and changing the link to Lera Boroditsky’s paper to a link to her TED talk on the same subject.

The article

Quality isn't something, it provides somethingA few weeks ago two colleagues, who were having difficulty working together, asked me to act as peacekeeper in a tricky looking meeting in which they were going to try and sort out their working relationship. I’ll call them Tony and Paul. For various reasons they were sparking off each and creating antagonism that was damaging the whole team.

An hour’s discussion seemed to go reasonably well; Tony talking loudly and passionately, while Paul spoke calmly and softly. Just as I thought we’d reached an accommodation that would allow us all to work together Tony blurted out, “you are cold and calculating, Paul, that’s the problem”.

Paul reacted as if he’d been slapped in the face, made his excuses and left the meeting. I then spent another 20 minutes talking Tony through what had happened, before separately speaking to Paul about how we should respond.

I told Tony that if he’d wanted to make the point I’d inferred from his comments, and from the whole meeting, then he should have said “your behaviour and attitude towards me throughout this meeting, and when we work together, strike me as cold and calculating, and that makes me very uncomfortable”.

“But I meant that!”, Tony replied. Sadly, he hadn’t said that. Paul had heard the actual words and reacted to them, rather than applying the more dispassionate analysis I had used as an observer. Paul meanwhile found Tony’s exuberant volatility disconcerting, and responded to him in a very studied and measured style that unsettled Tony.

Tony committed two sins. Firstly, he didn’t acknowledge the two way nature of the problem. It should have been about how he reacted to Paul, rather than trying to dump all the responsibility onto Paul.

Secondly, he said that Paul is cold and calculating, rather than acting in a way Tony found cold, and calculating at a certain time, in certain circumstances.

I think we’d all see a huge difference between being “something”, and behaving in a “something” way at a certain time, in a certain situation. The verb “to be” gives us this problem. It can mean, and suggest, many different things and can create fog where we need clarity.

Some languages, such as Spanish, maintain a useful distinction between different forms of “to be” depending on whether one is talking about something’s identity or just a temporary attribute or state.

The way we think obviously shapes the language we speak, but increasingly scientists are becoming aware of how the language we use shapes the way that we think. [See this 2017 TED talk, “How Language Shapes Thought”, by Lera Boroditsky]

The problem we have with “to be” has great relevance to testers. I don’t just mean treating people properly, however much importance we rightly attach to working successfully with others. More than that, if we shy away from “to be” then it helps us think more carefully and constructively as testers.

This topic has stretched bigger brains than mine, in the fields of philosophy, psychology and linguistics. Just google “general semantics” if you want to give your brain a brisk workout. You might find it tough stuff, but I don’t think you have to master the underlying concept to benefit from its lessons.

Don’t think of it as intellectual navel gazing. All this deep thought has produced some fascinating results, in particular something called E-prime, a form of English that totally dispenses with “to be” in all its forms; no “I am”, “it is”, or “you are”. Users of E-prime don’t simply replace the verb with an alternative. That doesn’t work. It forces you to think and articulate more clearly what you want to say. [See this classic paper by Kellogg, “Speaking in E-prime” PDF, opens in new tab].

“The banana is yellow” becomes “the banana looks yellow”, which starts to change the meaning. “Banana” and “yellow” are not synonyms. The banana’s yellowness becomes apparent only because I am looking at it, and once we introduce the observer we can acknowledge that the banana appears yellow to us now. Tomorrow the banana might appear brown to me as it ripens. Last week it would have looked green.

You probably wouldn’t disagree with any of that, but you might regard it as a bit abstract and pointless. However, shunning “to be” helps us to think more clearly about the products we test, and the information that we report. E-prime therefore has great practical benefits.

The classic definition of software quality came from Gerald Weinburg in his book “Quality Software Management: Systems Thinking”.

“Quality is value to some person”.

Weinburg’s definition reflects some of the clarity of thought that E-prime requires, though he has watered it down somewhat to produce a snappy aphorism. The definition needs to go further, and “is” has to go!

Weinburg makes the crucial point that we must not regard quality as some intrinsic, absolute attribute. It arises from the value it provides to some person. Once you start thinking along those lines you naturally move on to realising that quality provides value to some person, at some moment in time, in a certain context.

Thinking and communicating in E-prime stops us making sweeping, absolute statements. We can’t say “this feature is confusing”. We have to use a more valuable construction such as “this feature confused me”. But we’re just starting. Once we drop the final, total condemnation of saying the feature is confusing, and admit our own involvement, it becomes more natural to think about and explain the reasons. “This feature confused me … when I did … because of …”.

Making the observer, the time and the context explicit help us by limiting or exposing hidden assumptions. We might or might not find these assumptions valid, but we need to test them, and we need to know about them so we understand what we are really learning as we test the product.

E-prime fits neatly with the scientific method and with the provisional and experimental nature of good testing. Results aren’t true or false. The evidence we gather matches our hypothesis, and therefore gives us greater confidence in our knowledge of the product, or it fails to match up and makes us reconsider what we thought we knew. [See this classic paper by Kellogg & Bourland, “Working with E-prime – some practical notes” PDF, opens in new tab].

Scientific method cannot be accommodated in traditional script-driven testing, which reflects a linear, binary, illusory worldview, pretending to be absolute. It tries to deal in right and wrong, pass and fail, true and false. Such an approach fits in neatly with traditional development techniques which fetishise the rigours of project management, rather than the rigours of the scientific method.

map and road This takes us back to general semantics, which coined the well known maxim that the map is not the territory. Reality and our attempts to model and describe it differ fundamentally from each other. We must not confuse them. Traditional techniques fail largely because they confuse the map with the territory. [See this “Less Wrong” blog post].

In attempting to navigate their way through a complex landscape, exponents of traditional techniques seek the comfort of a map that turns messy, confusing reality into something they can understand and that offers the illusion of being manageable. However, they are managing the process, not the underlying real work. The plan is not the work. The requirements specification is not the requirements. The map is not the territory.

Adopting E-prime in our thinking and communication will probably just make us look like the pedantic awkward squad on a traditional project. But on agile or lean developments E-prime comes into its own. Testers must contribute constructively, constantly, and above all, early. E-prime helps us in all of this. It makes us clarify our thoughts and helps us understand that we gain knowledge provisionally, incrementally and never with absolute certainty.

I was not consciously deploying E-prime during and after the fractious meeting I described earlier. But I had absorbed the precepts sufficiently to instinctively realise that I had two problems; Tony’s response to Paul’s behaviour, and Paul’s response to Tony’s outburst. I really didn’t see it as a matter of “uh oh – Tony is stupid”.

E-prime purists will look askance at my failure to eliminate all forms of “to be” in this article. I checked my writing to ensure that I’ve written what I meant to, and said only what I can justify. Question your use of the verb, and weed out those hidden assumptions and sweeping, absolute statements that close down thought, rather than opening it up. Don’t think you have to be obsessive about it. As far as I am concerned, that would be silly!

Traditional techniques and motivating staff (2010)

traditional techniques & motivating staffTesting Planet 2020This article appeared in the February 2010 edition of Software Testing Club Magazine, now the Testing Planet. The STC has evolved into the wonderful Ministry of Testing, one of the most exciting developments in software testing over the last 20 years.

That might seem a low bar; testing isn’t meant to be a thrill a minute. But the Ministry of Testing has been a gale of fresh air sweeping through the industry, mixing great content and conferences with an approach to testing that has managed to be both entertaining and deeply serious. It has been a consistent voice of sanity and decency in an industry that has had too much cynicism and short sightedness.ministry of testing logo

I’m moving this article onto my blog fromy my website, which will shortly be decommissioned. Looking back I was interested to see that I didn’t post this article on the website immediately. I had some reservations about the article. I wondered if I had taken a rather extreme stance. I do believe that rigid standards and processes can be damaging, and I certainly believe that enforcing strict compliance, at the expense of initiative and professional judgement, undermines morale.

However, I thought I had perhaps gone too far and might have been thought to be dismissing the idea of any formality, and that I might be seen to be advocating software development as an entirely improvised activity with everyone winging it. That’s not the case. We need to have some structure, some shape and formality to our work. It’s just that prescriptive standards and processes aren’t sensitive to context and become a straitjacket. This was written in January 2010 and it was a theme I spent a good deal of time on when the ISO 29119 standard was released a few years later and the Stop 29119 campaign swung into action.

So I still largely stand by this article, though I think it is lacking in nuance in some respects. In particular the bald statement “development isn’t engineering”, while true does require greater nuance, unpacking and explanation. Development isn’t engineering in the sense that engineering is usually understood, and it’s certainly not akin to civil engineering. But it should aspire to be more “engineering like”, while remaining realistic about the reality of software development. I was particularly interested to see that I described reality as being chaotic in 2010, a couple of years before I started to learn about Cynefin.

The article

Do we follow the standards or use our initiative?

Recently I’ve been thinking and writing about the effects of testing standards. The more I thought, the more convinced I became that standards, or any rigid processes, can damage the morale, and even the professionalism, of IT professionals if they are not applied wisely.

The problem is that calling them “standards” implies that they are mandatory and should be applied in all cases. The word should be reserved for situations where compliance is essential, eg security, good housekeeping or safety critical applications.

I once worked for a large insurance company as an IT auditor in Group Audit. I was approached by Information Services. Would I consider moving to lead a team developing new management information (MI) applications? It sounded interesting, so I said yes.

On my first day in the new role I asked my new manager what I had to do. He stunned me when he said. “You tell me. I’ll give you the contact details for your users. Go and see them. They’re next in line to get an MI application. See what they need, then work out how you’re going to deliver it. Speak to other people to see how they’ve done it, but it’s up to you”.

The company did have standards and processes, but they weren’t rigid and they weren’t very useful in the esoteric world of insurance MI, so we were able to pick and choose how we developed applications.

My users were desperate for a better understanding of their portfolio; what was profitable, and what was unprofitable. I had no trouble getting a manager and a senior statistician to set aside two days to brief me and my assistant. There was just us, a flip chart, and gallons of coffee as they talked us through the market they were competing in, the problems they faced and their need for better information from the underwriting and claims applications with which they did business.

I realised that it was going to be a pig of a job to give them what they needed. It would take several months. However, I could give them about a quarter of what they needed in short order. So we knocked up a quick disposable application in a couple of weeks that delighted them, and then got to work on the really tricky stuff.

The source systems proved to be riddled with errors and poor quality data, so it took longer than expected. However, we’d got the users on our side by giving them something quickly, so they were patient.

It took so long to get phase 1 of the application working to acceptable tolerances that I decided to scrap phase 2, which was nearly fully coded, and rejig the design of the first part so that it could do the full job on its own. That option had been ruled out at the start because there seemed to be insurmountable performance problems.

Our experience with testing had shown that we could make the application run much faster than we’d thought possible, but that the fine tuning of the code to produce accurate MI was a nightmare. It therefore made sense to clone jobs and programs wholesale to extend the first phase and forget about trying to hack the phase 2 code into shape.

The important point is that I was allowed to take a decision that meant binning several hundred hours of coding effort and utterly transforming a design that had been signed off.

I took the decision during a trip to the dentist, discussed it with my assistant on my return, sold the idea to the users and only then did I present my management with a fait accompli. They had no problems with it. They trusted my judgement, and I was taking the users along with me.

The world changed and an outsourcing deal meant I was working for a big supplier, with development being driven by formal processes, rigid standards and contracts. This wasn’t all bad. It did give developers some protection from the sort of unreasonable pressure that could be brought to bear when relationships were less formal. However, it did mean that I never again had the same freedom to use my own initiative and judgement.

The bottom line was that it could be better to do the wrong thing for the corporately correct reason, than to do the right thing the “wrong” way. By “better” I mean better for our careers, and not better for the customers.

Ultimately that is soul destroying. What really gets teams fired up is when developers, testers and users all see themselves as being on the same side, determined to produce a great product.

Development isn’t engineering

Reality is chaotic. Processes are perfectly repeatable only if one pretends that reality is neat, orderly and predictable. The result is strain, tension and developers ordered to do the wrong things for the “right” reasons, to follow the processes mandated by standards and by the contract.

Instead of making developers more “professional” it has exactly the opposite effect. It reduces them to the level of, well, second hand car salesmen, knocking out old cars with no warranty. It’s hardly a crime, but it doesn’t get me excited.

Development and testing become drudgery. Handling the users isn’t a matter of building lasting relationships with fellow professionals. It’s a matter of “managing the stakeholders”, being diplomatic with them rather than candid, and if all else fails telling them “to read the ******* contract”.

This isn’t a rant about contractual development. Contracts don’t have to be written so that the development team is in a strait-jacket. It’s just that traditional techniques fit much more neatly with contracts than agile, or any iterative approach.

Procurement is much simpler if you pretend that traditional, linear techniques are best practice; if you pretend that software development is like civil engineering, and that developing an application is like building a bridge.

Development and testing are really not like that all. The actual words used should be a good clue. Development is not the same as construction. Construction is what you do when you’ve developed an idea to the point where it can be manufactured, or built.

Traditional techniques were based on that fundamental flaw; the belief that development was engineering, and that repeatable success required greater formality, more tightly defined processes and standards, and less freedom for developers.

Development is exploring

Good development is a matter of investigation, experimentation and exploration. It’s about looking at the possibilities, and evaluating successive versions. It’s not about plodding through a process document.

Different customers, different users and different problems will require different approaches. These various approaches are not radically different from each other, but they are more varied than is allowed for by rigid and formal processes.

Any organisation that requires development teams to adhere to these processes, rather than make their own judgements based on their experience and their users’ needs, frequently requires the developers to do the wrong things.

This is demoralising, and developers working under these constraints have the initiative, enthusiasm and intellectual energy squeezed out of them. As they build their careers in such an atmosphere they become corporate bureaucrats.

They rise to become not development managers, but managers of the development process; not test managers, but managers of the test process. Their productivity is measured in meetings and reports. Sometimes the end product seems a by-product of the real business; doing the process.

If people are to realise their potential they need managers who will do as mine did; who will say, “here is your problem, tell me how you’re going to solve it”.

We need guidance from processes and standards in the same way as we need guidance from more experienced practitioners, but they should be suggestions of possible approaches so that teams don’t flounder around in confused ignorance, don’t try to re-invent the wheel, and don’t blunder into swamps that have consumed previous projects.

If development is exploration it is thrilling and brings out the best in us. People rise to the challenge, learn and grow. They want to do it again and get better at it. If development means plodding through a process document it is a grind.

I know which way has inspired me, which way has given users applications I’ve been proud of. It wasn’t the formal way. It wasn’t the traditional way. Set the developers free!

Testers and coders are both developers (2009)

This article appeared in September 2009 as an opinion piece on the front page of TEST magazine‘s website. I’m moving the article onto my blog from my website, which will be decommissioned soon. It might be an old article, but it remains valid.

The article

testers and coders are both developersWhen I was a boy I played football non-stop; in organised matches, in playgrounds or in the park, even kicking coal around the street!

There was a strict hierarchy. The good players, the cool kids; they were forwards. The one who couldn’t play were defenders. If you were really hopeless you were a goalkeeper. Defending was boring. Football was about fun, attacking and scoring goals.

When I moved into IT I found a similar hierarchy. I had passed the programming aptitude test. I was one of the elite. I had a big head, to put it mildly! The operators were the defenders, the ones who couldn’t do the fun stuff. We were vaguely aware they thought the coding kids were irresponsible cowboys, but who cared?

As for testers, well, they were the goalkeepers. Frankly, they were lucky to be allowed to play at all. They did what they were told. Independence? You’re joking, but if they were good they were allowed to climb the ladder and become programmers.

Gradually things changed. Testers became more clearly identified with the users. They weren’t just menial team members. A clear career path opened up as testing professionals.

However, that didn’t earn them respect from programmers. Now testing is changing again. Agile gives testers the chance to learn and apply interesting coding skills. Testers can be just as technical as coders. They might code in different ways, for different reasons, but they can be coders too.

That’s great isn’t it? Well, up to a point. It’s fantastic that testers have these exciting opportunities. But I worry that programmers might start respecting the more technical testers for the wrong reason, and that testers who don’t code will still be looked down on. Testers shouldn’t try to impress programmers with their coding skills. That’s just a means to an end.

We’ll still need testers who don’t code and it’s vital that if testers are to achieve the respect they deserve then they must be valued for all the skills they bring to the team, not just the skills they share with programmers. For a start, we should stop referring to developers and testers. Testers always were part of the development process. In Agile teams they quite definitely are developers. It’s time everyone acknowledged that. Development is a team game.

Football teams who played the way we used to as kids got thrashed if they didn’t grow up and play as a team. Development teams who don’t ditch similar attitudes will be equally ineffective.

Bridging the gap – Agile and the troubled relationship between UX and software engineering (2009)

Bridging the gap – Agile and the troubled relationship between UX and software engineering (2009)

This article appeared as the cover story for the September 2009 edition of TEST magazine.

I’m moving the article onto my blog from my website, which will be decommissioned soon. If you choose to read the article please bear in mind that it was written in August 2009 and is therefore inevitably dated in some respects, though I think that much of it is still valid, especially where I am discussing the historical problems.

The article

TEST magazine coverFor most of its history software engineering has had great difficulty building applications that users found enjoyable. Far too many applications were frustrating and wasted users’ time.

That has slowly changed with the arrival of web developments, and I expect the spread of Agile development to improve matters further.

I’m going to explain why I think developers have had difficulty building usable applications and why user interaction designers have struggled to get their ideas across. For simplicity I’ll just refer to these designers and psychologists as “UX”. There are a host of labels and acronyms I could have used, and it really wouldn’t have helped in a short article.

Why software engineering didn’t get UX

Software engineering’s problems with usability go back to its roots, when geeks produced applications for fellow geeks. Gradually applications spread from the labs to commerce and government, but crucially users were wage slaves who had no say in whether they would use the application. If they hated it they could always find another job!

Gradually applications spread into the general population until the arrival of the internet meant that anyone might be a user. Now it really mattered if users hated your application. They would be gone for good, into the arms of the competition.

Software engineering had great difficulty coming to terms with this. The methods it had traditionally used were poison to usability. The Waterfall lifecycle was particularly damaging.

The Waterfall had two massive flaws. At its root was the implicit assumption that you can tackle the requirements and design up front, before the build starts. This led to the second problem; iteration was effectively discouraged.

Users cannot know what they want till they’ve seen what is possible and what can work. In particular, UX needs iteration to let analysts and users build on their understanding of what is required.

The Waterfall meant that users could not see and feel what an application was like until acceptance testing at the end when it was too late to correct defects that could be dismissed as cosmetic.

The Waterfall was a project management approach to development, a means of keeping control, not building good products. This made perfect sense to organisations who wanted tight contracts and low costs.

The desire to keep control and make development a more predictable process explained the damaging attempt to turn software engineering into a profession akin to civil engineering.

So developers were sentenced to 20 years hard labour with structured methodologies, painstakingly creating an avalanche of documentation; moving from the current physical system, through the current logical system to a future logical system and finally a future physical system.

However, the guilty secret of software engineering was that translating requirements into a design wasn’t just a difficult task that required a methodical approach; it’s a fundamental problem for developers. It’s not a problem specific to usability requirements, and it was never resolved in traditional techniques.

The mass of documentation obscured the fact that crucial design decisions weren’t flowing predictably and objectively from the requirements, but were made intuitively by the developers – people who by temperament and training were polar opposites of typical users.

Why UX didn’t get software development

Not surprisingly, given the massive documentation overhead of traditional techniques, and developers’ propensity to pragmatically tailor and trim formal methods, the full process was seldom followed. What actually happened was more informal and opaque to outsiders.

The UX profession understandably had great difficulty working out what was happening. Sadly they didn’t even realise that they didn’t get it. They were hampered by their naivety, their misplaced sense of the importance of UX and their counter-productive instinct to retain their independence from developers.

If developers had traditionally viewed functionality as a matter of what the organisation required, UX went to the other extreme and saw functionality as being about the individual user. Developers ignored the human aspect, but UX ignored commercial realities – always a fast track to irrelevance.

UX took software engineering at face value, tailoring techniques to fit what they thought should be happening rather than the reality. They blithely accepted the damaging concept that usability was all about the interface; that the interface was separate from the application.

This separability concept was flawed on three grounds. Conceptually it was wrong. It ignored the fact that the user experience depends on how the whole application works, not just the interface.

Architecturally it was wrong. Detailed technical design decisions can have a huge impact on the users. Finally separability was wrong organisationally. It left the UX profession stranded on the margins, in a ghetto, available for consultation at the end of the project, and then ignored when their findings were inconvenient.

An astonishing amount of research and effort went into justifying this fallacy, but the real reason UX bought the idea was that it seemed to liberate them from software engineering. Developers could work on the boring guts of the application while UX designed a nice interface that could be bolted on at the end, ready for testing. However, this illusory freedom actually meant isolation and impotence.

The fallacy of separability encouraged reliance on usability testing at the end of the project, on summative testing to reveal defects that wouldn’t be fixed, rather than formative testing to stop these defects being built in the first place.

There’s an argument that there’s no such thing as effective usability testing. If it takes place at the end it’s too late to be effective. If it’s effective then it’s done iteratively during design, and it’s just good design rather than testing.

So UX must be hooked into the development process. It must take place early enough to allow alternative designs to be evaluated. Users must therefore be involved early and often. Many people in UX accept this completely, though the separability fallacy is still alive and well. However, its days are surely numbered. I believe, and hope, that the Agile movement will finally kill it.

Agile and UX – the perfect marriage?

The mutual attraction of Agile and UX isn’t simply a case of “my enemy’s enemy is my friend”. Certainly they do have a common enemy in the Waterfall, but each discipline really does need the other.

Agile gives UX the chance to hook into development, at the points where it needs to be involved to be effective. Sure, with the Waterfall it is possible to smuggle effective UX techniques into a project, but they go against the grain. It takes strong, clear-sighted project managers and test managers to make them work. The schedule and political pressures on these managers to stop wasting time iterating and to sign off each stage is huge.

If UX needs Agile, then equally Agile needs UX if it is to deliver high quality applications. The opening principle of the Agile Manifesto states that “our highest priority is to satisfy the customer through early and continuous delivery of valuable software”.

There is nothing in Agile that necessarily guarantees better usability, but if practitioners believe in that principle then they have to take UX seriously and use UX professionals to interpret users’ real needs and desires. This is particularly important with web applications when developers don’t have direct access to the end users.

There was considerable mutual suspicion between the two communities when Agile first appeared. Agile was wary of UX’s detailed analyses of the users, and suspected that this was a Waterfall style big, up-front requirements phase.

UX meanwhile saw Agile as a technical solution to a social, organisational problem and was rightly sceptical of claims that manual testing would be a thing of the past. Automated testing of the human interaction with the application is a contradiction.

Both sides have taken note of these criticisms. Many in UX now see the value in speeding up the user analysis and focussing on the most important user groups, and Agile has recognised the value of up-front analysis to help them understand the users.

The Agile community is also taking a more rounded view of testing, and how UX can fit into the development process. In particular check out Brian Marick’s four Agile testing quadrants.Agile Testing Quadrants

UX straddles quadrants two and three. Q2 contains the up-front analysis to shape the product, and Q3 contains the evaluation of the results. Both require manual testing, and use such classic UX tools as personas (fictional representative users) and wireframes (sketches of webpages).

Other people who’re working on the boundary of Agile and UX are Jeff Patton, Jared Spool and Anders Ramsay. They’ve come up with some great ideas, and it’s well worth checking them out. Microsoft have also developed an interesting technique called the RITE method.

This is an exciting field. Agile changes the role of testers and requires them to learn new skills. The integration of UX into Agile will make testing even more varied and interesting.

There will still be tension between UX and software professionals who’ve been used to working remotely from the users. However, Agile should mean that this is a creative tension, with each group supporting and restraining the others.

This is a great opportunity. Testers will get the chance to help create great applications, rather than great documentation!

Business logic security testing (2009)

Business logic security testing (2009)

Testing ExperienceThis article appeared in the June 2009 edition of Testing Experience magazine and the October 2009 edition of Security Acts magazine.Security Acts magazine

If you choose to read the article please bear in mind that it was written in January 2009 and is therefore inevitably dated in some respects. In particular, ISACA has restructured COBIT, but it remains a useful source. Overall I think the arguments I made in this article are still valid.

The references in the article were all structured for a paper magazine. They were not set up as hyperlinks and I have not tried to recreate them and check out whether they still work.business logic security testing article

The article

When I started in IT in the 80s the company for which I worked had a closed network restricted to about 100 company locations with no external connections.

Security was divided neatly into physical security, concerned with the protection of the physical assets, and logical security, concerned with the protection of data and applications from abuse or loss.

When applications were built the focus of security was on internal application security. The arrangements for physical security were a given, and didn’t affect individual applications.

There were no outsiders to worry about who might gain access, and so long as the common access controls software was working there was no need for analysts or designers to worry about unauthorized internal access.

Security for the developers was therefore a matter of ensuring that the application reflected the rules of the business; rules such as segregation of responsibilities, appropriate authorization levels, dual authorization of high value payments, reconciliation of financial data.

The world quickly changed and relatively simple, private networks isolated from the rest of the world gave way to more open networks with multiple external connections and to web applications.

Security consequently acquired much greater focus. However, it began to seem increasingly detached from the work of developers. Security management and testing became specialisms in their own right, and not just an aspect of technical management and support.

We developers and testers continued to build our applications, comforted by the thought that the technical security experts were ensuring that the network perimeter was secure.photo of business logic security article header

Nominally security testing was a part of non-functional testing. In reality, it had become somewhat detached from conventional testing.

According to the glossary of the British Computer Society’s Special Interest Group in Software Testing (BCS SIGIST) [1], security testing is determining whether the application meets the specified security requirements.

SIGIST also says that security entails the preservation of confidentiality, integrity and availability of information. Availability means ensuring that authorized users have access to information and associated assets when required. Integrity means safeguarding the accuracy and completeness of information and processing methods. Confidentiality means ensuring that information is accessible only to those authorized to have access.

Penetration testing, and testing the security of the network and infrastructure, are all obviously important, but if you look at security in the round, bearing in mind wider definitions of security (such as SIGIST’s), then these activities can’t be the whole of security testing.

Some security testing has to consist of routine functional testing that is purely a matter of how the internals of the application work. Security testing that is considered and managed as an exercise external to the development, an exercise that follows the main testing, is necessarily limited. It cannot detect defects that are within the application rather than on the boundary.

Within the application, insecure design features or insecure coding might be detected without any deep understanding of the application’s business role. However, like any class of requirements, security requirements will vary from one application to another, depending on the job the application has to do.

If there are control failures that reflect poorly applied or misunderstood business logic, or business rules, then will we as functional testers detect that? Testers test at the boundaries. Usually we think in terms of boundary values for the data, the boundary of the application or the network boundary with the outside world.

Do we pay enough attention to the boundary of what is permissible user behavior? Do we worry enough about abuse by authorized users, employees or outsiders who have passed legitimately through the network and attempt to subvert the application, using it in ways never envisaged by the developers?

I suspect that we do not, and this must be a matter for concern. A Gartner report of 2005 [2] claimed that 75% of attacks are at the application level, not the network level. The types of threats listed in the report all arise from technical vulnerabilities, such as command injection and buffer overflows.

Such application layer vulnerabilities are obviously serious, and must be addressed. However, I suspect too much attention has been given to them at the expense of vulnerabilities arising from failure to implement business logic correctly.

This is my main concern in this article. Such failures can offer great scope for abuse and fraud. Security testing has to be about both the technology and the business.

Problem of fraud and insider abuse

It is difficult to come up with reliable figures about fraud because of its very nature. According to PriceWaterhouseCoopers in 2007 [3] the average loss to fraud by companies worldwide over the two years from 2005 was $2.4 million (their survey being biased towards larger companies). This is based on reported fraud, and PWC increased the figure to $3.2 million to allow for unreported frauds.

In addition to the direct costs there were average indirect costs in the form of management time of $550,000 and substantial unquantifiable costs in terms of damage to the brand, staff morale, reduced share prices and problems with regulators.

PWC stated that 76% of their respondents reported the involvement of an outside party, implying that 24% were purely internal. However, when companies were asked for details on one or two frauds, half of the perpetrators were internal and half external.

It would be interesting to know the relative proportions of frauds (by number and value) which exploited internal applications and customer facing web applications but I have not seen any statistics for these.

The U.S. Secret Service and CERT Coordination Center have produced an interesting series of reports on “illicit cyber activity”. In their 2004 report on crimes in the US banking and finance sector [4] they reported that in 70% of the cases the insiders had exploited weaknesses in applications, processes or procedures (such as authorized overrides). 78% of the time the perpetrators were authorized users with active accounts, and in 43% of cases they were using their own account and password.

The enduring problem with fraud statistics is that many frauds are not reported, and many more are not even detected. A successful fraud may run for many years without being detected, and may never be detected. A shrewd fraudster will not steal enough money in one go to draw attention to the loss.

I worked on the investigation of an internal fraud at a UK insurance company that had lasted 8 years, as far back as we were able to analyze the data and produce evidence for the police. The perpetrator had raised 555 fraudulent payments, all for less than £5,000 and had stolen £1.1 million pounds by the time that we received an anonymous tip off.

The control weaknesses related to an abuse of the authorization process, and a failure of the application to deal appropriately with third party claims payments, which were extremely vulnerable to fraud. These weaknesses would have been present in the original manual process, but the users and developers had not taken the opportunities that a new computer application had offered to introduce more sophisticated controls.

No-one had been negligent or even careless in the design of the application and the surrounding procedures. The trouble was that the requirements had focused on the positive functions of the application, and on replicating the functionality of the previous application, which in turn had been based on the original manual process. There had not been sufficient analysis of how the application could be exploited.

Problem of requirements and negative requirements

Earlier I was careful to talk about failure to implement business logic correctly, rather than implementing requirements. Business logic and requirements will not necessarily be the same.

The requirements are usually written as “the application must do” rather than “the application must not…”. Sometimes the “must not” is obvious to the business. It “goes without saying” – that dangerous phrase!

However, the developers often lack the deep understanding of business logic that users have, and they design and code only the “must do”, not even being aware of the implicit corollary, the “must not”.

As a computer auditor I reviewed a sales application which had a control to ensure that debts couldn’t be written off without review by a manager. At the end of each day a report was run to highlight debts that had been cleared without a payment being received. Any discrepancies were highlighted for management action.

I noticed that it was possible to overwrite the default of today’s date when clearing a debt. Inserting a date in the past meant that the money I’d written off wouldn’t appear on any control report. The report for that date had been run already.

When I mentioned this to the users and the teams who built and tested the application the initial reaction was “but you’re not supposed to do that”, and then they all tried blaming each other. There was a prolonged discussion about the nature of requirements.

The developers were adamant that they’d done nothing wrong because they’d built the application exactly as specified, and the users were responsible for the requirements.

The testers said they’d tested according to the requirements, and it wasn’t their fault.

The users were infuriated at the suggestion that they should have to specify every last little thing that should be obvious – obvious to them anyway.

The reason I was looking at the application, and looking for that particular problem, was because we knew that a close commercial rival had suffered a large fraud when a customer we had in common had bribed an employee of our rival to manipulate the sales control application. As it happened there was no evidence that the same had happened to us, but clearly we were vulnerable.

Testers should be aware of missing or unspoken requirements, implicit assumptions that have to be challenged and tested. Such assumptions and requirements are a particular problem with security requirements, which is why the simple SIGIST definition of security testing I gave above isn’t sufficient – security testing cannot be only about testing the formal security requirements.

However, testers, like developers, are working to tight schedules and budgets. We’re always up against the clock. Often there is barely enough time to carry out all the positive testing that is required, never mind thinking through all the negative testing that would be required to prove that missing or unspoken negative requirements have been met.

Fraudsters, on the other hand, have almost unlimited time to get to know the application and see where the weaknesses are. Dishonest users also have the motivation to work out the weaknesses. Even people who are usually honest can be tempted when they realize that there is scope for fraud.

If we don’t have enough time to do adequate negative testing to see what weaknesses could be exploited than at least we should be doing a quick informal evaluation of the financial sensitivity of the application and alerting management, and the internal computer auditors, that there is an element of unquantifiable risk. How comfortable are they with that?

If we can persuade project managers and users that we need enough time to test properly, then what can we do?

CobiT and OWASP

If there is time, there are various techniques that testers can adopt to try and detect potential weaknesses or which we can encourage the developers and users to follow to prevent such weaknesses.

I’d like to concentrate on the CobiT (Control Objectives for Information and related Technology) guidelines for developing and testing secure applications (CobiT 4.1 2007 [5]), and the CobiT IT Assurance Guide [6], and the OWASP (Open Web Application Security Project) Testing Guide [7].

Together, CobiT and OWASP cover the whole range of security testing. They can be used together, CobiT being more concerned with what applications do, and OWASP with how applications work.

They both give useful advice about the internal application controls and functionality that developers and users can follow. They can also be used to provide testers with guidance about test conditions. If the developers and users know that the testers will be consulting these guides then they have an incentive to ensure that the requirements and build reflect this advice.

CobiT implicitly assumes a traditional, big up-front design, Waterfall approach. Nevertheless, it’s still potentially useful for Agile practitioners, and it is possible to map from CobiT to Agile techniques, see Gupta [8].

The two most relevant parts are in the CobiT IT Assurance Guide [6]. This is organized into domains, the most directly relevant being “Acquire and Implement” the solution. This is really for auditors, guiding them through a traditional development, explaining the controls and checks they should be looking for at each stage.

It’s interesting as a source of ideas, and as an alternative way of looking at the development, but unless your organization has mandated the developers to follow CobiT there’s no point trying to graft this onto your project.

Of much greater interest are the six CobiT application controls. Whereas the domains are functionally separate and sequential activities, a life-cycle in effect, the application controls are statements of intent that apply to the business area and the application itself. They can be used at any stage of the development. They are;

AC1 Source Data Preparation and Authorization

AC2 Source Data Collection and Entry

AC3 Accuracy, Completeness and Authenticity Checks

AC4 Processing Integrity and Validity

AC5 Output Review, Reconciliation and Error Handling

AC6 Transaction Authentication and Integrity

Each of these controls has stated objectives, and tests that can be made against the requirements, the proposed design and then on the built application. Clearly these are generic statements potentially applicable to any application, but they can serve as a valuable prompt to testers who are willing to adapt them to their own application. They are also a useful introduction for testers to the wider field of business controls.

CobiT rather skates over the question of how the business requirements are defined, but these application controls can serve as a useful basis for validating the requirements.

Unfortunately the CobiT IT Assurance Guide can be downloaded for free only by members of ISACA (Information Systems Audit and Control Association) and costs $165 for non-members to buy. Try your friendly neighborhood Internal Audit department! If they don’t have a copy, well maybe they should.

If you are looking for a more constructive and proactive approach to the requirements then I recommend the Open Web Application Security Project (OWASP) Testing Guide [7]. This is an excellent, accessible document covering the whole range of application security, both technical vulnerabilities and business logic flaws.

It offers good, practical guidance to testers. It also offers a testing framework that is basic, and all the better for that, being simple and practical.

The OWASP testing framework demands early involvement of the testers, and runs from before the start of the project to reviews and testing of live applications.

Phase 1: Before Deployment begins

1A: Review policies and standards

1B: Develop measurement and metrics criteria (ensure traceability)

Phase 2: During definition and design

2A: Review security requirements

2B: Review design and architecture

2C: Create and review UML models

2D: Create and review threat models

Phase 3: During development

3A: Code walkthroughs

3B: Code reviews

Phase 4: During development

4A: Application penetration testing

4B: Configuration management testing

Phase 5: Maintenance and operations

5A: Conduct operational management reviews

5B: Conduct periodic health checks

5C: Ensure change verification

OWASP suggests four test techniques for security testing; manual inspections and reviews, code reviews, threat modeling and penetration testing. The manual inspections are reviews of design, processes, policies, documentation and even interviewing people; everything except the source code, which is covered by the code reviews.

A feature of OWASP I find particularly interesting is its fairly explicit admission that the security requirements may be missing or inadequate. This is unquestionably a realistic approach, but usually testing models blithely assume that the requirements need tweaking at most.

The response of OWASP is to carry out what looks rather like reverse engineering of the design into the requirements. After the design has been completed testers should perform UML modeling to derive use cases that “describe how the application works.

In some cases, these may already be available”. Obviously in many cases these will not be available, but the clear implication is that even if they are available they are unlikely to offer enough information to carry out threat modeling.OWASP threat modelling UML
The feature most likely to be missing is the misuse case. These are the dark side of use cases! As envisaged by OWASP the misuse cases shadow the use cases, threatening them, then being mitigated by subsequent use cases.

The OWASP framework is not designed to be a checklist, to be followed blindly. The important point about using UML is that it permits the tester to decompose and understand the proposed application to the level of detail required for threat modeling, but also with the perspective that threat modeling requires; i.e. what can go wrong? what must we prevent? what could the bad guys get up to?

UML is simply a means to that end, and was probably chosen largely because that is what most developers are likely to be familiar with, and therefore UML diagrams are more likely to be available than other forms of documentation. There was certainly some debate in the OWASP community about what the best means of decomposition might be.

Personally, I have found IDEF0 a valuable means of decomposing applications while working as a computer auditor. Full details of this technique can be found at http://www.idef.com [9].

It entails decomposing an application using a hierarchical series of diagrams, each of which has between three and six functions. Each function has inputs, which are transformed into outputs, depending on controls and mechanisms.IDEF0
Is IDEF0 as rigorous and effective as UML? No, I wouldn’t argue that. When using IDEF0 we did not define the application in anything like the detail that UML would entail. Its value was in allowing us to develop a quick understanding of the crucial functions and issues, and then ask pertinent questions.

Given that certain inputs must be transformed into certain outputs, what are the controls and mechanisms required to ensure that the right outputs are produced?

In working out what the controls were, or ought to be, we’d run through the mantra that the output had to be accurate, complete, authorized, and timely. “Accurate” and “complete” are obvious. “Authorized” meant that the output must have been created or approved by people with the appropriate level of authority. “Timely” meant that the output must not only arrive in the right place, but at the right time. One could also use the six CobiT application controls as prompts.

In the example I gave above of the debt being written off I had worked down to the level of detail of “write off a debt” and looked at the controls required to produce the right output, “cancelled debts”. I focused on “authorized”, “complete” and “timely”.

Any sales operator could cancel a debt, but that raised the item for management review. That was fine. The problem was with “complete” and “timely”. All write-offs had to be collected for the control report, which was run daily. Was it possible to ensure some write-offs would not appear? Was it possible to over-key the default of the current date? It was possible. If I did so, would the write-off appear on another report? No. The control failure therefore meant that the control report could be easily bypassed.

The testing that I was carrying out had nothing to do with the original requirements. They were of interest, but not really relevant to what I was trying to do. I was trying to think like a dishonest employee, looking for a weakness I could exploit.

The decomposition of the application is the essential first step of threat modeling. Following that, one should analyze the assets for importance, explore possible vulnerabilities and threats, and create mitigation strategies.

I don’t want to discuss these in depth. There is plenty of material about threat modeling available. OWASP offers good guidance, [10] and [11]. Microsoft provides some useful advice [12], but its focus is on technical security, whereas OWASP looks at the business logic too. The OWASP testing guide [7] has a section devoted to business logic that serves as a useful introduction.

OWASP’s inclusion of mitigation strategies in the version of threat modeling that it advocates for testers is interesting. This is not normally a tester’s responsibility. However, considering such strategies is a useful way of planning the testing. What controls or protections should we be testing for? I think it also implicitly acknowledges that the requirements and design may well be flawed, and that threat modeling might not have been carried out in circumstances where it really should have been.

This perception is reinforced by OWASP’s advice that testers should ensure that threat models are created as early as possible in the project, and should then be revisited as the application evolves.

What I think is particularly valuable about the application control advice in CobIT and OWASP is that they help us to focus on security as an attribute that can, and must, be built into applications. Security testing then becomes a normal part of functional testing, as well as a specialist technical exercise. Testers must not regard security as an audit concern, with the testing being carried out by quasi-auditors, external to the development.

Getting the auditors on our side

I’ve had a fairly unusual career in that I’ve spent several years in each of software development, IT audit, IT security management, project management and test management. I think that gives me a good understanding of each of these roles, and a sympathetic understanding of the problems and pressures associated with them. It’s also taught me how they can work together constructively.

In most cases this is obvious, but the odd one out is the IT auditor. They have the reputation of being the hard-nosed suits from head office who come in to bayonet the wounded after a disaster! If that is what they do then they are being unprofessional and irresponsible. Good auditors should be pro-active and constructive. They will be happy to work with developers, users and testers to help them anticipate and prevent problems.

Auditors will not do your job for you, and they will rarely be able to give you all the answers. They usually have to spread themselves thinly across an organization, inevitably concentrating on the areas with problems and which pose the greatest risk.

They should not be dictating the controls, but good auditors can provide useful advice. They can act as a valuable sounding board, for bouncing ideas off. They can also be used as reinforcements if the testers are coming under irresponsible pressure to restrict the scope of security testing. Good auditors should be the friend of testers, not our enemy. At least you may be able to get access to some useful, but expensive, CobiT material.

Auditors can give you a different perspective and help you ask the right questions, and being able to ask the right questions is much more important than any particular tool or method for testers.

This article tells you something about CobiT and OWASP, and about possible new techniques for approaching testing of security. However, I think the most important lesson is that security testing cannot be a completely separate specialism, and that security testing must also include the exploration of the application’s functionality in a skeptical and inquisitive manner, asking the right questions.

Validating the security requirements is important, but so is exposing the unspoken requirements and disproving the invalid assumptions. It is about letting management see what the true state of the application is – just like the rest of testing.

References

[1] British Computer Society’s Special Interest Group in Software Testing (BCS SIGIST) Glossary.

[2] Gartner Inc. “Now Is the Time for Security at the Application Level” (NB PDF download), 2005.

[3] PriceWaterhouseCoopers. “Economic crime- people, culture and controls. The 4th biennial Global Economic Crime Survey”.

[4] US Secret Service. “Insider Threat Study: Illicit Cyber Activity in the Banking and Finance Sector”.

[5] IT Governance Institute. CobiT 4.1, 2007.

[6] IT Governance Institute. CobiT IT Assurance Guide (not free), 2007.

[7] Open Web Application Security Project. OWASP Testing Guide, V3.0, 2008.

[8] Gupta, S. “SOX Compliant Agile Processes”, Agile Alliance Conference, Agile 2008.

[9] IDEF0 Function Modeling Method.

[10] Open Web Application Security Project. OWASP Threat Modeling, 2007.

[11] Open Web Application Security Project. OWASP Code Review Guide “Application Threat Modeling”, 2009.

[12] Microsoft. “Improving Web Application Security: Threats and Countermeasures”, 2003.

Do standards keep testers in the kindergarten? (2009)

Do standards keep testers in the kindergarten? (2009)

Testing ExperienceThis article appeared in the December 2009 edition of Testing Experience magazine, which is no longer published. I’m moving the article onto my blog from my website, which will be decommissioned soon.

Normally when I re-post old articles I provide a warning about them being dated. This one was written in November 2009 but I think that its arguments are still valid. It is only dated in the sense that it doesn’t mention ISO 29119, the current ISO software testing standard, which was released in 2013. This article shows why I was dismayed when ISO 29119 arrived on the scene. I thought that prescriptive testing standards, such as IEEE 829, had had their day. They had failed and we had moved on.

The references in the article were all structured for a paper magazine. There are no hyperlinks and I have not tried to recreate them and check out whether they still work.
kindergarten

The article

Discussion of standards usually starts from the premise that they are intrinsically a good thing, and the debate then moves on to consider what form they should take and how detailed they should be.

Too often sceptics are marginalised. The presumption is that standards are good and beneficial. Those who are opposed to them appear suspect, even unprofessional.

I believe that although the content of standards for software development and testing can be valuable, especially within individual organisations, I do not believe that they should be regarded as generic “standards” for the whole profession. Turning useful guidelines into standards suggests that they should be mandatory.

My particular concern is that the IEEE 829 “Standard for Software and System Test Documentation”, and the many document templates derived from it, encourage a safety first approach to documentation, with testers documenting plans and scripts in slavish detail.

They do so not because the project genuinely requires it, but because they have been encouraged to equate documentation with quality, and they fear that they will look unprofessional and irresponsible in a subsequent review or audit. I think these fears are ungrounded and I will explain why.

A sensible debate about the value of standards must start with a look at what standards are, and the benefits that they bring in general, and specifically to testing.

Often discussion becomes confused because justification for applying standards in one context is transferred to a quite different context without any acknowledgement that the standards and the justification may no longer be relevant in the new context.

Standards can be internal to a particular organisation or they can be external standards attempting to introduce consistency across an industry, country or throughout the world.

I’m not going to discuss legal requirements enforcing minimum standards of safety, such as Health and Safety legislation, or the requirements of the US Food & Drug Administration. That’s the the law, and it’s not negotiable.

The justification for technical and product standards is clear. Technical standards introduce consistency, common protocols and terminology. They allow people, services and technology to be connected. Product standards protect consumers and make it easier for them to distinguish cheap, poor quality goods from more expensive but better quality competition.

Standards therefore bring information and mobility to the market and thus have huge economic benefits.

It is difficult to see where standards for software development or testing fit into this. To a limited extent they are technical standards, but only so far as they define the terminology, and that is a somewhat incidental role.

They appear superficially similar to product standards, but software development is not a manufacturing process, and buyers of applications are not in the same position as consumers choosing between rival, inter-changeable products.

Are software development standards more like the standards issued by professional bodies? Again, there’s a superficial resemblance. However, standards such as Generally Accepted Accounting Principles (Generally Accepted Accounting Practice in the UK) are backed up by company law and have a force no-one could dream of applying to software development.

Similarly, standards of professional practice and competence in the professions are strictly enforced and failure to meet these standards is punished.

Where does that leave software development standards? I do believe that they are valuable, but not as standards.

Susan Land gave a good definition and justification for standards in the context of software engineering in her book “Jumpstart CMM-CMMI Software Process Improvements – using IEEE software engineering standards”. [1]

“Standards are consensus-based documents that codify best practice. Consensus-based standards have seven essential attributes that aid in process engineering. They;

  1. Represent the collected experience of others who have been down the same road.
  2. Tell in detail what it means to perform a certain activity.
  3. Help to assure that two parties attach the same meaning to an engineering activity.
  4. Can be attached to or referenced by contracts.
  5. Improve the product.
  6. Protect the business and the buyer.
  7. Increase professional discipline.” (List sequence re-ordered from original).

The first four justifications are for standards in a descriptive form, to aid communication. Standards of this type would have a broader remit than the technical standards I referred to, and they would be guidelines rather than prescriptive. These justifications are not controversial, although the fourth has interesting implications that I will return to later.

The last three justifications hint at compulsion. These are valid justifications, but they are for standards in a prescriptive form and I believe that these justifications should be heavily qualified in the context of testing.

I believe that where testing standards have value they should be advisory, and that the word “standard” is unhelpful. “Standards” implies that they should be mandatory, or that they should at least be considered a level of best practice to which all practitioners should aspire.

Is the idea of “best practice” useful?

I don’t believe that software development standards, specifically the IEEE series, should be mandatory, or that they can be considered best practice. Their value is as guidelines, which would be a far more accurate and constructive term for them.

I do believe that there is a role for mandatory standards in software development. The time-wasting shambles that is created if people don’t follow file naming conventions is just one example. Secure coding standards that tell programmers about security flaws that they must not introduce into their programs are also a good example of standards that should be mandatory.

However, these are local, site-specific standards. They are about consistency, security and good housekeeping, rather than attempting to define an over-arching vision of “best practice”.

Testing standards should be treated as guidelines, practices that experienced practitioners would regard as generally sound and which should be understood and regarded as the default approach by inexperienced staff.

Making these practices mandatory “standards”, as if they were akin to technical or product standards and the best approach in any situation, will never ensure that experienced staff do a better job, and will often ensure they do a worse job than if they’d been allowed to use their own judgement.

Testing consultant Ben Simo, has clear views on the notion of best practice. He told me;

“‘Best’ only has meaning in context. And even in a narrow context, what we think is best now may not really be the best.

In practice, ‘best practice’ often seems to be either something that once worked somewhere else, or a technical process required to make a computer system do a task. I like for words to mean something. If it isn’t really best, let’s not call it best.

In my experience, things called best practices are falsifiable as not being best, or even good, in common contexts. I like guidelines that help people do their work. The word ‘guideline’ doesn’t imply a command. Guidelines can help set some parameters around what and how to do work and still give the worker the freedom to deviate from the guidelines when it makes sense.”

Rather than tie people’s hands and minds with standards and best practices, I like to use guidelines that help people think and communicate lessons learned – allowing the more experienced to share some of their wisdom with the novices.”

Such views cannot be dismissed as the musings of maverick testers who can’t abide the discipline and order that professional software development and testing require.

Ben is the President of the Association of Software Testing. His comments will be supported by many testers who see how it matches their own experience. Also, there has been some interesting academic work that justify such scepticism about standards. Interestingly, it has not come from orthodox IT academics.

Lloyd Roden drew on the work of the Dreyfus brothers as he presented a powerful argument against the idea of “best practice” at Starwest 2009 and the TestNet Najaarsevent. Hubert Dreyfus is a philosopher and psychologist and Stuart Dreyfus works in the fields of industrial engineering and artificial intelligence.

In 1980 they wrote an influential paper that described how people pass through five levels of competence as they move from novice to expert status, and analysed how rules and guidelines helped them along the way. The five level of the Dreyfus Model of Skills Acquisition can be summarised as follows.

  1. Novices require rules that can be applied in narrowly defined situations, free of the wider context.
  2. Advanced beginners can work with guidelines that are less rigid than the rules that novices require.
  3. Competent practitioners understand the plan and goals, and can evaluate alternative ways to reach the goal.
  4. Proficient practitioners have sufficient experience to foresee the likely result of different approaches and can predict what is likely to be the best outcome.
  5. Experts can intuitively see the best approach. Their vast experience and skill mean that rules and guidelines have no practical value.

For novices the context of the problem presents potentially confusing complications. Rules provide clarity. For experts, understanding the context is crucial and rules are at best an irrelevant hindrance.

Roden argued that we should challenge any references to “best practices”. We should talk about good practices instead, and know when and when not to apply them. He argued that imposing “best practice” on experienced professionals stifles creativity, frustrates the best people and can prompt them to leave.

However, the problem is not simply a matter of “rules for beginners, no rules for experts”. Rules can have unintended consequences, even for beginners.

Chris Atherton, a senior lecturer in psychology at the University of Central Lancashire, made an interesting point in a general, anecdotal discussion about the ways in which learners relate to rules.

“The trouble with rules is that people cling to them for reassurance, and what was originally intended as a guideline quickly becomes a noose.

The issue of rules being constrictive or restrictive to experienced professionals is a really interesting one, because I also see it at the opposite end of the scale, among beginners.”

Obviously the key difference is that beginners do need some kind of structural scaffold or support; but I think we often fail to acknowledge that the nature of that early support can seriously constrain the possibilities apparent to a beginner, and restrict their later development.”

The issue of whether rules can hinder the development of beginners has significant implications for the way our profession structures its processes. Looking back at work I did at the turn of the decade improving testing processes for an organisation that was aiming for CMMI level 3, I worry about the effect it had.

Independent professional testing was a novelty for this client and the testers were very inexperienced. We did the job to the best of our ability at the time, and our processes were certainly considered best practice by my employers and the client.

The trouble is that people can learn, change and grow faster than strict processes adapt. A year later and I’d have done it better. Two years later, it would have been different and better, and so on.

Meanwhile, the testers would have been gaining in experience and confidence, but the processes I left behind were set in tablets of stone.

As Ben Simo put it; “if an organisation is at a level less than the intent of level 5, CMM seems to often lock in ignorance that existed when the process was created”.

CMMI has its merits but also has dangers. Continuous process improvement is at its heart, but these are incremental advances and refinements in response to analysis of metrics.

Step changes or significant changes in response to a new problem don’t fit comfortably with that approach. Beginners advance from the first stage of the Dreyfus Model, but the context they come to know and accept is one of rigid processes and rules.

Rules, mandatory standards and inflexible processes can hinder the development of beginners. Rigid standards don’t promote quality. They can have the opposite effect if they keep testers in the kindergarten.

IEEE829 and the motivation behind documentation

One could argue that standards do not have to be mandatory. Software developers are pragmatic, and understand when standards should be mandatory and when they should be discretionary. That is true, but the problem is that the word “standards” strongly implies compulsion. That is the interpretation that most outsiders would place on the word.

People do act on the assumption that the standard should be mandatory, and then regard non-compliance as a failure, deviation or problem. These people include accountants and lawyers, and perhaps most significantly, auditors.

My particular concern is the effect of IEEE 829 testing documentation standard. I wonder if much more than 1% of testers have ever seen a copy of the standard. However, much of its content is very familiar, and its influence is pervasive.

IEEE 829 is a good document with much valuable material in it. It has excellent templates, which provide great examples of how to meticulously document a project.

Or at least they’re great examples of meticulous documentation if that is the right approach for the project. That of course is the question that has to be asked. What is the right approach? Too often the existence of a detailed documentation standard is taken as sufficient justification for detailed documentation.

I’m going to run through two objections to detailed documentation. They are related, but one refers to design and the other to testing. It could be argued that both have their roots in psychology as much as IT.

I believe that the fixation of many projects on documentation, and the highly dubious assumption that quality and planning are synonymous with detailed documentation, have their roots in the structured methods that dominated software development for so long.

These methods were built on the assumption that software development was an engineering discipline, rather than a creative process, and that greater quality and certainty in the development process could be achieved only through engineering style rigour and structure.

Paul Ward, one of the leading developers of structured methods, wrote a series of articles [2] on the history of structured methods, which admitted that they were neither based on empirical research nor subjected to peer-review.

Two other proponents of structured methods, Larry Constantine and Ed Yourdon, admitted that the early investigations were no more than informal “noon-hour” critiques” [3].

Fitzgerald, Russo and Stolterman gave a brief history of structured methods in their book “Information Systems Development – Methods in Action” [4] and concluded that “the authors relied on intuition rather than real-world experience that the techniques would work”.

One of the main problem areas for structured methods was the leap from the requirements to the design. Fitzgerald et al wrote that “the creation of hierarchical structure charts from data flow diagrams is poorly defined, thus causing the design to be loosely coupled to the results of the analysis. Coad & Yourdon [5] label this shift as a ‘Grand Canyon’ due to its fundamental discontinuity.”

The solution to this discontinuity, according to the advocates of structured methods, was an avalanche of documentation to help analysts to crawl carefully from the current physical system, through the current logical system to a future logical system and finally a future physical system.

Not surprisingly, given the massive documentation overhead, and developers’ propensity to pragmatically tailor and trim formal methods, this full process was seldom followed. What was actually done was more informal, intuitive, and opaque to outsiders.

An interesting strand of research was pursued by Human Computer Interface academics such as Curtis, Iscoe and Krasner [6], and Robbins, Hilbert and Redmiles [7].

They attempted to identify the mental processes followed by successful software designers when building designs. Their conclusion was that they did so using a high-speed, iterative process; repeatedly building, proving and refining mental simulations of how the system might work.

Unsuccessful designers couldn’t conceive working simulations, and fixed on designs whose effectiveness they couldn’t test till they’d been built.

Curtis et al wrote;

Exceptional designers were extremely familiar with the application domain. Their crucial contribution was their ability to map between the behavior required of the application system and the computational structures that implemented this behavior.

In particular, they envisioned how the design would generate the system behavior customers expected, even under exceptional circumstances.”

Robbins et al stressed the importance of iteration;

“The cognitive theory of reflection-in-action observes that designers of complex systems do not conceive a design fully-formed. Instead, they must construct a partial design, evaluate, reflect on, and revise it, until they are ready to extend it further.”

The eminent US software pioneer Robert Glass discussed these studies in his book “Software Conflict 2.0” [8] and observed that;

“people who are not very good at design … tend to build representations of a design rather than models; they are then unable to perform simulation runs; and the result is they invent and are stuck with inadequate design solutions.”

These studies fatally undermine the argument that linear and documentation driven processes are necessary for a quality product and that more flexible, light-weight documentation approaches are irresponsible.

Flexibility and intuition are vital to developers. Heavyweight documentation can waste time and suffocate staff if used when there is no need.

Ironically, it was the heavyweight approach that was founded on guesswork and intuition, and the lightweight approach that has sound conceptual underpinnings.

The lessons of the HCI academics have obvious implications for exploratory testing, which again is rooted in psychology as much as in IT. In particular, the finding by Curtis et al that “exceptional designers were extremely familiar with the application domain” takes us to the heart of exploratory testing.

What matters is not extensive documentation of test plans and scripts, but deep knowledge of the application. These need not be mutually exclusive, but on high-pressure, time-constrained projects it can be hard to do both.

Itkonen, Mäntylä and Lassenius conducted a fascinating experiment at the University of Helsinki in 2007 in which they tried to compare the effectiveness of exploratory testing and test case based testing. [9]

Their findings were that test case testing was no more effective in finding defects. The defects were a mixture of native defects in the application and defects seeded by the researchers. Defects were categorised according to the ease with which they could be found. Defects were also assigned to one of eight defect types (performance, usability etc.).

Exploratory testing scored better for defects at all four levels of “ease of detection”, and in 6 out of the 8 defect type categories. The differences were not considered statistically significant, but it is interesting that exploratory testing had the slight edge given that conventional wisdom for many years was that heavily documented scripting was essential for effective testing.

However, the really significant finding, which the researchers surprisingly did not make great play of, was that the exploratory testing results were achieved with 18% of the effort of the test case testing.

The exploratory testing required 1.5 hours per tester, and the test case testing required an average of 8.5 hours (7 hours preparation and 1.5 hours testing).

It is possible to criticise the methods of the researchers, particularly their use of students taking a course in software testing, rather than professionals experienced in applying the techniques they were using.

However, exploratory testing has often been presumed to be suitable only for experienced testers, with scripted, test case based testing being more appropriate for the less experienced.

The methods followed by the Helsinki researchers might have been expected to bias the results in favour of test case testing. Therefore, the finding that exploratory testing is at least as effective as test case testing with a fraction of the effort should make proponents of heavily documented test planning pause to reconsider whether it is always appropriate.

Documentation per se does not produce quality. Quality is not necessarily dependent on documentation. Sometimes they can be in conflict.

Firstly, the emphasis on producing the documentation can be a major distraction for test managers. Most of their effort goes into producing, refining and updating plans that often bear little relation to reality.

Meanwhile the team are working hard firming up detailed test cases based on an imperfect and possibly outdated understanding of the application. While the application is undergoing the early stages of testing, with consequent fixes and changes, detailed test plans for the later stages are being built on shifting sand.

You may think that is being too cynical and negative, and that testers will be able to produce useful test cases based on a correct understanding of the system as it is supposed to be delivered to the testing stage in question. However, even if that is so, the Helsinki study shows that this is not a necessary condition for effective testing.

Further, if similar results can be achieved with less than 20% of the effort, how much more could be achieved if the testers were freed from the documentation drudgery in order to carry out more imaginative and proactive testing during the earlier stages of development?

Susan Land’s fourth justification for standards (see start of article) has interesting implications.

Standards “can be attached to or referenced by contracts”. That is certainly true. However, the danger of detailed templates in the form of a standard is that organisations tailor their development practices to the templates rather than the other way round.

If the lawyers fasten onto the standard and write its content into the contract then documentation can become an end and not just a means to an end.

Documentation becomes a “deliverable”. The dreaded phrase “work product” is used, as if the documentation output is a product of similar value to the software.

In truth, sometimes it is more valuable if the payments are staged under the terms of the contract, and dependent on the production of satisfactory documentation.

I have seen triumphant announcements of “success” following approval of “work products” with the consequent release of payment to the supplier when I have known the underlying project to be in a state of chaos.

Formal, traditional methods attempt to represent a highly complex, even chaotic, process in a defined, repeatable model. These methods often bear only vague similarities to what developers have to do to craft applications.

The end product is usually poor quality, late and over budget. Any review of the development will find constant deviations from the mandated method.

The suppliers, and defenders, of the method can then breathe a sigh of relief. The sacred method was not followed. It was the team’s fault. If only they’d done it by the book! The possibility that the developers’ and testers’ apparent sins were the only reason anything was produced at all is never considered.

What about the auditors?

Adopting standards like IEEE 829 without sufficient thought causes real problems. If the standard doesn’t reflect what really has to be done to bring the project to a successful conclusion then mandated tasks or documents may be ignored or skimped on, with the result that a subsequent review or audit reports on a failure to comply.

An alternative danger is that testers do comply when there is no need, and put too much effort into the wrong things. Often testers arrive late on the project. Sometimes the emphasis is on catching up with plans and documentation that are of dubious value, and are not an effective use of the limited resources and time.

However, if the contract requires it, or if there is a fear of the consequences of an audit, then it could be rational to assign valuable staff to unproductive tasks.

Sadly, auditors are often portrayed as corporate bogey-men. It is assumed that they will conduct audits by following ticklists, with simplistic questions that require yes/no answers. “Have you done x to y, yes or no”.

If the auditees start answering “No, but …” they would be cut off with “So, it’s no”.

I have seen that style of auditing. It is unprofessional and organisations that tolerate it have deeper problems than unskilled, poorly trained auditors. It is senior management that creates the environment in which the ticklist approach thrives. However, I don’t believe it is common. Unfortunately people often assume that this style of auditing is the norm.

IT audit is an interesting example of a job that looks extremely easy at first sight, but is actually very difficult when you get into it.

It is very easy for an inexperienced auditor to do what appears to be a decent job. At least it looks competent to everyone except experienced auditors and those who really understand the area under review.

If auditors are to add value they have to be able to use their judgement, and that has to be based on their own skills and experience as well as formal standards.

They have to be able to analyse a situation and evaluate whether the risks have been identified and whether the controls are appropriate to the level of risk.

It is very difficult to find the right line and you need good experienced auditors to do that. I believe that ideally IT auditors should come from an IT background so that they do understand what is going on; poachers turned gamekeepers if you like.

Too often testers assume that they know what auditors expect, and they do not speak directly to the auditors or check exactly what professional auditing consists of.

They assume that auditors expect to see detailed documentation of every stage, without consideration of whether it truly adds value, promotes quality or helps to manage the risk.

Professional auditors take a constructive and pragmatic approach and can help testers. I want to help testers understand that. I used to find it frustrating when I worked as an IT auditor when I found that people had wasted time on unnecessary and unhelpful actions on the assumption that “the auditors require it”.

Kanwal Mookhey, an IT auditor and founder of NII consulting, wrote an interesting article for the Internal Auditor magazine of May 2008 [10] about auditing IT project management.

He described the checking that auditors should carry out at each stage of a project. He made no mention of the need to see documentation of detailed test plans and scripts whereas he did emphasize the need for early testing.

Kanwal told me.

“I would agree that auditors are – or should be – more inclined to see comprehensive testing, rather than comprehensive test documentation.

Documentation of test results is another matter of course. As an auditor, I would be more keen to know that a broad-based testing manual exists, and that for the system in question, key risks and controls identified during the design phase have been tested for. The test results would provide a higher degree of assurance than exhaustive test plans.”

One of the most significant developments in the field of IT governance in the last few decades has been the US 2002 Sarbanes-Oxley Act, which imposed new standards of reporting, auditing and control for US companies. It has had massive worldwide influence because it applies to the foreign subsidiaries of US companies, and foreign companies that are listed on the US stock exchanges.

The act attracted considerable criticism for the additional overheads it imposed on companies, duplicating existing controls and imposing new ones of dubious value.

Unfortunately, the response to Sarbanes-Oxley verged on the hysterical, with companies, and unfortunately some auditors, reading more into the legislation than a calmer reading could justify. The assumption was that every process and activity should be tied down and documented in great detail.

However, not even Sarbanes-Oxley, supposedly the sacred text of extreme documentation, requires detailed documentation of test plans or scripts. That may be how some people misinterpret the act. It is neither mandated by the act nor recommended in the guidance documents issued by the Institute of Internal Auditors [11] and the Information Systems Audit & Control Association [12].

If anyone tries to justify extensive documentation by telling you that “the auditors will expect it”, call their bluff. Go and speak to the auditors. Explain that what you are doing is planned, responsible and will have sufficient documentation of the test results.

Documentation is never required “for the auditors”. If it is required it is because it is needed to manage the project, or it is a requirement of the project that has to be justified like any other requirement. That is certainly true of safety critical applications, or applications related to pharmaceutical development and manufacture. It is not true in all cases.

IEEE 829 and other standards do have value, but in my opinion their value is not as standards! They do contain some good advice and the fruits of vast experience. However, they should be guidelines to help the inexperienced, and memory joggers for the more experienced.

I hope this article has made people think about whether mandatory standards are appropriate for software development and testing, and whether detailed documentation in the style of IEEE 829 is always needed. I hope that I have provided some arguments and evidence that will help testers persuade others of the need to give testers the freedom to leave the kindergarten and grow as professionals.

References

[1] Land, S. (2005). “Jumpstart CMM-CMMI Software Process Improvements – using IEEE software engineering standards”, Wiley.

[2a] Ward, P. (1991). “The evolution of structured analysis: Part 1 – the early years”. American Programmer, vol 4, issue 11, 1991. pp4-16.

[2b] Ward, P. (1992). “The evolution of structured analysis: Part 2 – maturity and its problems”. American Programmer, vol 5, issue 4, 1992. pp18-29.

[2c] Ward, P. (1992). “The evolution of structured analysis: Part 3 – spin offs, mergers and acquisitions”. American Programmer, vol 5, issue 9, 1992. pp41-53.

[3] Yourdon, E., Constantine, L. (1977) “Structured Design”. Yourdon Press, New York.

[4] Fitzgerald B., Russo N., Stolterman, E. (2002). “Information Systems Development – Methods in Action”, McGraw Hill.

[5] Coad, P., Yourdon, E. (1991). “Object-Oriented Analysis”, 2nd edition. Yourdon Press.

[6] Curtis, B., Iscoe, N., Krasner, H. (1988). “A field study of the software design process for large systems” (NB PDF download). Communications of the ACM, Volume 31, Issue 11 (November 1988), pp1268-1287.

[7] Robbins, J., Hilbert, D., Redmiles, D. (1998). “Extending Design Environments to Software Architecture Design” (NB PDF download). Automated Software Engineering, Vol. 5, No. 3, July 1998, pp261-290.

[8] Glass, R. (2006). “Software Conflict 2.0: The Art and Science of Software Engineering” Developer Dot Star Books.

[9a] Itkonen, J., Mäntylä, M., Lassenius C., (2007). “Defect detection efficiency – test case based vs exploratory testing”. First International Symposium on Empirical Software Engineering and Measurement. (Payment required).

[9b] Itkonen, J. (2008). “Do test cases really matter? An experiment comparing test case based and exploratory testing”.

[10] Mookhey, K. (2008). “Auditing IT Project Management”. Internal Auditor, May 2008, the Institute of Internal Auditors.

[11] The Institute of Internal Auditors (2008). “Sarbanes-Oxley Section 404: A Guide for Management by Internal Controls Practitioners”.

[12] Information Systems Audit and Control Association (2006). “IT Control Objectives for Sarbanes-Oxley 2nd Edition”.

What happens to usability when development goes offshore? (2009)

What happens to usability when development goes offshore? (2009)

Testing ExperienceThis article appeared in the March 2009 edition of Testing Experience magazine, which is no longer published. I’m moving the article onto my blog from my website, which will be decommissioned soon.

If you choose to read the article please bear in mind that it was written in January 2009 and is therefore inevitably dated in some respects, though I think that much of it is still valid.

The references in the article were all structured for a paper magazine. There are no hyperlinks and I have not tried to recreate them and check out whether they still work.

The article

Two of the most important trends in software development over the last 20 years have been the increasing number of companies sending development work to cheaper labour markets, and the increasing attention that is paid to the usability of applications.offshore usability article

Developers in Europe and North America cannot fail to have missed the trend to offshore development work and they worry about the long-term implications.

Usability, however, has had a mixed history. Many organizations and IT professionals have been deeply influenced by the need to ensure that their products and applications are usable; many more are only vaguely aware of this trend and do not take it seriously.

As a result, many developers and testers have missed the significant implications that offshoring has for building usable applications, and underestimate the problems of testing for usability. I will try to explain these problems, and suggest possible remedial actions that testers can take if they find themselves working with offshore developers. I will be looking mainly at India, the largest offshoring destination, because information has been more readily available. However, the problems and lessons apply equally to other offshoring countries.

According to NASSCOM, the main trade body representing the Indian IT industry [1], the numbers of IT professionals employed in India rose from 430,000 in 2001 to 2,010,000 in 2008. The numbers employed by offshore IT companies rose 10 fold, from 70,000 to 700,000.

It is hard to say how many of these are usability professionals. Certainly at the start of the decade there were only a handful in India. Now, according to Jumkhee Iyengar, of User in Design, “a guesstimate would be around 5,000 to 8,000”. At most that’s about 0.4% of the people working in IT. Even if they were all involved in offshore work they would represent no more than 1% of the total.

Does that matter? Jakob Nielsen, the leading usability expert, would argue that it does. His rule of thumb [2] is that “10% of project resources must be devoted to usability to ensure a good quality product”.

Clearly India is nowhere near capable of meeting that figure. To be fair, the same can be said of the rest of the world given that 10% represents Nielsen’s idea of good practice, and most organizations have not reached that stage. Further, India traditionally provided development of “back-office” applications, which are less likely to have significant user interaction.

Nevertheless, the shortage of usability professionals in the offshoring destinations does matter. Increasingly offshore developments have involved greater levels of user interaction, and any shortage of usability expertise in India will damage the quality of products.

Sending development work offshore always introduces management and communication problems. Outsourcing development, even within the same country, poses problems for usability. When the development is both offshored and outsourced, the difficulties in producing a usable application multiply. If there are no usability professionals on hand, the danger is that the developers will not only fail to resolve those problems – they will probably not even recognize that they exist.

Why can outsourcing be a problem for usability?

External software developers are subject to different pressures from internal developers, and this can lead to poorer usability. I believe that external suppliers are less likely to be able to respond to the users’ real needs, and research supports this. [3, 4, 5, 6, 7]

Obviously external suppliers have different goals from internal developers. Their job is to deliver an application that meets the terms of the contract and makes a profit in doing so. Requirements that are missed or are vague are unlikely to be met, and usability requirements all too often fall into one of these categories. This is not simply a matter of a lack of awareness. Usability is a subjective matter, and it is difficult to specify precise, objective, measurable and testable requirements. Indeed, trying too hard to do so can be counter-productive if the resulting requirements are too prescriptive and constrain the design.

A further problem is that the formal nature of contractual relationships tends to push clients towards more traditional, less responsive and less iterative development processes, with damaging results for usability. If users and developers are based in different offices, working for different employers, then rapid informal feedback becomes difficult.

Some of the studies that found these problems date back to the mid 90s. However, they contain lessons that remain relevant now. Many organizations have still not taken these lessons on board, and they are therefore facing the same problems that others confronted 10 or even 20 years ago.

How can offshoring make usability problems worse?

So, if simple outsourcing to a supplier in the same country can be fraught with difficulty, what are the usability problems that organizations face when they offshore?

Much of the discussion of this harks back to an article by Jakob Nielsen in 2002 [2]. Nielsen stirred up plenty of discussion about the problem, much of it critical.

“Offshore design raises the deeper problem of separating interaction designers and usability professionals from the users. User-centered design requires frequent access to users: the more frequent the better.”

If the usability professionals need to be close to the users, can they stay onshore and concentrate on the design while the developers build offshore? Nielsen was emphatic on that point.

“It is obviously not a solution to separate design and implementation since all experience shows that design, usability, and implementers need to proceed in tight co-ordination. Even separating teams across different floors in the same building lowers the quality of the resulting product (for example, because there will be less informal discussions about interpretations of the design spec).”

So, according to Nielsen, the usability professionals have to follow the developers offshore. However, as we’ve seen, the offshore countries have nowhere near enough trained professionals to cover the work. Numbers are increasing, but not by enough to keep pace with the growth in offshore development, never mind the demands of local commerce.

This apparent conundrum has been dismissed by many people who have pointed out, correctly, that offshoring is not an “all or nothing” choice. Usability does not have to follow development. If usability is a concern, then user design work can be retained onshore, and usability expertise can be deployed in both countries. This is true, but it is a rather unfair criticism of Nielsen’s arguments. The problem he describes is real enough. The fact that it can be mitigated by careful planning certainly does not mean the problem can be ignored.

User-centred design assumes that developers, usability analysts and users will be working closely together. Offshoring the developers forces organizations to make a choice between two unattractive options; separating usability professionals from the users, or separating them from the developers.

It is important that organizations acknowledge this dilemma and make the choice explicitly, based on their needs and their market. Every responsible usability professional would be keenly aware that their geographical separation from their users was a problem, and so those organizations that hire usability expertise offshore are at least implicitly acknowledging the problems caused by offshoring. My concern is for those organizations who keep all the usability professionals onshore and either ignore the problems, or assume that they don’t apply in their case.

How not to tackle the problems

Jhumkee Iyengar has studied the responses of organizations wanting to ensure that offshore development will give them usable applications [8]. Typically they have tried to do so without offshore usability expertise. They have used two techniques sadly familiar to those who have studied usability problems; defining the user interaction requirements up-front and sending a final, frozen specification to the offshore developers, or adopting the flawed and fallacious layering approach.

Attempting to define detailed up-front requirements is anathema to good user-centred design. It is an approach consistent with the Waterfall approach and is attractive because it is neat and easy to manage (as I discussed in my article on the V Model in Testing Experience, issue 4).

Building a usable application that allows users and customers to achieve their personal and corporate goals painlessly and efficiently requires iteration, prototyping and user involvement that is both early in the lifecycle and repeated throughout it.

The layering approach was based on the fallacy that the user interface could be separated from the functionality of the application, and that each could be developed separately. This fallacy was very popular in the 80s and 90s. Its influence has persisted, not because it is valid, but because it lends an air of spurious respectability to what people want to do anyway.

Academics expended a huge amount of effort trying to justify this separability. Their arguments, their motives and the consequences of their mistake are worth a full article in their own right. I’ll restrict myself to saying that the notion of separability was flawed on three counts.

  • It was flawed conceptually because usability is a product of the experience of the user with the whole application, not just the interface.
  • It was flawed architecturally, because design decisions taken by system architects can have a huge impact on the user experience.
  • Finally, it was flawed in political and organizational terms because it encouraged usability professionals to retreat into a ghetto, isolated from the hubbub of the developers, where they would work away on an interface that could be bolted onto the application in time for user testing.

Lewis & Rieman [3] memorably savaged the idea that usability professionals could hold themselves aloof from the application design, calling it “the peanut butter theory of usability”

“Usability is seen as a spread that can be smeared over any design, however dreadful, with good results if the spread is thick enough. If the underlying functionality is confusing, then spread a graphical user interface on it. … If the user interface still has some problems, smear some manuals over it. If the manuals are still deficient, smear on some training which you force users to take.”

If the usability professionals stay onshore, and adopt either the separability or the peanut butter approach, the almost inescapable result is that they will be delegating decisions about usability to the offshore developers.

Developers are just about the worst people to take these decisions. They are too close to the application, and instinctively see workarounds to problems that might appear insoluble to real users.

Developers also have a different mindset when approaching technology. Even if they understand the business context of the applications they can’t unlearn their technical knowledge and experience to see the application as a user would; and this is if developers and users are in the same country. The cultural differences are magnified if they are based in different continents.

The relative lack of maturity of usability in the offshoring destinations means that developers often have an even less sophisticated approach than developers in the client countries. User interaction is regarded as an aesthetic matter restricted to the interface, with the developers more interested in the guts of the application.

Pradeep Henry reported in 2003 that most user interfaces at Indian companies were being designed by programmers, and that in his opinion they had great difficulty switching from their normal technical, system-focused mindset to that of a user. [9]

They also had very little knowledge of user centered design techniques. This is partly a matter of education, but there is more to it. In explaining the shortage of usability expertise in India, Jhumkee Iyengar told me that she believes important factors are the “phenomenal success of Indian IT industry, which leads people to question the need for usability, and the offshore culture, which has traditionally been a ‘back office culture’ not conducive to usability”.

The situation is, however, certainly improving. Although the explosive growth in development work in India, China and Eastern Europe has left the usability profession struggling to keep up, the number of usability experts has grown enormously over the last 10 years. There are nowhere near enough, but there are firms offering this specialist service keen to work with offshoring clients.

This trend is certain to continue because usability is a high value service. It is a hugely attractive route to follow for these offshore destinations, complementing and enhancing the traditional offshore development service.

Testers must warn of the dangers

The significance of all this from the perspective of testers is that even though usability faces significant threats when development is offshored, there are ways to reduce the dangers and the problems. They cannot be removed entirely, but offshoring offers such cost savings it will continue to grow and it is important that testers working for client companies understand these problems and can anticipate them.

Testers may not always, or often, be in a position to influence whether usability expertise is hired locally or offshore. However, they can flag up the risks of whatever approach is used, and adopt an appropriate test strategy.

The most obvious danger is if an application has significant interaction with the user and there is no specialist usability expertise on the project. As I said earlier, this could mean that the project abdicates responsibility for crucial usability decisions to the offshore developers.

Testers should try to prevent a scenario where the interface and user interaction are pieced together offshore, and thrown “over the wall” to the users back in the client’s country for acceptance testing when it may be too late to fix even serious usability defects.

Is it outside the traditional role of a tester to lobby project management to try and change the structure of the project? Possibly, but if testers can see that the project is going to be run in a way that makes it hard to do their job effectively then I believe they have a responsibility to speak out.

I’m not aware of any studies looking at whether outsourcing contracts (or managed service agreements) are written in such prescriptive detail that they restrict the ability of test managers to tailor their testing strategy to the risks they identify. However, going by my experience and the anecdotal evidence I’ve heard, this is not an issue. Testing is not usually covered in detail in contracts, thus leaving considerable scope to test managers who are prepared to take the initiative.

Although I’ve expressed concern about the dangers of relying on a detailed up front specification there is no doubt that if the build is being carried out by offshore developers then they have to be given clear, detailed, unambiguous instructions.

The test manager should therefore set a test strategy that allows for significant static testing of the requirements documents. These should be shaped by walkthroughs and inspections to check that the usability requirements are present, complete, stated in sufficient detail to be testable, yet not defined so precisely that they constrain the design and rule out what might have been perfectly acceptable solutions to the requirements.

Once the offshore developers have been set to work on the specification it is important that there is constant communication with them and ongoing static testing as the design is fleshed out.

Hienadz Drahun leads an offshore interaction design team in Belarus. He stresses the importance of communication. He told me that “communication becomes a crucial point. You need to maintain frequent and direct communication with your development team.”

Dave Cronin of the US Cooper usability design consultancy wrote an interesting article about this in 2004, [10].

“We already know that spending the time to holistically define and design a software product dramatically increases the likelihood that you will deliver an effective and pleasurable experience to your customers, and that communication is one of the critical ingredients to this design process. All this appears to be even more true if you decide to have the product built in India or Eastern Europe.

To be absolutely clear, to successfully outsource product development, it is crucial that every aspect of the product be specifically defined, designed and documented. The kind of hand-waving you may be accustomed to when working with familiar and well-informed developers will no longer suffice.”

Significantly Cronin did not mention testing anywhere in his article, though he does mention “feedback” during the design process.

The limits of usability testing

One of the classic usability mistakes is to place too much reliance on usability testing. In fact, I’ve heard it argued that there is no such thing as usability testing. It’s a provocative argument, but it has some merit.

If usability is dependent only on testing, then it will be left to the end of the development, and serious defects will be discovered too late in the project for them to be fixed.

“They’re only usability problems, the users can work around them” is the cry from managers under pressure to implement on time. Usability must be incorporated into the design stages, with possible solutions being evaluated and refined. Usability is therefore produced not by testing, but by good design practices.

Pradeep Henry called his new team “Usability Lab” when he introduced usability to Cognizant, the offshore outsourcing company, in India. However, the name and the sight of the testing equipment in the lab encouraged people to equate usability with testing. As Henry explained;

“Unfortunately, equating usability with testing leads people to believe that programmers or graphic designers should continue to design the user interface and that usability specialists should be consulted only later for testing.”

Henry renamed his team the Cognizant Usability Group (now the Cognizant Usability Center of Excellence). [9]

Tactical improvements testers can make

So if usability evaluation has to be integrated into the whole development process then what can testers actually do in the absence of usability professionals? Obviously it will be difficult, but if iteration is possible during design, and if you can persuade management that there is a real threat to quality then you can certainly make the situation a lot better.

There is a lot of material readily available to guide you. I would suggest the following.

Firstly, Jakob Nielsen’s Discount Usability Engineering [11] consists of cheap prototyping (maybe just paper based), heuristic (rule based) evaluation and getting users to talk their way through the application, thinking out loud as they work their way through a scenario.

Steve Krug’s “lost our lease” usability testing basically says that any usability testing is better than none, and that quick and crude testing can be both cheap and effective. Krug’s focus is more on the management of this approach rather than the testing techniques themselves, so it fits with Nielsen’s DUE, rather than being an alternative in my opinion. It’s all in his book “Don’t make me think”. [12]

Lockwood & Constantine’s Collaborative Usability Inspections offer a far more formal and rigorous approach, though you may be stretching yourself to take this on without usability professionals. It entails setting up formal walk-throughs of the proposed design, then iteration to remove defects and improve the product. [13, 14, 15]

On a lighter note, Alan Cooper’s book “The inmates are running the asylum” [15], is an entertaining rant on the subject. Cooper’s solution to the problem is his Interaction Design approach. The essence of this is that software development must include a form of functional analysis that seeks to understand the business problem from the perspective of the users, based on their personal and corporate goals, working through scenarios to understand what they will want to do.

Cooper’s Interaction Design strikes a balance between the old, flawed extremes of structured methods (which ignored the individual) and traditional usability (which often paid insufficient attention to the needs of the organization). I recommend this book not because I think that a novice could take this technique on board and make it work, but because it is very readable and might make you question your preconceptions and think about what is desirable and possible.

Longer term improvements

Of course it’s possible that you are working for a company that is still in the process of offshoring and where it is still possible to influence the outcome. It is important that the invitation to tender includes a requirement that suppliers can prove expertise and experience in usability engineering. Additionally, the client should expect potential suppliers to show they can satisfy the following three criteria.

  • The supplier should have a process or lifecycle model that not only has usability engineering embedded within it but that also demonstrates how the onshore and offshore teams will work together to achieve usability. The process must involve both sides.

    Offshore suppliers have put considerable effort into developing such frameworks. Three examples are Cognizant’s “End-to-End UI Process”, HFI’s “Schaffer-Weinschenk Method™” and Persistent’s “Overlap Usability”.

  • Secondly, suppliers should carry out user testing with users from the country where the application will be used. The cultural differences are too great to use people who happen to be easily available to the offshore developers.

    Remote testing entails usability experts based in one location conducting tests with users based elsewhere, even another continent. It would probably not be the first choice of most usability professionals, but it is becoming increasingly important. As Jumkhee Iyengar told me it “may not be the best, but it works and we have had good results. A far cry above no testing.”

  • Finally, suppliers should be willing to send usability professionals to the onshore country for the requirements gathering. This is partly a matter of ensuring that the requirements gathering takes full account of usability principles. It is also necessary so that these usability experts can fully understand the client’s business problem and can communicate it to the developers when they return offshore.

It’s possible that there may still be people in your organization who are sceptical about the value of usability. There has been a lot of work done on the return on investment that user centered design can bring. It’s too big a topic for this article, but a simple internet search on “usability” and “roi” will give you plenty of material.

What about the future?

There seems no reason to expect any significant changes in the trends we’ve seen over the last 10 years. The offshoring countries will continue to produce large numbers of well-educated, technically expert IT professionals. The cost advantage of developing in these countries will continue to attract work there.

Proactive test managers can head off some of the usability problems associated with offshoring. They can help bring about higher quality products even if their employers have not allowed for usability expertise on their projects. However, we should not have unrealistic expectations about what they can achieve. High quality, usable products can only be delivered consistently by organizations that have a commitment to usability and who integrate usability throughout the design process.

Offshoring suppliers will have a huge incentive to keep advancing into user centered design and usability consultancy. The increase in offshore development work creates the need for such services, whilst the specialist and advanced nature of the work gives these suppliers a highly attractive opportunity to move up the value chain, selling more expensive services on the basis of quality rather than cost.

The techniques I have suggested are certainly worthwhile, but they may prove no more than damage limitation. As Hienadz Drahun put it to me; “to get high design quality you need to have both a good initial design and a good amount of iterative usability evaluation. Iterative testing alone is not able to turn a bad product design into a good one. You need both.” Testers alone cannot build usability into an application, any more than they can build in quality.

Testers in the client countries will increasingly have to cope with the problems of working with offshore development. It is important that they learn how to work successfully with offshoring and adapt to it.

They will therefore have to be vigilant about the risks to usability of offshoring, and advise their employers and clients how testing can best mitigate these risks, both on a short term tactical level, i.e. on specific projects where there is no established usability framework, and in the longer term, where there is the opportunity to shape the contracts signed with offshore suppliers.

There will always be a need for test management expertise based in client countries, working with the offshore teams, but it will not be the same job we knew in the 90s.

References

[1] NASSCOM (2009). “Industry Trends, IT-BPO Sector-Overview”.

[2] Nielsen, J. (2002). “Offshore Usability”. Jakob Nielsen’s website.

[3] Lewis, C., Rieman, J. (1994). “Task-Centered User Interface Design: A Practical Introduction”. University of Colorado e-book.

[4] Artman, H. (2002). “Procurer Usability Requirements: Negotiations in Contract Development”. Proceedings of the second Nordic conference on Human-Computer Interaction (NordiCHI) 2002.

[5] Holmlid, S. Artman, H. (2003). “A Tentative Model for Procuring Usable Systems” (NB PDF download). 10th International Conference on Human-Computer Interaction, 2003.

[6] Grudin, J. (1991). “Interactive Systems: Bridging the Gaps Between Developers and Users”. Computer, April 1991, Vol. 24, No. 4 (subscription required).

[7] Grudin, J. (1996). “The Organizational Contexts of Development and Use”. ACM Computing Surveys. Vol 28, issue 1, March 1996, pp 169-171 (subscription required).

[8] Iyengar, J. (2007). “Usability Issues in Offshore Development: an Indian Perspective”. Usability Professionals Association Conference, 2007 (UPA membership required.

[9] Henry, P. (2003). “Advancing UCD While Facing Challenges Working from Offshore”. ACM Interactions, March/April 2003 (subscription required).

[10] Cronin D. (2004). “Designing for Offshore Development”. Cooper Journal blog.

[11] Nielsen, J. (1994). “Guerrilla HCI: Using Discount Usability Engineering to Penetrate the Intimidation Barrier”, Jakob Nielsen’s website.

[12] Krug, S. (2006). “Don’t Make Me Think!: A Common Sense Approach to Web Usability” 2nd edition. New Riders.

[13] Constantine, L. Lockwood, L. (1999). “Software for Use”. Addison Wesley.

[14] Lockwood, L. (1999). “Collaborative Usability Inspecting – more usable software and sites” (NB PDF download), Constantine and Lockwood website.

[15] Cooper, A. (2004). “The Inmates Are Running the Asylum: Why High-tech Products Drive Us Crazy and How to Restore the Sanity”. Sams.

The seductive and dangerous V model (2008)

The seductive and dangerous V model (2008)

Testing Experience

This is an expanded version of an article I wrote for the December 2008 edition of Testing Experience, a magazine which is no longer published. I’m moving the article onto my blog from my website, which will be decommissioned soon.

Inevitably the article is dated in some respects, especially where I discuss possible ways for testers to ameliorate the V Model if they are forced to use it. There’s no mention of exploratory testing. That’s simply because my aim in writing the article was to help people understand the flaws of the V Model and how they can work around them in traditional environments that try to apply the model. A comparison of exploratory and traditional scripted testing techniques was far too big a topic to be shoe-horned in here.

However, I think the article still has value in helping to explain why software development and testing took the paths it followed. For the article I drew heavily on reading and research I carried out for my MSc dissertation, an exercice that taught me a huge amount about the history of software development, and why we ended up where we did.

The references in the article were all structured for a paper magazine. There are no hyperlinks and I have not tried to recreate them and check out whether they still work.

The article

The seductive and dangerous V Model

The Project Manager finishes the meeting by firing a question at the Programme Test Manager, and myself, the Project Test Manager.

“The steering board’s asking questions about quality. Are we following a formal testing model?”.

The Programme Test Manager doesn’t hesitate. “We always use the V Model. It’s the industry standard”.

The Project Manager scribbles a note. “Good”.

Fast forward four months, and I’m with the Programme Manager and the test analysts. It’s a round table session, to boost morale and assure everyone that their hard work is appreciated.

The Programme Manager is upbeat. “Of course you all know about the problems we’ve been having, but when you look at the big picture, it’s not so bad. We’re only about 10% behind where we should have been. We’re still on target to hit the implementation date”.

This is a red rag to a bullish test manager, so I jump in. “Yes, but that 10% is all at the expense of the testing window. We’ve lost half our time”.

The Programme Manager doesn’t take offence, and laughs. “Oh, come on! You’re not going to tell me you’ve not seen that before? If you’re unhappy with that then maybe you’re in the wrong job!”.

I’m not in the mood to let it drop. “That doesn’t make it right. There’s always a readiness to accept testing getting squeezed, as if the test window is contingency. If project schedules and budgets overrun, project managers can’t get away with saying ‘oh – that always happens'”.

He smiles and moves smoothly on, whilst the test analysts try to look deadpan, and no doubt some decide that career progression to test management isn’t quite as attractive as they used to think.

Test windows do get squeezed

The Programme Manager was quite right. That’s what happens on Waterfall projects, and saying that you’re using the V Model does nothing to stop that almost routine squeeze.

In “Testing Experience” issue 2, Dieter Arnouts [1] outlined how test management fits into different software development lifecycle models. I want to expand on the V Model, its problems and what testers can do in response.

In the earlier meeting the Programme Test Manager had given an honest and truthful answer, but I wonder if he was actually wholly misleading. Yes, we were using the V Model, and unquestionably it would be the “test model” of choice for most testing professionals, in the UK at least.

However, I question whether it truly qualifies as a “test model”, and whether its status as best practice, or industry standard, is deserved.

A useful model has to represent the way that testing can and should be carried out. A model must therefore be coherent, reasonably precisely defined and grounded in the realities of software development.

Coherence at the expense of precision

If the V Model, as practised in the UK, has coherence it is at the expense of precision. Worldwide there are several different versions and interpretations. At one extreme is the German V-Model, [2], the official project management methodology of the German government. It is roughly equivalent to Prince-2, but more directly relevant to software development. Unquestionably this is coherent and rigorously defined.

Of course, it’s not the V Model at all, not in the sense that UK testers understand it. For one thing the V stands not for the familiar V shaped lifecycle model, but for “vorgehens”, German for “going forwards”. However, this model does promote a V shaped lifecycle, and some seem to think that this is the real V Model, in its pure form.

The US also has a government standard V Model [3], which dates back about 20 years, like its German counterpart. Its scope is rather narrower, being a systems development lifecycle model, but still more far more detailed and more rigorous than most UK practitioners would understand by the V Model.

The understanding that most of us in the UK have of the V Model is probably based on the V Model as it’s taught in the ISEB Foundation Certificate in Software Testing [4]. The syllabus merely describes the Model without providing an illustration, and does not even name the development levels on the left hand side of the V. Nor does it prescribe a set number of levels of testing. Four levels is “common”, but there could be more or less on any project.

This makes sense to an experienced practitioner. It is certainly coherent, in that it is intuitive. It is easy for testers to understand, but it is far from precise. Novices have no trouble understanding the model to the level required for a straightforward multiple choice exam, but they can struggle to get to grips with what exactly the V Model is. If they search on the internet their confusion will deepen. Wikipedia is hopelessly confused on the matter, with two separate articles on the V Model, which fail to make a clear distinction between the German model [5] and the descriptive lifecycle model [6] familiar to testers. Unsurprisingly there are many queries on internet forums asking “what is the V Model?”

There are so many variations that in practice the V Model can mean just about whatever you want it to mean. This is a typical, and typically vague, illustration of the model, or rather, or a version of the V model.

V model 2

It is interesting to do a Google images search on “V Model”. One can see a huge range of images illustrating the model. All share the V shape, and all show an arrow or line linking equivalent stages in each leg. However, the nature of the link is vague and inconsistent. Is it static testing of deliverables? Simple review and sign-off of specifications? Early test planning? Even the direction of the arrow varies. The Wikipedia article on the German V Model has the arrows going in different directions in different diagrams. There is no explanation for this. It simply reflects the confusion in the profession about what exactly the V Model is.

Boiled down to its most basic form, the V Model can mean simply that test planning should start early on, and the test plans should be based on the equivalent level of the left hand development leg. In practice, I fear that that is all it usually does mean.

The trouble is that this really isn’t a worthwhile advance on the Waterfall Model. The V Model is no more than a testing variant of the Waterfall. At best, it is a form of damage limitation, mitigating some of the damage that the Waterfall has on testing. To understand the significance of this for the quality of applications we need to look at the history and defects of the Waterfall Model itself.

Why the Waterfall is bad for testing and quality

In 1970 Royce wrote the famous paper [7] depicting the Waterfall Model. It is a strange example of how perceptive and practical advice can be distorted and misapplied. Using the analogy of a waterfall, Royce set up a straw man model to describe how software developments had been managed in the 1960s. He then demolished the model. Unfortunately posterity has credited him with inventing the model

The Waterfall assumes that a project can be broken up into separate, linear phases and that the output of each phase provides the input for the next phase, thus implying that each would be largely complete before the next one would start. Although Royce did not make the point explicitly, his Waterfall assumes that requirements can, and must, be defined accurately at the first pass, before any significant progress has been made with the design. The problems with the Waterfall, as seen by Royce, are its inability to handle changes, mainly changes forced by test defects, and the inflexibility of the model in dealing with iteration. Further flaws were later identified, but Royce’s criticisms are still valid.

Royce makes the point that prior to testing, all the stages of the project have predictable outputs, i.e. the results of rigorous analysis. The output of the testing stage is inherently unpredictable in the Waterfall, yet this is near the end of the project. Therefore, if the outcome of testing is not what was predicted or wanted, then reworks to earlier stages can wreck the project. The assumption that iteration can be safely restricted to successive stages doesn’t hold. Such reworking could extend right back to the early stages of design, and is not a matter of polishing up the later steps of the previous phase.

Although Royce did not dwell on the difficulty of establishing the requirements accurately at the first pass, this problem did trouble other, later writers. They concluded that not only is it impossible in practice to define the requirements accurately before construction starts, it is wrong in principle because the process of developing the application changes the users’ understanding of what is desirable and possible and thus changes the requirements. Furthermore, there is widespread agreement that the Waterfall is particularly damaging for interactive applications.

The problems of the Waterfall are therefore not because the analysts and developers have screwed up. The problems are inherent in the model. They are the result of following it properly, not badly!

The Waterfall and project management

So if the Waterfall was originally depicted as a straw man in 1970, and if it has been consistently savaged by academics, why is there still debate about it? Respected contemporary textbooks still defend it. Books such as Hallows’ “Information Systems Project Management” [8], a valuable book, frequently used for university courses. The title gives the game away. The Waterfall was shaped by project management requirements, and it therefore facilitates project management. Along with the V model it is neater, and much easier to manage, plan and control than an iterative approach, which looks messy and unpredictable to the project manager.

Raccoon in 1997 [9] used a revealing phrase, describing the Waterfall.

“If the Waterfall model were wrong, we would stop arguing over it. Though the Waterfall model may not describe the whole truth, it describes an interesting structure that occurs in many well-defined projects and it will continue to describe this truth for a long time to come. I expect the Waterfall model will live on for the next one hundred years and more”.

Note the key phrase, “an interesting structure that occurs in many well-defined projects”.

The Waterfall allows project managers to structure projects neatly, and it is good for project plans not applications!

It is often argued that in practice the Waterfall is not applied in its pure form; that there is always some limited iteration and that there is invariably some degree of overlap between the phases. Defenders of the model therefore argue that critics don’t understand how it can be used effectively.

Hallows argues that changes are time-consuming and expensive regardless of the model followed. He dismisses the criticism that the Waterfall doesn’t allow a return to an earlier stage by saying that this would be a case of an individual project manager being too inflexible, rather than a problem with the model.

This is naive. The problem is not solved by better project management, because project management itself has contributed towards the problem; or rather the response of rational project managers to the pressures facing them.

The dangerous influence of project management had been recognised by the early 1980s. Practitioners had always been contorting development practices to fit their project management model, a point argued forcibly by McCracken & Jackson in 1982 [10].

“Any form of life cycle is a project management structure imposed on system development. To contend that any life cycle scheme, even with variations, can be applied to all system development is either to fly in the face of reality or to assume a life cycle so rudimentary as to be vacuous.”

In hindsight the tone of McCracken & Jackson’s paper, and the lack of response to it for years is reminiscent of Old Testament prophets raging in the wilderness. They were right, but ignored.

The symbiosis between project management and the Waterfall has meant that practitioners have frequently been compelled to employ methods that they knew were ineffective. This is most graphically illustrated by the UK government’s mandated use of the PRINCE2 project management method and SSADM development methodology. These two go hand in hand.

They are not necessarily flawed, and this article does not have room for their merits and problems, but they are associated with a traditional approach such as the Waterfall.

The UK’s National Audit Office stated in 2003 [11] that “PRINCE requires a project to be organised into a number of discrete stages, each of which is expected to deliver end products which meet defined quality requirements. Progress to the next stage of the project depends on successfully meeting the delivery targets for the current stage. The methodology fits particularly well with the ‘waterfall’ approach.”

The NAO says in the same paper that “the waterfall … remains the preferred approach for developing systems where it is very important to get the specification exactly right (for example, in processes which are computationally complex or safety critical)”. This is current advice. The public sector tends to be more risk averse than the private sector. If auditors say an approach is “preferred” then it would take a bold and confident project manager to reject that advice.

This official advice is offered in spite of the persistent criticism that it is never possible to define the requirements precisely in advance in the style assumed by the Waterfall model, and that attempting to do so is possible only if one is prepared to steamroller the users into accepting a system that doesn’t satisfy their goals.

The UK government is therefore promoting the use of a project management method partly because it fits well with a lifecycle that is fundamentally flawed because it has been shaped by project management needs rather than those of software development.

The USA and commercial procurement

The history of the Waterfall in the USA illustrates its durability and provides a further insight into why it will survive for some time to come; its neat fit with commercial procurement practices.

The US military was the main customer for large-scale software development contracts in the 1970s and insisted on formal and rigorous development methods. The US Department of Defense (DoD) did not explicitly mandate the Waterfall, but their insistence in Standard DOD-STD-2167 [12] on a staged development approach, with a heavy up-front documentation overhead, and formal review and sign-off of all deliverables at the end of each stage, effectively ruled out any alternative. The reasons for this were quite explicitly to help the DoD to keep control of procurement.

In the 80s the DoD relaxed its requirements and allowed iterative developments. However, it did not forbid the Waterfall, and the Standard’s continued insistence on formal reviews and audits that were clearly consistent with the Waterfall gave developers and contractors every incentive to stick with that model.

The damaging effects of this were clearly identified to the DoD at the time. A report of the Defense Science Board Task Force [13] criticised the effects of the former Standard, and complained that the reformed version did not go nearly far enough.

However, the Task Force had to acknowledge that “evolutionary development plays havoc with the customary forms of competitive procurement, … and they with it.”

The Task Force contained such illustrious experts as Frederick Brooks, Vic Basili and Barry Boehm. These were reputable insiders, part of a DoD Task Force, not iconoclastic academic rebels. They knew the problems caused by the Waterfall and they understood that the rigid structure of the model provided comfort and reassurance that large projects involving massive amounts of public money were under control. They therefore recommended appropriate remedies, involving early prototyping and staged awarding of contracts. They were ignored. Such was the grip of the Waterfall nearly 20 years after it had first been criticised by Royce.

The DoD did not finally abandon the Waterfall till Military Standard 498 (MIL-STD-498) seven years later in 1994, by which time the Waterfall was embedded in the very soul of the IT profession.

Even now the traditional procurement practices referred to by the Task Force, which fit much more comfortably with the Waterfall and the V Model, are being followed because they facilitate control, not quality. It is surely significant that the V Model is the only testing model that students of the Association of Chartered Certified Accountants learn about. It is the model for accountants and project managers, not developers or testers. The contractual relationship between client and supplier reinforces the rigid project management style of development.

Major George Newberry, a US Air Force officer specialising in software acquisition and responsible for collating USAF input to the defence standards, complained in 1995 [14] about the need to deliver mind-numbing amounts of documentation in US defence projects because of the existing standards.

“DOD-STD-2167A imposes formal reviews and audits that emphasize the Waterfall Model and are often nonproductive ‘dog and pony shows’. The developer spends thousands of staff-hours preparing special materials for these meetings, and the acquirer is then swamped by information overload.”

This is a scenario familiar to any IT professional who has worked on external contracts, especially in the public sector. Payments are tied to the formal reviews and dependent on the production of satisfactory documentation. The danger is that supplier staff become fixated on the production of the material that pays their wages, rather than the real substance of the project.

Nevertheless, as noted earlier, the UK public sector in particular, and large parts of the private sector are still wedded to the Waterfall method and this must surely bias contracts against a commitment to quality.

V for veneer?

What is seductive and damaging about the V Model is that it gives the Waterfall approach credibility. It has given a veneer of respectability to a process that almost guarantees shoddy quality. The most damaging aspect is perhaps the effect on usability.

The V Model discourages active user involvement in evaluating the design, and especially the interface, before the formal user acceptance testing stage. By then it is too late to make significant changes to the design. Usability problems can be dismissed as “cosmetic” and the users are pressured to accept a system that doesn’t meet their needs. This is bad if it is an application for internal corporate users. It is potentially disastrous if it is a web application for customers.

None of this is new to academics or test consultants who’ve had to deal with usability problems. However, what practitioners do in the field can often lag many years behind what academics and specialist consultants know to be good practice. Many organisations are a long, long way from the leading edge.

Rex Black provided a fine metaphor for this quality problem in 2002 [15]. After correctly identifying that V Model projects are driven by cost and schedule constraints, rather than quality, Black argues that the resultant fixing of the implementation date effectively locks the top of the right leg of the V in place, while the pivot point at the bottom slips further to the right, thus creating Black’s “ski slope and cliff”.

The project glides down the ski slope, then crashes into the “quality cliff” of the test execution stages that have been compressed into an impossible timescale.

The Waterfall may have resulted in bad systems, but its massive saving grace for companies and governments alike was that they were developed in projects that offered at least the illusion of being manageable! This suggests, as Raccoon stated [9], that the Waterfall may yet survive another hundred years.

The V Model’s great attractions were that it fitted beautifully into the structure of the Waterfall, it didn’t challenge that model, and it just looks right; comfortable and reassuring.

What can testers do to limit the damage?

I believe strongly that iterative development techniques must be used wherever possible. However, such techniques are beyond the scope of this article. Here I am interested only in explaining why the V Model is defective, why it has such a grip on our profession, and what testers can do to limit its potential damage.

The key question is therefore; how can we improve matters when we find we have to use it? As so often in life just asking the question is half the battle. It’s crucial that testers shift their mindset from an assumption that the V Model will guide them through to a successful implementation, and instead regard it as a flawed model with a succession of mantraps that must be avoided.

Testers must first accept the counter-intuitive truth that the Waterfall and V Model only work when their precepts are violated. This won’t come as any great surprise to experienced practitioners, though it is a tough lesson for novices to learn.

Developers and testers may follow models and development lifecyles in theory, but often it’s no more than lip service. When it comes to the crunch we all do whatever works and ditch the theory. So why not adopt techniques that work and stop pretending that the V Model does?

In particular, iteration happens anyway! Embrace it. The choice is between planning for iteration and frantically winging it, trying to squeeze in fixes and reworking of specifications.

Even the most rigid Waterfall project would allow some iteration during test execution. It is crucial that testers ensure there is no confusion between test cycles exercising different parts of the solution, and reruns of previous tests to see whether fixes have been applied. Testers must press for realistic allowances for reruns. One cycle to reveal defects and another to retest is planning for failure.

Once this debate has been held (and won!) with project management the tester should extend the argument. Make the point that test execution provides feedback about quality and risks. It cannot be right to defer feedback. Where it is possible to get feedback early it must be taken.

It’s not the testers’ job to get the quality right. It’s not the testers’ responsibility to decide if the application is fit to implement. It’s our responsibility to ensure that the right people get accurate feedback about quality at the right time. That means feedback to analysts and designers early enough to let them fix problems quickly and cheaply. This feedback and correction effectively introduces iteration. Acknowledging this allows us to plan for it.

Defenders of the V Model would argue that that is entirely consistent with the V Model. Indeed it is. That is the point.

However, what the V Model doesn’t do adequately is help testers to force the issue; to provide a statement of sound practice, an effective, practical model that will guide them to a happy conclusion. It is just too vague and wishy washy. In so far as the V Model means anything, it means to start test planning early, and to base your testing on documents from equivalent stages on the left hand side of the V.

Without these, the V Model is nothing. A fundamental flaw of the V Model is that it is not hooked into the construction stages in the way that its proponents blithely assume. Whoever heard of a development being delayed because a test manager had not been appointed?

“We’ll just crack on”, is the response to that problem. “Once we’ve got someone appointed they can catch up.”

Are requirements ever nailed down accurately and completely before coding starts? In practice, no. The requirements keep evolving, and the design is forced to change. The requirements and design keep changing even as the deadline for the end of coding nears.

“Well, you can push back the delivery dates for the system test plan and the acceptance test plan. Don’t say we’re not sympathetic to the testers!”

What is not acknowledged is that if test planning doesn’t start early, and if the solution is in a state of flux till the last moment, one is not left with a compromised version of the V Model. One is left with nothing whatsoever; no coherent test model.

Testing has become the frantic, last minute, ulcer inducing sprint it always was under the Waterfall and that the V Model is supposed to prevent.

It is therefore important that testers agitate for the adoption of a model that honours the good intentions of the V Model, but is better suited to the realties of development and the needs of the testers.

Herzlich’s W Model

An interesting extension of the V Model is Paul Herzlich’s W Model [16].

Herzlich W model

The W Model removes the vague and ambiguous lines linking the left and right legs of the V and replaces them with parallel testing activities, shadowing each of the development activities.

As the project moves down the left leg, the testers carry out static testing (i.e. inspections and walkthroughs) of the deliverables at each stage. Ideally prototyping and early usability testing would be included to test the system design of interactive systems at a time when it would be easy to solve problems. The emphasis would then switch to dynamic testing once the project moves into the integration leg.

There are several interesting aspects to the W Model. Firstly, it drops the arbitrary and unrealistic assumption that there should be a testing stage in the right leg for each development stage in the left leg. Each of the development stages has its testing shadow, within the same leg.

The illustration shows a typical example where there are the same number of stages in each leg, but it’s possible to vary the number and the nature of the testing stages as circumstances require without violating the principles of the model.

Also, it explicitly does not require the test plan for each dynamic test stage to be based on the specification produced in the twin stage on the left hand side. There is no twin stage of course, but this does address one of the undesirable by-products of a common but unthinking adoption of the V Model; a blind insistence that test plans should be generated from these equivalent documents, and only from those documents.

A crucial advantage of the W Model is that it encourages testers to define tests that can be built into the project plan, and on which development activity will be dependent, thus making it harder for test execution to be squeezed at the end of the project.

However, starting formal test execution in parallel with the start of development must not mean token reviews and sign-offs of the documentation at the end of each stage. Commonly under the V Model, and the Waterfall, test managers receive specifications with the request to review and sign off within a few days what the developers hope is a completed document. In such circumstances test managers who detect flaws can be seen as obstructive rather than constructive. Such last minute “reviews” do not count as early testing.

Morton’s Butterfly Model

Another variation on the V Model is the little known Butterfly Model [17] by Stephen Morton, which shares many features of the W Model.

Butterfly Model

The butterfly metaphor is based on the idea that clusters of testing are required throughout the development to tease out information about the requirements, design and build. These micro-iterations can be structured into the familiar testing stages, and the early static testing stages envisaged by the W Model.

In this model these micro-iterations explicitly shape the development

Butterfly Model micro-iterations

during the progression down the development leg. In essence, each micro-iteration can be represented by a butterfly; the left wing for test analysis, the right wing for specification and design, and the body is test execution, the muscle that links and drives the test, which might consist of more than one piece of analysis and design, hence the segmented wings. Sadly, this model does not seem to have been fully fleshed out, and in spite of its attractions it has almost vanished from sight.

Conclusion – the role of education

The beauty of the W and Butterfly Models is that they fully recognise the flaws of the V Model, but they can be overlaid on the V. That allows the devious and imaginative test manager to smuggle a more helpful and relevant testing model into a project committed to the V Model without giving the impression that he or she is doing anything radical or disruptive.

The V Model is so vague that a test manager could argue with a straight face that the essential features of the W or Butterfly are actually features of the V Model as the test manager believes it must be applied in practice. I would regard this as constructive diplomacy rather than spinning a line!

I present the W and Butterfly Models as interesting possibilities but what really matters is that test managers understand the need to force planned iteration into the project schedule, and to hook testing activities into the project plan so that “testing early” becomes meaningful rather than a comforting and irrelevant platitude. It is possible for test managers to do any of this provided they understand the flaws of the V Model and how to improve matters. This takes us onto the matter of education.

The V Model was the only “model” testers learned about when they took the old ISEB Foundation Certificate. Too many testers regarded that as the end of their education in testing. They were able to secure good jobs or contracts with their existing knowledge. Spending more time and money continuing their learning was not a priority.

As a result of this, and the pressures of project management and procurement, the V Model is unchallenged as the state of the art for testing in the minds of many testers, project managers and business managers.

The V Model will not disappear just because practitioners become aware of its problems. However, a keen understanding of its limitations will give them a chance to anticipate these problems and produce higher quality applications.

I don’t have a problem with testers attempting to extend their knowledge and skills through formal qualifications such as ISEB and ISTQB. However, it is desperately important that they don’t think that what they learn from these courses comprises All You Ever Need to Know About The Only Path to True Testing. They’re biased towards traditional techniques and don’t pay sufficient attention to exploratory testing.

Ultimately we are all responsible for our own knowledge and skills; for our own education. We’ve got to go out there and find out what is possible, and to understand what’s going on. Testers need to make sure they’re aware of the work of testing experts such as Cem Kamer, James Bach, Brian Marick, Lisa Crispin and Michael Bolton. These people have put a huge amount of priceless material out on the internet for free. Go and find it!

References

[1] Arnouts, D. (2008). “Test management in different Software development life cycle models”. “Testing Experience” magazine, issue 2, June 2008.

[2] IABG. “Das V-Modell”. This is in German, but there are links to English pages and a downloadable version of the full documentation in English.

[3] US Dept of Transportation, Federal Highway Administration. “Systems Engineering Guidebook for ITS”.

[4] BCS ISEB Foundation Certificate in Software Testing – Syllabus (NB PDF download).

[5] Wikipedia. “V-Model”.

[6] Wikipedia. “V-Model (software development)”.

[7] Royce, W. (1970). “Managing the Development of Large Software Systems”, IEEE Wescon, August 1970.

[8] Hallows, J. (2005). “Information Systems Project Management”. 2nd edition. AMACOM, New York.

[9] Raccoon, L. (1997). “Fifty Years of Progress in Software Engineering”. SIGSOFT Software Engineering Notes Vol 22, Issue 1 (Jan. 1997). pp88-104.

[10] McCracken, D., Jackson, M. (1982). “Life Cycle Concept Considered Harmful”, ACM SIGSOFT Software Engineering Notes, Vol 7 No 2, April 1982. Subscription required.

[11] National Audit Office. (2003).”Review of System Development – Overview”.

[12] Department of Defense Standard 2167 (DOD-STD-2167). (1975) “Defense System Software Development”, US Government defence standard.

[13] “Defense Science Board Task Force On Military Software – Report” (extracts), (1987). ACM SIGAda Ada Letters Volume VIII , Issue 4 (July/Aug. 1988) pp35-46. Subscription required.

[14] Newberry, G. (1995). “Changes from DOD-STD-2167A to MIL-STD-498”, Crosstalk – the Journal of Defense Software Engineering, April 1995.

[15] Black, R. (2002). “Managing the Testing Process”, p415. Wiley 2002.

[16] Herzlich, P. (1993). “The Politics of Testing”. Proceedings of 1st EuroSTAR conference, London, Oct. 25-28, 1993.

[17] Morton, S. (2001). “The Butterfly Model for Test Development”. Sticky Minds website.

The Post Office Horizon IT scandal, part 3 – audit, risk & perverse incentives

In the first post of this three part series about the scandal of the Post Office’s Horizon IT system I explained the concerns I had about the approach to errors and accuracy. In the second post I talked about my experience working as an IT auditor investigating frauds, and my strong disapproval for the way the Post Office investigated and prosecuted the Horizon cases. In this, the final part, I will look at the role of internal audit and question the apparent lack of action by the Post Office’s internal auditors.

Independence and access to information

There’s a further aspect to the Horizon scandal that troubles me as an ex-auditor. In 2012, after some pressure from a Parliamentary committee, the Post Office commissioned the forensic IT consultancy Second Sight to review Horizon. Second Sight did produce a report that was critical of the system but they could not complete their investigation and issue a final report. They were stymied by the Post Office’s refusal to hand over crucial documents, and they were eventually sacked in 2015. The Post Office ordered Second Sight to hand over or destroy all the evidence it had collected.

An experienced, competent IT audit team should have the technical expertise to conduct its own detailed system review. It was a core part of our job. I can see why in this case it made sense to bring in an outside firm, “for the optics”. However, we would have been keeping a very close eye on the investigation, assisting and co-operating with the investigators as we did with our external auditors. We would have expected the investigators to have the same access rights as we had, and these were very wide ranging.

We always had the right to scope our audits and investigations and we had the right to see any documents or data that we considered relevant. If anyone ever tried to block us we would insist, as a matter of principle, that they should be overruled. This was non-negotiable. If it was possible to stymie audits or investigations by a refusal to co-operate then we could not do our job. This is all covered in the professional standards of the Institute of Internal Auditors. The terms of reference for the Post Office’s Audit, Risk and Compliance Committee makes its responsibilities clear.

“The purpose of the charter will be to grant Internal Audit unfettered access to staff, data and systems required in the course of discharging its responsibilities to the Committee…

Ensure internal audit has unrestricted scope, the necessary resources and access to information to fulfil its mandate.”

I am sure that a good internal audit department, under the strong management that I knew, would have stepped in to demand access to the relevant records in the Horizon case on behalf of the external investigators, and would have pursued the investigation themselves if necessary. It’s inconceivable that we would have let the matter drop under management pressure.

Internal auditors must be independent of management, with a direct reporting line to the board to protect them from attempted intimidation. “Abdication of management responsibilities” was the nuclear phrase in our audit department. It was only to be used by the Group Chief Auditor. He put it in the management summary of one of my reports, referring to the UK General Manager. The explosion was impressive. It was the best example of audit independence I’ve seen. The General Manager stormed into the audit department and started aggressively haranguing the Chief Auditor, who listened calmly then asked. “Have you finished? Ok. The report will not be changed. Goodbye”. I was in awe. You can’t intimidate good auditors. They tend to be strong willed. The weak ones don’t last long, unless they’re part of a low grade and weak audit department that has been captured by the management.

Risk and bonuses

The role of internal audit in the private sector recognises the divergent interests of the executives and the owners. The priority of the auditors is the long term security and health of the company, which means they will often look at problems from a different angle than executives whose priority might be shaped by annual targets, bonuses and the current share price. The auditors keep an eye on the executives, who will often face a conflict of interest.

Humans struggle to think clearly about risk. Mechanical risk matrices like this one (from the Health and Safety Executive, the UK Government inspectorate responsible for regulating workplace safety) serve only to fog thinking. A near certain chance of trivial harm isn’t remotely the same as a trivial chance of catastrophic damage.

UK HSE risk matrix

UK HSE risk matrix

Senior executives may pretend they are acting in the interests of the company in preventing news of a scandal emerging but their motivation could be the protection of their jobs and bonuses. The company’s true, long term interests might well require early honesty and transparency to avoid the risk of massive reputational damage further down the line when the original scandal is compounded by dishonesty, deflection and covering up. By that time the executives responsible may have moved on, or profited from bonuses they might not otherwise have received.

A recurring theme in the court case was that the Post Office’s senior management, especially Paula Vennells, the chief executive from 2012 to 2019, simply wanted the problem to go away. Their perception seems to have been that the real problem was the litigation, rather than the underlying system problems and the lives that were ruined.

In an email, written in 2015 before she appeared in front of a Parliamentary committee, Vennells wrote.

“Is it possible to access the system remotely?

What is the true answer? I hope it is that we know it is not possible and that we are able to explain why that is. I need to say no it is not possible and that we are sure of this because of xxx [sic] and we know this because we had the system assured.”

Again, in 2015, Vennells instructed an urgent review in response to some embarrassingly well informed blog posts, mainly about the Dalmellington Bug, by a campaigning former sub-postmaster, Tim McCormack. Vennells made it clear what she expected from the review.

“I’m most concerned that we/our suppliers appear to be very lax at handling £24k. And want to know we’ve rectified all the issues raised, if they happened as Tim explains.”

These two examples show the chief executive putting pressure on reviewers to hunt for evidence that would justify the answer she wants. It would be the job of internal auditors to tell the unvarnished truth. No audit manager would frame an audit in such an unprofessional way. Reviews like these would have been automatically assigned to IT auditors at the insurance company where I worked. I wonder who performed them at the Post Office.

When the Horizon court case was settled Vennells issued a statement, an apology of sorts.

“I am pleased that the long-standing issues related to the Horizon system have finally been resolved. It was and remains a source of great regret to me that these colleagues and their families were affected over so many years. I am truly sorry we were unable to find both a solution and a resolution outside of litigation and for the distress this caused.”

That is inadequate. Expressing regret is very different from apologising. I also regret that these lives were ruined, but I hardly have any responsibility. Vennells was “truly sorry” only for the litigation and its consequences, although that litigation was what offered the victims hope and rescue.

Vennells resigned from her post in the spring of 2019, eight months before the conclusion of the Horizon court case. In her last year as chief executive Vennells earned £717,500, only £800 less than the previous year. She lost part of her bonus because the Post Office was still mired in litigation, but it hardly seems to have been a punitive cut. Over the course of her seven years as chief executive, according to the annual reports, she earned £4.5 million, half of which came in the form of bonuses. In that last year when she was penalised for the ongoing litigation she still earned £389,000 in bonuses.

These bonuses are subject to clawback clauses (according to the annual reports, available at the last link);

“which provide for the return of any over-payments in the event of misstatement of the accounts, error or gross misconduct on the part of an Executive Director.”

Bonuses for normal workers reflect excellent performance. In the case of chief executives the criterion seems to be “not actually criminal”.

I have dismissed the risk matrix above for being too mechanical and simplistic. There’s a further criticism; it ignores the time it takes for risks to materialise into damage. A risk that is highly unlikely in any particular year might be almost certain over a longer period. It depends how you choose to frame the problem. To apply a crude probability calculation, if the chance of a risk blowing up in a single year is 3%, then there is a 53% chance it will happen at some point over 25 years. If a chief executive is in post for seven years, as Paula Vennells was, there is only a 19% chance of that risk occurring.

These are crude calculations, but there is an important and valid underlying point; a risk that might be intolerable to the organisation might be perfectly acceptable to a chief executive who is incentivised to maximise earnings through bonuses, and push troubling risks down the line for someone else to worry about.

No organisation should choose to remain in the intolerable risk cell, yet Vennells took the Post Office there and it probably made financial sense for her. The Post Office was very likely to lose the Horizon litigation, with massive damage. It wouldn’t happen while she was in post, and it would be extremely unlikely that fighting the case aggressively would be regarded as gross misconduct.

Perverse incentives often tempt managers, and also politicians, to ignore the possibility of dreadful outcomes that are unlikely while they are in post and would force them to incur expense or unpopularity to prepare for. The odds are good that irresponsible management will be rewarded for being wrong and will have left with their hefty bonuses before disaster strikes. On the other hand you can get sacked for doing the right thing long before justification is obvious.

This is, or at least it should be, a big issue for internal auditors who have to keep a sharp eye on risk and misaligned incentives. All too often the only people with a clear eyed, dispassionate understanding of risk are those who are gaming the corporate system. The Post Office’s internal auditors fell down on the job here. Even setting aside the human tragedies, the risks to the Post Office posed by the Horizon system and the surrounding litigation should have been seen as intolerable.

Role of internal audit when organisations move from the public to private sector

This all raises questions about corporate governance and the role of internal audit in bodies like the Post Office that sit between the public and private sectors. The Post Office is owned by the UK government, but with a remit of turning itself into a self-sustaining company without government subsidy. The senior executives were acting like private sector management, but with internal auditors who had a public sector culture, focusing on value for money and petty fraud. There are endless examples of private sector internal auditors losing sight of the big picture. However, a good risk-based audit department will always be thinking of those big risks that could take the company down.

Public bodies are backed by the government and can’t fail in the same way as a private company. When they move into the private sector, the management culture and remuneration system exposes the organisation to a new world of risks. So how do their internal auditors respond? In the case of the Post Office the answer is; badly. The problems were so serious that the internal auditors would have had a professional responsibility to bypass the senior executives and escalate them to board level, and to the external auditors. There is no sign that it happened. The only conclusion is that the Post Office’s internal auditors were either complicit in the Horizon scandal, or negligent. At best, they were taking their salaries under false pretences.

Conclusion

At almost every step, over many years, the Post Office handled the Horizon scandal badly, inexcusably so. They could hardly have done worse. There will be endless lessons that can, and will be drawn, from detailed investigation in what must be the inevitable inquiry. However, for software testers and for IT auditors the big lesson they should take to heart is that bad software, and dysfunctional corporate practices, hurt people and damage lives. The Post Office’s subpostmasters were hard working, decent, business people trying to make a living and provide for their family. They were ruined by a cynical, incompetent corporation. They will receive substantial compensation, but it’s hardly enough. They deserve better.in