In defence of lobbying

Lobbying of politicians has acquired a dirty name, and that’s a pity. In fact, I find it rather irritating. Lobbying means campaigning, providing expert analysis and briefing to politicians, who are usually generalists and need that guidance from experts. Good politicians even come looking for it, to gain a better understanding of complex problems.

That’s when lobbying is being done well and responsibly. Some of my recent work counts as lobbying, related to the law of evidence in England and the Post Office Horizon scandal. I’m proud of that. It is a worthwhile activity and can help the political process work better.

Lobbying is quite different from schmoozing old pals to give other chums contracts without the tedious bureaucracy of open tenders. It’s different from offering politicians highly paid sinecures in the hope they’ll work on behalf of you rather than their constituents. It is certainly different from handing out bribes. Lobbying is not the same as corruption, and that is what we’ve been seeing lately in the UK.

Bugs are about more than code

Bugs are about more than code

Introduction

Recently I have had to think carefully about the nature of software systems, especially complex ones, and the bugs they contain. In doing so my thinking has been guided by certain beliefs I hold about complex software systems. These beliefs, or principles, are based on my practical experience but also on my studies, which, as well as teaching me much that I didn’t know, have helped me to make sense of what I have done and seen at work. Here are three vital principles I hold to be true.

Principle 1

Complex systems are not like calculators, which are predictable, deterministic instruments, i.e. they will always give the same answer from the same inputs. Complex systems are not predictable. We can only predict what they will probably do, but we cannot be certain. It is particularly important to remember this when working with complex socio-technical systems, i.e. complex systems, in the wider sense, that include humans, which are operated by people or require people to make them work. That covers most, or all, complex software systems.

Principle 2

Complex systems are more than the sum of their parts, or at least they are different. A system can be faulty even if all the individual programs,or components, are working correctly. The individual elements can combine with each other, and with the users, in unexpected and possibly damaging ways that could not have been predicted from inspecting the components separately.

Conversely, a system can be working satisfactorily even if some of the components are flawed. This inevitably means that the software code itself, however important it is, cannot be the only factor that determines the quality of the system.

Principle 3

Individual programs in a system can produce harmful outcomes even if their code was written perfectly. The outcome depends on how the different components, factors and people work together over the life of the system. Perfectly written code can cause a failure long after it` has been released when there are changes to the technical, legal, or commercial environment in which the system runs.

The consequences

Bugs in complex systems are therefore inevitable. The absence of bugs in the past does not mean they are absent now, and certainly not that the system will be bug free in the future. The challenge is partly to find bugs, learn from them, and help users to learn how they can use the system safely. But testers should also try to anticipate future bugs, how they might arise, where the system is vulnerable, and learn how and whether users and operators will be able to detect problems and respond. They must then have the communication skills to explain what they have found to the people who need to know.

What we must not do is start off from the assumption that particular elements of the system are reliable and that any problems must have their roots elsewhere in the system. That mindset tends to push blame towards the unfortunate people who operate a flawed system.

Bugs and the Post Office Horizon scandal

justice lost in the postOver the last few months I have spent a lot of time on issues raised by the Post Office Horizon scandal. For a detailed account of the scandal I strongly recommend the supplement that Private Eye has produced, written by Richard Brooks and Nick Wallis, “Justice lost in the post”.

When I have been researching this affair I have realised, time and again, how the Post Office and Fujitsu, the outsourced IT services supplier, ignored the three principles I outlined. While trawling through the judgment of Mr Justice Fraser in Bates v Post Office Ltd (No 6: Horizon Issues, i.e. the second of the two court cases brought by the Justice For Subpostmasters Alliance), which should be compulsory reading for Computer Science students, I was struck by the judge’s discussion of the nature of bugs in computer systems. You can find the full 313 page judgment here [PDF, opens in new tab].

The definition of a bug was at the heart of the second court case. The Post Office, and Fujitsu (the outsourced IT services supplier) argued that a bug is a coding error, and the word should not apply to other problems. The counsel for the claimants, i.e. the subpostmasters and subpostmistresses who had been victims of the flawed system, took a broader view; a bug is anything that means the software does not operate as users, or the corporation, expect.

After listening to both sides Fraser came down emphatically on the claimants’ side.

“26 The phrase ‘bugs, errors or defects’ is sufficiently wide to capture the many different faults or characteristics by which a computer system might not work correctly… Computer professionals will often refer simply to ‘code’, and a software bug can refer to errors within a system’s source code, but ‘software bugs’ has become more of a general term and is not restricted, in my judgment, to meaning an error or defect specifically within source code, or even code in an operating system.

Source code is not the only type of software used in a system, particularly in a complex system such as Horizon which uses numerous applications or programmes too. Reference data is part of the software of most modern systems, and this can be changed without the underlying code necessarily being changed. Indeed, that is one of the attractions of reference data. Software bug means something within a system that causes it to cause an incorrect or unexpected result. During Mr de Garr Robinson’s cross-examination of Mr Roll, he concentrated on ‘code’ very specifically and carefully [de Garr Robinson was the lawyer representing the Post Office and Roll was a witness for the claimants who gave evidence about problems with Horizon that he had seen when he worked for Fujitsu]. There is more to the criticisms levelled at Horizon by the claimants than complaints merely about bugs within the Horizon source code.

27 Bugs, errors or defects is not a phrase restricted solely to something contained in the source code, or any code. It includes, for example, data errors, data packet errors, data corruption, duplication of entries, errors in reference data and/or the operation of the system, as well as a very wide type of different problems or defects within the system. ‘Bugs, errors or defects’ is wide enough wording to include a wide range of alleged problems with the system.”

The determination of the Post Office and Fujitsu to limit the definition of bugs to source code was part of a policy of blaming users for all errors that were not obviously caused by the source code. This is clear from repeated comments from witnesses and from Mr Justice Fraser in the judgment. “User error” was the default explanation for all problems.

This stance was taken to an extreme with “phantom transactions”. These were transactions generated by the system but which were recorded as if they had been made by a user (see in particular paragraphs 209 to 214 of Fraser’s judgment).

In paragraph 212 Fraser refers to a Fujitsu problem report.

“However, the conclusion reached by Fujitsu and recorded in the PEAK was as follows:

‘Phantom transactions have not been proven in circumstances which preclude user error. In all cases where these have occurred a user error related cause can be attributed to the phenomenon.'”

This is striking. These phantom transactions had been observed by Royal Mail engineers. They were known to exist. But they were dismissed as a cause of problems unless it could be proven that user error was not responsible. If Fujitsu could imagine a scenario where user error might have been responsible for a problem they would rule out the possibility that a phantom transaction could have been the cause, even if the phantom had occurred. The PEAK (error report) would simply be closed off, whether or not the subpostmaster agreed.

This culture of blaming users rather than software was illustrated by a case of the system “working as designed” when its behaviour clearly confused and misled users. In fact the system was acting contrary to user commands. In certain circumstances if a user entered the details for a transaction, but did not commit it, the system would automatically complete the transaction with no further user intervention, which might result in a loss to the subpostmaster.

The Post Office, in a witness statement, described this as a “design quirk”. However, the Post Office’s barrister, Mr de Garr Robinson, in his cross-examination of Jason Coyne, an IT consultant hired by the subpostmasters, was able to convince Nick Wallis (one of the authors of “Justice lost in the post”) that there wasn’t a bug.

“Mr de Garr Robinson directs Mr Coyne to Angela van den Bogerd’s witness statement which notes this is a design quirk of Horizon. If a bunch of products sit in a basket for long enough on the screen Horizon will turn them into a sale automatically.

‘So this isn’t evidence of Horizon going wrong, is it?’ asks Mr de Garr Robinson. ‘It is an example of Horizon doing what it was supposed to do.’

‘It is evidence of the system doing something without the user choosing to do it.’ retorts Mr Coyne.

But that is not the point. It is not a bug in the system.”

Not a bug? I would contest that very strongly. If I were auditing a system with this “quirk” I would want to establish the reasons for the system’s behaviour. Was this feature deliberately designed into the system? Or was it an accidental by-product of the system design? Whatever the answer, it would be simply the start of a more detailed scrutiny of technical explanations, understanding of the nature of bugs, the reasons for a two-stage committal of data, and the reasons why those two stages were not always applied. I would not consider “working as designed” to be an acceptable answer.

The Post Office’s failure to grasp the nature of complex systems

A further revealing illustration of the Post Office’s attitude towards user error came in a witness statement provided for the Common Issues trial, the first of the two court cases brought by the Justice For Subpostmasters Alliance. This first trial was about the contractual relationship between the Post Office and subpostmasters. The statement came from Angela van den Bogerd. At the time she was People Services Director for the Post Office, but over the previous couple of decades she had been in senior technical positions, responsible for Horizon and its deployment. She described herself in court as “not an IT expert”. That is an interesting statement to consider alongside some of the comments in her witness statement.

“[78]… the Subpostmaster has complete control over the branch accounts and transactions only enter the branch accounts with the Subpostmaster’s (or his assistant’s) knowledge.

[92] I describe Horizon to new users as a big calculator. I think this captures the essence of the system in that it records the transactions inputted into it, and then adds or subtracts from the branch cash or stock holdings depending on whether it was a credit or debit transaction.”

“Complete control”? That confirms her admission that she is not an IT expert. I would never have been bold, or reckless, enough to claim that I was in complete control of any complex IT system for which I was responsible. The better I understood the system the less inclined I would be to make such a claim. Likening Horizon to a calculator is particularly revealing. See Principle 1 above. When I have tried to explain the nature of complex systems I have also used the calculator analogy, but as an illustration of what a complex system is not.

If a senior manager responsible for Horizon could use such a fundamentally mistaken analogy, and be prepared to insert it in a witness statement for a court case, it reveals how poorly equipped the Post Office management was to deal with the issues raised by Horizon. When we are confronted by complexity it is a natural reaction to try and construct a mental model that simplifies the problems and makes them understandable. This can be helpful. Indeed it is often essential if we are too make any sense of complexity. I have written about this here in my blog series “Dragons of the unknown”.

However, even the best models become dangerous if we lose sight of their limitations and start to think that they are exact representations of reality. They are no longer fallible aids to understanding, but become deeply deceptive.

If you think a computer system is like a calculator then you will approach problems with the wrong attitude. Calculators are completely reliable. Errors are invariably the result of users’ mistakes, “finger trouble”. That is exactly how senior Post Office managers, like Angela van den Bogerd, regarded the Horizon problems.

BugsZero

The Horizon scandal has implications for the argument that software developers can produce systems that have no bugs, that zero bugs is an attainable target. Arlo Belshee is a prominent exponent of this idea, of BugsZero as it is called. Here is a short introduction.

Before discussing anyone’s approach to bugs it is essential that we are clear what they mean by a bug. Belshee has a useful definition, which he provided in this talk in Singapore in 2016. (The conference website has a useful introduction to the talk.)

3:50 “The definition (of a bug) I use is anything that would frustrate, confuse or annoy a human and is potentially visible to a human other than the person who is currently actively writing (code).”

This definition is close to Justice Fraser’s (see above); “a bug is anything that means the software does not operate as users, or the corporation, expect”. However, I think that both definitions are limited.

BugsZero is a big topic, and I don’t have the time or expertise to do it justice, but for the purposes of this blog I’m happy to concede that it is possible for good coders to deliver exactly what they intend to, so that the code itself, within a particular program, will not act in ways that will “frustrate, confuse or annoy a human”, or at least a human who can detect the problem. That is the limitation of the definition. Not all faults with complex software will be detected. Some are not even detectable. Our inability to see them does not mean they are absent. Bugs can produce incorrect but plausible answers to calculations, or they can corrupt data, without users being able to see that a problem exists.

I speak from experience here. It might even be impossible for technical system experts to identify errors with confidence. It is not always possible to know whether a complex system is accurate. The insurance finance systems I used to work on were notoriously difficult to understand and manipulate. 100% accuracy was never a serious, practicable goal. As I wrote in “Fix on failure – a failure to understand failure”;

“With complex financial applications an honest and constructive answer to the question ‘is the application correct?’ would be some variant on ‘what do you mean by correct?’, or ‘I don’t know. It depends’. It might be possible to say the application is definitely not correct if it is producing obvious garbage. But the real difficulty is distinguishing between the seriously inaccurate, but plausible, and the acceptably inaccurate that is good enough to be useful. Discussion of accuracy requires understanding of critical assumptions, acceptable margins of error, confidence levels, the nature and availability of oracles, and the business context of the application.”

It is therefore misleading to define bugs as being potentially visible to users. Nevertheless, Belshee’s definition is useful provided that qualifiction is accepted. However, in the same talk, Belshee goes on to make further statements I do not accept.

19:55 “A bug is an encoded developer mistake.”

28:50 “A bug is a mistake by a developer.”

This is a developer-centric view of systems. It is understandable if developers focus on the bugs for which they are responsible. However, if you look at the systems, and bugs, from the other end, from the perspective of users when a bug has resulted in frustration, confusion or annoyance, the responsibility for the problem is irrelevant. The disappointed human is uninterested in whether the problem is with the coding, the design, the interaction of programs or components, or whatever. All that matters is that the system is buggy.

There is a further complication. The coder may well have delivered code that was perfect when it was written and released. But perfect code can create later problems if the world in which it operates changes. See Principle 3 above. This aspect of software is not sufficiently appreciated; it has caused me a great deal of trouble in my career (see the section “Across time, not just at a point in time” in this blog, about working with Big Data).

Belshee does say that developers should take responsibility for bugs that manifest themselves elswhere, even if their code was written correctly. He also makes it clear, when talking about fault tolerant systems (17:22 in the talk above), that faults can arise “when the behaviour of the world is not as we expect”.

However he also says that the system “needs to work exactly as the developer thought if it’s going to recover”. That’s too high a bar for complex socio-technical systems. The most anyone can say, and it’s an ambitious target, is that the programs have been developed exactly as the developers intended. Belshee is correct at the program level; if the programs were not built as the developers intended then recovery will be very much harder. But at the system level we need to be clear and outspoken about the limits of what we can know, and about the inevitability that bugs are present.

If we start to raise hopes that systems might be perfect and bug-free because we believe that we can deliver perfectly written code then we are setting ourselves up for unpleasant recriminations when our users suffer from bugs. It is certainly laudable to eradicate sloppy and cavalier coding and it might be noble for developers to be willing to assume responsibility for all bugs. But it could leave them exposed to legal recriminations if the wider world believes that software developers can and should` ensure systems are free of bugs. This is where lawyers might become involved and that is why I’m unhappy about the name BugsZero, and the undeliverable promise that it implies.

Unnoticed and undetectable bugs in the Horizon case

The reality that a bug might be real and damaging but not detected by users, or even detectable by them, was discussed in the Horizon case.

“[972] Did the Horizon IT system itself alert Subpostmasters of such bugs, errors or defects… and if so how?

[973] Answer: Although the experts were agreed that the extent to which any IT system can automatically alert its users to bugs within the system itself is necessarily limited, and although Horizon has automated checks which would detect certain bugs, they were also agreed that there are types of bugs which would not be detected by such checks. Indeed, the evidence showed that some bugs lay undiscovered in the Horizon system for years. This issue is very easy, therefore, to answer. The correct answer is very short. The answer… is ‘No, the Horizon system did not alert SPMs’. The second part of the issue does not therefore arise.”

That is a significant extract from an important judgment. A senior judge directly addressed the question of system reliability and pronounced that he is satisfied that a complex system cannot be expected to have adequate controls to warn users of all errors.

This is more than an abstract, philosophical debate about proof, evidence and what we can know. In England there is a legal presumption that computer evidence is reliable. This made a significant contribution to the Horizon scandal. Both parties in a court case are obliged to disclose documents which might either support or undermine their case, so that the other side has a chance to inspect and challenge them. The Post Office and Fujitsu did not disclose anything that would have cast doubt on their computer evidence. That failure to disclose meant it was impossible for the subpostmasters being prosecuted to challenge the presumption that the evidence was reliable. The subpostmasters didn’t know about the relevant system problems, and they didn’t even know that that knowledge had been withheld from them.

Replacing the presumption of computer reliability

There are two broad approaches that can be taken in response to the presumption that computer evidence is reliable and the ease with which it can be abused, apart of course from ignoring the problem and accepting that injustice is a price worth paying for judicial convenience. England can scrap the presumption, which would require the party seeking to use the evidence to justify its reliability. Or the rules over disclosure can be strengthened to try and ensure that all relevant information about systems is revealed. Some blend of the two approaches seems most likely.

I have recently contributed to a paper entitled “Recommendations for the probity of computer evidence”. It has been submitted to the Ministry of Justice, which is responsible for the courts in England & Wales, and is available from the Digital Evidence and Electronic Signature Law Review.

The paper argues that the presumption of computer reliability should be replaced by a two stage approach when reliability is challenged. The provider of the data should first be required to provide evidence to demonstrate that they are in control of their systems, that they record and track all bugs, fixes, changes and releases, and that they have implemented appropriate security standards and processes.

If the party wishing to rely on the computer evidence cannot provide a satisfactory response in this first stage then the presumption of reliability should be reversed. The second stage would require them to prove that none of the failings revealed in the first stage might affect the reliability of the computer evidence.

Whatever approach is taken, IT professionals would have to offer an opinion on their systems. How reliable are the systems? What relevant evidence might there be that systems are reliable, or unreliable? Can they demonstrate that they are in control of their systems? Can they reassure senior managers who will have to put their name to a legal document and will be very keen to avoid the humiliation that has been heaped upon Post Office and Fujitsu executives, with the possibility of worse to come?

A challenging future

The extent to which we can rely on computers poses uncomfortable challenges for the English law now, but it will be an increasingly difficult problem for the IT world over the coming years. What can we reasonably say about the systems we work with? How reliable are they? What do we know about the bugs? Are we sufficiently clear about the nature of our systems to brief managers who will have to appear in court, or certify legal documents?

It will be essential that developers and testers are clear in their own minds, and in their communications, about what bugs are. They are not just coding errors, and we must try to ensure people outside IT understand that. Testers must also be able to communicate clearly what they have learned about systems, and they must never say or do anything that suggests systems will be free of bugs.

Testers will have to think carefully about the quality of their evidence, not just about the quality of their systems. How good is our evidence? How far can go in making confident statements of certainty? What do we still not know, and what is the significance of that missing knowledge? Much of this will be a question of good management. But organisations will need good testers, very good testers, who can explain what we know, and what we don’t know, about complex systems; testers who have the breadth of knowledge, the insight, and the communication skills to tell a convincing story to those who require the unvarnished truth.

We will need confident, articulate testers who can explain that a lack of certainty about how complex systems will behave is an honest, clear sighted, statement of truth. It is not an admission of weakness or incompetence. Too many people people in IT have build good careers on bullshitting, on pretending they are more confident and certain than they have any right to be. Systems will inevitably become more complex. IT people will increasingly be dragged into litigation, and as the Post Office and Fujitsu executives have found, misplaced and misleading confidence and bluster in court have excruciating personal consequences. Unpleasant though these consequences are, they hardly compare with the tragedies endured by the subpostmasters and subpostmistresses, whose lives were ruined by corporations who insisted that their complex software was reliable.

The future might be difficult, even stressful, for software testers, but they will have a valuable, essential role to play in helping organisations and users to gain a better understanding of the fallible nature of software. To say the future will be interesting is an understatement; it will present exciting challenges and there should be great opportunities for the best testers.

The last straw – the project that convinced me to resign

Once upon a time there was a project, on which I was the test manager, which prompted me to take the drastic career change I had been mulling over for a year or so.

This project was with a large UK government department. I was the test manager for the supplier who also provided the development team. A rival IT services company ran the Service Delivery function. It was a very interesting (i.e. poisonous) three way relationship!

The development was a six month project to customise an off the shelf package that was the basis for a rule based expert system. It would help users to decide if applicants were eligible on health grounds for welfare payments. The idea was that we would develop a pilot for live running in one particular part of the organisation.

The crucial constraint was that the development had to be fast so that all parties could understand the base product better and learn the right lessons for the customisation required for the rest of the organisation. However, it would be a live application that would shape decisions which would have a huge impact on the lives of poor and vulnerable people. It was a serious matter, not just a throwaway prototype. There was also a strong emphasis on accessibility. If any corners were cut elsewhere there was no question of skimping on accessibility issues.

The stress on accessibility was a key insight into later problems. There were about 100 potential users for the pilot, which was expected to run for only a couple of years till it would be replaced by a more sophisticated version as part of the full programme. So I asked what accessibility needs the current staff had, so I could factor that into my risk assessment. Obviously, or so I thought, the needs of those staff should have the highest priority in testing.

I was bluntly told by the client that I was not allowed to ask that question. We had to assume that any type of user with special needs might be recruited within the lifetime of the application. That was fair enough, but it didn’t sit easily with the expectation that the development would be as fast as possible. It was clear that it would be better to deliver late, or even to fail altogether, rather than to produce an application with accessibility problems. The reason? This government department was responsible for ensuring that other organisations complied with accessibility legislation, so the client was insistent that everything must be done by the book.

I gradually became aware that this was a wider problem than concern about accessibility. At that time there had been a string of embarrassing stories in the British magazine Private Eye about government IT failures. These were well researched and highly critical articles. They were referred to occasionally and it dawned on me that the client was terrified of being in a future article.

When I say they were terrified I mean their fear dictated their whole approach to the job. Working in the private sector I had been used to seeing people take risks, doing what was necessary to get the project in. Sometimes these risks were reckless, sometimes they were shrewdly calculated. But I had never seen a client who was too terrified to take any reasonable risks.

The bottom line wasn’t that the pilot had to be on time, or to work, or to be within budget. The bottom line was that the client managers had to be bullet proof if things went wrong. They were so concerned about protecting themselves that they were refusing to create the conditions for success. They were almost guaranteeing failure, but a failure for which they would not be blamed.

The response of client management was to retain all of the normal processes and project governance. These were specified in the contract. Testing was required to produce the full set of horribly obsolete, IEEE 829 style documentation. Payment to the supplier was staged with each chunk of money being released as each document passed a quality gate.

The only concession to speed was that the project team was expected to do everything faster than normal.

I expended a huge amount of effort, working long hours, writing the test plans. One of my great assets to my employer was that I produced great, high quality, impressive looking documentation.

In this case the documents were produced on shifting sands because neither I nor the development team had an adequate understanding of either the base product or what the tailored version would look like. We were able to acquire that understanding later, once our documents had been written and approved. We were always working half-blind, catching up later with our understanding.

However, everyone was happy. At least everyone at a senior level was happy as my beautiful documents sailed through the quality gates. The client paid out the money, and we had a celebratory night out every time we got another payment.

Meanwhile the testing team were miserable. They were stressed as they worked hard trying to piece together their preparations using unhelpful and barely relevant plans. The situation was rescued by my fantastic deputy test manager. I was the better test manager, in a skilled corporate operator sense. She was extremely experienced and capable, and she was the better hands on test manager. We were happy to admit this, with wry smiles, to each other, and we made a good team in those difficult circumstances. While I was working hard to produce useless test plans that would nevertheless pay our salaries, she drew up a set of informal test plans that the testers could work from.

When it came to test execution we winged it frantically. The official plans were shelfware, and we had to constantly adapt and improvise using the informal plans my deputy had prepared. We could have done it so much better and less stressfully if we had been able to plan and prepare on a realistic basis for the specific problems of this application.

What were the main differences between the two sets of plans?

The official plans did not deal with risk. The client refused to acknowledge in writing the real risk that motivated them, the fear of bad publicity. They insisted, infuriatingly, that everything had to be fully tested. They would not prioritise the testing so we could adapt the plan and concentrate on the crucial, high priority testing if we started to run out of time. They would only sign off a test plan that pretended it would be possible to do everything in time. So our informal test plans had to allow for the inevitable, and assign priorities based on our growing knowledge of the client and the system.

The client also refused to accept that using a new package would inevitably make it hard to predict how long it would take to tailor and develop applications. We were not allowed any flexibility in the schedule. Yet we were obliged to follow the full formal process, until such point as they would concede that we could skip or adapt steps. They would never do so until it was too late, by which time we had already acted unilaterally.

A particular annoyance was the insistence that the detailed plans should identify the type of people required to develop and test the system at each stage of development, and plan for recruiting this set of perfect people. There was a basic team in place, with the skills and experience to do the job, more or less. The schedule meant we had to do the best we could with the people available. That was fine, but I had to waste ridiculous amounts of time arguing about whether they were the right people, or whether we should look for different people. There wasn’t the budget or the time to go looking for a different team, but the plans had to be drawn up on the basis that we would try to develop and test with a perfect team – if we wanted to get the plans signed off and to be paid.

Another source of friction was that the client had completely ignored the non-functional requirements, with the result that the supplier architects had drawn up a requirements specification that was no more than vague mush; a set of motherhood statements and untestable criteria that were useful for a service level agreement, but not testing.

We had to re-open the user workshops to address the non-functional requirements, so we could get some meaningful input from the users as a basis for our testing. This caused real trouble, because it was a direct, explicit statement by the test team that the official plans and process were inadequate, even though they had been signed off by the supplier and client. My depute and I did not budge on this point and we were insistent. It was clear that our stance was considered insulting to the professionalism of the client. I had already decided I was going to resign soon, so I was happy to take all the blame, and unpopularity, for rocking the boat.

What were the lessons learned by the client and supplier? Well, there were no useful lessons taken on board. We were only slightly late, the quality was acceptable given that it was a pilot, there were no serious accessibility issues and we didn’t miss our budget by too much. The fact that the project team, and especially the testers, hated the experience wasn’t relevant. Nor was it acknowledged that the only reason we achieved anything was because we didn’t follow the standard driven, officially documented plan.

That was a crucial insight into what happened. There were two projects in effect. There was the official one, a beautifully planned and documented project, and a messy, chaotic project. There was little relation between the two. The messy project was the one that mattered. It delivered. The pristine document driven project was the only one that was visible and judged, but it did nothing, except bring in the money.

I thought it was a ludicrous way to work, but it was clearly the way ahead at that client. It wasn’t that people were working like this because they thought it was better than Agile, which would have been well suited to that project, if there had been a willingness to do it properly. There was no conscious rejection of Agile. Client management seemed to think vaguely that Agile meant doing the same things, but magically somehow doing them faster. In that organisation people were following prescriptive processes and standards because it gave them a sense of protection when they failed. Perversely this way of working created stress and misery for most of the people on the project. Protection? I don’t think so.

Anyway, I had had enough of this style of working, so I handed in my notice and left. The full IT development programme at that client was scheduled to run for another three years. About a year later it was cancelled.

Teachers, children, testers and leaders (2013)

Testing Planet 2020This article appeared in the March 2013 edition of Testing Planet, which is published by the wonderful Ministry of Testing, one of the most exciting developments in software testing over the last 20 years.

I’m moving this article onto my blog from my website, which will shortly be decommissioned.teachers, children, testers and leaders The article was written in January 2013. Looking at it again I see that I was starting to develop arguments I fleshed out over the next couple of years as part of the Stop 29119 campaign against the testing standard, ISO 29119.

The article

“A tester is someone who knows things can be different” – Gerald Weinberg.

Leaders aren’t necessarily people who do things, or order other people about. To me the important thing about leaders is that they enable other people to do better, whether by inspiration, by example or just by telling them how things can be different – and better. The difference between a leader and a manager is like the difference between a great teacher and, well, the driver of the school bus. Both take children places, but a teacher can take children on a journey that will transform their whole life.

My first year or so in working life after I left university was spent in a fog of confusion. I struggled to make sense of the way companies worked; I must be more stupid than I’d always thought. All these people were charging around, briskly getting stuff done, making money and keeping the world turning; they understood what they were doing and what was going on. They must be smarter than me.

Gradually it dawned on me that very many of them hadn’t a clue. They were no wiser than me. They didn’t really know what was going on either. They thought they did. They had their heads down, working hard, convinced they were contributing to company profits, or at least keeping the losses down.

The trouble was their efforts often didn’t have much to do with the objectives of the organisation, or the true goals of the users and the project in the case of IT. Being busy was confused with being useful. Few people were capable of sitting back, looking at what was going on and seeing what was valuable as opposed to mere work creation.

I saw endless cases of poor work, sloppy service and misplaced focus. I became convinced that we were all working hard doing unnecessary, and even harmful, things for users who quite rightly were distinctly ungrateful. It wasn’t a case of the end justifying the means; it was almost the reverse. The means were only loosely connected to the ends, and we were focussing obsessively on the means without realising that our efforts were doing little to help us achieve our ends.

Formal processes didn’t provide a clear route to our goal. Following the process had become the goal itself. I’m not arguing against processes; just the attitude we often bring to them, confusing the process with the destination, the map with the territory. The quote from Gerald Weinberg absolutely nails the right attitude for testers to bring to their work. There are twin meanings. Testers should know there is a difference between what people expect, or assume, and what really is. They should also know that there is a difference between what is, and what could be.

Testers usually focus on the first sort of difference; seeing the product for what it really is and comparing that to what the users and developers expected. However, the second sort of difference should follow on naturally. What could the product be? What could we be doing better?

Testers have to tell a story, to communicate not just the reality to the stakeholders, but also a glimpse of what could be. Organisations need people who can bring clear headed thinking to confusion, standing up and pointing out that something is wrong, that people are charging around doing the wrong things, that things could be better. Good testers are well suited by instinct to seeing what positive changes are possible. Communicating these possibilities, dispelling the fog, shining a light on things that others would prefer to remain in darkness; these are all things that testers can and should do. And that too is a form of leadership, every bit as much as standing up in front of the troops and giving a rousing speech.

In Hans Christian’s Andersen’s story, the Emperor’s New Clothes, who showed a glimpse of leadership? Not the emperor, not his courtiers; it was the young boy who called out the truth, that the Emperor was wearing no clothes at all. If testers are not prepared to tell it like it is, to explain why things are different from what others are pretending, to explain how they could be better then we diminish and demean our profession. Leaders do not have to be all-powerful figures. They can be anyone who makes a difference; teachers, children. Or even testers.

Quality isn’t something, it provides something (2012)

Quality isn’t something, it provides something (2012)

Testing Planet 2020This article appeared in the July 2012 edition of Testing Planet, which is published by the wonderful Ministry of Testing, one of the most exciting developments in software testing over the last 20 years.

The article was written in June 2012, but I don’t think it has dated. It’s about the way we think and work with other people.ministry of testing logo These are timeless problems. The idea behind E-prime is particularly interesting. Dispensing with the verb “to be” isn’t something to get obsessive or ideological about, but testers should be aware of the important distinction between the way something is and the way it behaves. The original article had only four references so I have checked them, converted them to hyperlinks, and changing the link to Lera Boroditsky’s paper to a link to her TED talk on the same subject.

The article

Quality isn't something, it provides somethingA few weeks ago two colleagues, who were having difficulty working together, asked me to act as peacekeeper in a tricky looking meeting in which they were going to try and sort out their working relationship. I’ll call them Tony and Paul. For various reasons they were sparking off each and creating antagonism that was damaging the whole team.

An hour’s discussion seemed to go reasonably well; Tony talking loudly and passionately, while Paul spoke calmly and softly. Just as I thought we’d reached an accommodation that would allow us all to work together Tony blurted out, “you are cold and calculating, Paul, that’s the problem”.

Paul reacted as if he’d been slapped in the face, made his excuses and left the meeting. I then spent another 20 minutes talking Tony through what had happened, before separately speaking to Paul about how we should respond.

I told Tony that if he’d wanted to make the point I’d inferred from his comments, and from the whole meeting, then he should have said “your behaviour and attitude towards me throughout this meeting, and when we work together, strike me as cold and calculating, and that makes me very uncomfortable”.

“But I meant that!”, Tony replied. Sadly, he hadn’t said that. Paul had heard the actual words and reacted to them, rather than applying the more dispassionate analysis I had used as an observer. Paul meanwhile found Tony’s exuberant volatility disconcerting, and responded to him in a very studied and measured style that unsettled Tony.

Tony committed two sins. Firstly, he didn’t acknowledge the two way nature of the problem. It should have been about how he reacted to Paul, rather than trying to dump all the responsibility onto Paul.

Secondly, he said that Paul is cold and calculating, rather than acting in a way Tony found cold, and calculating at a certain time, in certain circumstances.

I think we’d all see a huge difference between being “something”, and behaving in a “something” way at a certain time, in a certain situation. The verb “to be” gives us this problem. It can mean, and suggest, many different things and can create fog where we need clarity.

Some languages, such as Spanish, maintain a useful distinction between different forms of “to be” depending on whether one is talking about something’s identity or just a temporary attribute or state.

The way we think obviously shapes the language we speak, but increasingly scientists are becoming aware of how the language we use shapes the way that we think. [See this 2017 TED talk, “How Language Shapes Thought”, by Lera Boroditsky]

The problem we have with “to be” has great relevance to testers. I don’t just mean treating people properly, however much importance we rightly attach to working successfully with others. More than that, if we shy away from “to be” then it helps us think more carefully and constructively as testers.

This topic has stretched bigger brains than mine, in the fields of philosophy, psychology and linguistics. Just google “general semantics” if you want to give your brain a brisk workout. You might find it tough stuff, but I don’t think you have to master the underlying concept to benefit from its lessons.

Don’t think of it as intellectual navel gazing. All this deep thought has produced some fascinating results, in particular something called E-prime, a form of English that totally dispenses with “to be” in all its forms; no “I am”, “it is”, or “you are”. Users of E-prime don’t simply replace the verb with an alternative. That doesn’t work. It forces you to think and articulate more clearly what you want to say. [See this classic paper by Kellogg, “Speaking in E-prime” PDF, opens in new tab].

“The banana is yellow” becomes “the banana looks yellow”, which starts to change the meaning. “Banana” and “yellow” are not synonyms. The banana’s yellowness becomes apparent only because I am looking at it, and once we introduce the observer we can acknowledge that the banana appears yellow to us now. Tomorrow the banana might appear brown to me as it ripens. Last week it would have looked green.

You probably wouldn’t disagree with any of that, but you might regard it as a bit abstract and pointless. However, shunning “to be” helps us to think more clearly about the products we test, and the information that we report. E-prime therefore has great practical benefits.

The classic definition of software quality came from Gerald Weinburg in his book “Quality Software Management: Systems Thinking”.

“Quality is value to some person”.

Weinburg’s definition reflects some of the clarity of thought that E-prime requires, though he has watered it down somewhat to produce a snappy aphorism. The definition needs to go further, and “is” has to go!

Weinburg makes the crucial point that we must not regard quality as some intrinsic, absolute attribute. It arises from the value it provides to some person. Once you start thinking along those lines you naturally move on to realising that quality provides value to some person, at some moment in time, in a certain context.

Thinking and communicating in E-prime stops us making sweeping, absolute statements. We can’t say “this feature is confusing”. We have to use a more valuable construction such as “this feature confused me”. But we’re just starting. Once we drop the final, total condemnation of saying the feature is confusing, and admit our own involvement, it becomes more natural to think about and explain the reasons. “This feature confused me … when I did … because of …”.

Making the observer, the time and the context explicit help us by limiting or exposing hidden assumptions. We might or might not find these assumptions valid, but we need to test them, and we need to know about them so we understand what we are really learning as we test the product.

E-prime fits neatly with the scientific method and with the provisional and experimental nature of good testing. Results aren’t true or false. The evidence we gather matches our hypothesis, and therefore gives us greater confidence in our knowledge of the product, or it fails to match up and makes us reconsider what we thought we knew. [See this classic paper by Kellogg & Bourland, “Working with E-prime – some practical notes” PDF, opens in new tab].

Scientific method cannot be accommodated in traditional script-driven testing, which reflects a linear, binary, illusory worldview, pretending to be absolute. It tries to deal in right and wrong, pass and fail, true and false. Such an approach fits in neatly with traditional development techniques which fetishise the rigours of project management, rather than the rigours of the scientific method.

map and road This takes us back to general semantics, which coined the well known maxim that the map is not the territory. Reality and our attempts to model and describe it differ fundamentally from each other. We must not confuse them. Traditional techniques fail largely because they confuse the map with the territory. [See this “Less Wrong” blog post].

In attempting to navigate their way through a complex landscape, exponents of traditional techniques seek the comfort of a map that turns messy, confusing reality into something they can understand and that offers the illusion of being manageable. However, they are managing the process, not the underlying real work. The plan is not the work. The requirements specification is not the requirements. The map is not the territory.

Adopting E-prime in our thinking and communication will probably just make us look like the pedantic awkward squad on a traditional project. But on agile or lean developments E-prime comes into its own. Testers must contribute constructively, constantly, and above all, early. E-prime helps us in all of this. It makes us clarify our thoughts and helps us understand that we gain knowledge provisionally, incrementally and never with absolute certainty.

I was not consciously deploying E-prime during and after the fractious meeting I described earlier. But I had absorbed the precepts sufficiently to instinctively realise that I had two problems; Tony’s response to Paul’s behaviour, and Paul’s response to Tony’s outburst. I really didn’t see it as a matter of “uh oh – Tony is stupid”.

E-prime purists will look askance at my failure to eliminate all forms of “to be” in this article. I checked my writing to ensure that I’ve written what I meant to, and said only what I can justify. Question your use of the verb, and weed out those hidden assumptions and sweeping, absolute statements that close down thought, rather than opening it up. Don’t think you have to be obsessive about it. As far as I am concerned, that would be silly!

Traditional techniques and motivating staff (2010)

traditional techniques & motivating staffTesting Planet 2020This article appeared in the February 2010 edition of Software Testing Club Magazine, now the Testing Planet. The STC has evolved into the wonderful Ministry of Testing, one of the most exciting developments in software testing over the last 20 years.

That might seem a low bar; testing isn’t meant to be a thrill a minute. But the Ministry of Testing has been a gale of fresh air sweeping through the industry, mixing great content and conferences with an approach to testing that has managed to be both entertaining and deeply serious. It has been a consistent voice of sanity and decency in an industry that has had too much cynicism and short sightedness.ministry of testing logo

I’m moving this article onto my blog fromy my website, which will shortly be decommissioned. Looking back I was interested to see that I didn’t post this article on the website immediately. I had some reservations about the article. I wondered if I had taken a rather extreme stance. I do believe that rigid standards and processes can be damaging, and I certainly believe that enforcing strict compliance, at the expense of initiative and professional judgement, undermines morale.

However, I thought I had perhaps gone too far and might have been thought to be dismissing the idea of any formality, and that I might be seen to be advocating software development as an entirely improvised activity with everyone winging it. That’s not the case. We need to have some structure, some shape and formality to our work. It’s just that prescriptive standards and processes aren’t sensitive to context and become a straitjacket. This was written in January 2010 and it was a theme I spent a good deal of time on when the ISO 29119 standard was released a few years later and the Stop 29119 campaign swung into action.

So I still largely stand by this article, though I think it is lacking in nuance in some respects. In particular the bald statement “development isn’t engineering”, while true does require greater nuance, unpacking and explanation. Development isn’t engineering in the sense that engineering is usually understood, and it’s certainly not akin to civil engineering. But it should aspire to be more “engineering like”, while remaining realistic about the reality of software development. I was particularly interested to see that I described reality as being chaotic in 2010, a couple of years before I started to learn about Cynefin.

The article

Do we follow the standards or use our initiative?

Recently I’ve been thinking and writing about the effects of testing standards. The more I thought, the more convinced I became that standards, or any rigid processes, can damage the morale, and even the professionalism, of IT professionals if they are not applied wisely.

The problem is that calling them “standards” implies that they are mandatory and should be applied in all cases. The word should be reserved for situations where compliance is essential, eg security, good housekeeping or safety critical applications.

I once worked for a large insurance company as an IT auditor in Group Audit. I was approached by Information Services. Would I consider moving to lead a team developing new management information (MI) applications? It sounded interesting, so I said yes.

On my first day in the new role I asked my new manager what I had to do. He stunned me when he said. “You tell me. I’ll give you the contact details for your users. Go and see them. They’re next in line to get an MI application. See what they need, then work out how you’re going to deliver it. Speak to other people to see how they’ve done it, but it’s up to you”.

The company did have standards and processes, but they weren’t rigid and they weren’t very useful in the esoteric world of insurance MI, so we were able to pick and choose how we developed applications.

My users were desperate for a better understanding of their portfolio; what was profitable, and what was unprofitable. I had no trouble getting a manager and a senior statistician to set aside two days to brief me and my assistant. There was just us, a flip chart, and gallons of coffee as they talked us through the market they were competing in, the problems they faced and their need for better information from the underwriting and claims applications with which they did business.

I realised that it was going to be a pig of a job to give them what they needed. It would take several months. However, I could give them about a quarter of what they needed in short order. So we knocked up a quick disposable application in a couple of weeks that delighted them, and then got to work on the really tricky stuff.

The source systems proved to be riddled with errors and poor quality data, so it took longer than expected. However, we’d got the users on our side by giving them something quickly, so they were patient.

It took so long to get phase 1 of the application working to acceptable tolerances that I decided to scrap phase 2, which was nearly fully coded, and rejig the design of the first part so that it could do the full job on its own. That option had been ruled out at the start because there seemed to be insurmountable performance problems.

Our experience with testing had shown that we could make the application run much faster than we’d thought possible, but that the fine tuning of the code to produce accurate MI was a nightmare. It therefore made sense to clone jobs and programs wholesale to extend the first phase and forget about trying to hack the phase 2 code into shape.

The important point is that I was allowed to take a decision that meant binning several hundred hours of coding effort and utterly transforming a design that had been signed off.

I took the decision during a trip to the dentist, discussed it with my assistant on my return, sold the idea to the users and only then did I present my management with a fait accompli. They had no problems with it. They trusted my judgement, and I was taking the users along with me.

The world changed and an outsourcing deal meant I was working for a big supplier, with development being driven by formal processes, rigid standards and contracts. This wasn’t all bad. It did give developers some protection from the sort of unreasonable pressure that could be brought to bear when relationships were less formal. However, it did mean that I never again had the same freedom to use my own initiative and judgement.

The bottom line was that it could be better to do the wrong thing for the corporately correct reason, than to do the right thing the “wrong” way. By “better” I mean better for our careers, and not better for the customers.

Ultimately that is soul destroying. What really gets teams fired up is when developers, testers and users all see themselves as being on the same side, determined to produce a great product.

Development isn’t engineering

Reality is chaotic. Processes are perfectly repeatable only if one pretends that reality is neat, orderly and predictable. The result is strain, tension and developers ordered to do the wrong things for the “right” reasons, to follow the processes mandated by standards and by the contract.

Instead of making developers more “professional” it has exactly the opposite effect. It reduces them to the level of, well, second hand car salesmen, knocking out old cars with no warranty. It’s hardly a crime, but it doesn’t get me excited.

Development and testing become drudgery. Handling the users isn’t a matter of building lasting relationships with fellow professionals. It’s a matter of “managing the stakeholders”, being diplomatic with them rather than candid, and if all else fails telling them “to read the ******* contract”.

This isn’t a rant about contractual development. Contracts don’t have to be written so that the development team is in a strait-jacket. It’s just that traditional techniques fit much more neatly with contracts than agile, or any iterative approach.

Procurement is much simpler if you pretend that traditional, linear techniques are best practice; if you pretend that software development is like civil engineering, and that developing an application is like building a bridge.

Development and testing are really not like that all. The actual words used should be a good clue. Development is not the same as construction. Construction is what you do when you’ve developed an idea to the point where it can be manufactured, or built.

Traditional techniques were based on that fundamental flaw; the belief that development was engineering, and that repeatable success required greater formality, more tightly defined processes and standards, and less freedom for developers.

Development is exploring

Good development is a matter of investigation, experimentation and exploration. It’s about looking at the possibilities, and evaluating successive versions. It’s not about plodding through a process document.

Different customers, different users and different problems will require different approaches. These various approaches are not radically different from each other, but they are more varied than is allowed for by rigid and formal processes.

Any organisation that requires development teams to adhere to these processes, rather than make their own judgements based on their experience and their users’ needs, frequently requires the developers to do the wrong things.

This is demoralising, and developers working under these constraints have the initiative, enthusiasm and intellectual energy squeezed out of them. As they build their careers in such an atmosphere they become corporate bureaucrats.

They rise to become not development managers, but managers of the development process; not test managers, but managers of the test process. Their productivity is measured in meetings and reports. Sometimes the end product seems a by-product of the real business; doing the process.

If people are to realise their potential they need managers who will do as mine did; who will say, “here is your problem, tell me how you’re going to solve it”.

We need guidance from processes and standards in the same way as we need guidance from more experienced practitioners, but they should be suggestions of possible approaches so that teams don’t flounder around in confused ignorance, don’t try to re-invent the wheel, and don’t blunder into swamps that have consumed previous projects.

If development is exploration it is thrilling and brings out the best in us. People rise to the challenge, learn and grow. They want to do it again and get better at it. If development means plodding through a process document it is a grind.

I know which way has inspired me, which way has given users applications I’ve been proud of. It wasn’t the formal way. It wasn’t the traditional way. Set the developers free!

Testers and coders are both developers (2009)

This article appeared in September 2009 as an opinion piece on the front page of TEST magazine‘s website. I’m moving the article onto my blog from my website, which will be decommissioned soon. It might be an old article, but it remains valid.

The article

testers and coders are both developersWhen I was a boy I played football non-stop; in organised matches, in playgrounds or in the park, even kicking coal around the street!

There was a strict hierarchy. The good players, the cool kids; they were forwards. The one who couldn’t play were defenders. If you were really hopeless you were a goalkeeper. Defending was boring. Football was about fun, attacking and scoring goals.

When I moved into IT I found a similar hierarchy. I had passed the programming aptitude test. I was one of the elite. I had a big head, to put it mildly! The operators were the defenders, the ones who couldn’t do the fun stuff. We were vaguely aware they thought the coding kids were irresponsible cowboys, but who cared?

As for testers, well, they were the goalkeepers. Frankly, they were lucky to be allowed to play at all. They did what they were told. Independence? You’re joking, but if they were good they were allowed to climb the ladder and become programmers.

Gradually things changed. Testers became more clearly identified with the users. They weren’t just menial team members. A clear career path opened up as testing professionals.

However, that didn’t earn them respect from programmers. Now testing is changing again. Agile gives testers the chance to learn and apply interesting coding skills. Testers can be just as technical as coders. They might code in different ways, for different reasons, but they can be coders too.

That’s great isn’t it? Well, up to a point. It’s fantastic that testers have these exciting opportunities. But I worry that programmers might start respecting the more technical testers for the wrong reason, and that testers who don’t code will still be looked down on. Testers shouldn’t try to impress programmers with their coding skills. That’s just a means to an end.

We’ll still need testers who don’t code and it’s vital that if testers are to achieve the respect they deserve then they must be valued for all the skills they bring to the team, not just the skills they share with programmers. For a start, we should stop referring to developers and testers. Testers always were part of the development process. In Agile teams they quite definitely are developers. It’s time everyone acknowledged that. Development is a team game.

Football teams who played the way we used to as kids got thrashed if they didn’t grow up and play as a team. Development teams who don’t ditch similar attitudes will be equally ineffective.

Bridging the gap – Agile and the troubled relationship between UX and software engineering (2009)

Bridging the gap – Agile and the troubled relationship between UX and software engineering (2009)

This article appeared as the cover story for the September 2009 edition of TEST magazine.

I’m moving the article onto my blog from my website, which will be decommissioned soon. If you choose to read the article please bear in mind that it was written in August 2009 and is therefore inevitably dated in some respects, though I think that much of it is still valid, especially where I am discussing the historical problems.

The article

TEST magazine coverFor most of its history software engineering has had great difficulty building applications that users found enjoyable. Far too many applications were frustrating and wasted users’ time.

That has slowly changed with the arrival of web developments, and I expect the spread of Agile development to improve matters further.

I’m going to explain why I think developers have had difficulty building usable applications and why user interaction designers have struggled to get their ideas across. For simplicity I’ll just refer to these designers and psychologists as “UX”. There are a host of labels and acronyms I could have used, and it really wouldn’t have helped in a short article.

Why software engineering didn’t get UX

Software engineering’s problems with usability go back to its roots, when geeks produced applications for fellow geeks. Gradually applications spread from the labs to commerce and government, but crucially users were wage slaves who had no say in whether they would use the application. If they hated it they could always find another job!

Gradually applications spread into the general population until the arrival of the internet meant that anyone might be a user. Now it really mattered if users hated your application. They would be gone for good, into the arms of the competition.

Software engineering had great difficulty coming to terms with this. The methods it had traditionally used were poison to usability. The Waterfall lifecycle was particularly damaging.

The Waterfall had two massive flaws. At its root was the implicit assumption that you can tackle the requirements and design up front, before the build starts. This led to the second problem; iteration was effectively discouraged.

Users cannot know what they want till they’ve seen what is possible and what can work. In particular, UX needs iteration to let analysts and users build on their understanding of what is required.

The Waterfall meant that users could not see and feel what an application was like until acceptance testing at the end when it was too late to correct defects that could be dismissed as cosmetic.

The Waterfall was a project management approach to development, a means of keeping control, not building good products. This made perfect sense to organisations who wanted tight contracts and low costs.

The desire to keep control and make development a more predictable process explained the damaging attempt to turn software engineering into a profession akin to civil engineering.

So developers were sentenced to 20 years hard labour with structured methodologies, painstakingly creating an avalanche of documentation; moving from the current physical system, through the current logical system to a future logical system and finally a future physical system.

However, the guilty secret of software engineering was that translating requirements into a design wasn’t just a difficult task that required a methodical approach; it’s a fundamental problem for developers. It’s not a problem specific to usability requirements, and it was never resolved in traditional techniques.

The mass of documentation obscured the fact that crucial design decisions weren’t flowing predictably and objectively from the requirements, but were made intuitively by the developers – people who by temperament and training were polar opposites of typical users.

Why UX didn’t get software development

Not surprisingly, given the massive documentation overhead of traditional techniques, and developers’ propensity to pragmatically tailor and trim formal methods, the full process was seldom followed. What actually happened was more informal and opaque to outsiders.

The UX profession understandably had great difficulty working out what was happening. Sadly they didn’t even realise that they didn’t get it. They were hampered by their naivety, their misplaced sense of the importance of UX and their counter-productive instinct to retain their independence from developers.

If developers had traditionally viewed functionality as a matter of what the organisation required, UX went to the other extreme and saw functionality as being about the individual user. Developers ignored the human aspect, but UX ignored commercial realities – always a fast track to irrelevance.

UX took software engineering at face value, tailoring techniques to fit what they thought should be happening rather than the reality. They blithely accepted the damaging concept that usability was all about the interface; that the interface was separate from the application.

This separability concept was flawed on three grounds. Conceptually it was wrong. It ignored the fact that the user experience depends on how the whole application works, not just the interface.

Architecturally it was wrong. Detailed technical design decisions can have a huge impact on the users. Finally separability was wrong organisationally. It left the UX profession stranded on the margins, in a ghetto, available for consultation at the end of the project, and then ignored when their findings were inconvenient.

An astonishing amount of research and effort went into justifying this fallacy, but the real reason UX bought the idea was that it seemed to liberate them from software engineering. Developers could work on the boring guts of the application while UX designed a nice interface that could be bolted on at the end, ready for testing. However, this illusory freedom actually meant isolation and impotence.

The fallacy of separability encouraged reliance on usability testing at the end of the project, on summative testing to reveal defects that wouldn’t be fixed, rather than formative testing to stop these defects being built in the first place.

There’s an argument that there’s no such thing as effective usability testing. If it takes place at the end it’s too late to be effective. If it’s effective then it’s done iteratively during design, and it’s just good design rather than testing.

So UX must be hooked into the development process. It must take place early enough to allow alternative designs to be evaluated. Users must therefore be involved early and often. Many people in UX accept this completely, though the separability fallacy is still alive and well. However, its days are surely numbered. I believe, and hope, that the Agile movement will finally kill it.

Agile and UX – the perfect marriage?

The mutual attraction of Agile and UX isn’t simply a case of “my enemy’s enemy is my friend”. Certainly they do have a common enemy in the Waterfall, but each discipline really does need the other.

Agile gives UX the chance to hook into development, at the points where it needs to be involved to be effective. Sure, with the Waterfall it is possible to smuggle effective UX techniques into a project, but they go against the grain. It takes strong, clear-sighted project managers and test managers to make them work. The schedule and political pressures on these managers to stop wasting time iterating and to sign off each stage is huge.

If UX needs Agile, then equally Agile needs UX if it is to deliver high quality applications. The opening principle of the Agile Manifesto states that “our highest priority is to satisfy the customer through early and continuous delivery of valuable software”.

There is nothing in Agile that necessarily guarantees better usability, but if practitioners believe in that principle then they have to take UX seriously and use UX professionals to interpret users’ real needs and desires. This is particularly important with web applications when developers don’t have direct access to the end users.

There was considerable mutual suspicion between the two communities when Agile first appeared. Agile was wary of UX’s detailed analyses of the users, and suspected that this was a Waterfall style big, up-front requirements phase.

UX meanwhile saw Agile as a technical solution to a social, organisational problem and was rightly sceptical of claims that manual testing would be a thing of the past. Automated testing of the human interaction with the application is a contradiction.

Both sides have taken note of these criticisms. Many in UX now see the value in speeding up the user analysis and focussing on the most important user groups, and Agile has recognised the value of up-front analysis to help them understand the users.

The Agile community is also taking a more rounded view of testing, and how UX can fit into the development process. In particular check out Brian Marick’s four Agile testing quadrants.Agile Testing Quadrants

UX straddles quadrants two and three. Q2 contains the up-front analysis to shape the product, and Q3 contains the evaluation of the results. Both require manual testing, and use such classic UX tools as personas (fictional representative users) and wireframes (sketches of webpages).

Other people who’re working on the boundary of Agile and UX are Jeff Patton, Jared Spool and Anders Ramsay. They’ve come up with some great ideas, and it’s well worth checking them out. Microsoft have also developed an interesting technique called the RITE method.

This is an exciting field. Agile changes the role of testers and requires them to learn new skills. The integration of UX into Agile will make testing even more varied and interesting.

There will still be tension between UX and software professionals who’ve been used to working remotely from the users. However, Agile should mean that this is a creative tension, with each group supporting and restraining the others.

This is a great opportunity. Testers will get the chance to help create great applications, rather than great documentation!

What happens to usability when development goes offshore? (2009)

What happens to usability when development goes offshore? (2009)

Testing ExperienceThis article appeared in the March 2009 edition of Testing Experience magazine, which is no longer published. I’m moving the article onto my blog from my website, which will be decommissioned soon.

If you choose to read the article please bear in mind that it was written in January 2009 and is therefore inevitably dated in some respects, though I think that much of it is still valid.

The references in the article were all structured for a paper magazine. There are no hyperlinks and I have not tried to recreate them and check out whether they still work.

The article

Two of the most important trends in software development over the last 20 years have been the increasing number of companies sending development work to cheaper labour markets, and the increasing attention that is paid to the usability of applications.offshore usability article

Developers in Europe and North America cannot fail to have missed the trend to offshore development work and they worry about the long-term implications.

Usability, however, has had a mixed history. Many organizations and IT professionals have been deeply influenced by the need to ensure that their products and applications are usable; many more are only vaguely aware of this trend and do not take it seriously.

As a result, many developers and testers have missed the significant implications that offshoring has for building usable applications, and underestimate the problems of testing for usability. I will try to explain these problems, and suggest possible remedial actions that testers can take if they find themselves working with offshore developers. I will be looking mainly at India, the largest offshoring destination, because information has been more readily available. However, the problems and lessons apply equally to other offshoring countries.

According to NASSCOM, the main trade body representing the Indian IT industry [1], the numbers of IT professionals employed in India rose from 430,000 in 2001 to 2,010,000 in 2008. The numbers employed by offshore IT companies rose 10 fold, from 70,000 to 700,000.

It is hard to say how many of these are usability professionals. Certainly at the start of the decade there were only a handful in India. Now, according to Jumkhee Iyengar, of User in Design, “a guesstimate would be around 5,000 to 8,000”. At most that’s about 0.4% of the people working in IT. Even if they were all involved in offshore work they would represent no more than 1% of the total.

Does that matter? Jakob Nielsen, the leading usability expert, would argue that it does. His rule of thumb [2] is that “10% of project resources must be devoted to usability to ensure a good quality product”.

Clearly India is nowhere near capable of meeting that figure. To be fair, the same can be said of the rest of the world given that 10% represents Nielsen’s idea of good practice, and most organizations have not reached that stage. Further, India traditionally provided development of “back-office” applications, which are less likely to have significant user interaction.

Nevertheless, the shortage of usability professionals in the offshoring destinations does matter. Increasingly offshore developments have involved greater levels of user interaction, and any shortage of usability expertise in India will damage the quality of products.

Sending development work offshore always introduces management and communication problems. Outsourcing development, even within the same country, poses problems for usability. When the development is both offshored and outsourced, the difficulties in producing a usable application multiply. If there are no usability professionals on hand, the danger is that the developers will not only fail to resolve those problems – they will probably not even recognize that they exist.

Why can outsourcing be a problem for usability?

External software developers are subject to different pressures from internal developers, and this can lead to poorer usability. I believe that external suppliers are less likely to be able to respond to the users’ real needs, and research supports this. [3, 4, 5, 6, 7]

Obviously external suppliers have different goals from internal developers. Their job is to deliver an application that meets the terms of the contract and makes a profit in doing so. Requirements that are missed or are vague are unlikely to be met, and usability requirements all too often fall into one of these categories. This is not simply a matter of a lack of awareness. Usability is a subjective matter, and it is difficult to specify precise, objective, measurable and testable requirements. Indeed, trying too hard to do so can be counter-productive if the resulting requirements are too prescriptive and constrain the design.

A further problem is that the formal nature of contractual relationships tends to push clients towards more traditional, less responsive and less iterative development processes, with damaging results for usability. If users and developers are based in different offices, working for different employers, then rapid informal feedback becomes difficult.

Some of the studies that found these problems date back to the mid 90s. However, they contain lessons that remain relevant now. Many organizations have still not taken these lessons on board, and they are therefore facing the same problems that others confronted 10 or even 20 years ago.

How can offshoring make usability problems worse?

So, if simple outsourcing to a supplier in the same country can be fraught with difficulty, what are the usability problems that organizations face when they offshore?

Much of the discussion of this harks back to an article by Jakob Nielsen in 2002 [2]. Nielsen stirred up plenty of discussion about the problem, much of it critical.

“Offshore design raises the deeper problem of separating interaction designers and usability professionals from the users. User-centered design requires frequent access to users: the more frequent the better.”

If the usability professionals need to be close to the users, can they stay onshore and concentrate on the design while the developers build offshore? Nielsen was emphatic on that point.

“It is obviously not a solution to separate design and implementation since all experience shows that design, usability, and implementers need to proceed in tight co-ordination. Even separating teams across different floors in the same building lowers the quality of the resulting product (for example, because there will be less informal discussions about interpretations of the design spec).”

So, according to Nielsen, the usability professionals have to follow the developers offshore. However, as we’ve seen, the offshore countries have nowhere near enough trained professionals to cover the work. Numbers are increasing, but not by enough to keep pace with the growth in offshore development, never mind the demands of local commerce.

This apparent conundrum has been dismissed by many people who have pointed out, correctly, that offshoring is not an “all or nothing” choice. Usability does not have to follow development. If usability is a concern, then user design work can be retained onshore, and usability expertise can be deployed in both countries. This is true, but it is a rather unfair criticism of Nielsen’s arguments. The problem he describes is real enough. The fact that it can be mitigated by careful planning certainly does not mean the problem can be ignored.

User-centred design assumes that developers, usability analysts and users will be working closely together. Offshoring the developers forces organizations to make a choice between two unattractive options; separating usability professionals from the users, or separating them from the developers.

It is important that organizations acknowledge this dilemma and make the choice explicitly, based on their needs and their market. Every responsible usability professional would be keenly aware that their geographical separation from their users was a problem, and so those organizations that hire usability expertise offshore are at least implicitly acknowledging the problems caused by offshoring. My concern is for those organizations who keep all the usability professionals onshore and either ignore the problems, or assume that they don’t apply in their case.

How not to tackle the problems

Jhumkee Iyengar has studied the responses of organizations wanting to ensure that offshore development will give them usable applications [8]. Typically they have tried to do so without offshore usability expertise. They have used two techniques sadly familiar to those who have studied usability problems; defining the user interaction requirements up-front and sending a final, frozen specification to the offshore developers, or adopting the flawed and fallacious layering approach.

Attempting to define detailed up-front requirements is anathema to good user-centred design. It is an approach consistent with the Waterfall approach and is attractive because it is neat and easy to manage (as I discussed in my article on the V Model in Testing Experience, issue 4).

Building a usable application that allows users and customers to achieve their personal and corporate goals painlessly and efficiently requires iteration, prototyping and user involvement that is both early in the lifecycle and repeated throughout it.

The layering approach was based on the fallacy that the user interface could be separated from the functionality of the application, and that each could be developed separately. This fallacy was very popular in the 80s and 90s. Its influence has persisted, not because it is valid, but because it lends an air of spurious respectability to what people want to do anyway.

Academics expended a huge amount of effort trying to justify this separability. Their arguments, their motives and the consequences of their mistake are worth a full article in their own right. I’ll restrict myself to saying that the notion of separability was flawed on three counts.

  • It was flawed conceptually because usability is a product of the experience of the user with the whole application, not just the interface.
  • It was flawed architecturally, because design decisions taken by system architects can have a huge impact on the user experience.
  • Finally, it was flawed in political and organizational terms because it encouraged usability professionals to retreat into a ghetto, isolated from the hubbub of the developers, where they would work away on an interface that could be bolted onto the application in time for user testing.

Lewis & Rieman [3] memorably savaged the idea that usability professionals could hold themselves aloof from the application design, calling it “the peanut butter theory of usability”

“Usability is seen as a spread that can be smeared over any design, however dreadful, with good results if the spread is thick enough. If the underlying functionality is confusing, then spread a graphical user interface on it. … If the user interface still has some problems, smear some manuals over it. If the manuals are still deficient, smear on some training which you force users to take.”

If the usability professionals stay onshore, and adopt either the separability or the peanut butter approach, the almost inescapable result is that they will be delegating decisions about usability to the offshore developers.

Developers are just about the worst people to take these decisions. They are too close to the application, and instinctively see workarounds to problems that might appear insoluble to real users.

Developers also have a different mindset when approaching technology. Even if they understand the business context of the applications they can’t unlearn their technical knowledge and experience to see the application as a user would; and this is if developers and users are in the same country. The cultural differences are magnified if they are based in different continents.

The relative lack of maturity of usability in the offshoring destinations means that developers often have an even less sophisticated approach than developers in the client countries. User interaction is regarded as an aesthetic matter restricted to the interface, with the developers more interested in the guts of the application.

Pradeep Henry reported in 2003 that most user interfaces at Indian companies were being designed by programmers, and that in his opinion they had great difficulty switching from their normal technical, system-focused mindset to that of a user. [9]

They also had very little knowledge of user centered design techniques. This is partly a matter of education, but there is more to it. In explaining the shortage of usability expertise in India, Jhumkee Iyengar told me that she believes important factors are the “phenomenal success of Indian IT industry, which leads people to question the need for usability, and the offshore culture, which has traditionally been a ‘back office culture’ not conducive to usability”.

The situation is, however, certainly improving. Although the explosive growth in development work in India, China and Eastern Europe has left the usability profession struggling to keep up, the number of usability experts has grown enormously over the last 10 years. There are nowhere near enough, but there are firms offering this specialist service keen to work with offshoring clients.

This trend is certain to continue because usability is a high value service. It is a hugely attractive route to follow for these offshore destinations, complementing and enhancing the traditional offshore development service.

Testers must warn of the dangers

The significance of all this from the perspective of testers is that even though usability faces significant threats when development is offshored, there are ways to reduce the dangers and the problems. They cannot be removed entirely, but offshoring offers such cost savings it will continue to grow and it is important that testers working for client companies understand these problems and can anticipate them.

Testers may not always, or often, be in a position to influence whether usability expertise is hired locally or offshore. However, they can flag up the risks of whatever approach is used, and adopt an appropriate test strategy.

The most obvious danger is if an application has significant interaction with the user and there is no specialist usability expertise on the project. As I said earlier, this could mean that the project abdicates responsibility for crucial usability decisions to the offshore developers.

Testers should try to prevent a scenario where the interface and user interaction are pieced together offshore, and thrown “over the wall” to the users back in the client’s country for acceptance testing when it may be too late to fix even serious usability defects.

Is it outside the traditional role of a tester to lobby project management to try and change the structure of the project? Possibly, but if testers can see that the project is going to be run in a way that makes it hard to do their job effectively then I believe they have a responsibility to speak out.

I’m not aware of any studies looking at whether outsourcing contracts (or managed service agreements) are written in such prescriptive detail that they restrict the ability of test managers to tailor their testing strategy to the risks they identify. However, going by my experience and the anecdotal evidence I’ve heard, this is not an issue. Testing is not usually covered in detail in contracts, thus leaving considerable scope to test managers who are prepared to take the initiative.

Although I’ve expressed concern about the dangers of relying on a detailed up front specification there is no doubt that if the build is being carried out by offshore developers then they have to be given clear, detailed, unambiguous instructions.

The test manager should therefore set a test strategy that allows for significant static testing of the requirements documents. These should be shaped by walkthroughs and inspections to check that the usability requirements are present, complete, stated in sufficient detail to be testable, yet not defined so precisely that they constrain the design and rule out what might have been perfectly acceptable solutions to the requirements.

Once the offshore developers have been set to work on the specification it is important that there is constant communication with them and ongoing static testing as the design is fleshed out.

Hienadz Drahun leads an offshore interaction design team in Belarus. He stresses the importance of communication. He told me that “communication becomes a crucial point. You need to maintain frequent and direct communication with your development team.”

Dave Cronin of the US Cooper usability design consultancy wrote an interesting article about this in 2004, [10].

“We already know that spending the time to holistically define and design a software product dramatically increases the likelihood that you will deliver an effective and pleasurable experience to your customers, and that communication is one of the critical ingredients to this design process. All this appears to be even more true if you decide to have the product built in India or Eastern Europe.

To be absolutely clear, to successfully outsource product development, it is crucial that every aspect of the product be specifically defined, designed and documented. The kind of hand-waving you may be accustomed to when working with familiar and well-informed developers will no longer suffice.”

Significantly Cronin did not mention testing anywhere in his article, though he does mention “feedback” during the design process.

The limits of usability testing

One of the classic usability mistakes is to place too much reliance on usability testing. In fact, I’ve heard it argued that there is no such thing as usability testing. It’s a provocative argument, but it has some merit.

If usability is dependent only on testing, then it will be left to the end of the development, and serious defects will be discovered too late in the project for them to be fixed.

“They’re only usability problems, the users can work around them” is the cry from managers under pressure to implement on time. Usability must be incorporated into the design stages, with possible solutions being evaluated and refined. Usability is therefore produced not by testing, but by good design practices.

Pradeep Henry called his new team “Usability Lab” when he introduced usability to Cognizant, the offshore outsourcing company, in India. However, the name and the sight of the testing equipment in the lab encouraged people to equate usability with testing. As Henry explained;

“Unfortunately, equating usability with testing leads people to believe that programmers or graphic designers should continue to design the user interface and that usability specialists should be consulted only later for testing.”

Henry renamed his team the Cognizant Usability Group (now the Cognizant Usability Center of Excellence). [9]

Tactical improvements testers can make

So if usability evaluation has to be integrated into the whole development process then what can testers actually do in the absence of usability professionals? Obviously it will be difficult, but if iteration is possible during design, and if you can persuade management that there is a real threat to quality then you can certainly make the situation a lot better.

There is a lot of material readily available to guide you. I would suggest the following.

Firstly, Jakob Nielsen’s Discount Usability Engineering [11] consists of cheap prototyping (maybe just paper based), heuristic (rule based) evaluation and getting users to talk their way through the application, thinking out loud as they work their way through a scenario.

Steve Krug’s “lost our lease” usability testing basically says that any usability testing is better than none, and that quick and crude testing can be both cheap and effective. Krug’s focus is more on the management of this approach rather than the testing techniques themselves, so it fits with Nielsen’s DUE, rather than being an alternative in my opinion. It’s all in his book “Don’t make me think”. [12]

Lockwood & Constantine’s Collaborative Usability Inspections offer a far more formal and rigorous approach, though you may be stretching yourself to take this on without usability professionals. It entails setting up formal walk-throughs of the proposed design, then iteration to remove defects and improve the product. [13, 14, 15]

On a lighter note, Alan Cooper’s book “The inmates are running the asylum” [15], is an entertaining rant on the subject. Cooper’s solution to the problem is his Interaction Design approach. The essence of this is that software development must include a form of functional analysis that seeks to understand the business problem from the perspective of the users, based on their personal and corporate goals, working through scenarios to understand what they will want to do.

Cooper’s Interaction Design strikes a balance between the old, flawed extremes of structured methods (which ignored the individual) and traditional usability (which often paid insufficient attention to the needs of the organization). I recommend this book not because I think that a novice could take this technique on board and make it work, but because it is very readable and might make you question your preconceptions and think about what is desirable and possible.

Longer term improvements

Of course it’s possible that you are working for a company that is still in the process of offshoring and where it is still possible to influence the outcome. It is important that the invitation to tender includes a requirement that suppliers can prove expertise and experience in usability engineering. Additionally, the client should expect potential suppliers to show they can satisfy the following three criteria.

  • The supplier should have a process or lifecycle model that not only has usability engineering embedded within it but that also demonstrates how the onshore and offshore teams will work together to achieve usability. The process must involve both sides.

    Offshore suppliers have put considerable effort into developing such frameworks. Three examples are Cognizant’s “End-to-End UI Process”, HFI’s “Schaffer-Weinschenk Method™” and Persistent’s “Overlap Usability”.

  • Secondly, suppliers should carry out user testing with users from the country where the application will be used. The cultural differences are too great to use people who happen to be easily available to the offshore developers.

    Remote testing entails usability experts based in one location conducting tests with users based elsewhere, even another continent. It would probably not be the first choice of most usability professionals, but it is becoming increasingly important. As Jumkhee Iyengar told me it “may not be the best, but it works and we have had good results. A far cry above no testing.”

  • Finally, suppliers should be willing to send usability professionals to the onshore country for the requirements gathering. This is partly a matter of ensuring that the requirements gathering takes full account of usability principles. It is also necessary so that these usability experts can fully understand the client’s business problem and can communicate it to the developers when they return offshore.

It’s possible that there may still be people in your organization who are sceptical about the value of usability. There has been a lot of work done on the return on investment that user centered design can bring. It’s too big a topic for this article, but a simple internet search on “usability” and “roi” will give you plenty of material.

What about the future?

There seems no reason to expect any significant changes in the trends we’ve seen over the last 10 years. The offshoring countries will continue to produce large numbers of well-educated, technically expert IT professionals. The cost advantage of developing in these countries will continue to attract work there.

Proactive test managers can head off some of the usability problems associated with offshoring. They can help bring about higher quality products even if their employers have not allowed for usability expertise on their projects. However, we should not have unrealistic expectations about what they can achieve. High quality, usable products can only be delivered consistently by organizations that have a commitment to usability and who integrate usability throughout the design process.

Offshoring suppliers will have a huge incentive to keep advancing into user centered design and usability consultancy. The increase in offshore development work creates the need for such services, whilst the specialist and advanced nature of the work gives these suppliers a highly attractive opportunity to move up the value chain, selling more expensive services on the basis of quality rather than cost.

The techniques I have suggested are certainly worthwhile, but they may prove no more than damage limitation. As Hienadz Drahun put it to me; “to get high design quality you need to have both a good initial design and a good amount of iterative usability evaluation. Iterative testing alone is not able to turn a bad product design into a good one. You need both.” Testers alone cannot build usability into an application, any more than they can build in quality.

Testers in the client countries will increasingly have to cope with the problems of working with offshore development. It is important that they learn how to work successfully with offshoring and adapt to it.

They will therefore have to be vigilant about the risks to usability of offshoring, and advise their employers and clients how testing can best mitigate these risks, both on a short term tactical level, i.e. on specific projects where there is no established usability framework, and in the longer term, where there is the opportunity to shape the contracts signed with offshore suppliers.

There will always be a need for test management expertise based in client countries, working with the offshore teams, but it will not be the same job we knew in the 90s.

References

[1] NASSCOM (2009). “Industry Trends, IT-BPO Sector-Overview”.

[2] Nielsen, J. (2002). “Offshore Usability”. Jakob Nielsen’s website.

[3] Lewis, C., Rieman, J. (1994). “Task-Centered User Interface Design: A Practical Introduction”. University of Colorado e-book.

[4] Artman, H. (2002). “Procurer Usability Requirements: Negotiations in Contract Development”. Proceedings of the second Nordic conference on Human-Computer Interaction (NordiCHI) 2002.

[5] Holmlid, S. Artman, H. (2003). “A Tentative Model for Procuring Usable Systems” (NB PDF download). 10th International Conference on Human-Computer Interaction, 2003.

[6] Grudin, J. (1991). “Interactive Systems: Bridging the Gaps Between Developers and Users”. Computer, April 1991, Vol. 24, No. 4 (subscription required).

[7] Grudin, J. (1996). “The Organizational Contexts of Development and Use”. ACM Computing Surveys. Vol 28, issue 1, March 1996, pp 169-171 (subscription required).

[8] Iyengar, J. (2007). “Usability Issues in Offshore Development: an Indian Perspective”. Usability Professionals Association Conference, 2007 (UPA membership required.

[9] Henry, P. (2003). “Advancing UCD While Facing Challenges Working from Offshore”. ACM Interactions, March/April 2003 (subscription required).

[10] Cronin D. (2004). “Designing for Offshore Development”. Cooper Journal blog.

[11] Nielsen, J. (1994). “Guerrilla HCI: Using Discount Usability Engineering to Penetrate the Intimidation Barrier”, Jakob Nielsen’s website.

[12] Krug, S. (2006). “Don’t Make Me Think!: A Common Sense Approach to Web Usability” 2nd edition. New Riders.

[13] Constantine, L. Lockwood, L. (1999). “Software for Use”. Addison Wesley.

[14] Lockwood, L. (1999). “Collaborative Usability Inspecting – more usable software and sites” (NB PDF download), Constantine and Lockwood website.

[15] Cooper, A. (2004). “The Inmates Are Running the Asylum: Why High-tech Products Drive Us Crazy and How to Restore the Sanity”. Sams.

The seductive and dangerous V model (2008)

The seductive and dangerous V model (2008)

Testing Experience

This is an expanded version of an article I wrote for the December 2008 edition of Testing Experience, a magazine which is no longer published. I’m moving the article onto my blog from my website, which will be decommissioned soon.

Inevitably the article is dated in some respects, especially where I discuss possible ways for testers to ameliorate the V Model if they are forced to use it. There’s no mention of exploratory testing. That’s simply because my aim in writing the article was to help people understand the flaws of the V Model and how they can work around them in traditional environments that try to apply the model. A comparison of exploratory and traditional scripted testing techniques was far too big a topic to be shoe-horned in here.

However, I think the article still has value in helping to explain why software development and testing took the paths it followed. For the article I drew heavily on reading and research I carried out for my MSc dissertation, an exercice that taught me a huge amount about the history of software development, and why we ended up where we did.

The references in the article were all structured for a paper magazine. There are no hyperlinks and I have not tried to recreate them and check out whether they still work.

The article

The seductive and dangerous V Model

The Project Manager finishes the meeting by firing a question at the Programme Test Manager, and myself, the Project Test Manager.

“The steering board’s asking questions about quality. Are we following a formal testing model?”.

The Programme Test Manager doesn’t hesitate. “We always use the V Model. It’s the industry standard”.

The Project Manager scribbles a note. “Good”.

Fast forward four months, and I’m with the Programme Manager and the test analysts. It’s a round table session, to boost morale and assure everyone that their hard work is appreciated.

The Programme Manager is upbeat. “Of course you all know about the problems we’ve been having, but when you look at the big picture, it’s not so bad. We’re only about 10% behind where we should have been. We’re still on target to hit the implementation date”.

This is a red rag to a bullish test manager, so I jump in. “Yes, but that 10% is all at the expense of the testing window. We’ve lost half our time”.

The Programme Manager doesn’t take offence, and laughs. “Oh, come on! You’re not going to tell me you’ve not seen that before? If you’re unhappy with that then maybe you’re in the wrong job!”.

I’m not in the mood to let it drop. “That doesn’t make it right. There’s always a readiness to accept testing getting squeezed, as if the test window is contingency. If project schedules and budgets overrun, project managers can’t get away with saying ‘oh – that always happens'”.

He smiles and moves smoothly on, whilst the test analysts try to look deadpan, and no doubt some decide that career progression to test management isn’t quite as attractive as they used to think.

Test windows do get squeezed

The Programme Manager was quite right. That’s what happens on Waterfall projects, and saying that you’re using the V Model does nothing to stop that almost routine squeeze.

In “Testing Experience” issue 2, Dieter Arnouts [1] outlined how test management fits into different software development lifecycle models. I want to expand on the V Model, its problems and what testers can do in response.

In the earlier meeting the Programme Test Manager had given an honest and truthful answer, but I wonder if he was actually wholly misleading. Yes, we were using the V Model, and unquestionably it would be the “test model” of choice for most testing professionals, in the UK at least.

However, I question whether it truly qualifies as a “test model”, and whether its status as best practice, or industry standard, is deserved.

A useful model has to represent the way that testing can and should be carried out. A model must therefore be coherent, reasonably precisely defined and grounded in the realities of software development.

Coherence at the expense of precision

If the V Model, as practised in the UK, has coherence it is at the expense of precision. Worldwide there are several different versions and interpretations. At one extreme is the German V-Model, [2], the official project management methodology of the German government. It is roughly equivalent to Prince-2, but more directly relevant to software development. Unquestionably this is coherent and rigorously defined.

Of course, it’s not the V Model at all, not in the sense that UK testers understand it. For one thing the V stands not for the familiar V shaped lifecycle model, but for “vorgehens”, German for “going forwards”. However, this model does promote a V shaped lifecycle, and some seem to think that this is the real V Model, in its pure form.

The US also has a government standard V Model [3], which dates back about 20 years, like its German counterpart. Its scope is rather narrower, being a systems development lifecycle model, but still more far more detailed and more rigorous than most UK practitioners would understand by the V Model.

The understanding that most of us in the UK have of the V Model is probably based on the V Model as it’s taught in the ISEB Foundation Certificate in Software Testing [4]. The syllabus merely describes the Model without providing an illustration, and does not even name the development levels on the left hand side of the V. Nor does it prescribe a set number of levels of testing. Four levels is “common”, but there could be more or less on any project.

This makes sense to an experienced practitioner. It is certainly coherent, in that it is intuitive. It is easy for testers to understand, but it is far from precise. Novices have no trouble understanding the model to the level required for a straightforward multiple choice exam, but they can struggle to get to grips with what exactly the V Model is. If they search on the internet their confusion will deepen. Wikipedia is hopelessly confused on the matter, with two separate articles on the V Model, which fail to make a clear distinction between the German model [5] and the descriptive lifecycle model [6] familiar to testers. Unsurprisingly there are many queries on internet forums asking “what is the V Model?”

There are so many variations that in practice the V Model can mean just about whatever you want it to mean. This is a typical, and typically vague, illustration of the model, or rather, or a version of the V model.

V model 2

It is interesting to do a Google images search on “V Model”. One can see a huge range of images illustrating the model. All share the V shape, and all show an arrow or line linking equivalent stages in each leg. However, the nature of the link is vague and inconsistent. Is it static testing of deliverables? Simple review and sign-off of specifications? Early test planning? Even the direction of the arrow varies. The Wikipedia article on the German V Model has the arrows going in different directions in different diagrams. There is no explanation for this. It simply reflects the confusion in the profession about what exactly the V Model is.

Boiled down to its most basic form, the V Model can mean simply that test planning should start early on, and the test plans should be based on the equivalent level of the left hand development leg. In practice, I fear that that is all it usually does mean.

The trouble is that this really isn’t a worthwhile advance on the Waterfall Model. The V Model is no more than a testing variant of the Waterfall. At best, it is a form of damage limitation, mitigating some of the damage that the Waterfall has on testing. To understand the significance of this for the quality of applications we need to look at the history and defects of the Waterfall Model itself.

Why the Waterfall is bad for testing and quality

In 1970 Royce wrote the famous paper [7] depicting the Waterfall Model. It is a strange example of how perceptive and practical advice can be distorted and misapplied. Using the analogy of a waterfall, Royce set up a straw man model to describe how software developments had been managed in the 1960s. He then demolished the model. Unfortunately posterity has credited him with inventing the model

The Waterfall assumes that a project can be broken up into separate, linear phases and that the output of each phase provides the input for the next phase, thus implying that each would be largely complete before the next one would start. Although Royce did not make the point explicitly, his Waterfall assumes that requirements can, and must, be defined accurately at the first pass, before any significant progress has been made with the design. The problems with the Waterfall, as seen by Royce, are its inability to handle changes, mainly changes forced by test defects, and the inflexibility of the model in dealing with iteration. Further flaws were later identified, but Royce’s criticisms are still valid.

Royce makes the point that prior to testing, all the stages of the project have predictable outputs, i.e. the results of rigorous analysis. The output of the testing stage is inherently unpredictable in the Waterfall, yet this is near the end of the project. Therefore, if the outcome of testing is not what was predicted or wanted, then reworks to earlier stages can wreck the project. The assumption that iteration can be safely restricted to successive stages doesn’t hold. Such reworking could extend right back to the early stages of design, and is not a matter of polishing up the later steps of the previous phase.

Although Royce did not dwell on the difficulty of establishing the requirements accurately at the first pass, this problem did trouble other, later writers. They concluded that not only is it impossible in practice to define the requirements accurately before construction starts, it is wrong in principle because the process of developing the application changes the users’ understanding of what is desirable and possible and thus changes the requirements. Furthermore, there is widespread agreement that the Waterfall is particularly damaging for interactive applications.

The problems of the Waterfall are therefore not because the analysts and developers have screwed up. The problems are inherent in the model. They are the result of following it properly, not badly!

The Waterfall and project management

So if the Waterfall was originally depicted as a straw man in 1970, and if it has been consistently savaged by academics, why is there still debate about it? Respected contemporary textbooks still defend it. Books such as Hallows’ “Information Systems Project Management” [8], a valuable book, frequently used for university courses. The title gives the game away. The Waterfall was shaped by project management requirements, and it therefore facilitates project management. Along with the V model it is neater, and much easier to manage, plan and control than an iterative approach, which looks messy and unpredictable to the project manager.

Raccoon in 1997 [9] used a revealing phrase, describing the Waterfall.

“If the Waterfall model were wrong, we would stop arguing over it. Though the Waterfall model may not describe the whole truth, it describes an interesting structure that occurs in many well-defined projects and it will continue to describe this truth for a long time to come. I expect the Waterfall model will live on for the next one hundred years and more”.

Note the key phrase, “an interesting structure that occurs in many well-defined projects”.

The Waterfall allows project managers to structure projects neatly, and it is good for project plans not applications!

It is often argued that in practice the Waterfall is not applied in its pure form; that there is always some limited iteration and that there is invariably some degree of overlap between the phases. Defenders of the model therefore argue that critics don’t understand how it can be used effectively.

Hallows argues that changes are time-consuming and expensive regardless of the model followed. He dismisses the criticism that the Waterfall doesn’t allow a return to an earlier stage by saying that this would be a case of an individual project manager being too inflexible, rather than a problem with the model.

This is naive. The problem is not solved by better project management, because project management itself has contributed towards the problem; or rather the response of rational project managers to the pressures facing them.

The dangerous influence of project management had been recognised by the early 1980s. Practitioners had always been contorting development practices to fit their project management model, a point argued forcibly by McCracken & Jackson in 1982 [10].

“Any form of life cycle is a project management structure imposed on system development. To contend that any life cycle scheme, even with variations, can be applied to all system development is either to fly in the face of reality or to assume a life cycle so rudimentary as to be vacuous.”

In hindsight the tone of McCracken & Jackson’s paper, and the lack of response to it for years is reminiscent of Old Testament prophets raging in the wilderness. They were right, but ignored.

The symbiosis between project management and the Waterfall has meant that practitioners have frequently been compelled to employ methods that they knew were ineffective. This is most graphically illustrated by the UK government’s mandated use of the PRINCE2 project management method and SSADM development methodology. These two go hand in hand.

They are not necessarily flawed, and this article does not have room for their merits and problems, but they are associated with a traditional approach such as the Waterfall.

The UK’s National Audit Office stated in 2003 [11] that “PRINCE requires a project to be organised into a number of discrete stages, each of which is expected to deliver end products which meet defined quality requirements. Progress to the next stage of the project depends on successfully meeting the delivery targets for the current stage. The methodology fits particularly well with the ‘waterfall’ approach.”

The NAO says in the same paper that “the waterfall … remains the preferred approach for developing systems where it is very important to get the specification exactly right (for example, in processes which are computationally complex or safety critical)”. This is current advice. The public sector tends to be more risk averse than the private sector. If auditors say an approach is “preferred” then it would take a bold and confident project manager to reject that advice.

This official advice is offered in spite of the persistent criticism that it is never possible to define the requirements precisely in advance in the style assumed by the Waterfall model, and that attempting to do so is possible only if one is prepared to steamroller the users into accepting a system that doesn’t satisfy their goals.

The UK government is therefore promoting the use of a project management method partly because it fits well with a lifecycle that is fundamentally flawed because it has been shaped by project management needs rather than those of software development.

The USA and commercial procurement

The history of the Waterfall in the USA illustrates its durability and provides a further insight into why it will survive for some time to come; its neat fit with commercial procurement practices.

The US military was the main customer for large-scale software development contracts in the 1970s and insisted on formal and rigorous development methods. The US Department of Defense (DoD) did not explicitly mandate the Waterfall, but their insistence in Standard DOD-STD-2167 [12] on a staged development approach, with a heavy up-front documentation overhead, and formal review and sign-off of all deliverables at the end of each stage, effectively ruled out any alternative. The reasons for this were quite explicitly to help the DoD to keep control of procurement.

In the 80s the DoD relaxed its requirements and allowed iterative developments. However, it did not forbid the Waterfall, and the Standard’s continued insistence on formal reviews and audits that were clearly consistent with the Waterfall gave developers and contractors every incentive to stick with that model.

The damaging effects of this were clearly identified to the DoD at the time. A report of the Defense Science Board Task Force [13] criticised the effects of the former Standard, and complained that the reformed version did not go nearly far enough.

However, the Task Force had to acknowledge that “evolutionary development plays havoc with the customary forms of competitive procurement, … and they with it.”

The Task Force contained such illustrious experts as Frederick Brooks, Vic Basili and Barry Boehm. These were reputable insiders, part of a DoD Task Force, not iconoclastic academic rebels. They knew the problems caused by the Waterfall and they understood that the rigid structure of the model provided comfort and reassurance that large projects involving massive amounts of public money were under control. They therefore recommended appropriate remedies, involving early prototyping and staged awarding of contracts. They were ignored. Such was the grip of the Waterfall nearly 20 years after it had first been criticised by Royce.

The DoD did not finally abandon the Waterfall till Military Standard 498 (MIL-STD-498) seven years later in 1994, by which time the Waterfall was embedded in the very soul of the IT profession.

Even now the traditional procurement practices referred to by the Task Force, which fit much more comfortably with the Waterfall and the V Model, are being followed because they facilitate control, not quality. It is surely significant that the V Model is the only testing model that students of the Association of Chartered Certified Accountants learn about. It is the model for accountants and project managers, not developers or testers. The contractual relationship between client and supplier reinforces the rigid project management style of development.

Major George Newberry, a US Air Force officer specialising in software acquisition and responsible for collating USAF input to the defence standards, complained in 1995 [14] about the need to deliver mind-numbing amounts of documentation in US defence projects because of the existing standards.

“DOD-STD-2167A imposes formal reviews and audits that emphasize the Waterfall Model and are often nonproductive ‘dog and pony shows’. The developer spends thousands of staff-hours preparing special materials for these meetings, and the acquirer is then swamped by information overload.”

This is a scenario familiar to any IT professional who has worked on external contracts, especially in the public sector. Payments are tied to the formal reviews and dependent on the production of satisfactory documentation. The danger is that supplier staff become fixated on the production of the material that pays their wages, rather than the real substance of the project.

Nevertheless, as noted earlier, the UK public sector in particular, and large parts of the private sector are still wedded to the Waterfall method and this must surely bias contracts against a commitment to quality.

V for veneer?

What is seductive and damaging about the V Model is that it gives the Waterfall approach credibility. It has given a veneer of respectability to a process that almost guarantees shoddy quality. The most damaging aspect is perhaps the effect on usability.

The V Model discourages active user involvement in evaluating the design, and especially the interface, before the formal user acceptance testing stage. By then it is too late to make significant changes to the design. Usability problems can be dismissed as “cosmetic” and the users are pressured to accept a system that doesn’t meet their needs. This is bad if it is an application for internal corporate users. It is potentially disastrous if it is a web application for customers.

None of this is new to academics or test consultants who’ve had to deal with usability problems. However, what practitioners do in the field can often lag many years behind what academics and specialist consultants know to be good practice. Many organisations are a long, long way from the leading edge.

Rex Black provided a fine metaphor for this quality problem in 2002 [15]. After correctly identifying that V Model projects are driven by cost and schedule constraints, rather than quality, Black argues that the resultant fixing of the implementation date effectively locks the top of the right leg of the V in place, while the pivot point at the bottom slips further to the right, thus creating Black’s “ski slope and cliff”.

The project glides down the ski slope, then crashes into the “quality cliff” of the test execution stages that have been compressed into an impossible timescale.

The Waterfall may have resulted in bad systems, but its massive saving grace for companies and governments alike was that they were developed in projects that offered at least the illusion of being manageable! This suggests, as Raccoon stated [9], that the Waterfall may yet survive another hundred years.

The V Model’s great attractions were that it fitted beautifully into the structure of the Waterfall, it didn’t challenge that model, and it just looks right; comfortable and reassuring.

What can testers do to limit the damage?

I believe strongly that iterative development techniques must be used wherever possible. However, such techniques are beyond the scope of this article. Here I am interested only in explaining why the V Model is defective, why it has such a grip on our profession, and what testers can do to limit its potential damage.

The key question is therefore; how can we improve matters when we find we have to use it? As so often in life just asking the question is half the battle. It’s crucial that testers shift their mindset from an assumption that the V Model will guide them through to a successful implementation, and instead regard it as a flawed model with a succession of mantraps that must be avoided.

Testers must first accept the counter-intuitive truth that the Waterfall and V Model only work when their precepts are violated. This won’t come as any great surprise to experienced practitioners, though it is a tough lesson for novices to learn.

Developers and testers may follow models and development lifecyles in theory, but often it’s no more than lip service. When it comes to the crunch we all do whatever works and ditch the theory. So why not adopt techniques that work and stop pretending that the V Model does?

In particular, iteration happens anyway! Embrace it. The choice is between planning for iteration and frantically winging it, trying to squeeze in fixes and reworking of specifications.

Even the most rigid Waterfall project would allow some iteration during test execution. It is crucial that testers ensure there is no confusion between test cycles exercising different parts of the solution, and reruns of previous tests to see whether fixes have been applied. Testers must press for realistic allowances for reruns. One cycle to reveal defects and another to retest is planning for failure.

Once this debate has been held (and won!) with project management the tester should extend the argument. Make the point that test execution provides feedback about quality and risks. It cannot be right to defer feedback. Where it is possible to get feedback early it must be taken.

It’s not the testers’ job to get the quality right. It’s not the testers’ responsibility to decide if the application is fit to implement. It’s our responsibility to ensure that the right people get accurate feedback about quality at the right time. That means feedback to analysts and designers early enough to let them fix problems quickly and cheaply. This feedback and correction effectively introduces iteration. Acknowledging this allows us to plan for it.

Defenders of the V Model would argue that that is entirely consistent with the V Model. Indeed it is. That is the point.

However, what the V Model doesn’t do adequately is help testers to force the issue; to provide a statement of sound practice, an effective, practical model that will guide them to a happy conclusion. It is just too vague and wishy washy. In so far as the V Model means anything, it means to start test planning early, and to base your testing on documents from equivalent stages on the left hand side of the V.

Without these, the V Model is nothing. A fundamental flaw of the V Model is that it is not hooked into the construction stages in the way that its proponents blithely assume. Whoever heard of a development being delayed because a test manager had not been appointed?

“We’ll just crack on”, is the response to that problem. “Once we’ve got someone appointed they can catch up.”

Are requirements ever nailed down accurately and completely before coding starts? In practice, no. The requirements keep evolving, and the design is forced to change. The requirements and design keep changing even as the deadline for the end of coding nears.

“Well, you can push back the delivery dates for the system test plan and the acceptance test plan. Don’t say we’re not sympathetic to the testers!”

What is not acknowledged is that if test planning doesn’t start early, and if the solution is in a state of flux till the last moment, one is not left with a compromised version of the V Model. One is left with nothing whatsoever; no coherent test model.

Testing has become the frantic, last minute, ulcer inducing sprint it always was under the Waterfall and that the V Model is supposed to prevent.

It is therefore important that testers agitate for the adoption of a model that honours the good intentions of the V Model, but is better suited to the realties of development and the needs of the testers.

Herzlich’s W Model

An interesting extension of the V Model is Paul Herzlich’s W Model [16].

Herzlich W model

The W Model removes the vague and ambiguous lines linking the left and right legs of the V and replaces them with parallel testing activities, shadowing each of the development activities.

As the project moves down the left leg, the testers carry out static testing (i.e. inspections and walkthroughs) of the deliverables at each stage. Ideally prototyping and early usability testing would be included to test the system design of interactive systems at a time when it would be easy to solve problems. The emphasis would then switch to dynamic testing once the project moves into the integration leg.

There are several interesting aspects to the W Model. Firstly, it drops the arbitrary and unrealistic assumption that there should be a testing stage in the right leg for each development stage in the left leg. Each of the development stages has its testing shadow, within the same leg.

The illustration shows a typical example where there are the same number of stages in each leg, but it’s possible to vary the number and the nature of the testing stages as circumstances require without violating the principles of the model.

Also, it explicitly does not require the test plan for each dynamic test stage to be based on the specification produced in the twin stage on the left hand side. There is no twin stage of course, but this does address one of the undesirable by-products of a common but unthinking adoption of the V Model; a blind insistence that test plans should be generated from these equivalent documents, and only from those documents.

A crucial advantage of the W Model is that it encourages testers to define tests that can be built into the project plan, and on which development activity will be dependent, thus making it harder for test execution to be squeezed at the end of the project.

However, starting formal test execution in parallel with the start of development must not mean token reviews and sign-offs of the documentation at the end of each stage. Commonly under the V Model, and the Waterfall, test managers receive specifications with the request to review and sign off within a few days what the developers hope is a completed document. In such circumstances test managers who detect flaws can be seen as obstructive rather than constructive. Such last minute “reviews” do not count as early testing.

Morton’s Butterfly Model

Another variation on the V Model is the little known Butterfly Model [17] by Stephen Morton, which shares many features of the W Model.

Butterfly Model

The butterfly metaphor is based on the idea that clusters of testing are required throughout the development to tease out information about the requirements, design and build. These micro-iterations can be structured into the familiar testing stages, and the early static testing stages envisaged by the W Model.

In this model these micro-iterations explicitly shape the development

Butterfly Model micro-iterations

during the progression down the development leg. In essence, each micro-iteration can be represented by a butterfly; the left wing for test analysis, the right wing for specification and design, and the body is test execution, the muscle that links and drives the test, which might consist of more than one piece of analysis and design, hence the segmented wings. Sadly, this model does not seem to have been fully fleshed out, and in spite of its attractions it has almost vanished from sight.

Conclusion – the role of education

The beauty of the W and Butterfly Models is that they fully recognise the flaws of the V Model, but they can be overlaid on the V. That allows the devious and imaginative test manager to smuggle a more helpful and relevant testing model into a project committed to the V Model without giving the impression that he or she is doing anything radical or disruptive.

The V Model is so vague that a test manager could argue with a straight face that the essential features of the W or Butterfly are actually features of the V Model as the test manager believes it must be applied in practice. I would regard this as constructive diplomacy rather than spinning a line!

I present the W and Butterfly Models as interesting possibilities but what really matters is that test managers understand the need to force planned iteration into the project schedule, and to hook testing activities into the project plan so that “testing early” becomes meaningful rather than a comforting and irrelevant platitude. It is possible for test managers to do any of this provided they understand the flaws of the V Model and how to improve matters. This takes us onto the matter of education.

The V Model was the only “model” testers learned about when they took the old ISEB Foundation Certificate. Too many testers regarded that as the end of their education in testing. They were able to secure good jobs or contracts with their existing knowledge. Spending more time and money continuing their learning was not a priority.

As a result of this, and the pressures of project management and procurement, the V Model is unchallenged as the state of the art for testing in the minds of many testers, project managers and business managers.

The V Model will not disappear just because practitioners become aware of its problems. However, a keen understanding of its limitations will give them a chance to anticipate these problems and produce higher quality applications.

I don’t have a problem with testers attempting to extend their knowledge and skills through formal qualifications such as ISEB and ISTQB. However, it is desperately important that they don’t think that what they learn from these courses comprises All You Ever Need to Know About The Only Path to True Testing. They’re biased towards traditional techniques and don’t pay sufficient attention to exploratory testing.

Ultimately we are all responsible for our own knowledge and skills; for our own education. We’ve got to go out there and find out what is possible, and to understand what’s going on. Testers need to make sure they’re aware of the work of testing experts such as Cem Kamer, James Bach, Brian Marick, Lisa Crispin and Michael Bolton. These people have put a huge amount of priceless material out on the internet for free. Go and find it!

References

[1] Arnouts, D. (2008). “Test management in different Software development life cycle models”. “Testing Experience” magazine, issue 2, June 2008.

[2] IABG. “Das V-Modell”. This is in German, but there are links to English pages and a downloadable version of the full documentation in English.

[3] US Dept of Transportation, Federal Highway Administration. “Systems Engineering Guidebook for ITS”.

[4] BCS ISEB Foundation Certificate in Software Testing – Syllabus (NB PDF download).

[5] Wikipedia. “V-Model”.

[6] Wikipedia. “V-Model (software development)”.

[7] Royce, W. (1970). “Managing the Development of Large Software Systems”, IEEE Wescon, August 1970.

[8] Hallows, J. (2005). “Information Systems Project Management”. 2nd edition. AMACOM, New York.

[9] Raccoon, L. (1997). “Fifty Years of Progress in Software Engineering”. SIGSOFT Software Engineering Notes Vol 22, Issue 1 (Jan. 1997). pp88-104.

[10] McCracken, D., Jackson, M. (1982). “Life Cycle Concept Considered Harmful”, ACM SIGSOFT Software Engineering Notes, Vol 7 No 2, April 1982. Subscription required.

[11] National Audit Office. (2003).”Review of System Development – Overview”.

[12] Department of Defense Standard 2167 (DOD-STD-2167). (1975) “Defense System Software Development”, US Government defence standard.

[13] “Defense Science Board Task Force On Military Software – Report” (extracts), (1987). ACM SIGAda Ada Letters Volume VIII , Issue 4 (July/Aug. 1988) pp35-46. Subscription required.

[14] Newberry, G. (1995). “Changes from DOD-STD-2167A to MIL-STD-498”, Crosstalk – the Journal of Defense Software Engineering, April 1995.

[15] Black, R. (2002). “Managing the Testing Process”, p415. Wiley 2002.

[16] Herzlich, P. (1993). “The Politics of Testing”. Proceedings of 1st EuroSTAR conference, London, Oct. 25-28, 1993.

[17] Morton, S. (2001). “The Butterfly Model for Test Development”. Sticky Minds website.