What happens to usability when development goes offshore? (2009)

What happens to usability when development goes offshore? (2009)

Testing ExperienceThis article appeared in the March 2009 edition of Testing Experience magazine, which is no longer published. I’m moving the article onto my blog from my website, which will be decommissioned soon.

If you choose to read the article please bear in mind that it was written in January 2009 and is therefore inevitably dated in some respects, though I think that much of it is still valid.

The references in the article were all structured for a paper magazine. There are no hyperlinks and I have not tried to recreate them and check out whether they still work.

The article

Two of the most important trends in software development over the last 20 years have been the increasing number of companies sending development work to cheaper labour markets, and the increasing attention that is paid to the usability of applications.offshore usability article

Developers in Europe and North America cannot fail to have missed the trend to offshore development work and they worry about the long-term implications.

Usability, however, has had a mixed history. Many organizations and IT professionals have been deeply influenced by the need to ensure that their products and applications are usable; many more are only vaguely aware of this trend and do not take it seriously.

As a result, many developers and testers have missed the significant implications that offshoring has for building usable applications, and underestimate the problems of testing for usability. I will try to explain these problems, and suggest possible remedial actions that testers can take if they find themselves working with offshore developers. I will be looking mainly at India, the largest offshoring destination, because information has been more readily available. However, the problems and lessons apply equally to other offshoring countries.

According to NASSCOM, the main trade body representing the Indian IT industry [1], the numbers of IT professionals employed in India rose from 430,000 in 2001 to 2,010,000 in 2008. The numbers employed by offshore IT companies rose 10 fold, from 70,000 to 700,000.

It is hard to say how many of these are usability professionals. Certainly at the start of the decade there were only a handful in India. Now, according to Jumkhee Iyengar, of User in Design, “a guesstimate would be around 5,000 to 8,000”. At most that’s about 0.4% of the people working in IT. Even if they were all involved in offshore work they would represent no more than 1% of the total.

Does that matter? Jakob Nielsen, the leading usability expert, would argue that it does. His rule of thumb [2] is that “10% of project resources must be devoted to usability to ensure a good quality product”.

Clearly India is nowhere near capable of meeting that figure. To be fair, the same can be said of the rest of the world given that 10% represents Nielsen’s idea of good practice, and most organizations have not reached that stage. Further, India traditionally provided development of “back-office” applications, which are less likely to have significant user interaction.

Nevertheless, the shortage of usability professionals in the offshoring destinations does matter. Increasingly offshore developments have involved greater levels of user interaction, and any shortage of usability expertise in India will damage the quality of products.

Sending development work offshore always introduces management and communication problems. Outsourcing development, even within the same country, poses problems for usability. When the development is both offshored and outsourced, the difficulties in producing a usable application multiply. If there are no usability professionals on hand, the danger is that the developers will not only fail to resolve those problems – they will probably not even recognize that they exist.

Why can outsourcing be a problem for usability?

External software developers are subject to different pressures from internal developers, and this can lead to poorer usability. I believe that external suppliers are less likely to be able to respond to the users’ real needs, and research supports this. [3, 4, 5, 6, 7]

Obviously external suppliers have different goals from internal developers. Their job is to deliver an application that meets the terms of the contract and makes a profit in doing so. Requirements that are missed or are vague are unlikely to be met, and usability requirements all too often fall into one of these categories. This is not simply a matter of a lack of awareness. Usability is a subjective matter, and it is difficult to specify precise, objective, measurable and testable requirements. Indeed, trying too hard to do so can be counter-productive if the resulting requirements are too prescriptive and constrain the design.

A further problem is that the formal nature of contractual relationships tends to push clients towards more traditional, less responsive and less iterative development processes, with damaging results for usability. If users and developers are based in different offices, working for different employers, then rapid informal feedback becomes difficult.

Some of the studies that found these problems date back to the mid 90s. However, they contain lessons that remain relevant now. Many organizations have still not taken these lessons on board, and they are therefore facing the same problems that others confronted 10 or even 20 years ago.

How can offshoring make usability problems worse?

So, if simple outsourcing to a supplier in the same country can be fraught with difficulty, what are the usability problems that organizations face when they offshore?

Much of the discussion of this harks back to an article by Jakob Nielsen in 2002 [2]. Nielsen stirred up plenty of discussion about the problem, much of it critical.

“Offshore design raises the deeper problem of separating interaction designers and usability professionals from the users. User-centered design requires frequent access to users: the more frequent the better.”

If the usability professionals need to be close to the users, can they stay onshore and concentrate on the design while the developers build offshore? Nielsen was emphatic on that point.

“It is obviously not a solution to separate design and implementation since all experience shows that design, usability, and implementers need to proceed in tight co-ordination. Even separating teams across different floors in the same building lowers the quality of the resulting product (for example, because there will be less informal discussions about interpretations of the design spec).”

So, according to Nielsen, the usability professionals have to follow the developers offshore. However, as we’ve seen, the offshore countries have nowhere near enough trained professionals to cover the work. Numbers are increasing, but not by enough to keep pace with the growth in offshore development, never mind the demands of local commerce.

This apparent conundrum has been dismissed by many people who have pointed out, correctly, that offshoring is not an “all or nothing” choice. Usability does not have to follow development. If usability is a concern, then user design work can be retained onshore, and usability expertise can be deployed in both countries. This is true, but it is a rather unfair criticism of Nielsen’s arguments. The problem he describes is real enough. The fact that it can be mitigated by careful planning certainly does not mean the problem can be ignored.

User-centred design assumes that developers, usability analysts and users will be working closely together. Offshoring the developers forces organizations to make a choice between two unattractive options; separating usability professionals from the users, or separating them from the developers.

It is important that organizations acknowledge this dilemma and make the choice explicitly, based on their needs and their market. Every responsible usability professional would be keenly aware that their geographical separation from their users was a problem, and so those organizations that hire usability expertise offshore are at least implicitly acknowledging the problems caused by offshoring. My concern is for those organizations who keep all the usability professionals onshore and either ignore the problems, or assume that they don’t apply in their case.

How not to tackle the problems

Jhumkee Iyengar has studied the responses of organizations wanting to ensure that offshore development will give them usable applications [8]. Typically they have tried to do so without offshore usability expertise. They have used two techniques sadly familiar to those who have studied usability problems; defining the user interaction requirements up-front and sending a final, frozen specification to the offshore developers, or adopting the flawed and fallacious layering approach.

Attempting to define detailed up-front requirements is anathema to good user-centred design. It is an approach consistent with the Waterfall approach and is attractive because it is neat and easy to manage (as I discussed in my article on the V Model in Testing Experience, issue 4).

Building a usable application that allows users and customers to achieve their personal and corporate goals painlessly and efficiently requires iteration, prototyping and user involvement that is both early in the lifecycle and repeated throughout it.

The layering approach was based on the fallacy that the user interface could be separated from the functionality of the application, and that each could be developed separately. This fallacy was very popular in the 80s and 90s. Its influence has persisted, not because it is valid, but because it lends an air of spurious respectability to what people want to do anyway.

Academics expended a huge amount of effort trying to justify this separability. Their arguments, their motives and the consequences of their mistake are worth a full article in their own right. I’ll restrict myself to saying that the notion of separability was flawed on three counts.

  • It was flawed conceptually because usability is a product of the experience of the user with the whole application, not just the interface.
  • It was flawed architecturally, because design decisions taken by system architects can have a huge impact on the user experience.
  • Finally, it was flawed in political and organizational terms because it encouraged usability professionals to retreat into a ghetto, isolated from the hubbub of the developers, where they would work away on an interface that could be bolted onto the application in time for user testing.

Lewis & Rieman [3] memorably savaged the idea that usability professionals could hold themselves aloof from the application design, calling it “the peanut butter theory of usability”

“Usability is seen as a spread that can be smeared over any design, however dreadful, with good results if the spread is thick enough. If the underlying functionality is confusing, then spread a graphical user interface on it. … If the user interface still has some problems, smear some manuals over it. If the manuals are still deficient, smear on some training which you force users to take.”

If the usability professionals stay onshore, and adopt either the separability or the peanut butter approach, the almost inescapable result is that they will be delegating decisions about usability to the offshore developers.

Developers are just about the worst people to take these decisions. They are too close to the application, and instinctively see workarounds to problems that might appear insoluble to real users.

Developers also have a different mindset when approaching technology. Even if they understand the business context of the applications they can’t unlearn their technical knowledge and experience to see the application as a user would; and this is if developers and users are in the same country. The cultural differences are magnified if they are based in different continents.

The relative lack of maturity of usability in the offshoring destinations means that developers often have an even less sophisticated approach than developers in the client countries. User interaction is regarded as an aesthetic matter restricted to the interface, with the developers more interested in the guts of the application.

Pradeep Henry reported in 2003 that most user interfaces at Indian companies were being designed by programmers, and that in his opinion they had great difficulty switching from their normal technical, system-focused mindset to that of a user. [9]

They also had very little knowledge of user centered design techniques. This is partly a matter of education, but there is more to it. In explaining the shortage of usability expertise in India, Jhumkee Iyengar told me that she believes important factors are the “phenomenal success of Indian IT industry, which leads people to question the need for usability, and the offshore culture, which has traditionally been a ‘back office culture’ not conducive to usability”.

The situation is, however, certainly improving. Although the explosive growth in development work in India, China and Eastern Europe has left the usability profession struggling to keep up, the number of usability experts has grown enormously over the last 10 years. There are nowhere near enough, but there are firms offering this specialist service keen to work with offshoring clients.

This trend is certain to continue because usability is a high value service. It is a hugely attractive route to follow for these offshore destinations, complementing and enhancing the traditional offshore development service.

Testers must warn of the dangers

The significance of all this from the perspective of testers is that even though usability faces significant threats when development is offshored, there are ways to reduce the dangers and the problems. They cannot be removed entirely, but offshoring offers such cost savings it will continue to grow and it is important that testers working for client companies understand these problems and can anticipate them.

Testers may not always, or often, be in a position to influence whether usability expertise is hired locally or offshore. However, they can flag up the risks of whatever approach is used, and adopt an appropriate test strategy.

The most obvious danger is if an application has significant interaction with the user and there is no specialist usability expertise on the project. As I said earlier, this could mean that the project abdicates responsibility for crucial usability decisions to the offshore developers.

Testers should try to prevent a scenario where the interface and user interaction are pieced together offshore, and thrown “over the wall” to the users back in the client’s country for acceptance testing when it may be too late to fix even serious usability defects.

Is it outside the traditional role of a tester to lobby project management to try and change the structure of the project? Possibly, but if testers can see that the project is going to be run in a way that makes it hard to do their job effectively then I believe they have a responsibility to speak out.

I’m not aware of any studies looking at whether outsourcing contracts (or managed service agreements) are written in such prescriptive detail that they restrict the ability of test managers to tailor their testing strategy to the risks they identify. However, going by my experience and the anecdotal evidence I’ve heard, this is not an issue. Testing is not usually covered in detail in contracts, thus leaving considerable scope to test managers who are prepared to take the initiative.

Although I’ve expressed concern about the dangers of relying on a detailed up front specification there is no doubt that if the build is being carried out by offshore developers then they have to be given clear, detailed, unambiguous instructions.

The test manager should therefore set a test strategy that allows for significant static testing of the requirements documents. These should be shaped by walkthroughs and inspections to check that the usability requirements are present, complete, stated in sufficient detail to be testable, yet not defined so precisely that they constrain the design and rule out what might have been perfectly acceptable solutions to the requirements.

Once the offshore developers have been set to work on the specification it is important that there is constant communication with them and ongoing static testing as the design is fleshed out.

Hienadz Drahun leads an offshore interaction design team in Belarus. He stresses the importance of communication. He told me that “communication becomes a crucial point. You need to maintain frequent and direct communication with your development team.”

Dave Cronin of the US Cooper usability design consultancy wrote an interesting article about this in 2004, [10].

“We already know that spending the time to holistically define and design a software product dramatically increases the likelihood that you will deliver an effective and pleasurable experience to your customers, and that communication is one of the critical ingredients to this design process. All this appears to be even more true if you decide to have the product built in India or Eastern Europe.

To be absolutely clear, to successfully outsource product development, it is crucial that every aspect of the product be specifically defined, designed and documented. The kind of hand-waving you may be accustomed to when working with familiar and well-informed developers will no longer suffice.”

Significantly Cronin did not mention testing anywhere in his article, though he does mention “feedback” during the design process.

The limits of usability testing

One of the classic usability mistakes is to place too much reliance on usability testing. In fact, I’ve heard it argued that there is no such thing as usability testing. It’s a provocative argument, but it has some merit.

If usability is dependent only on testing, then it will be left to the end of the development, and serious defects will be discovered too late in the project for them to be fixed.

“They’re only usability problems, the users can work around them” is the cry from managers under pressure to implement on time. Usability must be incorporated into the design stages, with possible solutions being evaluated and refined. Usability is therefore produced not by testing, but by good design practices.

Pradeep Henry called his new team “Usability Lab” when he introduced usability to Cognizant, the offshore outsourcing company, in India. However, the name and the sight of the testing equipment in the lab encouraged people to equate usability with testing. As Henry explained;

“Unfortunately, equating usability with testing leads people to believe that programmers or graphic designers should continue to design the user interface and that usability specialists should be consulted only later for testing.”

Henry renamed his team the Cognizant Usability Group (now the Cognizant Usability Center of Excellence). [9]

Tactical improvements testers can make

So if usability evaluation has to be integrated into the whole development process then what can testers actually do in the absence of usability professionals? Obviously it will be difficult, but if iteration is possible during design, and if you can persuade management that there is a real threat to quality then you can certainly make the situation a lot better.

There is a lot of material readily available to guide you. I would suggest the following.

Firstly, Jakob Nielsen’s Discount Usability Engineering [11] consists of cheap prototyping (maybe just paper based), heuristic (rule based) evaluation and getting users to talk their way through the application, thinking out loud as they work their way through a scenario.

Steve Krug’s “lost our lease” usability testing basically says that any usability testing is better than none, and that quick and crude testing can be both cheap and effective. Krug’s focus is more on the management of this approach rather than the testing techniques themselves, so it fits with Nielsen’s DUE, rather than being an alternative in my opinion. It’s all in his book “Don’t make me think”. [12]

Lockwood & Constantine’s Collaborative Usability Inspections offer a far more formal and rigorous approach, though you may be stretching yourself to take this on without usability professionals. It entails setting up formal walk-throughs of the proposed design, then iteration to remove defects and improve the product. [13, 14, 15]

On a lighter note, Alan Cooper’s book “The inmates are running the asylum” [15], is an entertaining rant on the subject. Cooper’s solution to the problem is his Interaction Design approach. The essence of this is that software development must include a form of functional analysis that seeks to understand the business problem from the perspective of the users, based on their personal and corporate goals, working through scenarios to understand what they will want to do.

Cooper’s Interaction Design strikes a balance between the old, flawed extremes of structured methods (which ignored the individual) and traditional usability (which often paid insufficient attention to the needs of the organization). I recommend this book not because I think that a novice could take this technique on board and make it work, but because it is very readable and might make you question your preconceptions and think about what is desirable and possible.

Longer term improvements

Of course it’s possible that you are working for a company that is still in the process of offshoring and where it is still possible to influence the outcome. It is important that the invitation to tender includes a requirement that suppliers can prove expertise and experience in usability engineering. Additionally, the client should expect potential suppliers to show they can satisfy the following three criteria.

  • The supplier should have a process or lifecycle model that not only has usability engineering embedded within it but that also demonstrates how the onshore and offshore teams will work together to achieve usability. The process must involve both sides.

    Offshore suppliers have put considerable effort into developing such frameworks. Three examples are Cognizant’s “End-to-End UI Process”, HFI’s “Schaffer-Weinschenk Method™” and Persistent’s “Overlap Usability”.

  • Secondly, suppliers should carry out user testing with users from the country where the application will be used. The cultural differences are too great to use people who happen to be easily available to the offshore developers.

    Remote testing entails usability experts based in one location conducting tests with users based elsewhere, even another continent. It would probably not be the first choice of most usability professionals, but it is becoming increasingly important. As Jumkhee Iyengar told me it “may not be the best, but it works and we have had good results. A far cry above no testing.”

  • Finally, suppliers should be willing to send usability professionals to the onshore country for the requirements gathering. This is partly a matter of ensuring that the requirements gathering takes full account of usability principles. It is also necessary so that these usability experts can fully understand the client’s business problem and can communicate it to the developers when they return offshore.

It’s possible that there may still be people in your organization who are sceptical about the value of usability. There has been a lot of work done on the return on investment that user centered design can bring. It’s too big a topic for this article, but a simple internet search on “usability” and “roi” will give you plenty of material.

What about the future?

There seems no reason to expect any significant changes in the trends we’ve seen over the last 10 years. The offshoring countries will continue to produce large numbers of well-educated, technically expert IT professionals. The cost advantage of developing in these countries will continue to attract work there.

Proactive test managers can head off some of the usability problems associated with offshoring. They can help bring about higher quality products even if their employers have not allowed for usability expertise on their projects. However, we should not have unrealistic expectations about what they can achieve. High quality, usable products can only be delivered consistently by organizations that have a commitment to usability and who integrate usability throughout the design process.

Offshoring suppliers will have a huge incentive to keep advancing into user centered design and usability consultancy. The increase in offshore development work creates the need for such services, whilst the specialist and advanced nature of the work gives these suppliers a highly attractive opportunity to move up the value chain, selling more expensive services on the basis of quality rather than cost.

The techniques I have suggested are certainly worthwhile, but they may prove no more than damage limitation. As Hienadz Drahun put it to me; “to get high design quality you need to have both a good initial design and a good amount of iterative usability evaluation. Iterative testing alone is not able to turn a bad product design into a good one. You need both.” Testers alone cannot build usability into an application, any more than they can build in quality.

Testers in the client countries will increasingly have to cope with the problems of working with offshore development. It is important that they learn how to work successfully with offshoring and adapt to it.

They will therefore have to be vigilant about the risks to usability of offshoring, and advise their employers and clients how testing can best mitigate these risks, both on a short term tactical level, i.e. on specific projects where there is no established usability framework, and in the longer term, where there is the opportunity to shape the contracts signed with offshore suppliers.

There will always be a need for test management expertise based in client countries, working with the offshore teams, but it will not be the same job we knew in the 90s.

References

[1] NASSCOM (2009). “Industry Trends, IT-BPO Sector-Overview”.

[2] Nielsen, J. (2002). “Offshore Usability”. Jakob Nielsen’s website.

[3] Lewis, C., Rieman, J. (1994). “Task-Centered User Interface Design: A Practical Introduction”. University of Colorado e-book.

[4] Artman, H. (2002). “Procurer Usability Requirements: Negotiations in Contract Development”. Proceedings of the second Nordic conference on Human-Computer Interaction (NordiCHI) 2002.

[5] Holmlid, S. Artman, H. (2003). “A Tentative Model for Procuring Usable Systems” (NB PDF download). 10th International Conference on Human-Computer Interaction, 2003.

[6] Grudin, J. (1991). “Interactive Systems: Bridging the Gaps Between Developers and Users”. Computer, April 1991, Vol. 24, No. 4 (subscription required).

[7] Grudin, J. (1996). “The Organizational Contexts of Development and Use”. ACM Computing Surveys. Vol 28, issue 1, March 1996, pp 169-171 (subscription required).

[8] Iyengar, J. (2007). “Usability Issues in Offshore Development: an Indian Perspective”. Usability Professionals Association Conference, 2007 (UPA membership required.

[9] Henry, P. (2003). “Advancing UCD While Facing Challenges Working from Offshore”. ACM Interactions, March/April 2003 (subscription required).

[10] Cronin D. (2004). “Designing for Offshore Development”. Cooper Journal blog.

[11] Nielsen, J. (1994). “Guerrilla HCI: Using Discount Usability Engineering to Penetrate the Intimidation Barrier”, Jakob Nielsen’s website.

[12] Krug, S. (2006). “Don’t Make Me Think!: A Common Sense Approach to Web Usability” 2nd edition. New Riders.

[13] Constantine, L. Lockwood, L. (1999). “Software for Use”. Addison Wesley.

[14] Lockwood, L. (1999). “Collaborative Usability Inspecting – more usable software and sites” (NB PDF download), Constantine and Lockwood website.

[15] Cooper, A. (2004). “The Inmates Are Running the Asylum: Why High-tech Products Drive Us Crazy and How to Restore the Sanity”. Sams.

The Post Office Horizon IT scandal, part 3 – audit, risk & perverse incentives

In the first post of this three part series about the scandal of the Post Office’s Horizon IT system I explained the concerns I had about the approach to errors and accuracy. In the second post I talked about my experience working as an IT auditor investigating frauds, and my strong disapproval for the way the Post Office investigated and prosecuted the Horizon cases. In this, the final part, I will look at the role of internal audit and question the apparent lack of action by the Post Office’s internal auditors.

Independence and access to information

There’s a further aspect to the Horizon scandal that troubles me as an ex-auditor. In 2012, after some pressure from a Parliamentary committee, the Post Office commissioned the forensic IT consultancy Second Sight to review Horizon. Second Sight did produce a report that was critical of the system but they could not complete their investigation and issue a final report. They were stymied by the Post Office’s refusal to hand over crucial documents, and they were eventually sacked in 2015. The Post Office ordered Second Sight to hand over or destroy all the evidence it had collected.

An experienced, competent IT audit team should have the technical expertise to conduct its own detailed system review. It was a core part of our job. I can see why in this case it made sense to bring in an outside firm, “for the optics”. However, we would have been keeping a very close eye on the investigation, assisting and co-operating with the investigators as we did with our external auditors. We would have expected the investigators to have the same access rights as we had, and these were very wide ranging.

We always had the right to scope our audits and investigations and we had the right to see any documents or data that we considered relevant. If anyone ever tried to block us we would insist, as a matter of principle, that they should be overruled. This was non-negotiable. If it was possible to stymie audits or investigations by a refusal to co-operate then we could not do our job. This is all covered in the professional standards of the Institute of Internal Auditors. The terms of reference for the Post Office’s Audit, Risk and Compliance Committee makes its responsibilities clear.

“The purpose of the charter will be to grant Internal Audit unfettered access to staff, data and systems required in the course of discharging its responsibilities to the Committee…

Ensure internal audit has unrestricted scope, the necessary resources and access to information to fulfil its mandate.”

I am sure that a good internal audit department, under the strong management that I knew, would have stepped in to demand access to the relevant records in the Horizon case on behalf of the external investigators, and would have pursued the investigation themselves if necessary. It’s inconceivable that we would have let the matter drop under management pressure.

Internal auditors must be independent of management, with a direct reporting line to the board to protect them from attempted intimidation. “Abdication of management responsibilities” was the nuclear phrase in our audit department. It was only to be used by the Group Chief Auditor. He put it in the management summary of one of my reports, referring to the UK General Manager. The explosion was impressive. It was the best example of audit independence I’ve seen. The General Manager stormed into the audit department and started aggressively haranguing the Chief Auditor, who listened calmly then asked. “Have you finished? Ok. The report will not be changed. Goodbye”. I was in awe. You can’t intimidate good auditors. They tend to be strong willed. The weak ones don’t last long, unless they’re part of a low grade and weak audit department that has been captured by the management.

Risk and bonuses

The role of internal audit in the private sector recognises the divergent interests of the executives and the owners. The priority of the auditors is the long term security and health of the company, which means they will often look at problems from a different angle than executives whose priority might be shaped by annual targets, bonuses and the current share price. The auditors keep an eye on the executives, who will often face a conflict of interest.

Humans struggle to think clearly about risk. Mechanical risk matrices like this one (from the Health and Safety Executive, the UK Government inspectorate responsible for regulating workplace safety) serve only to fog thinking. A near certain chance of trivial harm isn’t remotely the same as a trivial chance of catastrophic damage.

UK HSE risk matrix

UK HSE risk matrix

Senior executives may pretend they are acting in the interests of the company in preventing news of a scandal emerging but their motivation could be the protection of their jobs and bonuses. The company’s true, long term interests might well require early honesty and transparency to avoid the risk of massive reputational damage further down the line when the original scandal is compounded by dishonesty, deflection and covering up. By that time the executives responsible may have moved on, or profited from bonuses they might not otherwise have received.

A recurring theme in the court case was that the Post Office’s senior management, especially Paula Vennells, the chief executive from 2012 to 2019, simply wanted the problem to go away. Their perception seems to have been that the real problem was the litigation, rather than the underlying system problems and the lives that were ruined.

In an email, written in 2015 before she appeared in front of a Parliamentary committee, Vennells wrote.

“Is it possible to access the system remotely?

What is the true answer? I hope it is that we know it is not possible and that we are able to explain why that is. I need to say no it is not possible and that we are sure of this because of xxx [sic] and we know this because we had the system assured.”

Again, in 2015, Vennells instructed an urgent review in response to some embarrassingly well informed blog posts, mainly about the Dalmellington Bug, by a campaigning former sub-postmaster, Tim McCormack. Vennells made it clear what she expected from the review.

“I’m most concerned that we/our suppliers appear to be very lax at handling £24k. And want to know we’ve rectified all the issues raised, if they happened as Tim explains.”

These two examples show the chief executive putting pressure on reviewers to hunt for evidence that would justify the answer she wants. It would be the job of internal auditors to tell the unvarnished truth. No audit manager would frame an audit in such an unprofessional way. Reviews like these would have been automatically assigned to IT auditors at the insurance company where I worked. I wonder who performed them at the Post Office.

When the Horizon court case was settled Vennells issued a statement, an apology of sorts.

“I am pleased that the long-standing issues related to the Horizon system have finally been resolved. It was and remains a source of great regret to me that these colleagues and their families were affected over so many years. I am truly sorry we were unable to find both a solution and a resolution outside of litigation and for the distress this caused.”

That is inadequate. Expressing regret is very different from apologising. I also regret that these lives were ruined, but I hardly have any responsibility. Vennells was “truly sorry” only for the litigation and its consequences, although that litigation was what offered the victims hope and rescue.

Vennells resigned from her post in the spring of 2019, eight months before the conclusion of the Horizon court case. In her last year as chief executive Vennells earned £717,500, only £800 less than the previous year. She lost part of her bonus because the Post Office was still mired in litigation, but it hardly seems to have been a punitive cut. Over the course of her seven years as chief executive, according to the annual reports, she earned £4.5 million, half of which came in the form of bonuses. In that last year when she was penalised for the ongoing litigation she still earned £389,000 in bonuses.

These bonuses are subject to clawback clauses (according to the annual reports, available at the last link);

“which provide for the return of any over-payments in the event of misstatement of the accounts, error or gross misconduct on the part of an Executive Director.”

Bonuses for normal workers reflect excellent performance. In the case of chief executives the criterion seems to be “not actually criminal”.

I have dismissed the risk matrix above for being too mechanical and simplistic. There’s a further criticism; it ignores the time it takes for risks to materialise into damage. A risk that is highly unlikely in any particular year might be almost certain over a longer period. It depends how you choose to frame the problem. To apply a crude probability calculation, if the chance of a risk blowing up in a single year is 3%, then there is a 53% chance it will happen at some point over 25 years. If a chief executive is in post for seven years, as Paula Vennells was, there is only a 19% chance of that risk occurring.

These are crude calculations, but there is an important and valid underlying point; a risk that might be intolerable to the organisation might be perfectly acceptable to a chief executive who is incentivised to maximise earnings through bonuses, and push troubling risks down the line for someone else to worry about.

No organisation should choose to remain in the intolerable risk cell, yet Vennells took the Post Office there and it probably made financial sense for her. The Post Office was very likely to lose the Horizon litigation, with massive damage. It wouldn’t happen while she was in post, and it would be extremely unlikely that fighting the case aggressively would be regarded as gross misconduct.

Perverse incentives often tempt managers, and also politicians, to ignore the possibility of dreadful outcomes that are unlikely while they are in post and would force them to incur expense or unpopularity to prepare for. The odds are good that irresponsible management will be rewarded for being wrong and will have left with their hefty bonuses before disaster strikes. On the other hand you can get sacked for doing the right thing long before justification is obvious.

This is, or at least it should be, a big issue for internal auditors who have to keep a sharp eye on risk and misaligned incentives. All too often the only people with a clear eyed, dispassionate understanding of risk are those who are gaming the corporate system. The Post Office’s internal auditors fell down on the job here. Even setting aside the human tragedies, the risks to the Post Office posed by the Horizon system and the surrounding litigation should have been seen as intolerable.

Role of internal audit when organisations move from the public to private sector

This all raises questions about corporate governance and the role of internal audit in bodies like the Post Office that sit between the public and private sectors. The Post Office is owned by the UK government, but with a remit of turning itself into a self-sustaining company without government subsidy. The senior executives were acting like private sector management, but with internal auditors who had a public sector culture, focusing on value for money and petty fraud. There are endless examples of private sector internal auditors losing sight of the big picture. However, a good risk-based audit department will always be thinking of those big risks that could take the company down.

Public bodies are backed by the government and can’t fail in the same way as a private company. When they move into the private sector, the management culture and remuneration system exposes the organisation to a new world of risks. So how do their internal auditors respond? In the case of the Post Office the answer is; badly. The problems were so serious that the internal auditors would have had a professional responsibility to bypass the senior executives and escalate them to board level, and to the external auditors. There is no sign that it happened. The only conclusion is that the Post Office’s internal auditors were either complicit in the Horizon scandal, or negligent. At best, they were taking their salaries under false pretences.

Conclusion

At almost every step, over many years, the Post Office handled the Horizon scandal badly, inexcusably so. They could hardly have done worse. There will be endless lessons that can, and will be drawn, from detailed investigation in what must be the inevitable inquiry. However, for software testers and for IT auditors the big lesson they should take to heart is that bad software, and dysfunctional corporate practices, hurt people and damage lives. The Post Office’s subpostmasters were hard working, decent, business people trying to make a living and provide for their family. They were ruined by a cynical, incompetent corporation. They will receive substantial compensation, but it’s hardly enough. They deserve better.in

The dragons of the unknown; part 5 – accident investigations and treating people fairly

Introduction

This is the fifth post in a series about problems that fascinate me, that I think are important and interesting. The series draws on important work from the fields of safety critical systems and from the study of complexity, specifically complex socio-technical systems. This was the theme of my keynote at EuroSTAR in The Hague (November 12th-15th 2018).

The first post was a reflection, based on personal experience, on the corporate preference for building bureaucracy rather than dealing with complex reality, “Facing the dragons part 1 – corporate bureaucracies”. The second post was about the nature of complex systems, “part 2 – crucial features of complex systems”. The third followed on from part 2, and talked about the impossibility of knowing exactly how complex socio-technical systems will behave with the result that it is impossible to specify them precisely, “part 3 – I don’t know what’s going on”.

The fourth post “part 4 – a brief history of accident models” looks at accident models, i.e. the way that safety experts mentally frame accidents when they try to work out what caused them. This post looks at weaknesses of of the way that we have traditionally investigated accidents and failures, assuming neat linearity with clear cause and effect. In particular, our use of root cause analysis, and willingness to blame people for accidents is hard to justify.

The limitations of root cause analysis

root cause (fishbone) diagram

Once you accept that complex systems can’t have clear and neat links between causes and effects then the idea of root cause analysis becomes impossible to sustain. “Fishbone” cause and effect diagrams (like those used in Six Sigma) illustrate traditional thinking, that it is possible to track back from an adverse event to find a root cause that was both necessary and sufficient to bring it about.

The assumption of linearity with tidy causes and effects is no more than wishful thinking. Like the Domino Model (see “part 4 – a brief history of accident models”) it encourages people to think there is a single cause, and to stop looking when they’ve found it. It doesn’t even offer the insight of the Swiss Cheese Model (also see part 4) that there can be multiple contributory causes, all of them necessary but none of them sufficient to produce an accident. That is a key idea. When complex systems go wrong there is rarely a single cause; causes are necessary, but not sufficient.

complex airline system

Here is a more realistic depiction of what a complex socio-technical system. It is a representation of the operations control system for an airline. The specifics don’t matter. It is simply a good illustration of how messy a real, complex system looks when we try to depict it.

This is actually very similar to the insurance finance applications diagram I drew up for Y2K (see “part 1 – corporate bureaucracies”). There was no neat linearity. My diagram looked just like this, with a similar number of nodes, or systems most of which had multiple two-way interfaces with others. And that was just at the level of applications. There was some intimidating complexity within these systems.

As there is no single cause of failure the search for a root cause can be counter-productive. There are always flaws, bugs, problems, deviances from process, variations. So you can always fix on something that has gone wrong. But it’s not really a meaningful single cause. It’s arbitrary.

The root cause is just where you decide to stop looking. The cause is not something you discover. It’s something you choose and construct. The search for a root cause can mean attention will focus on something that is not inherently dangerous, something that had previously “failed” repeatedly but without any accident. The response might prevent that particular failure and therefore ensure there’s no recurrence of an identical accident. However, introducing a change, even if it’s a fix, to one part of a complex system affects the system in unpredictable ways. The change therefore creates new possibilities for failure that are unknown, even unknowable.

It’s always been hard, even counter-intuitive, to accept that we can have accidents & disasters without any new failure of a component, or even without any technical failure that investigators can identify and without external factors interfering with the system and its operators. We can still have air crashes for which no cause is ever found. The pressure to find an answer, any plausible answer, means there has always been an overwhelming temptation to fix the blame on people, on human error.

Human error – it’s the result of a problem, not the cause

If there’s an accident you can always find someone who screwed up, or who didn’t follow the rules, the standard, or the official process. One problem with that is the same applies when everything goes well. Something that troubled me in audit was realising that every project had problems, every application had bugs when it went live, and there were always deviations from the standards. But the reason smart people were deviating wasn’t that they were irresponsible. They were doing what they had to do to deliver the project. Variation was a sign of success as much as failure. Beating people up didn’t tell us anything useful, and it was appallingly unfair.

One of the rewarding aspects of working as an IT auditor was conducting post-implementation reviews and being able to defend developers who were being blamed unfairly for problem projects. The business would give them impossible jobs, complacently assuming the developers would pick up all the flak for the inevitable problems. When auditors, like me, called them out for being cynical and irresponsible they hated it. They used to say it was because I had a developer background and was angling for my next job. I didn’t care because I was right. Working in a good audit department requires you to build up a thick skin, and some healthy arrogance.

There always was some deviation from standards, and the tougher the challenge the more obvious they would be, but these allegedly deviant developers were the only reason anything was delivered at all, albeit by cutting a few corners.

It’s an ethical issue. Saying the cause of an accident is that people screwed up is opting for an easy answer that doesn’t offer any useful insights for the future and just pushes problems down the line.

Sidney Dekker used a colourful analogy. Dumping the blame on an individual after an accident is “peeing in your pants management” (PDF, opens in new tab).

“You feel relieved, but only for a short while… you start to feel cold and clammy and nasty. And you start stinking. And, oh by the way, you look like a fool.”

Putting the blame on human error doesn’t just stink. It obscures the deeper reasons for failure. It is the result of a problem, not the cause. It also encourages organisations to push for greater automation, in the vain hope that will produce greater safety and predictability, and fewer accidents.

The ironies of automation

An important part of the motivation to automate systems is that humans are seen as unreliable & inefficient. So they are replaced by automation, but that leaves the humans with jobs that are even more complex and even more vulnerable to errors. The attempt to remove errors creates fresh possibilities for even worse errors. As Lisanne Bainbridge wrote in a 1983 paper “The ironies of automation”;

“The more advanced a control system is… the more crucial may be the contribution of the human operator.”

There are all sorts of twists to this. Automation can mean the technology does all the work and operators have to watch a machine that’s in a steady-state, with nothing to respond to. That means they can lose attention & not intervene when they need to. If intervention is required the danger is that vital alerts will be lost if the system is throwing too much information at operators. There is a difficult balance to be struck between denying operators feedback, and thus lulling them into a sense that everything is fine, and swamping them with information. Further, if the technology is doing deeply complicated processing, are the operators really equipped to intervene? Will the system allow operators to override? Bainbridge makes the further point;

“The designer who tries to eliminate the operator still leaves the operator to do the tasks which the designer cannot think how to automate.”

This is a vital point. Systems are becoming more complex and the tasks left to the humans become ever more demanding. System designers have only a very limited understanding of what people will do with their systems. They don’t know. The only certainty is that people will respond and do things that are hard, or impossible, to predict. That is bound to deviate from formal processes, which have been defined in advance, but these deviations, or variations, will be necessary to make the systems work.

Acting on the assumption that these deviations are necessarily errors and “the cause” when a complex socio-technical system fails is ethically wrong. However, there is a further twist to the problem, summed up by the Law of Stretched Systems.

Stretched systems

Lawrence Hirschhorn’s Law of Stretched Systems is similar to the Fundamental Law of Traffic Congestion. New roads create more demand to use them, so new roads generate more traffic. Likewise, improvements to systems result in demands that the system, and the people, must do more. Hirschhorn seems to have come up with the law informally, but it has been popularised by the safety critical community, especially by David Woods and Richard Cook.

“Every system operates always at its capacity. As soon as there is some improvement, some new technology, we stretch it.”

And the corollary, furnished by Woods and Cook.

“Under resource pressure, the benefits of change are taken in increased productivity, pushing the system back to the edge of the performance envelope.”

Every change and improvement merely adds to the stress that operators are coping with. The obvious response is to place more emphasis on ergonomics and human factors, to try and ensure that the systems are tailored to the users’ needs and as easy to use as possible. That might be important, but it hardly resolved the problem. These improvements are themselves subject to the Law of Stretched Systems.

This was all first noticed in the 1990s after the First Gulf War. The US Army hadn’t been in serious combat for 18 years. Technology had advanced massively. Throughout the 1980s the army reorganised, putting more emphasis on technology and training. The intention was that the technology should ease the strain on users, reduce fatigue and be as simple to operate as possible. It didn’t pan out that way when the new army went to war. Anthony H. Cordesman and Abraham Wagner analysed in depth the lessons of the conflict. They were particularly interested in how the technology had been used.

“Virtually every advance in ergonomics was exploited to ask military personnel to do more, do it faster, and do it in more complex ways… New tactics and technology simply result in altering the pattern of human stress to achieve a new intensity and tempo of combat.”

Improvements in technology create greater demands on the technology – and the people who operate it. Competitive pressures push companies towards the limits of the system. If you introduce an enhancement to ease the strain on users then managers, or senior officers, will insist on exploiting the change. Complex socio-technical systems always operate at the limits.

This applies not only to soldiers operating high tech equipment. It applies also to the ordinary infantry soldier. In 1860 the British army was worried that troops had to carry 27kg into combat (PDF, opens in new tab). The load has now risen to 58kg. US soldiers have to carry almost 9kg of batteries alone. The Taliban called NATO troops “donkeys”.

These issues don’t apply only to the military. They’ve prompted a huge amount of new thinking in safety critical industries, in particular healthcare and air transport.

The overdose – system behaviour is not explained by the behaviour of its component technology

Remember the traditional argument that any system that was not determimistic was inherently buggy and badly designed? See “part 2 – crucial features of complex systems”.

In reality that applies only to individual components, and even then complexity & thus bugginess can be inescapable. When you’re looking at the whole socio-technical system it just doesn’t stand up.

Introducing new controls, alerts and warnings doesn’t just increase the complexity of the technology as I mentioned earlier with the MIG jet designers (see part 4). These new features add to the burden on the people. Alerts and error message can swamp users of complex systems and they miss the information they really need to know.

I can’t recommend strongly enough the story told by Bob Wachter in “The overdose: harm in a wired hospital”.

A patient at a hospital in California received an overdose of 38½ times the correct amount. Investigation showed that the technology worked fine. All the individual systems and components performed as designed. They flagged up potential errors before they happened. So someone obviously screwed up. That would have been the traditional verdict. However, the hospital allowed Wachter to interview everyone involved in each of the steps. He observed how the systems were used in real conditions, not in a demonstration or test environment. Over five articles he told a compelling story that will force any fair reader to admit “yes, I’d have probably made the same error in those circumstances”.

Happily the patient survived the overdose. The hospital staff involved were not disciplined and were allowed to return to work. The hospital had to think long and hard about how it would try to prevent such mistakes recurring. The uncomfortable truth they had to confront was that there were no simple answers. Blaming human error was a cop out. Adding more alerts would compound the problems staff were already facing; one of the causes of the mistake was the volume of alerts swamping staff making it hard, or impossible, to sift out the vital warnings from the important and the merely useful.

One of the hard lessons was that focussing on making individual components more reliable had harmed the overall system. The story is an important illustration of the maxim in the safety critical community that trying to make systems safer can make them less safe.

Some system changes were required and made, but the hospital realised that the deeper problem was organisational and cultural. They made the brave decision to allow Wachter to publicise his investigation and his series of articles is well worth reading.

The response of the safety critical community to such problems and the necessary trade offs that a practical response requires, is intriguing with important lessons for software testers. I shall turn to this in my next post, “part 6 – Safety II, a new way of looking at safety”.

The dragons of the unknown; part 2 – crucial features of complex systems

Introduction

This is the second post in a series about problems that fascinate me, that I think are important and interesting. The series draws on important work from the fields of safety critical systems and from the study of complexity, specifically complex socio-technical systems. This was the theme of my keynote at EuroSTAR in The Hague (November 12th-15th 2018).

The first post was a reflection, based on personal experience, on the corporate preference for building bureaucracy rather than dealing with complex reality, “The dragons of the unknown; part 1 – corporate bureaucracies”. This post is about the nature of complex systems and discusses some features that have significant implications for testing. We have been slow to recognise the implications of these features.

Complex systems are probabilistic (stochastic) not deterministic

A deterministic system will always produce the same output, starting from a given initial state and receiving the same input. Probabilistic, or stochastic, systems are inherently unpredictable and therefore non-deterministic. Stochastic is defined by the Oxford English Dictionary as “having a random probability distribution or pattern that may be analysed statistically but may not be predicted precisely.”

Traditionally, non-determinism meant a system was badly designed, inherently buggy, and untestable. Testers needed deterministic systems to do their job. It was therefore the job of designers to produce systems that were deterministic, and testers would demonstrate whether or not the systems met that benchmark. Any non-determinism meant a bug had to be removed.

Is that right or nonsense? Well, neither, or rather it depends on the context you choose. It depends what you choose to look at. You can restrict yourself to a context where determinism holds true, or you can expand your horizons. The traditional approach to determinism is correct, but only within carefully defined limits.

You can argue, quite correctly, that a computer program cannot have the properties of a true complex system. A program does what it’s coded to do: outputs can always be predicted from the inputs, provided you’re clever enough and you have enough time. For a single, simple program that is certainly true. A fearsomely complicated program might not be meaningfully deterministic, but we can respond constructively to that with careful design, and sensitivity to the needs of testing and maintenance. However, if we draw the context wider than individual programs the weaker becomes our confidence that we can know what should happen.

Once you’re looking at complex socio-technical systems, i.e. systems where people interact with complex technology, then any reasonable confidence that we can predict outcomes accurately has evaporated. These are the reasons.

Even if the system is theoretically still deterministic we don’t have brains the size of a planet, so for practical purposes the system becomes non-deterministic.

The safety critical systems community likes to talk about tractable and intractable systems. They know that the complex socio-technical systems they work with are intractable, which means that they can’t even describe with confidence how they are supposed to work (a problem I will return to). Does that rule out the possibility of offering a meaningful opinion about whether they are working as intended?

That has huge implications for testing artificial intelligence, autonomous vehicles and other complex technologies. Of course testers will have to offer the best information they can, but they shouldn’t pretend they can say these systems are working “as intended” because the danger is that we are assuming some artificial and unrealistic definition of “as intended” that will fit the designers’ limited understanding of what the system will do. I will be returning to that. We don’t know what complex systems will do.

In a deeply complicated system things will change that we are unaware of. There will always be factors we don’t know about, or whose impact we can’t know about. Y2K changed the way I thought about systems. Experience had made us extremely humble and modest about what we knew, but there was a huge amount of stuff we didn’t even know we didn’t know. At the end of the lengthy, meticulous job of fixing and testing we thought we’d allowed for everything, in the high risk, date sensitive areas at least. We were amazed how many fresh problems we found when we got hold of a dedicated mainframe LPAR, effectively our own mainframe, and booted it up with future dates.

We discovered that there were vital elements (operating system utilities, old vendor tools etc) lurking in the underlying infrastructure that didn’t look like they could cause a problem but which interacted with application code in ways we could not have predicted when run with Y2K dates. The fixed systems had run satisfactorily with overrides to the system date in test enviroments that were built to mirror production, but they crashed when they ran on a mainframe running at future system dates. We were experts, but we hadn’t known what we didn’t know.

The behaviour of these vastly complicated systems was indistinguishable from complex, unpredictable systems. When a test passes with such a system there are strict limits to what we should say with confidence about the system.

As Michael Bolton tweeted;

Michael Bolton's tweet“A ‘passing’ test doesn’t mean ‘no problem’. It means ‘no problem *observed*. This time. With these inputs. So far. On my machine’.”

So, even if you look at the system from a narrow technical perspective, the computerised system only, the argument that a good system has to be deterministic is weak. We’ve traditionally tested systems as if they were calculators, which should always produce the same answers from the same sequence of button presses. That is a limited perspective. When you factor in humans then the ideal of determinism disintegrates.

In any case there are severe limits to what we can say about the whole system from our testing of the components. A complex system behaves differently from the aggregation of its components. It is more than the sum. That brings us to an important feature of complex systems. They are emergent. I’ll discuss this in the next section.

My point here is that the system that matters is the wider system. In the case of safety critical systems, the whole, wider system decides whether people live or die.

Instead of thinking of systems as being deterministic, we have to accept that complex socio-technical systems are stochastic. Any conclusions we reach should reflect probability rather than certainty. We cannot know what will happen, just what is likely. We have to learn about the factors that are likely to tip the balance towards good outcomes, and those that are more likely to give us bad outcomes.

I can’t stress strongly enough that lack of determinism in socio-technical systems is not a flaw, it’s an intrinsic part of the systems. We must accept that and work with it. I must also stress that I am not dismissing the idea of determinism or of trying to learn as much as possible about the behaviour of individual programs and components. If we lose sight of what is happening within these it becomes even more confusing when we try to look at a bigger picture. Likewise, I am certainly not arguing against Test Driven Development, which is a valuable approach for coding. Cling to determinism whenever you can, but accept its limits – and abandon all hope that it will be available when you have to learn about the behaviour of complex socio-technical systems.

We have to deal with whole systems as well as components, and that brings me to the next point. It’s no good thinking about breaking the system down into its components and assuming we can learn all we need to by looking at them individually. Complex systems have emergent behaviour.

Complex systems are emergent; the whole is greater than the sum of the parts

It doesn’t make sense to talk of an H2O molecule being wet. Wetness is what you get from a whole load of them. The behaviour or the nature of the components in isolation doesn’t tell you about the behaviour or nature of the whole. However, the whole is entirely consistent with the elements. The H2O molecules are governed by the laws of the periodic table and that remains so regardless of whether they are combined. But once they are combined they become water, which is unquestionably wet and is governed by the laws of fluid dynamics. If you look at the behaviour of free surface water in the oceans under the influence of wind then you are dealing with a stochastic process. Individual waves are unpredictable, but reasonable predictions can be made about the behaviour of a long series of waves.

As you draw back and look at the wider picture, rather than the low level components you see that the components are combining in ways that couldn’t possibly have been predicted simply by looking at the components and trying to extrapolate.

Starlings offer another good illustration of emergence. These birds combine in huge flocks to form murmurations, amazing, constantly evolving aerial patterns that look as if a single brain is in control. The individual birds are aware of only seven others, rather than the whole murmuration. They concentrate on those neighbours and respond to their movements. Their behaviour isn’t any different from what they can do on their own. However well you understood the invidual starling and its behaviour you could not possibly predict what these birds do together.


Likewise with computer systems, even if all of the components are well understood and working as intended the behaviour of the whole is different from what you’d expect from simply looking at these components. This applies especially when humans are in the loop. Not only is the whole different from the sum of the parts, the whole system will evolve and adapt unpredictably as people find out what they have to do to make the system work, as they patch it and cover for problems and as they try to make it work better. This is more than a matter of changing code to enhance the system. It is about how people work with the system.

Safety is an emergent property of complex systems. The safety critical experts know that they cannot offer a meaningful opinion just by looking at the individual components. They have to look at how the whole system works.

In complex systems success & failure are not absolutes

Success & failure are not absolutes. A system might be flawed, even broken, but still valuable to someone. There is no right, simple answer to the question “Is it working? Are the figures correct?”

Appropriate answers might be “I don’t know. It depends. What do you mean by ‘working’? What is ‘correct’? Who is it supposed to be working for?”

The insurance finance systems I used to work on were notoriously difficult to understand and manipulate. 100% accuracy was never a serious, practicable goal. As I wrote in “Fix on failure – a failure to understand failure”;

“With complex financial applications an honest and constructive answer to the question ‘is the application correct?’ would be some variant on ‘what do you mean by correct?’, or ‘I don’t know. It depends’. It might be possible to say the application is definitely not correct if it is producing obvious garbage. But the real difficulty is distinguishing between the seriously inaccurate, but plausible, and the acceptably inaccurate that is good enough to be useful. Discussion of accuracy requires understanding of critical assumptions, acceptable margins of error, confidence levels, the nature and availability of oracles, and the business context of the application.”

I once had to lead a project to deliver a new sub-system that would be integrated into the main financial decision support system. There were two parallel projects, each tackling a different line of insurance. I would then be responsible for integrating the new sub-systems to the overall system, a big job in itself.

The other project manager wanted to do his job perfectly. I wanted to do whatever was necessary to build an acceptable system in time. I succeeded. The other guy delivered late and missed the implementation window. I had to carry on with the integration without his beautiful baby.

By the time the next window came around there were no developers available to make the changes needed to bring it all up to date. The same happened next time, and then the next time, and then… and eventually it was scrapped without ever going live.

If you compared the two sub-systems in isolation there was no question that the other man’s was far better than the one I lashed together. Mine was flawed but gave the business what they needed, when they needed it. The other was slightly more accurate but far more elegant, logical, efficient and lovingly crafted. And it was utterly useless. The whole decision support system was composed of sub-systems like mine, flawed, full of minor errors, needing constant nursing, but extremely valuable to the business. If we had chased perfection we would never have been able to deliver anything useful. Even if we had ever achieved perfection it would have been fleeting as the shifting sands of the operational systems that fed it introduced new problems.

The difficult lesson we had to learn was that flaws might have been uncomfortable but they were an inescapable feature of these systems. If they were to be available when the business needed them they had to run with all these minor flaws.

Richard Cook expanded on this point in his classic, and highly influential article from 1998 “How complex systems fail”. He put it succinctly.

“Complex systems run in degraded mode.”

Cook’s arguments ring true to those who have worked with complex systems, but it hasn’t been widely appreciated in the circles of senior management where budgets, plans and priorities are set.

Complex systems are impossible to specify precisely

SystemanticsCook’s 1998 paper is important, and I strongly recommend it, but it wasn’t quite ground breaking. John Gall wrote a slightly whimsical and comical book that elaborated on the same themes back in 1975. “Systemantics; how systems work and especially how they fail”. Despite the jokey tone he made serious arguments about the nature of complex systems and the way that organisations deal, and fail to deal, with them. Here is a selection of his observations.

“Large systems usually operate in failure mode.”

“The behaviour of complex systems… living or non-living, is unpredictable.”

“People in systems do not do what the system says they are doing.”

“Failure to function as expected is an intrinsic feature of systems.”

John Gall wrote that fascinating and hugely entertaining book more than forty years ago. He nailed it when he discussed the problems we’d face with complex socio-technical systems. How can we say the system is working properly if we neither know how it is working, or even how it is supposed to work? Or what the people are doing within the system?

The complex systems we have to deal with are usually socio-technical systems. They operate in a social setting, with humans. People make the systems work and they have to make decisions under pressure in order to keep the system running. Different people will do different things. Even the same person might act differently at different times. That makes the outcomes from such a system inherently unpredictable. How can we specify such a system? What does it even mean to talk of specifying an unpredictable system?

That’s something that the safety critical experts focus on. People die because software can trip up humans even when it is working smoothly as designed. This has received a lot of attention in medical circles. I’ll come back to that in a later post.

That is the reality of complex socio-technical systems. These systems are impossible to specify with complete accuracy or confidence, and certainly not at the start of any development. Again, this is not a bug, but an inescapable feature of complex socio-technical systems. Any failure may well be in our expectations, a flaw in our assumptions and knowledge, and not necessarily the system.

This reflected my experience with the insurance finance systems, especially for Y2K, and it was also something I had to think seriously about when I was an IT auditor. I will turn to that in my next post, “part 3 – I don’t know what’s going on”.

The dragons of the unknown; part 1 – corporate bureaucracies

Introduction

This is the first post in a series about problems that fascinate me, that I think are important and interesting. The series will draw on important work from the fields of safety critical systems and from the study of complexity, specifically complex socio-technical systems. I’m afraid I will probably dwell longer on problems than answers. One of the historical problems with software development and testing has been an eagerness to look for and accept easy, but wrong answers. We have been reluctant to face up to reality when we are dealing with complexity, which doesn’t offer simple or clear answers.

This was the theme of my keynote at EuroSTAR in The Hague (November 12th-15th 2018).

Complexity is intimidating and it’s tempting to pretend the world is simpler than it is. We’ve been too keen to try & reshape reality so that it will look like something we can manage neatly. That mindset often dovetails neatly with the pressures of corporate life and it is possible to go far in large organisations while denying and evading reality. It is, however, bullshit.

A bit about my background

When I left university I went to work for one of the big, international accountancy firms as a trainee chartered accountant. It was a bewildering experience. I felt clueless. I didn’t understand what was going on. I never did feel comfortable that I understood what we were doing. It wasn’t that I was dimmer than my colleagues. I was the only one who seemed to question what was going on and I felt confused. Everyone else took it all at face value but the work we were doing seemed to provide no value to anyone.

Alice managing flamingosAt best we were running through a set of rituals to earn a fee that paid our salaries. The client got a clean, signed off set of accounts, but I struggled to see what value the information we produced might have for anyone. None of the methods we used seemed designed to tell us anything useful about what our clients were doing. It all felt like a charade. I was being told to do things and I just couldn’t see how anything made sense. I may as well have been trying to manage that flamingo, from Alice in Wonderland. That picture may seem a strange one to use, but it appeals to me; it sums up my confusion well. What on earth was the point of all these processes? They might as well have been flamingos for all the use they seemed. I hadn’t a clue.

I moved to a life assurance company, managing the foreign currency bank accounts. That entailed shuffling tens of millions of dollars around the world every day to maximise the overnight interest we earned. The job had a highly unattractive combination of stress and boredom. A simple, single mistake in my projections of the cash flowing through the accounts on one day would cost far more than my annual salary. The projections weren’t an arithmetical exercise. They required judgment and trade offs of various factors. Getting it right produced a sense of relief rather than satisfaction.

The most interesting part of the job was using the computer systems to remotely manage the New York bank accounts (which was seriously cutting edge for the early 1980s) and discussing with the IT people how they worked. So I decided on a career switch into IT, a decision I never regretted, and arrived in the IT division of a major insurance company. I loved it. I think my business background got me more interesting roles in development, involving lots of analysis and design as well as coding.

After a few years I had the chance to move into computer audit, part of the group audit department. It was a marvellous learning experience, seeing how IT fitted into the wider business, and seeing all the problems of IT from a business perspective. That transformed my outlook and helped me navigate my way round corporate bureaucracies, but once I learned to see problems and irresponsible bullshit I couldn’t keep my mouth shut. I didn’t want to, and that’s because of my background, my upbringing, and training. I acquired the reputation for being an iconoclast, an awkward bastard. I couldn’t stand bullshit.

The rise of bullshit jobs

My ancestors had real jobs, tough, physical jobs as farmhands, stonemasons and domestic servants till the 20th century when they managed to work their way up into better occupations, like shopkeeping, teaching, sales interspersed with spells in the military during the two world wars. They were still real jobs, where it was obvious if you didn’t do anything worthwhile, if you weren’t achieving anything.

I had a very orthodox Scottish Presbyterian upbringing. We were taught to revere books and education. We should work hard, learn, stand our ground when we know we are right and always argue our case. We should always respect those who have earned respect, regardless of where they are in society.Scotty from Star Trek

In the original Star Trek Scotty’s accent may have been dodgy, but that character was authentic. It was my father’s generation. As a Star Trek profile puts it; “rank means nothing to Scotty if you’re telling him how to do his job”.

A few years ago Better Software magazine introduced an article I wrote by saying I was never afraid to voice my opinion. I was rather embarrassed when I saw that. Am I really opinionated and argumentative? Well, probably (definitely, says my wife). When I think that I’m right I find it hard to shut up. Nobody does righteous certainty better than Scottish Presbyterians! In that, at least, we are world class, but I have to admit, it’s not always an attractive quality (and the addictive yearning for certainty becomes very dangerous when you are dealing with complex systems). However, that ingrained attitude, along with my experience in audit, did prepare me well to analyse and challenge dysfunctional corporate practices, to challenge bullshit and there has never been any shortage of that.

Why did corporations embrace harmful practices? A major factor was that they had become too big, complex and confusing for anyone to understand what was going on, never mind exercise effective control. The complexity of the corporation itself is difficult enough to cope with, but the problems it faces and the environment it operates in have also become more complex.

I’m not advocating some radical Year Zero destruction of corporate bureaucracy. Large organisations are so baffling and difficult to manage that without some form of bureaucracy nothing would happen. All would be confusion and chaos. But it is difficult to keep the bureaucracy under control and in proportion to the organisation’s real needs and purpose. There is an almost irrestible tendency for the bureaucracy to become the master rather than the servant.

Long and painful, if educational, experience has allowed me to distill the lessons I’ve learned into seven simple statements.

  • Modern corporations, the environment they’re operating in and the problems they face are too complex for anyone to control or understand.
  • Corporations have been taken over by managers and run for their own benefit, rather than customers, shareholders, the workforce or wider society.
  • Managers need an elaborate bureaucracy to maintain even a semblance of control, though it’s only the bureaucracy they control, not the underlying reality.
  • These managers struggle to understand the jobs of the people who do the productive work.
  • So the managers value protocol and compliance with the bureaucracy over technical expertise.
  • The purpose of the corporate bureaucracy therefore becomes the smooth running of the bureaucracy.
  • Hence the proliferation of jobs that provide no real value and exist only so that the corporate bureaucracy can create the illusion of working effectively.

I have written about this phenomenon in a blog series “Corporate bureaucracy and testing” and also reflected in “Testing: valuable or bullshit?” on the specific threat to testing if it becomes a low skilled, low value corporate bullshit job.

The aspect of this problem that I want to focus on in this series is our desire to simplify complexity. We furnish simple explanations for complex problems. I did this as a child when I decided the wind was caused by trees waving their branches. My theory fitted what I observed, and it was certainly much easier for a five year old to understand than variations in atmospheric pressure. We also make convenient, but flawed, assumptions that turn a messy, confusing, complex problem into one that we are confident we can deal with. The danger is that in doing so we completely lose sight of the real problem while we focus on a construct of our own imagination. This is hardly a recent phenomenon.Guns of August

The German military planners of World War One provide a striking example of this escape from reality. They fully appreciated what a modern, industrial war would be like with huge armies and massively destructive armaments. The politicians didn’t get it, but according to “Barbara Tuchman” the German military staff did understand. They just didn’t know how to respond. So they planned to win the sort of war they were already familiar with, a 19th century war.

(General Moltke, the German Chief of Staff,) said to the Kaiser in 1906, ‘It will be a long war that will not be settled by a decisive battle but by a long wearisome struggle with a country that will not be overcome until its whole national force is broken, and a war that will utterly exhaust our own people, even if we are victorious.’ It went against human nature, however – and the nature of General Staffs – to follow the logic of his own prophecy. Amorphous and without limits, the concept of a long war could not be scientifically planned for as could the orthodox, predictable and simple solution of decisive battle and short war. The younger Moltke was already Chief of Staff when he made his prophecy, but neither he nor his Staff, nor the Staff of any other country made any effort to plan for a long war.

The military planners yearned for a problem that allowed an “orthodox, predictable and simple solution”, so they redefined the problem to fit that longing. The results were predictably horrific.

There is a phrase for the mental construct the military planners chose to work with; an “envisioned world” (PDF – opens in new tab). That paper, by David Woods, Paul Feltovich, Robert Hoffman, and Axel Roesler is a fairly short and clear introduction to the dangers of approaching complex systems with a set of naively simplistic assumptions. Our natural, human bias towards over-simplification has various features. In each case the danger is that we opt for a simplified perspective, rather than a more realistic one.

We like to think of activities as a series of discrete steps that can be analysed individually, rather than as continuous processes that cannot meaningfully be broken down. Similarly, we prefer to see processes as being separable and independent, rather than envisage them all interacting with the wider world. We are inclined to consider activities as if they were sequential when they actually happen simultaneously. We instinctively want to assume homogeneity rather than heterogeneity, so we mentally class similar things as if they were exactly the same, thus losing sight of nuance and important distinctions; we assume regularity when the reality is irregular. We look at elements as if there is only one perspective when there might be multiple viewpoints. We like to assume any rules or principles are universal when they might really be local and conditional, relevant only to the current context. We inspect the surface and shy away from considering deep analysis that might reveal awkward complications and subtleties.

These are all relevant considerations for testers, but there are three more that are all related and are particularly important when trying to learn how complex socio-technical systems work.

  • We look on problems as if they are static objects, when we should be thinking of them as dynamic, flowing processes. If we focus on the static then we lose sight of the problems or opportunities that might arise as the problem, or the application, changes over time or space.
  • We treat problems as if they are simple and mechanical, rather than organic with unpredictable, emergent properties. The implicit assumption is that we can know how whole systems will behave simply by looking at the behaviour of the components.
  • We pretend that the systems are subject to linear causes and effects with the same cause always producing the same effect. The possibilities of tipping points and cascading effects is ignored.

Complex socio-technical systems are not static, simple or linear. Testers have to recognise that and frame their testing to take account of the reality, that these systems are dynamic, organic and non-linear. If they don’t and if they try to restrict themselves to the parts of the system that can be treated as mechanical rather than truly complex, the great danger is that testing will become just another pointless, bureaucratic job producing nothing of any real value. I have worked both as an external auditor and an internal auditor. Internal audit has a focus and a mindset that allows it to deliver great value, when it is done well. External audit has been plagued by a flawed business model that is struggling with the complexity of modern corporations and their accounts. The external audit model requires masses of inexperienced, relatively lowly paid staff, carrying out unskilled checking of the accounts and producing output of dubious value. The result can fairly be described as a crisis of relevance for external audit.

I don’t want to see testing suffer the same fate, but that is likely if we try to define the job as one that can be carried out by large squads of poorly skilled testers. We can’t afford to act as if the job is easy. That is the road to irrelevance. In order to remain relevant we must try to meet the real needs of those who employ us. That requires us to deal with the world as it is, not as we would like it to be.

My spell in IT audit forced me to think seriously about all these issues seriously for the first time. The audit department in which I worked was very professional and enlightened, with some very good, very bright people. We carried out valuable, risk-based auditing when that was at the leading edge of internal audit practice. Many organisations have still not caught up and are mired in low-value, low-skilled, compliance checking. That style of auditing falls squarely into the category of pointless, bullshit jobs. It is performing a ritual for the sake of appearances

My spell as an auditor transformed my outlook. I had to look at, and understand the bigger picture, how the various business critical applications fitted together, and what the implications were of changing them. We had to confront bullshitters and “challenge the intellectual inadequates”, as the Group Chief Auditor put it. We weren’t just allowed to challenge bullshit; it was our duty. Our organisational independence meant that nobody could pull rank on us, or go over our heads.

I never had a good understanding of what the company was doing with IT till I moved into audit. The company paid me enough money to enjoy a good lifestyle while I played with fun technology. As an auditor I had to think seriously about how IT kept the company competitive and profitable. I had to understand how everything fitted together, understand the risks we faced and the controls we needed.

I could no longer just say “well, shit happens” I had to think “what sort of shit?”, “how bad is it?”, “what shit can we live with?”, “what shit have we really, really got to avoid”, “what are the knock-on implications?”, “can we recover from it?”, “how do we recover?”, “what does ‘happen’ mean anyway?”, “who does it happen to?”, “where does it happen?”.

Everything that mattered fitted together. If it was stand alone, then it almost certainly didn’t matter and we had more important stuff to worry about. The more I learned the more humble I became about the limits of my knowledge. It gradually dawned on me how few people had a good overall understanding of how the company worked, and this lesson was hammered home when we reached Y2K.

When I was drafted onto the Y2K programme as a test manager I looked at the plans drawn up by the Y2K architects for my area, which included the complex finance systems on which I had been working. The plans were a hopelessly misleading over-simplification. There were only three broad systems defined, covering 1,175 modules. I explained that it was nonsense, but I couldn’t say for sure what the right answer was, just that it was a lot more.

I wrote SAS programs to crawl through the production libraries, schedules, datasets and access control records to establish all the links and outputs. I drew up an overview that identified 20 separate interfacing applications with 3,000 modules. That was a shock to management because it had already been accepted that there would not be enough time to test the lower number thoroughly.

My employers realised I was the only available person who had any idea of the complexity of both the technical and business issues. They put me in charge of the development team as well as the testers. That was an unusual outcome for a test manager identifying a fundamental problem. I might not have considered myself an expert, but I had proved my value by demonstrating how much we didn’t know. That awareness was crucial.

That Y2K programme might be 20 years ago but it was painfully clear at the time that we had totally lost sight of the complexity of these finance applications. I was able to provide realistic advice only because of my deep expertise and thereafter I was always uncomfortably aware that I never again had the time to acquire such deep knowledge.

These applications, for all their complexity, were at least rigidly bounded. We might not have known what was going on within them, but we knew where the limits lay. They were all internal to the corporation with a tightly secured perimeter. That is a different world from today. The level of complexity has increased vastly. Web applications are built on layers of abstraction that render the infrastructure largely opaque. These applications aren’t even notionally under the control of organisations in the way that our complex insurance applications were. That makes their behaviour impossible to control precisely, and even to predict as I will discussing in my next post, “part 2 – crucial features of complex systems”.

Precertification of low risk digital products by FDA

Occasionally I am asked why I use Twitter. “Why do you need to know what people have had for breakfast? Why get involved with all those crazies?”. I always answer that it’s easy to avoid the bores and trolls (all the easier if one is a straight, white male I suspect) and Twitter is a fantastic way of keeping in touch with interesting people, ideas and developments.

A good recent example was this short series of tweets from Griffin Jones.
This was the first I’d heard of the pre-certification program proposed by the Food and Drug Administration (FDA), the USA’s federal body regulating food, drugs and medical devices.

Griffin is worried that IT certification providers will rush to sell their services. My first reaction was to agree, but on consideration I’m cautiously more optimistic.

Precertification would be for organisations, not individuals. The certification controversy in software testing relates to certifying individuals through ISTQB. FDA precertification is aimed at organisations, which would need “an existing track record in developing, testing, and maintaining software products demonstrating a culture of quality and organizational excellence measured and tracked by Key Performance Indicators (KPIs) or other similar measures.” That quote is from the notification for the pilot program for the precertification scheme, so it doesn’t necessarily mean the same criteria would apply to the final scheme. However, the FDA’s own track record of highly demanding standards (no, not like ISO 29119) that are applied with pragmatism provides grounds for optimism.

Sellers of CMMi and TMMi consultancy might hope this would give them a boost, but I’ve not heard much about these in recent years. It could be a tough sell for consultancies to push these models at the FDA when it is wanting to adopt more lightweight governance with products that are relatively low risk to consumers.

The FDA action plan (PDF, opens in new tab) that announced the precertification program did contain a word that jumped out at me. The FDA will precertify companies “who demonstrate a culture of quality and organizational excellence based on objective criteria”.

“Objective” might provide an angle for ISO 29119 proponents to exploit. A standard can provide an apparently objective basis for reviewing testing. If you don’t understand testing you can check for compliance with the standard. In a sense that is objective. Checkers are not bringing their own subjective opinions to the exercise. Or are they? The check is based on the assumption that the standard is relevant, and that the exercise is useful. In the absence of any evidence of efficacy, and there is no such evidence for ISO 29119, then using ISO 29119 as the benchmark is a subjective choice. It is used because it makes the job easier; it facilitates checking for compliance, it has nothing to do with good testing.

“Objective” should mean something different, and more constructive, to the FDA. They expect evidence of testing to be sufficient in quality and quantity so that third parties would have to come to the same conclusion if they review it, without interpretation by the testers. Check out Griffin Jones’ talk about evidence on YouTube.


Incidentally, the FDA’s requirements are strikingly similar to the professional standards of the Institute of Internal Auditors (IIA). In order to form an audit opinion auditors must gather sufficient information that is “factual, adequate, and convincing so that a prudent, informed person would reach the same conclusions as the auditor.” The IIA also has an interesting warning in its Global Technology Audit Guide, “Management of IT Auditing“. It warns IT auditors of the pitfalls of auditing against standards or benchmarks that might be dated or useless just because they want something to “audit against”.

So will ISO, or some large consultancies, try to influence the FDA to endorse ISO 29119 on the grounds that it would provide an objective benchmark against which to assess testing? That wouldn’t surprise me at all. What would surprise me is if the FDA bought into it. I like to think they are too smart for that. I am concerned that some day external political pressure might force adoption of ISO 29119. There was a hint of that in the fallout from the problems with the US’s Healthcare.gov website. Politicians who are keen to see action, any action, in a field they don’t understand always worry me. That’s another subject, however, and I hope it stays that way.

Quantum computing; a whole new field of bewilderment

At primary school in London I was taught about different number bases, concentrating on binary arithmetic. This was part of an attempt in the 1960s to bring mathematics up to date. The introduction of binary to the curriculum was prompted by the realisation that computers would become hugely more significant. It was fascinating to realise that there was nothing inevitable or sacrosanct about using base 10 and I enjoyed it all.

When I started working in IT I quickly picked up the technical side. I understood binary and with a bit of work, keeping a clear head, everything made sense. It was mostly based on Boolean logic. Everything was true or false, on or off, pass or fail, 1 or 0.

Now this simple state of affairs applied only to the technology. When you started to deal with humans, with organisations, with messy social reality it was important to step back from a simple binary worldview. You can’t develop worthwhile applications, or test them, if you assume that the job simply requires stepping through a process that is akin to a computer program, with every decision being a straightforward binary, yes/no, pass/fail. I’ve talked about that at length elsewhere, eg here “binary opinions – yes or no?“.

Recently I’ve been reading about quantum computing. I’d vaguely known about the subject before. I knew it was a radically different approach to computing, but I hadn’t thought through just how radical the differences are, or the implications for testing. It’s not just testing of course. When a completely new form of computing comes on the scene, one that leaves all our expectations, assumptions and beliefs in tatters, there will be huge ramifications throughout IT. Traditional computers won’t vanish; there will still be plenty of jobs working with them. People will continue to specialise in specific areas of computing. The problem for testers will be that quantum computers are going to be used for new applications, to do things that were previously impossible, and traditional testing techniques won’t necessarily be readily transferable.

What’s so different about quantum computing?

I’m not going to try and provide an introduction to quantum computers. If you think you understand them and you haven’t done some serious study of quantum physics then you’re kidding yourself. And as Richard Feynman said;

“If you think you understand quantum physics, you don’t understand quantum physics.”

If a primary school introduction to binary arithmetic allowed me to get to grips with traditional, digital computers, then the equivalent for quantum computers would be at least an undergraduate degree in physics. Here is a good overview.

I’m just going to run through three features of quantum computers that make my head spin, that have persuaded me that everything I knew about computers will be useless and I will be as well prepared as a peasant trying to get to grips with Excel after stumbling through a time portal from the Middle Ages.

First, instead of bits we have qubits. Bits can be 0 or 1. Once set to a value they don’t change until we perform some operation. The possible states of a qubit are excited or relaxed, which could be considered equivalent to 0 or 1. The fundamental difference is that they can be both at the same time, or rather a mixture of the two. This property is superposition. However, when the qubit is measured it will be frozen as one or the other.

The second weird feature of qubits is that different qubits influence each other. In traditional computing if we operate on a bit then no other bits will change. If they do it’s because the logic of the program controls these changes. In quantum computing that doesn’t apply. Changing a qubit will cause other qubits to change. Qubits are not independent of each other; they are all linked to each other. This is called entanglement.

The third weird thing about quantum computing is that algorithms are probabilistic, not deterministic. A classical digital computer will run through an algorithm and produce the right answer, or rather it will produce a predictable answer that is consistent with the algorithm’s logic. A quantum computer won’t always give the right answer straight away. It might have to send the same input through the same process many times and the results should converge on the right answer, probably.

I’m telling you this not because I think it will give you any sort of useful understanding of quantum computing. A spot of humility is in order. I pass on this information in the hope you realise you have as little idea as I do.

In summary, quantum computing is moving way beyond dear old familiar Boolean logic. Boolean algebra is still valid, but it is only relevant to one part of a wider reality and quantum computers will allow us to explore beyond the Boolean boundaries.

I have done my share of white box testing, delving into the guts of applications to understand what is going on. I know I will never be in a position to do that with quantum computers, even assuming that they are in widespread use before I retire. I doubt if there are many, if any testers, who will be better placed than I am.

What will quantum computers be used for?

If white box testing poses a tough challenge what about black box testing? As I said earlier, quantum computers will be used for problems that traditional computers can’t help with. One application will be the transformation of cryptography. Quantum computing will probably render existing cryptographing techniques obsolete. Other likely uses will be new ways to search and interrogate massive databases, and financial modelling that requires fiendishly intricate calculations on huge volumes of data. These are the applications that caught my eye, because of my background in financial services. A big problem for testers will be, how do you evaluate the output if the computer is doing something that can’t be replicated by other means? This isn’t a new problem, but quantum computers will massively increase the difficulty.

Blind computing guiding the blind?

I wrote about the problems I’d encountered working with big data a few years ago. Our approach could be summed up simply as;

  • build a deep understanding of the domain and the data, and the essential relationships or rules,
  • get the business or domain experts heavily involved and pick their brains,
  • ensure you influence the design so that it is testable, ideally so that the solution enables testing of separate segments which can be isolated so you can bring your acquired knowledge and understanding to bear.

The first two points will always be relevant regardless of the technology. The third one has been picked up by quantum computing scientists who have developed a technique called blind quantum computing.

The blind technique involves running a separate, simpler(!), quantum computer called a verifier alongside the server running the test application. The verifier feeds the server with small, discreet calculations to perform. It does this in a way so that only the verifier, and the testers, can know what the answer should be, and the server knows only what operations it should perform. This is the simplest explanation of blind quantum computing I’ve been able to find.

An obvious observation is that this is hardly black box testing as we have known it. Who programs and controls the verifier? It is a quantum computer, and I doubt if ISTQB accreditation will ever get you very far here.

Fields of bewilderment

I don’t have any answers here, and I strongly suspect easy answers will never be available to testers who work with quantum computers. That, perhaps, is the point to take away from my article. Writing this has felt less like trying to spread some understanding of an immensely difficult subject, and more like an account of a bewildered ramble through my freshly expanded fields of ignorance.

I can’t state with confidence that software testers will ever move into quantum computing. Perhaps the serious testing will be done by quantum computing scientists. That raises its own dangers. Will they have the right instincts, an appropriate sceptical and questioning approach? There is already talk of quantum computers permitting the development of perfect software. Apparently they will be able to work their way through every conceivable permutation of data to establish that the application was built to its specification. Of course there is a huge, familiar old question being begged there; how does anyone know the spec was right?
Opening of the poem 'Youth and Hope' from which the quote 'perplexity is the beginning of knowledge' comes

I worry that good testers may be frozen out of working with quantum computers, and that the skills and questions they do have, and that will remain relevant, will be lost. Whatever happens I am confident that testers imbued with the principles of Context-Driven Testing will be better prepared than those who know only what they learned to pass ISTQB exams, or who are accustomed to following prescriptive standards. Whatever the future holds for us it will be interesting! After all, perplexity is the beginning of knowledge.

Presentations

This page will carry slide decks for presentations I’ve given at conferences – as I get round to adding them. All the presentations are in PDF format and open in new tabs.

Farewell to “pass or fail”?

I gave this presentation, “Farewell to ‘pass or fail’” at Expo:QA 2014 in Madrid in May 2014. It shows how auditing and software testing have faced similar challenges and how each profession can learn from the other.

Standards – promoting quality or restricting competition?

This presentation is the one that attracted so much attention and controversy at CAST 2014 in New York. It is a critique of software testing standards from the perspective of economics.

ISO – the dog that hasn’t barked

On August 12th I gave a talk at CAST 2014, the conference of the Association for Software Testing (AST) in New York, “Standards; promoting quality or restricting competition.” It was mainly about the new ISO 29119 software testing standard, though I also wove in arguments about ISTQB certification.


I was staggered at the response. Iain McCowatt asked what we should do in response. Karen Johnson proposed a petitition, which subsequently became two. Iain set up a petition through the International Society for Software Testing (ISST), directly targetted at ISO.

Karen’s petition is a more general manifesto for all professional testers to sign if they agree with its stance on certification and standards.

I strongly commend both the petition and the manifesto to all testers.

Eric Proegler also set up an AST special interest group to monitor and review the issue.

This action was not confined to the conference. In the last three weeks there has been a blizzard of activity and discussion on social media and throughout the testing community. Many people have blogged and spoken out. I gave a brief interview to Infoworld.com, and wrote an article for the uTest blog.

Huib Schoots has drawn together many of the relevant articles on his blog. It’s a valuable resource.

However, my own blog has been embarrassingly silent. I’ve not had time, till now, to write anything here. I’ve seen a big rise in traffic as people have hunted down articles I have previously written. August was the busiest month since I started the blog in 2010.

Significant and sustained opposition

Finally I have had a chance to set some thoughts down. I never expected my talk to get such a huge reaction. The response has not been because of what I said. It has happened because it came at the right time. I caught the mood. Many influential people, whom I respect, realised that it was time to speak out, and to do it in unison.

There has been significant and sustained opposition to ISO 29119 as it has developed over the years. However, it has been piecemeal. Individuals have argued passionately, authoritatively and persuasively against the standard, and they have been ignored.

The most interesting response since my talk, however, has been from the dog that didn’t bark, ISO. Neither ISO nor the 29119 working group has come out with a defence of the standard or the way that it was developed.

They have been accused of failing to provide a credible justification for the standard, of ignoring and excluding opponents, and of engaging in rent-seeking by promoting a factional interest as a generic standard.

And their response? Silence. In three weeks we have heard nothing. In their silence ISO have effectively conceded all the points that the Stop 29119 campaigners have been arguing.

There have been some sporadic and entirely unconvincing attempted defences of the standard, mixed up the with odd spot of risibly offensive ad hominem attacks. Collectively the weakness of these defences exposes ISO 29119 for the sham that it is.

Defence #1 – “It’s a standard”

This is possibly the weakest and most wrong-headed argument. ISO are trading on their brand, their image as a disinterested promoter of quality. It is disappointing how many people accept anything that ISO does at face value.

It is important that promoters of any standard justify it, demonstrate that it is relevant to those who will have to use it and that they enjoy a consensus amongst that community. In practice, ISO can spin the argument around.

Once ISO anoints a document with the magic word “standard” too many people suspend their own judgement. They look for opponents to justify their opposition, with reference to the detail they object to in the standard.

In the absence of such detailed arguments against the standard’s content people feel content with saying “it’s a standard, therefore it is a good thing”. That argument might impress those who know nothing about testing or ISO 29119. It lacks any credibility amongst those who know what they are talking about.

Defence #2 – “It’s better than nothing”

Whoops. Sorry. I have to backtrack rapidly. This argument is even worse than the last one. Something that is lousy is emphatically not better than nothing. Just doing something because it’s better than doing nothing? Oh dear. It’s like being lost in a strange city, without a map, and putting a blindfold over your eyes. Well, why not? You’re lost. You don’t know what you’re doing. Blindfolding yourself has to be better than nothing. Right? Oh dear.

This is the politicians’ fallacy, as explained in the classic comedy “Yes Minister”. Thanks to Scott Nickell for reminding me about it (see comments below).


If an organisation’s testing is so disorganised and immature that ISO 29119 might look like the answer then it is ripe ground for a far better approach. Incompetence is not a justification for wheeling in ISO 29119. It is a justification for doing a better job. Next argument please.

Defence #3 – “ISO 29119 doesn’t have to be damaging. It’s possible to work around it.”

The arguments in favour of ISO 29119 aren’t really getting a whole lot more convincing. Sure, you might be able to mitigate some of the worst effects by tailoring your compliance, or ignoring parts that seem particularly unhelpful. However, that is putting effort into ensuring you aren’t worse off than you are if you do nothing.

Also, if standards and rules are so detailed and impractical that they cannot be followed in full then it exposes people to arbitrary enforcement. Once users start tailoring the standard they will leave themselves open to the accusation that they did not comply in full. If things go well there will be no thanks for tailored compliance.

If there are problems, and there are always problems, then any post mortem will reveal that the standard wasn’t followed in full. The implication will be that the problems were caused by deviation, not that the deviation was the only reason anything worthwhile was achieved. Testers and developers will rightly fear that retribution will be arbitrary and quite unrelated to the level of care and professionalism they brought to their work.

Further, ISO 29119 is being marketed with an appeal to fear, as a way of protecting individuals’ backsides if things go wrong. If managers buy into the standard on that basis, are they really likely to take a chance with tailoring the standard?

ISO 29119 is also being pushed to buyers who don’t understand what good practice in testing means. Are they not likely to play safe by insisting in contracts and internal standards that ISO 29119 is complied with?

No, the possibility that we can “work around it” is not a credible argument in favour of ISO 29119.

Defence #4 – “The dissenters are just self-interested”

The standard is apparently a threat to our “craft” and to our businesses. Well, there’s more to it than that, but even if that were the only objection it would still be an entirely valid one. We have argued strenuously that the standard is anti-competitive. A riposte that we fear it will hit us financially is merely to concede the argument in a graceless way.

Anyway, if I were chasing the money I could have made a lot more, an awful lot more by taking lucrative test management contracts to do poor quality work managing the testing process on dysfunctional projects.

I can do great documentation. I am literate, organised and have the knack of getting tuned into corporate politics. My ability to churn out high quality, professional looking documents that would get the money in from clients was a great asset to my last employer.

Testing? Sorry, I forgot about that. I thought we were talking about documentation and money.

Defence #5 – “They’re whingers who wouldn’t get involved.”

This point ignores the argument that the ISO process for developing standards is carefully designed to exclude those who disagree. Again, the “whingers” line spins the argument around. It is not our responsibility to justify our failure to get involved with rent seekers and try to persuade their committees to desist.

I have seen the ISO 29119 working group’s schedule for meetings worldwide over the last few years. In the unlikely event that I would have been allowed to join, my expenses would have wiped out a huge slice of my income. It would certainly have cost me far more than we’ve spent on family holidays in that period. And for what? I’d have sacrificed all that time and money in order to be ignored. Those gallant souls who have tried to fight from the inside have been ground down and spat out by the system. That’s how the system works, how it is intended to work.

No, I didn’t get involved. I had better things to do with my time. In any case, it seems that my campaigning from the outside has not been ignored!

So what now?

We keep going. All this activity has just been the start. ISO is not going to withdraw the standard. However, the blogs, articles and petititions lay down a marker. It shows that the standard was not developed according to any plausible definition of consensus, and that it lacks credibility.

The opposition will strengthen the resolve of testers who wish to stop their organisations buying into the standard. It will provide ammunition to those who want to persuade lawyers and purchasing managers from requiring compliance to ISO 29119.

This one is going to run. We are not going away, and if ISO continue to ignore us then they will simply make themselves look foolish. Do they care? Or are they confident they have enough power and can just sail on regardless?