“Privileged accesses” – an insight into incompetence at Fujitsu and the Post Office

Recently I have been thinking and writing about corporate governance failings at the Post Office during the two decades of the Post Office Horizon scandal. Having worked in software development, testing and IT audit I have experience that is relevant to several aspects of the scandal. I have a further slice of experience I have not yet commented on publicly. That is largely because I should not talk about experiences with clients when I worked for IBM. However, I have decided to break that rule, and I feel justified for two reasons. Firstly, I think it offers a useful insight into failings at the Post Office and Fujitsu. Secondly, my clients all set, and met, a far higher standard than we have seen in the long-running Horizon scandal. Nothing I write will embarrass them or IBM, quite the opposite.

I keep going back to the management letter, [PDF, opens in new tab] issued by Ernst & Young (E&Y), the Post Office’s external auditors, following the 2011 audit. The letter was commented on in the Horizon Issues court case, Bates v Post Office Ltd (No 6: Horizon Issues), [PDF, opens in new tab].

To normal people this 43 page letter is incomprehensible and boring. It lists a series of major and minor problems with Fujitsu’s management of the IT service it provided to the Post Office. Only people who have worked in this field will feel comfortable interpreting the letter and its significance.

The letter draws attention to problems that E&Y came across in the course of their audit. As the introduction says.

“Our review of the company’s systems of internal control is carried out to help us express an opinion on the accounts of the company as a whole. This work is not primarily directed towards the discovery of weaknesses, the detection of fraud or other irregularities (other than those which would influence us in forming that opinion) and should not, therefore, be relied upon to show that no other weaknesses exist or areas require attention. Accordingly, the comments in this letter refer only to those matters that have come to our attention during the course of our normal audit work and do not attempt to indicate all possible improvements that a special review might develop.

E&Y did not conduct a full technical audit. They were concerned with assessing whether the financial accounts offered a true and fair view of the financial position of the company. Their assessment of internal control was only sufficiently detailed to allow them to form an opinion on the company accounts.

It is, or it should be, monumentally embarrassing for the internal auditors if the external auditors find long-standing control problems. The internal auditors should have the staff, expertise and time to detect these problems and ensure that they are resolved long before external auditors spot them. The external auditors are around for only a few weeks or months, and it is not their primary responsibility to find problems like this. I wrote about this from the perspective of an IT auditor last year (see section “Superusers going ‘off piste'”).

The specific issue in the management letter that rightly attracted most attention in the Horizon Issues’ case was the poor control over user IDs with high privilege levels. Not only did this highlight the need to improve Fujitu’s management of the IT service and the oversight provided by the Post Office, it also pointed to an ineffective internal audit function at the Post Office, and previously the Royal Mail before the Post Office was hived off.

When I was reading throught the E&Y management letter I was struck by how familiar the problems were. When I worked for IBM I spent three years as an information security manager. My background had been in software development, testing and IT audit. The contract on which I was working was winding down and one day my phone rang and I was made an interesting offer. Service Delivery Security wanted another information security manager to work with new outsourced accounts. My background showed I had a grasp of security issues, the ability to run projects, and a track record of working with clients without triggering unseemly brawls or litigation. So I was a plausible candidate. I would rely on the deeply technical experts and make sure that IBM and the client got what they wanted.

The job entailed working with the client right at the start of the outsourcing deal, for a few months either side of the cutover. An important responsibility was reaching agreement with the client about the detail of what IBM would provide.

All the issues relating to privileged access raised by E&Y in their management letter were within my remit. The others, mainly change management, were dealt with by the relevant experts. Each outsourcing contract required us to reach agreement on the full detail of the service by a set date, typically within a few months of the service cutover. In one case we had to reach agreement before service even started. On the service cutover date all staff transferring to IBM were required to continue working to exactly the same processes and standards until they were told to do something new.

I had to set up a series of meetings and workshops with the client and work through the detail of the security service. We would agree all the tedious but vital details; password lengths and formats, the processes required for authorising and reviewing new accounts and access privileges, logging and review of accesses, security incident response actions. It went on and on.

For each item we would document the IBM recommended action or setting. Alongside that we had to record what the client was currently doing. Finally we would agree the client’s requirement for the future service. If the future requirement entailed work by IBM to improve on what the client was currently doing that would entail a charge. If the client wanted something lower than the IBM recommendation then it was important that we had evidence that IBM was required to do something we would usually regard as unsatisfactory. This happened only rarely, and with good reason. The typical reason was that the client’s business meant the risk did not justify the tighter, and more expensive, control.

We also had to ensure that all the mainframe systems and servers were inventoried, and the settings documented. That was a huge job, but I farmed that out to the unenthusiastic platform experts. For all these platforms and settings we also had to agree how they should be configured in future.

The next step, and my final involvement, would be to set up a project plan to make all the changes required to bring the service up to the standard that the client needed. A new project manager would come in to run that boring project.

After three clients I felt I had learned a lot but staying in the job was going to mean endless repetition of very similar assignments. I also had some disagreements about IBM’s approach to outsourcing security services that meant I was unlikely to get promoted. I was doing a very good job at my current level and it was clearly recognised that I would only cause trouble if I were given more power! It’s true. I would have done. So I secured a move back to test management.

I enjoyed those three years because it gave me the chance to work with some very interesting clients. These were big, blue chip names; AstraZeneca, Boots (the UK retailer), and Nokia (when they were utterly dominant in the mobile phone market). I don’t have any qualms about naming these clients because they were all very thorough, professional and responsible.

The contrast with the Post Office and Fujitsu is striking. Fujitsu won the Post Office outsourcing contract [PDF, opens in new tab] in 1996 for an initial eight years. Yet, 15 years later, by which time the contract had been extended twice, E&Y reported that Fujitsu had not set up the control regime IBM demanded we create, with client agreement, at the very start of an outsourcing contract. The problems had still not been fully resolved by 2015.

Getting these basics correct is vital if corporations want to show that they are in control of their systems. If users have high privilege levels without effective authorisation, logging and monitoring then the corporation cannot have confidence in its data, which can be changed without permission and without a record of who took what action. Nobody can have confidence in the integrity of the systems. That has clear implications for the Horizon scandal. The Post Office insisted that Horizon was reliable when the reality was that Fujitsu did not apply the controls to justify that confidence.

Fujitsu may have failed to manage the service properly, but the Post Office is equally culpable. Outsourcing an IT service is not a matter of handing over responsibility then forgetting about it. The service has to be specified precisely then monitored carefully and constantly.

Why were the two corporations so incompetent and so negligent for so long? Why were the Post Office and Fujitsu so much less responsible and careful than IBM, AstraZeneca, Boots and Nokia?

Why did the Royal Mail’s and subsequently the Post Office’s internal auditors not detect problems with the outsourced service and force through an effective response?

When I became an information security manager I was told a major reason we had to tie the service down tightly was in case we ended up in court. We had to be able to demonstrate that we were in control of the systems, that we could prove the integrity of the data and the processing. So why did Fujitsu and the Post Office choose not to act as responsibly?

I was working in a well-trodden field. None of the issues we were dealing with were remotely new. The appropriate responses were very familiar. They were the mundane basics that every company using IT has to get right. Lay observers might think that the outsourcing arrangement was responsible for the failure of management control by distancing user management from the service providers. That would be wrong. The slackness seen at Fujitsu is more likely to occur in an in-house operation that has grown and evolved gradually. An outsourcing agreement should mean that everything is tied down precisely, and that was my experience.

I have worked as an IT auditor, and I have been an information security manager on big outsourcing contracts. I know how these jobs should be done and it amazes me to see that one of our major rivals was able to get away with such shoddy practices at the very time I was in the outsourcing game. Fujitsu still has the Post Office contract. That is astonishing.

Bugs are about more than code

Bugs are about more than code

Introduction

Recently I have had to think carefully about the nature of software systems, especially complex ones, and the bugs they contain. In doing so my thinking has been guided by certain beliefs I hold about complex software systems. These beliefs, or principles, are based on my practical experience but also on my studies, which, as well as teaching me much that I didn’t know, have helped me to make sense of what I have done and seen at work. Here are three vital principles I hold to be true.

Principle 1

Complex systems are not like calculators, which are predictable, deterministic instruments, i.e. they will always give the same answer from the same inputs. Complex systems are not predictable. We can only predict what they will probably do, but we cannot be certain. It is particularly important to remember this when working with complex socio-technical systems, i.e. complex systems, in the wider sense, that include humans, which are operated by people or require people to make them work. That covers most, or all, complex software systems.

Principle 2

Complex systems are more than the sum of their parts, or at least they are different. A system can be faulty even if all the individual programs,or components, are working correctly. The individual elements can combine with each other, and with the users, in unexpected and possibly damaging ways that could not have been predicted from inspecting the components separately.

Conversely, a system can be working satisfactorily even if some of the components are flawed. This inevitably means that the software code itself, however important it is, cannot be the only factor that determines the quality of the system.

Principle 3

Individual programs in a system can produce harmful outcomes even if their code was written perfectly. The outcome depends on how the different components, factors and people work together over the life of the system. Perfectly written code can cause a failure long after it has been released when there are changes to the technical, legal, or commercial environment in which the system runs.

The consequences

Bugs in complex systems are therefore inevitable. The absence of bugs in the past does not mean they are absent now, and certainly not that the system will be bug free in the future. The challenge is partly to find bugs, learn from them, and help users to learn how they can use the system safely. But testers should also try to anticipate future bugs, how they might arise, where the system is vulnerable, and learn how and whether users and operators will be able to detect problems and respond. They must then have the communication skills to explain what they have found to the people who need to know.

What we must not do is start off from the assumption that particular elements of the system are reliable and that any problems must have their roots elsewhere in the system. That mindset tends to push blame towards the unfortunate people who operate a flawed system.

Bugs and the Post Office Horizon scandal

justice lost in the postOver the last few months I have spent a lot of time on issues raised by the Post Office Horizon scandal. For a detailed account of the scandal I strongly recommend the supplement that Private Eye has produced, written by Richard Brooks and Nick Wallis, “Justice lost in the post”.

When I have been researching this affair I have realised, time and again, how the Post Office and Fujitsu, the outsourced IT services supplier, ignored the three principles I outlined. While trawling through the judgment of Mr Justice Fraser in Bates v Post Office Ltd (No 6: Horizon Issues, i.e. the second of the two court cases brought by the Justice For Subpostmasters Alliance), which should be compulsory reading for Computer Science students, I was struck by the judge’s discussion of the nature of bugs in computer systems. You can find the full 313 page judgment here [PDF, opens in new tab].

The definition of a bug was at the heart of the second court case. The Post Office, and Fujitsu (the outsourced IT services supplier) argued that a bug is a coding error, and the word should not apply to other problems. The counsel for the claimants, i.e. the subpostmasters and subpostmistresses who had been victims of the flawed system, took a broader view; a bug is anything that means the software does not operate as users, or the corporation, expect.

After listening to both sides Fraser came down emphatically on the claimants’ side.

“26 The phrase ‘bugs, errors or defects’ is sufficiently wide to capture the many different faults or characteristics by which a computer system might not work correctly… Computer professionals will often refer simply to ‘code’, and a software bug can refer to errors within a system’s source code, but ‘software bugs’ has become more of a general term and is not restricted, in my judgment, to meaning an error or defect specifically within source code, or even code in an operating system.

Source code is not the only type of software used in a system, particularly in a complex system such as Horizon which uses numerous applications or programmes too. Reference data is part of the software of most modern systems, and this can be changed without the underlying code necessarily being changed. Indeed, that is one of the attractions of reference data. Software bug means something within a system that causes it to cause an incorrect or unexpected result. During Mr de Garr Robinson’s cross-examination of Mr Roll, he concentrated on ‘code’ very specifically and carefully [de Garr Robinson was the lawyer representing the Post Office and Roll was a witness for the claimants who gave evidence about problems with Horizon that he had seen when he worked for Fujitsu]. There is more to the criticisms levelled at Horizon by the claimants than complaints merely about bugs within the Horizon source code.

27 Bugs, errors or defects is not a phrase restricted solely to something contained in the source code, or any code. It includes, for example, data errors, data packet errors, data corruption, duplication of entries, errors in reference data and/or the operation of the system, as well as a very wide type of different problems or defects within the system. ‘Bugs, errors or defects’ is wide enough wording to include a wide range of alleged problems with the system.”

The determination of the Post Office and Fujitsu to limit the definition of bugs to source code was part of a policy of blaming users for all errors that were not obviously caused by the source code. This is clear from repeated comments from witnesses and from Mr Justice Fraser in the judgment. “User error” was the default explanation for all problems.

Phantom transactions or bugs?

This stance of blaming the users if they were confused by Horizon’s design was taken to an extreme with “phantom transactions”. These were transactions generated by the system but which were recorded as if they had been made by a user (see in particular paragraphs 209 to 214 of Fraser’s judgment).

In paragraph 212 Fraser refers to a Fujitsu problem report.

“However, the conclusion reached by Fujitsu and recorded in the PEAK was as follows:

‘Phantom transactions have not been proven in circumstances which preclude user error. In all cases where these have occurred a user error related cause can be attributed to the phenomenon.'”

This is striking. These phantom transactions had been observed by Royal Mail engineers. They were known to exist. But they were dismissed as a cause of problems unless it could be proven that user error was not responsible. If Fujitsu could imagine a scenario where user error might have been responsible for a problem they would rule out the possibility that a phantom transaction could have been the cause, even if the phantom had occurred. The PEAK (error report) would simply be closed off, whether or not the subpostmaster agreed.

This culture of blaming users rather than software was illustrated by a case of the system “working as designed” when its behaviour clearly confused and misled users. In fact the system was acting contrary to user commands. In certain circumstances if a user entered the details for a transaction, but did not commit it, the system would automatically complete the transaction with no further user intervention, which might result in a loss to the subpostmaster.

The Post Office, in a witness statement, described this as a “design quirk”. However, the Post Office’s barrister, Mr de Garr Robinson, in his cross-examination of Jason Coyne, an IT consultant hired by the subpostmasters, was able to convince Nick Wallis (one of the authors of “Justice lost in the post”) that there wasn’t a bug.

“Mr de Garr Robinson directs Mr Coyne to Angela van den Bogerd’s witness statement which notes this is a design quirk of Horizon. If a bunch of products sit in a basket for long enough on the screen Horizon will turn them into a sale automatically.

‘So this isn’t evidence of Horizon going wrong, is it?’ asks Mr de Garr Robinson. ‘It is an example of Horizon doing what it was supposed to do.’

‘It is evidence of the system doing something without the user choosing to do it.’ retorts Mr Coyne.

But that is not the point. It is not a bug in the system.”

Not a bug? I would contest that very strongly. If I were auditing a system with this “quirk” I would want to establish the reasons for the system’s behaviour. Was this feature deliberately designed into the system? Or was it an accidental by-product of the system design? Whatever the answer, it would be simply the start of a more detailed scrutiny of technical explanations, understanding of the nature of bugs, the reasons for a two-stage committal of data, and the reasons why those two stages were not always applied. I would not consider “working as designed” to be an acceptable answer.

The Post Office’s failure to grasp the nature of complex systems

A further revealing illustration of the Post Office’s attitude towards user error came in a witness statement provided for the Common Issues trial, the first of the two court cases brought by the Justice For Subpostmasters Alliance. This first trial was about the contractual relationship between the Post Office and subpostmasters. The statement came from Angela van den Bogerd. At the time she was People Services Director for the Post Office, but over the previous couple of decades she had been in senior technical positions, responsible for Horizon and its deployment. She described herself in court as “not an IT expert”. That is an interesting statement to consider alongside some of the comments in her witness statement.

“[78]… the Subpostmaster has complete control over the branch accounts and transactions only enter the branch accounts with the Subpostmaster’s (or his assistant’s) knowledge.

[92] I describe Horizon to new users as a big calculator. I think this captures the essence of the system in that it records the transactions inputted into it, and then adds or subtracts from the branch cash or stock holdings depending on whether it was a credit or debit transaction.”

“Complete control”? That confirms her admission that she is not an IT expert. I would never have been bold, or reckless, enough to claim that I was in complete control of any complex IT system for which I was responsible. The better I understood the system the less inclined I would be to make such a claim. Likening Horizon to a calculator is particularly revealing. See Principle 1 above. When I have tried to explain the nature of complex systems I have also used the calculator analogy, but as an illustration of what a complex system is not.

If a senior manager responsible for Horizon could use such a fundamentally mistaken analogy, and be prepared to insert it in a witness statement for a court case, it reveals how poorly equipped the Post Office management was to deal with the issues raised by Horizon. When we are confronted by complexity it is a natural reaction to try and construct a mental model that simplifies the problems and makes them understandable. This can be helpful. Indeed it is often essential if we are too make any sense of complexity. I have written about this here in my blog series “Dragons of the unknown”.

However, even the best models become dangerous if we lose sight of their limitations and start to think that they are exact representations of reality. They are no longer fallible aids to understanding, but become deeply deceptive.

If you think a computer system is like a calculator then you will approach problems with the wrong attitude. Calculators are completely reliable. Errors are invariably the result of users’ mistakes, “finger trouble”. That is exactly how senior Post Office managers, like Angela van den Bogerd, regarded the Horizon problems.

BugsZero

The Horizon scandal has implications for the argument that software developers can produce systems that have no bugs, that zero bugs is an attainable target. Arlo Belshee is a prominent exponent of this idea, of BugsZero as it is called. Here is a short introduction.

Before discussing anyone’s approach to bugs it is essential that we are clear what they mean by a bug. Belshee has a useful definition, which he provided in this talk in Singapore in 2016. (The conference website has a useful introduction to the talk.)

3:50 “The definition (of a bug) I use is anything that would frustrate, confuse or annoy a human and is potentially visible to a human other than the person who is currently actively writing (code).”

This definition is close to Justice Fraser’s (see above); “a bug is anything that means the software does not operate as users, or the corporation, expect”. However, I think that both definitions are limited.

BugsZero is a big topic, and I don’t have the time or expertise to do it justice, but for the purposes of this blog I’m happy to concede that it is possible for good coders to deliver exactly what they intend to, so that the code itself, within a particular program, will not act in ways that will “frustrate, confuse or annoy a human”, or at least a human who can detect the problem. That is the limitation of the definition. Not all faults with complex software will be detected. Some are not even detectable. Our inability to see them does not mean they are absent. Bugs can produce incorrect but plausible answers to calculations, or they can corrupt data, without users being able to see that a problem exists.

I speak from experience here. It might even be impossible for technical system experts to identify errors with confidence. It is not always possible to know whether a complex system is accurate. The insurance finance systems I used to work on were notoriously difficult to understand and manipulate. 100% accuracy was never a serious, practicable goal. As I wrote in “Fix on failure – a failure to understand failure”;

“With complex financial applications an honest and constructive answer to the question ‘is the application correct?’ would be some variant on ‘what do you mean by correct?’, or ‘I don’t know. It depends’. It might be possible to say the application is definitely not correct if it is producing obvious garbage. But the real difficulty is distinguishing between the seriously inaccurate, but plausible, and the acceptably inaccurate that is good enough to be useful. Discussion of accuracy requires understanding of critical assumptions, acceptable margins of error, confidence levels, the nature and availability of oracles, and the business context of the application.”

It is therefore misleading to define bugs as being potentially visible to users. Nevertheless, Belshee’s definition is useful provided that that qualification is accepted. However, in the same talk, Belshee goes on to make further statements I do not accept.

19:55 “A bug is an encoded developer mistake.”

28:50 “A bug is a mistake by a developer.”

This is a developer-centric view of systems. It is understandable if developers focus on the bugs for which they are responsible. However, if you look at the systems, and bugs, from the other end, from the perspective of users when a bug has resulted in frustration, confusion or annoyance, the responsibility for the problem is irrelevant. The disappointed human is uninterested in whether the problem is with the coding, the design, the interaction of programs or components, or whatever. All that matters is that the system is buggy.

There is a further complication. The coder may well have delivered code that was perfect when it was written and released. But perfect code can create later problems if the world in which it operates changes. See Principle 3 above. This aspect of software is not sufficiently appreciated; it has caused me a great deal of trouble in my career (see the section “Across time, not just at a point in time” in this blog, about working with Big Data).

Belshee does say that developers should take responsibility for bugs that manifest themselves elswhere, even if their code was written correctly. He also makes it clear, when talking about fault tolerant systems (17:22 in the talk above), that faults can arise “when the behaviour of the world is not as we expect”.

However he also says that the system “needs to work exactly as the developer thought if it’s going to recover”. That’s too high a bar for complex socio-technical systems. The most anyone can say, and it’s an ambitious target, is that the programs have been developed exactly as the developers intended. Belshee is correct at the program level; if the programs were not built as the developers intended then recovery will be very much harder. But at the system level we need to be clear and outspoken about the limits of what we can know, and about the inevitability that bugs are present.

If we start to raise hopes that systems might be perfect and bug-free because we believe that we can deliver perfectly written code then we are setting ourselves up for unpleasant recriminations when our users suffer from bugs. It is certainly laudable to eradicate sloppy and cavalier coding and it might be noble for developers to be willing to assume responsibility for all bugs. But it could leave them exposed to legal recriminations if the wider world believes that software developers can and should` ensure systems are free of bugs. This is where lawyers might become involved and that is why I’m unhappy about the name BugsZero, and the undeliverable promise that it implies.

Unnoticed and undetectable bugs in the Horizon case

The reality that a bug might be real and damaging but not detected by users, or even detectable by them, was discussed in the Horizon case.

“[972] Did the Horizon IT system itself alert Subpostmasters of such bugs, errors or defects… and if so how?

[973] Answer: Although the experts were agreed that the extent to which any IT system can automatically alert its users to bugs within the system itself is necessarily limited, and although Horizon has automated checks which would detect certain bugs, they were also agreed that there are types of bugs which would not be detected by such checks. Indeed, the evidence showed that some bugs lay undiscovered in the Horizon system for years. This issue is very easy, therefore, to answer. The correct answer is very short. The answer… is ‘No, the Horizon system did not alert SPMs’. The second part of the issue does not therefore arise.”

That is a significant extract from an important judgment. A senior judge directly addressed the question of system reliability and pronounced that he is satisfied that a complex system cannot be expected to have adequate controls to warn users of all errors.

This is more than an abstract, philosophical debate about proof, evidence and what we can know. In England there is a legal presumption that computer evidence is reliable. This made a significant contribution to the Horizon scandal. Both parties in a court case are obliged to disclose documents which might either support or undermine their case, so that the other side has a chance to inspect and challenge them. The Post Office and Fujitsu did not disclose anything that would have cast doubt on their computer evidence. That failure to disclose meant it was impossible for the subpostmasters being prosecuted to challenge the presumption that the evidence was reliable. The subpostmasters didn’t know about the relevant system problems, and they didn’t even know that that knowledge had been withheld from them.

Replacing the presumption of computer reliability

There are two broad approaches that can be taken in response to the presumption that computer evidence is reliable and the ease with which it can be abused, apart of course from ignoring the problem and accepting that injustice is a price worth paying for judicial convenience. England can scrap the presumption, which would require the party seeking to use the evidence to justify its reliability. Or the rules over disclosure can be strengthened to try and ensure that all relevant information about systems is revealed. Some blend of the two approaches seems most likely.

I have recently contributed to a paper entitled “Recommendations for the probity of computer evidence”. It has been submitted to the Ministry of Justice, which is responsible for the courts in England & Wales, and is available from the Digital Evidence and Electronic Signature Law Review.

The paper argues that the presumption of computer reliability should be replaced by a two stage approach when reliability is challenged. The provider of the data should first be required to provide evidence to demonstrate that they are in control of their systems, that they record and track all bugs, fixes, changes and releases, and that they have implemented appropriate security standards and processes.

If the party wishing to rely on the computer evidence cannot provide a satisfactory response in this first stage then the presumption of reliability should be reversed. The second stage would require them to prove that none of the failings revealed in the first stage might affect the reliability of the computer evidence.

Whatever approach is taken, IT professionals would have to offer an opinion on their systems. How reliable are the systems? What relevant evidence might there be that systems are reliable, or unreliable? Can they demonstrate that they are in control of their systems? Can they reassure senior managers who will have to put their name to a legal document and will be very keen to avoid the humiliation that has been heaped upon Post Office and Fujitsu executives, with the possibility of worse to come?

A challenging future

The extent to which we can rely on computers poses uncomfortable challenges for the English law now, but it will be an increasingly difficult problem for the IT world over the coming years. What can we reasonably say about the systems we work with? How reliable are they? What do we know about the bugs? Are we sufficiently clear about the nature of our systems to brief managers who will have to appear in court, or certify legal documents?

It will be essential that developers and testers are clear in their own minds, and in their communications, about what bugs are. They are not just coding errors, and we must try to ensure people outside IT understand that. Testers must also be able to communicate clearly what they have learned about systems, and they must never say or do anything that suggests systems will be free of bugs.

Testers will have to think carefully about the quality of their evidence, not just about the quality of their systems. How good is our evidence? How far can go in making confident statements of certainty? What do we still not know, and what is the significance of that missing knowledge? Much of this will be a question of good management. But organisations will need good testers, very good testers, who can explain what we know, and what we don’t know, about complex systems; testers who have the breadth of knowledge, the insight, and the communication skills to tell a convincing story to those who require the unvarnished truth.

We will need confident, articulate testers who can explain that a lack of certainty about how complex systems will behave is an honest, clear sighted, statement of truth. It is not an admission of weakness or incompetence. Too many people in IT have built good careers on bullshitting, on pretending they are more confident and certain than they have any right to be. Systems will inevitably become more complex. IT people will increasingly be dragged into litigation, and as the Post Office and Fujitsu executives have found, misplaced and misleading confidence and bluster in court have excruciating personal consequences. Unpleasant though these consequences are, they hardly compare with the tragedies endured by the subpostmasters and subpostmistresses, whose lives were ruined by corporations who insisted that their complex software was reliable.

The future might be difficult, even stressful, for software testers, but they will have a valuable, essential role to play in helping organisations and users to gain a better understanding of the fallible nature of software. To say the future will be interesting is an understatement; it will present exciting challenges and there should be great opportunities for the best testers.

The last straw – the project that convinced me to resign

Once upon a time there was a project, on which I was the test manager, which prompted me to take the drastic career change I had been mulling over for a year or so.

This project was with a large UK government department. I was the test manager for the supplier who also provided the development team. A rival IT services company ran the Service Delivery function. It was a very interesting (i.e. poisonous) three way relationship!

The development was a six month project to customise an off the shelf package that was the basis for a rule based expert system. It would help users to decide if applicants were eligible on health grounds for welfare payments. The idea was that we would develop a pilot for live running in one particular part of the organisation.

The crucial constraint was that the development had to be fast so that all parties could understand the base product better and learn the right lessons for the customisation required for the rest of the organisation. However, it would be a live application that would shape decisions which would have a huge impact on the lives of poor and vulnerable people. It was a serious matter, not just a throwaway prototype. There was also a strong emphasis on accessibility. If any corners were cut elsewhere there was no question of skimping on accessibility issues.

The stress on accessibility was a key insight into later problems. There were about 100 potential users for the pilot, which was expected to run for only a couple of years till it would be replaced by a more sophisticated version as part of the full programme. So I asked what accessibility needs the current staff had, so I could factor that into my risk assessment. Obviously, or so I thought, the needs of those staff should have the highest priority in testing.

I was bluntly told by the client that I was not allowed to ask that question. We had to assume that any type of user with special needs might be recruited within the lifetime of the application. That was fair enough, but it didn’t sit easily with the expectation that the development would be as fast as possible. It was clear that it would be better to deliver late, or even to fail altogether, rather than to produce an application with accessibility problems. The reason? This government department was responsible for ensuring that other organisations complied with accessibility legislation, so the client was insistent that everything must be done by the book.

I gradually became aware that this was a wider problem than concern about accessibility. At that time there had been a string of embarrassing stories in the British magazine Private Eye about government IT failures. These were well researched and highly critical articles. They were referred to occasionally and it dawned on me that the client was terrified of being in a future article.

When I say they were terrified I mean their fear dictated their whole approach to the job. Working in the private sector I had been used to seeing people take risks, doing what was necessary to get the project in. Sometimes these risks were reckless, sometimes they were shrewdly calculated. But I had never seen a client who was too terrified to take any reasonable risks.

The bottom line wasn’t that the pilot had to be on time, or to work, or to be within budget. The bottom line was that the client managers had to be bullet proof if things went wrong. They were so concerned about protecting themselves that they were refusing to create the conditions for success. They were almost guaranteeing failure, but a failure for which they would not be blamed.

The response of client management was to retain all of the normal processes and project governance. These were specified in the contract. Testing was required to produce the full set of horribly obsolete, IEEE 829 style documentation. Payment to the supplier was staged with each chunk of money being released as each document passed a quality gate.

The only concession to speed was that the project team was expected to do everything faster than normal.

I expended a huge amount of effort, working long hours, writing the test plans. One of my great assets to my employer was that I produced great, high quality, impressive looking documentation.

In this case the documents were produced on shifting sands because neither I nor the development team had an adequate understanding of either the base product or what the tailored version would look like. We were able to acquire that understanding later, once our documents had been written and approved. We were always working half-blind, catching up later with our understanding.

However, everyone was happy. At least everyone at a senior level was happy as my beautiful documents sailed through the quality gates. The client paid out the money, and we had a celebratory night out every time we got another payment.

Meanwhile the testing team were miserable. They were stressed as they worked hard trying to piece together their preparations using unhelpful and barely relevant plans. The situation was rescued by my fantastic deputy test manager. I was the better test manager, in a skilled corporate operator sense. She was extremely experienced and capable, and she was the better hands on test manager. We were happy to admit this, with wry smiles, to each other, and we made a good team in those difficult circumstances. While I was working hard to produce useless test plans that would nevertheless pay our salaries, she drew up a set of informal test plans that the testers could work from.

When it came to test execution we winged it frantically. The official plans were shelfware, and we had to constantly adapt and improvise using the informal plans my deputy had prepared. We could have done it so much better and less stressfully if we had been able to plan and prepare on a realistic basis for the specific problems of this application.

What were the main differences between the two sets of plans?

The official plans did not deal with risk. The client refused to acknowledge in writing the real risk that motivated them, the fear of bad publicity. They insisted, infuriatingly, that everything had to be fully tested. They would not prioritise the testing so we could adapt the plan and concentrate on the crucial, high priority testing if we started to run out of time. They would only sign off a test plan that pretended it would be possible to do everything in time. So our informal test plans had to allow for the inevitable, and assign priorities based on our growing knowledge of the client and the system.

The client also refused to accept that using a new package would inevitably make it hard to predict how long it would take to tailor and develop applications. We were not allowed any flexibility in the schedule. Yet we were obliged to follow the full formal process, until such point as they would concede that we could skip or adapt steps. They would never do so until it was too late, by which time we had already acted unilaterally.

A particular annoyance was the insistence that the detailed plans should identify the type of people required to develop and test the system at each stage of development, and plan for recruiting this set of perfect people. There was a basic team in place, with the skills and experience to do the job, more or less. The schedule meant we had to do the best we could with the people available. That was fine, but I had to waste ridiculous amounts of time arguing about whether they were the right people, or whether we should look for different people. There wasn’t the budget or the time to go looking for a different team, but the plans had to be drawn up on the basis that we would try to develop and test with a perfect team – if we wanted to get the plans signed off and to be paid.

Another source of friction was that the client had completely ignored the non-functional requirements, with the result that the supplier architects had drawn up a requirements specification that was no more than vague mush; a set of motherhood statements and untestable criteria that were useful for a service level agreement, but not testing.

We had to re-open the user workshops to address the non-functional requirements, so we could get some meaningful input from the users as a basis for our testing. This caused real trouble, because it was a direct, explicit statement by the test team that the official plans and process were inadequate, even though they had been signed off by the supplier and client. My depute and I did not budge on this point and we were insistent. It was clear that our stance was considered insulting to the professionalism of the client. I had already decided I was going to resign soon, so I was happy to take all the blame, and unpopularity, for rocking the boat.

What were the lessons learned by the client and supplier? Well, there were no useful lessons taken on board. We were only slightly late, the quality was acceptable given that it was a pilot, there were no serious accessibility issues and we didn’t miss our budget by too much. The fact that the project team, and especially the testers, hated the experience wasn’t relevant. Nor was it acknowledged that the only reason we achieved anything was because we didn’t follow the standard driven, officially documented plan.

That was a crucial insight into what happened. There were two projects in effect. There was the official one, a beautifully planned and documented project, and a messy, chaotic project. There was little relation between the two. The messy project was the one that mattered. It delivered. The pristine document driven project was the only one that was visible and judged, but it did nothing, except bring in the money.

I thought it was a ludicrous way to work, but it was clearly the way ahead at that client. It wasn’t that people were working like this because they thought it was better than Agile, which would have been well suited to that project, if there had been a willingness to do it properly. There was no conscious rejection of Agile. Client management seemed to think vaguely that Agile meant doing the same things, but magically somehow doing them faster. In that organisation people were following prescriptive processes and standards because it gave them a sense of protection when they failed. Perversely this way of working created stress and misery for most of the people on the project. Protection? I don’t think so.

Anyway, I had had enough of this style of working, so I handed in my notice and left. The full IT development programme at that client was scheduled to run for another three years. About a year later it was cancelled.

Teachers, children, testers and leaders (2013)

Testing Planet 2020This article appeared in the March 2013 edition of Testing Planet, which is published by the wonderful Ministry of Testing, one of the most exciting developments in software testing over the last 20 years.

I’m moving this article onto my blog from my website, which will shortly be decommissioned.teachers, children, testers and leaders The article was written in January 2013. Looking at it again I see that I was starting to develop arguments I fleshed out over the next couple of years as part of the Stop 29119 campaign against the testing standard, ISO 29119.

The article

“A tester is someone who knows things can be different” – Gerald Weinberg.

Leaders aren’t necessarily people who do things, or order other people about. To me the important thing about leaders is that they enable other people to do better, whether by inspiration, by example or just by telling them how things can be different – and better. The difference between a leader and a manager is like the difference between a great teacher and, well, the driver of the school bus. Both take children places, but a teacher can take children on a journey that will transform their whole life.

My first year or so in working life after I left university was spent in a fog of confusion. I struggled to make sense of the way companies worked; I must be more stupid than I’d always thought. All these people were charging around, briskly getting stuff done, making money and keeping the world turning; they understood what they were doing and what was going on. They must be smarter than me.

Gradually it dawned on me that very many of them hadn’t a clue. They were no wiser than me. They didn’t really know what was going on either. They thought they did. They had their heads down, working hard, convinced they were contributing to company profits, or at least keeping the losses down.

The trouble was their efforts often didn’t have much to do with the objectives of the organisation, or the true goals of the users and the project in the case of IT. Being busy was confused with being useful. Few people were capable of sitting back, looking at what was going on and seeing what was valuable as opposed to mere work creation.

I saw endless cases of poor work, sloppy service and misplaced focus. I became convinced that we were all working hard doing unnecessary, and even harmful, things for users who quite rightly were distinctly ungrateful. It wasn’t a case of the end justifying the means; it was almost the reverse. The means were only loosely connected to the ends, and we were focussing obsessively on the means without realising that our efforts were doing little to help us achieve our ends.

Formal processes didn’t provide a clear route to our goal. Following the process had become the goal itself. I’m not arguing against processes; just the attitude we often bring to them, confusing the process with the destination, the map with the territory. The quote from Gerald Weinberg absolutely nails the right attitude for testers to bring to their work. There are twin meanings. Testers should know there is a difference between what people expect, or assume, and what really is. They should also know that there is a difference between what is, and what could be.

Testers usually focus on the first sort of difference; seeing the product for what it really is and comparing that to what the users and developers expected. However, the second sort of difference should follow on naturally. What could the product be? What could we be doing better?

Testers have to tell a story, to communicate not just the reality to the stakeholders, but also a glimpse of what could be. Organisations need people who can bring clear headed thinking to confusion, standing up and pointing out that something is wrong, that people are charging around doing the wrong things, that things could be better. Good testers are well suited by instinct to seeing what positive changes are possible. Communicating these possibilities, dispelling the fog, shining a light on things that others would prefer to remain in darkness; these are all things that testers can and should do. And that too is a form of leadership, every bit as much as standing up in front of the troops and giving a rousing speech.

In Hans Christian’s Andersen’s story, the Emperor’s New Clothes, who showed a glimpse of leadership? Not the emperor, not his courtiers; it was the young boy who called out the truth, that the Emperor was wearing no clothes at all. If testers are not prepared to tell it like it is, to explain why things are different from what others are pretending, to explain how they could be better then we diminish and demean our profession. Leaders do not have to be all-powerful figures. They can be anyone who makes a difference; teachers, children. Or even testers.

Quality isn’t something, it provides something (2012)

Quality isn’t something, it provides something (2012)

Testing Planet 2020This article appeared in the July 2012 edition of Testing Planet, which is published by the wonderful Ministry of Testing, one of the most exciting developments in software testing over the last 20 years.

The article was written in June 2012, but I don’t think it has dated. It’s about the way we think and work with other people.ministry of testing logo These are timeless problems. The idea behind E-prime is particularly interesting. Dispensing with the verb “to be” isn’t something to get obsessive or ideological about, but testers should be aware of the important distinction between the way something is and the way it behaves. The original article had only four references so I have checked them, converted them to hyperlinks, and changing the link to Lera Boroditsky’s paper to a link to her TED talk on the same subject.

The article

Quality isn't something, it provides somethingA few weeks ago two colleagues, who were having difficulty working together, asked me to act as peacekeeper in a tricky looking meeting in which they were going to try and sort out their working relationship. I’ll call them Tony and Paul. For various reasons they were sparking off each and creating antagonism that was damaging the whole team.

An hour’s discussion seemed to go reasonably well; Tony talking loudly and passionately, while Paul spoke calmly and softly. Just as I thought we’d reached an accommodation that would allow us all to work together Tony blurted out, “you are cold and calculating, Paul, that’s the problem”.

Paul reacted as if he’d been slapped in the face, made his excuses and left the meeting. I then spent another 20 minutes talking Tony through what had happened, before separately speaking to Paul about how we should respond.

I told Tony that if he’d wanted to make the point I’d inferred from his comments, and from the whole meeting, then he should have said “your behaviour and attitude towards me throughout this meeting, and when we work together, strike me as cold and calculating, and that makes me very uncomfortable”.

“But I meant that!”, Tony replied. Sadly, he hadn’t said that. Paul had heard the actual words and reacted to them, rather than applying the more dispassionate analysis I had used as an observer. Paul meanwhile found Tony’s exuberant volatility disconcerting, and responded to him in a very studied and measured style that unsettled Tony.

Tony committed two sins. Firstly, he didn’t acknowledge the two way nature of the problem. It should have been about how he reacted to Paul, rather than trying to dump all the responsibility onto Paul.

Secondly, he said that Paul is cold and calculating, rather than acting in a way Tony found cold, and calculating at a certain time, in certain circumstances.

I think we’d all see a huge difference between being “something”, and behaving in a “something” way at a certain time, in a certain situation. The verb “to be” gives us this problem. It can mean, and suggest, many different things and can create fog where we need clarity.

Some languages, such as Spanish, maintain a useful distinction between different forms of “to be” depending on whether one is talking about something’s identity or just a temporary attribute or state.

The way we think obviously shapes the language we speak, but increasingly scientists are becoming aware of how the language we use shapes the way that we think. [See this 2017 TED talk, “How Language Shapes Thought”, by Lera Boroditsky]

The problem we have with “to be” has great relevance to testers. I don’t just mean treating people properly, however much importance we rightly attach to working successfully with others. More than that, if we shy away from “to be” then it helps us think more carefully and constructively as testers.

This topic has stretched bigger brains than mine, in the fields of philosophy, psychology and linguistics. Just google “general semantics” if you want to give your brain a brisk workout. You might find it tough stuff, but I don’t think you have to master the underlying concept to benefit from its lessons.

Don’t think of it as intellectual navel gazing. All this deep thought has produced some fascinating results, in particular something called E-prime, a form of English that totally dispenses with “to be” in all its forms; no “I am”, “it is”, or “you are”. Users of E-prime don’t simply replace the verb with an alternative. That doesn’t work. It forces you to think and articulate more clearly what you want to say. [See this classic paper by Kellogg, “Speaking in E-prime” PDF, opens in new tab].

“The banana is yellow” becomes “the banana looks yellow”, which starts to change the meaning. “Banana” and “yellow” are not synonyms. The banana’s yellowness becomes apparent only because I am looking at it, and once we introduce the observer we can acknowledge that the banana appears yellow to us now. Tomorrow the banana might appear brown to me as it ripens. Last week it would have looked green.

You probably wouldn’t disagree with any of that, but you might regard it as a bit abstract and pointless. However, shunning “to be” helps us to think more clearly about the products we test, and the information that we report. E-prime therefore has great practical benefits.

The classic definition of software quality came from Gerald Weinburg in his book “Quality Software Management: Systems Thinking”.

“Quality is value to some person”.

Weinburg’s definition reflects some of the clarity of thought that E-prime requires, though he has watered it down somewhat to produce a snappy aphorism. The definition needs to go further, and “is” has to go!

Weinburg makes the crucial point that we must not regard quality as some intrinsic, absolute attribute. It arises from the value it provides to some person. Once you start thinking along those lines you naturally move on to realising that quality provides value to some person, at some moment in time, in a certain context.

Thinking and communicating in E-prime stops us making sweeping, absolute statements. We can’t say “this feature is confusing”. We have to use a more valuable construction such as “this feature confused me”. But we’re just starting. Once we drop the final, total condemnation of saying the feature is confusing, and admit our own involvement, it becomes more natural to think about and explain the reasons. “This feature confused me … when I did … because of …”.

Making the observer, the time and the context explicit help us by limiting or exposing hidden assumptions. We might or might not find these assumptions valid, but we need to test them, and we need to know about them so we understand what we are really learning as we test the product.

E-prime fits neatly with the scientific method and with the provisional and experimental nature of good testing. Results aren’t true or false. The evidence we gather matches our hypothesis, and therefore gives us greater confidence in our knowledge of the product, or it fails to match up and makes us reconsider what we thought we knew. [See this classic paper by Kellogg & Bourland, “Working with E-prime – some practical notes” PDF, opens in new tab].

Scientific method cannot be accommodated in traditional script-driven testing, which reflects a linear, binary, illusory worldview, pretending to be absolute. It tries to deal in right and wrong, pass and fail, true and false. Such an approach fits in neatly with traditional development techniques which fetishise the rigours of project management, rather than the rigours of the scientific method.

map and road This takes us back to general semantics, which coined the well known maxim that the map is not the territory. Reality and our attempts to model and describe it differ fundamentally from each other. We must not confuse them. Traditional techniques fail largely because they confuse the map with the territory. [See this “Less Wrong” blog post].

In attempting to navigate their way through a complex landscape, exponents of traditional techniques seek the comfort of a map that turns messy, confusing reality into something they can understand and that offers the illusion of being manageable. However, they are managing the process, not the underlying real work. The plan is not the work. The requirements specification is not the requirements. The map is not the territory.

Adopting E-prime in our thinking and communication will probably just make us look like the pedantic awkward squad on a traditional project. But on agile or lean developments E-prime comes into its own. Testers must contribute constructively, constantly, and above all, early. E-prime helps us in all of this. It makes us clarify our thoughts and helps us understand that we gain knowledge provisionally, incrementally and never with absolute certainty.

I was not consciously deploying E-prime during and after the fractious meeting I described earlier. But I had absorbed the precepts sufficiently to instinctively realise that I had two problems; Tony’s response to Paul’s behaviour, and Paul’s response to Tony’s outburst. I really didn’t see it as a matter of “uh oh – Tony is stupid”.

E-prime purists will look askance at my failure to eliminate all forms of “to be” in this article. I checked my writing to ensure that I’ve written what I meant to, and said only what I can justify. Question your use of the verb, and weed out those hidden assumptions and sweeping, absolute statements that close down thought, rather than opening it up. Don’t think you have to be obsessive about it. As far as I am concerned, that would be silly!

Traditional techniques and motivating staff (2010)

traditional techniques & motivating staffTesting Planet 2020This article appeared in the February 2010 edition of Software Testing Club Magazine, now the Testing Planet. The STC has evolved into the wonderful Ministry of Testing, one of the most exciting developments in software testing over the last 20 years.

That might seem a low bar; testing isn’t meant to be a thrill a minute. But the Ministry of Testing has been a gale of fresh air sweeping through the industry, mixing great content and conferences with an approach to testing that has managed to be both entertaining and deeply serious. It has been a consistent voice of sanity and decency in an industry that has had too much cynicism and short sightedness.ministry of testing logo

I’m moving this article onto my blog fromy my website, which will shortly be decommissioned. Looking back I was interested to see that I didn’t post this article on the website immediately. I had some reservations about the article. I wondered if I had taken a rather extreme stance. I do believe that rigid standards and processes can be damaging, and I certainly believe that enforcing strict compliance, at the expense of initiative and professional judgement, undermines morale.

However, I thought I had perhaps gone too far and might have been thought to be dismissing the idea of any formality, and that I might be seen to be advocating software development as an entirely improvised activity with everyone winging it. That’s not the case. We need to have some structure, some shape and formality to our work. It’s just that prescriptive standards and processes aren’t sensitive to context and become a straitjacket. This was written in January 2010 and it was a theme I spent a good deal of time on when the ISO 29119 standard was released a few years later and the Stop 29119 campaign swung into action.

So I still largely stand by this article, though I think it is lacking in nuance in some respects. In particular the bald statement “development isn’t engineering”, while true does require greater nuance, unpacking and explanation. Development isn’t engineering in the sense that engineering is usually understood, and it’s certainly not akin to civil engineering. But it should aspire to be more “engineering like”, while remaining realistic about the reality of software development. I was particularly interested to see that I described reality as being chaotic in 2010, a couple of years before I started to learn about Cynefin.

The article

Do we follow the standards or use our initiative?

Recently I’ve been thinking and writing about the effects of testing standards. The more I thought, the more convinced I became that standards, or any rigid processes, can damage the morale, and even the professionalism, of IT professionals if they are not applied wisely.

The problem is that calling them “standards” implies that they are mandatory and should be applied in all cases. The word should be reserved for situations where compliance is essential, eg security, good housekeeping or safety critical applications.

I once worked for a large insurance company as an IT auditor in Group Audit. I was approached by Information Services. Would I consider moving to lead a team developing new management information (MI) applications? It sounded interesting, so I said yes.

On my first day in the new role I asked my new manager what I had to do. He stunned me when he said. “You tell me. I’ll give you the contact details for your users. Go and see them. They’re next in line to get an MI application. See what they need, then work out how you’re going to deliver it. Speak to other people to see how they’ve done it, but it’s up to you”.

The company did have standards and processes, but they weren’t rigid and they weren’t very useful in the esoteric world of insurance MI, so we were able to pick and choose how we developed applications.

My users were desperate for a better understanding of their portfolio; what was profitable, and what was unprofitable. I had no trouble getting a manager and a senior statistician to set aside two days to brief me and my assistant. There was just us, a flip chart, and gallons of coffee as they talked us through the market they were competing in, the problems they faced and their need for better information from the underwriting and claims applications with which they did business.

I realised that it was going to be a pig of a job to give them what they needed. It would take several months. However, I could give them about a quarter of what they needed in short order. So we knocked up a quick disposable application in a couple of weeks that delighted them, and then got to work on the really tricky stuff.

The source systems proved to be riddled with errors and poor quality data, so it took longer than expected. However, we’d got the users on our side by giving them something quickly, so they were patient.

It took so long to get phase 1 of the application working to acceptable tolerances that I decided to scrap phase 2, which was nearly fully coded, and rejig the design of the first part so that it could do the full job on its own. That option had been ruled out at the start because there seemed to be insurmountable performance problems.

Our experience with testing had shown that we could make the application run much faster than we’d thought possible, but that the fine tuning of the code to produce accurate MI was a nightmare. It therefore made sense to clone jobs and programs wholesale to extend the first phase and forget about trying to hack the phase 2 code into shape.

The important point is that I was allowed to take a decision that meant binning several hundred hours of coding effort and utterly transforming a design that had been signed off.

I took the decision during a trip to the dentist, discussed it with my assistant on my return, sold the idea to the users and only then did I present my management with a fait accompli. They had no problems with it. They trusted my judgement, and I was taking the users along with me.

The world changed and an outsourcing deal meant I was working for a big supplier, with development being driven by formal processes, rigid standards and contracts. This wasn’t all bad. It did give developers some protection from the sort of unreasonable pressure that could be brought to bear when relationships were less formal. However, it did mean that I never again had the same freedom to use my own initiative and judgement.

The bottom line was that it could be better to do the wrong thing for the corporately correct reason, than to do the right thing the “wrong” way. By “better” I mean better for our careers, and not better for the customers.

Ultimately that is soul destroying. What really gets teams fired up is when developers, testers and users all see themselves as being on the same side, determined to produce a great product.

Development isn’t engineering

Reality is chaotic. Processes are perfectly repeatable only if one pretends that reality is neat, orderly and predictable. The result is strain, tension and developers ordered to do the wrong things for the “right” reasons, to follow the processes mandated by standards and by the contract.

Instead of making developers more “professional” it has exactly the opposite effect. It reduces them to the level of, well, second hand car salesmen, knocking out old cars with no warranty. It’s hardly a crime, but it doesn’t get me excited.

Development and testing become drudgery. Handling the users isn’t a matter of building lasting relationships with fellow professionals. It’s a matter of “managing the stakeholders”, being diplomatic with them rather than candid, and if all else fails telling them “to read the ******* contract”.

This isn’t a rant about contractual development. Contracts don’t have to be written so that the development team is in a strait-jacket. It’s just that traditional techniques fit much more neatly with contracts than agile, or any iterative approach.

Procurement is much simpler if you pretend that traditional, linear techniques are best practice; if you pretend that software development is like civil engineering, and that developing an application is like building a bridge.

Development and testing are really not like that all. The actual words used should be a good clue. Development is not the same as construction. Construction is what you do when you’ve developed an idea to the point where it can be manufactured, or built.

Traditional techniques were based on that fundamental flaw; the belief that development was engineering, and that repeatable success required greater formality, more tightly defined processes and standards, and less freedom for developers.

Development is exploring

Good development is a matter of investigation, experimentation and exploration. It’s about looking at the possibilities, and evaluating successive versions. It’s not about plodding through a process document.

Different customers, different users and different problems will require different approaches. These various approaches are not radically different from each other, but they are more varied than is allowed for by rigid and formal processes.

Any organisation that requires development teams to adhere to these processes, rather than make their own judgements based on their experience and their users’ needs, frequently requires the developers to do the wrong things.

This is demoralising, and developers working under these constraints have the initiative, enthusiasm and intellectual energy squeezed out of them. As they build their careers in such an atmosphere they become corporate bureaucrats.

They rise to become not development managers, but managers of the development process; not test managers, but managers of the test process. Their productivity is measured in meetings and reports. Sometimes the end product seems a by-product of the real business; doing the process.

If people are to realise their potential they need managers who will do as mine did; who will say, “here is your problem, tell me how you’re going to solve it”.

We need guidance from processes and standards in the same way as we need guidance from more experienced practitioners, but they should be suggestions of possible approaches so that teams don’t flounder around in confused ignorance, don’t try to re-invent the wheel, and don’t blunder into swamps that have consumed previous projects.

If development is exploration it is thrilling and brings out the best in us. People rise to the challenge, learn and grow. They want to do it again and get better at it. If development means plodding through a process document it is a grind.

I know which way has inspired me, which way has given users applications I’ve been proud of. It wasn’t the formal way. It wasn’t the traditional way. Set the developers free!

Testers and coders are both developers (2009)

This article appeared in September 2009 as an opinion piece on the front page of TEST magazine‘s website. I’m moving the article onto my blog from my website, which will be decommissioned soon. It might be an old article, but it remains valid.

The article

testers and coders are both developersWhen I was a boy I played football non-stop; in organised matches, in playgrounds or in the park, even kicking coal around the street!

There was a strict hierarchy. The good players, the cool kids; they were forwards. The one who couldn’t play were defenders. If you were really hopeless you were a goalkeeper. Defending was boring. Football was about fun, attacking and scoring goals.

When I moved into IT I found a similar hierarchy. I had passed the programming aptitude test. I was one of the elite. I had a big head, to put it mildly! The operators were the defenders, the ones who couldn’t do the fun stuff. We were vaguely aware they thought the coding kids were irresponsible cowboys, but who cared?

As for testers, well, they were the goalkeepers. Frankly, they were lucky to be allowed to play at all. They did what they were told. Independence? You’re joking, but if they were good they were allowed to climb the ladder and become programmers.

Gradually things changed. Testers became more clearly identified with the users. They weren’t just menial team members. A clear career path opened up as testing professionals.

However, that didn’t earn them respect from programmers. Now testing is changing again. Agile gives testers the chance to learn and apply interesting coding skills. Testers can be just as technical as coders. They might code in different ways, for different reasons, but they can be coders too.

That’s great isn’t it? Well, up to a point. It’s fantastic that testers have these exciting opportunities. But I worry that programmers might start respecting the more technical testers for the wrong reason, and that testers who don’t code will still be looked down on. Testers shouldn’t try to impress programmers with their coding skills. That’s just a means to an end.

We’ll still need testers who don’t code and it’s vital that if testers are to achieve the respect they deserve then they must be valued for all the skills they bring to the team, not just the skills they share with programmers. For a start, we should stop referring to developers and testers. Testers always were part of the development process. In Agile teams they quite definitely are developers. It’s time everyone acknowledged that. Development is a team game.

Football teams who played the way we used to as kids got thrashed if they didn’t grow up and play as a team. Development teams who don’t ditch similar attitudes will be equally ineffective.

Bridging the gap – Agile and the troubled relationship between UX and software engineering (2009)

Bridging the gap – Agile and the troubled relationship between UX and software engineering (2009)

This article appeared as the cover story for the September 2009 edition of TEST magazine.

I’m moving the article onto my blog from my website, which will be decommissioned soon. If you choose to read the article please bear in mind that it was written in August 2009 and is therefore inevitably dated in some respects, though I think that much of it is still valid, especially where I am discussing the historical problems.

The article

TEST magazine coverFor most of its history software engineering has had great difficulty building applications that users found enjoyable. Far too many applications were frustrating and wasted users’ time.

That has slowly changed with the arrival of web developments, and I expect the spread of Agile development to improve matters further.

I’m going to explain why I think developers have had difficulty building usable applications and why user interaction designers have struggled to get their ideas across. For simplicity I’ll just refer to these designers and psychologists as “UX”. There are a host of labels and acronyms I could have used, and it really wouldn’t have helped in a short article.

Why software engineering didn’t get UX

Software engineering’s problems with usability go back to its roots, when geeks produced applications for fellow geeks. Gradually applications spread from the labs to commerce and government, but crucially users were wage slaves who had no say in whether they would use the application. If they hated it they could always find another job!

Gradually applications spread into the general population until the arrival of the internet meant that anyone might be a user. Now it really mattered if users hated your application. They would be gone for good, into the arms of the competition.

Software engineering had great difficulty coming to terms with this. The methods it had traditionally used were poison to usability. The Waterfall lifecycle was particularly damaging.

The Waterfall had two massive flaws. At its root was the implicit assumption that you can tackle the requirements and design up front, before the build starts. This led to the second problem; iteration was effectively discouraged.

Users cannot know what they want till they’ve seen what is possible and what can work. In particular, UX needs iteration to let analysts and users build on their understanding of what is required.

The Waterfall meant that users could not see and feel what an application was like until acceptance testing at the end when it was too late to correct defects that could be dismissed as cosmetic.

The Waterfall was a project management approach to development, a means of keeping control, not building good products. This made perfect sense to organisations who wanted tight contracts and low costs.

The desire to keep control and make development a more predictable process explained the damaging attempt to turn software engineering into a profession akin to civil engineering.

So developers were sentenced to 20 years hard labour with structured methodologies, painstakingly creating an avalanche of documentation; moving from the current physical system, through the current logical system to a future logical system and finally a future physical system.

However, the guilty secret of software engineering was that translating requirements into a design wasn’t just a difficult task that required a methodical approach; it’s a fundamental problem for developers. It’s not a problem specific to usability requirements, and it was never resolved in traditional techniques.

The mass of documentation obscured the fact that crucial design decisions weren’t flowing predictably and objectively from the requirements, but were made intuitively by the developers – people who by temperament and training were polar opposites of typical users.

Why UX didn’t get software development

Not surprisingly, given the massive documentation overhead of traditional techniques, and developers’ propensity to pragmatically tailor and trim formal methods, the full process was seldom followed. What actually happened was more informal and opaque to outsiders.

The UX profession understandably had great difficulty working out what was happening. Sadly they didn’t even realise that they didn’t get it. They were hampered by their naivety, their misplaced sense of the importance of UX and their counter-productive instinct to retain their independence from developers.

If developers had traditionally viewed functionality as a matter of what the organisation required, UX went to the other extreme and saw functionality as being about the individual user. Developers ignored the human aspect, but UX ignored commercial realities – always a fast track to irrelevance.

UX took software engineering at face value, tailoring techniques to fit what they thought should be happening rather than the reality. They blithely accepted the damaging concept that usability was all about the interface; that the interface was separate from the application.

This separability concept was flawed on three grounds. Conceptually it was wrong. It ignored the fact that the user experience depends on how the whole application works, not just the interface.

Architecturally it was wrong. Detailed technical design decisions can have a huge impact on the users. Finally separability was wrong organisationally. It left the UX profession stranded on the margins, in a ghetto, available for consultation at the end of the project, and then ignored when their findings were inconvenient.

An astonishing amount of research and effort went into justifying this fallacy, but the real reason UX bought the idea was that it seemed to liberate them from software engineering. Developers could work on the boring guts of the application while UX designed a nice interface that could be bolted on at the end, ready for testing. However, this illusory freedom actually meant isolation and impotence.

The fallacy of separability encouraged reliance on usability testing at the end of the project, on summative testing to reveal defects that wouldn’t be fixed, rather than formative testing to stop these defects being built in the first place.

There’s an argument that there’s no such thing as effective usability testing. If it takes place at the end it’s too late to be effective. If it’s effective then it’s done iteratively during design, and it’s just good design rather than testing.

So UX must be hooked into the development process. It must take place early enough to allow alternative designs to be evaluated. Users must therefore be involved early and often. Many people in UX accept this completely, though the separability fallacy is still alive and well. However, its days are surely numbered. I believe, and hope, that the Agile movement will finally kill it.

Agile and UX – the perfect marriage?

The mutual attraction of Agile and UX isn’t simply a case of “my enemy’s enemy is my friend”. Certainly they do have a common enemy in the Waterfall, but each discipline really does need the other.

Agile gives UX the chance to hook into development, at the points where it needs to be involved to be effective. Sure, with the Waterfall it is possible to smuggle effective UX techniques into a project, but they go against the grain. It takes strong, clear-sighted project managers and test managers to make them work. The schedule and political pressures on these managers to stop wasting time iterating and to sign off each stage is huge.

If UX needs Agile, then equally Agile needs UX if it is to deliver high quality applications. The opening principle of the Agile Manifesto states that “our highest priority is to satisfy the customer through early and continuous delivery of valuable software”.

There is nothing in Agile that necessarily guarantees better usability, but if practitioners believe in that principle then they have to take UX seriously and use UX professionals to interpret users’ real needs and desires. This is particularly important with web applications when developers don’t have direct access to the end users.

There was considerable mutual suspicion between the two communities when Agile first appeared. Agile was wary of UX’s detailed analyses of the users, and suspected that this was a Waterfall style big, up-front requirements phase.

UX meanwhile saw Agile as a technical solution to a social, organisational problem and was rightly sceptical of claims that manual testing would be a thing of the past. Automated testing of the human interaction with the application is a contradiction.

Both sides have taken note of these criticisms. Many in UX now see the value in speeding up the user analysis and focussing on the most important user groups, and Agile has recognised the value of up-front analysis to help them understand the users.

The Agile community is also taking a more rounded view of testing, and how UX can fit into the development process. In particular check out Brian Marick’s four Agile testing quadrants.Agile Testing Quadrants

UX straddles quadrants two and three. Q2 contains the up-front analysis to shape the product, and Q3 contains the evaluation of the results. Both require manual testing, and use such classic UX tools as personas (fictional representative users) and wireframes (sketches of webpages).

Other people who’re working on the boundary of Agile and UX are Jeff Patton, Jared Spool and Anders Ramsay. They’ve come up with some great ideas, and it’s well worth checking them out. Microsoft have also developed an interesting technique called the RITE method.

This is an exciting field. Agile changes the role of testers and requires them to learn new skills. The integration of UX into Agile will make testing even more varied and interesting.

There will still be tension between UX and software professionals who’ve been used to working remotely from the users. However, Agile should mean that this is a creative tension, with each group supporting and restraining the others.

This is a great opportunity. Testers will get the chance to help create great applications, rather than great documentation!

Business logic security testing (2009)

Business logic security testing (2009)

Testing ExperienceThis article appeared in the June 2009 edition of Testing Experience magazine and the October 2009 edition of Security Acts magazine.Security Acts magazine

If you choose to read the article please bear in mind that it was written in January 2009 and is therefore inevitably dated in some respects. In particular, ISACA has restructured COBIT, but it remains a useful source. Overall I think the arguments I made in this article are still valid.

The references in the article were all structured for a paper magazine. They were not set up as hyperlinks and I have not tried to recreate them and check out whether they still work.business logic security testing article

The article

When I started in IT in the 80s the company for which I worked had a closed network restricted to about 100 company locations with no external connections.

Security was divided neatly into physical security, concerned with the protection of the physical assets, and logical security, concerned with the protection of data and applications from abuse or loss.

When applications were built the focus of security was on internal application security. The arrangements for physical security were a given, and didn’t affect individual applications.

There were no outsiders to worry about who might gain access, and so long as the common access controls software was working there was no need for analysts or designers to worry about unauthorized internal access.

Security for the developers was therefore a matter of ensuring that the application reflected the rules of the business; rules such as segregation of responsibilities, appropriate authorization levels, dual authorization of high value payments, reconciliation of financial data.

The world quickly changed and relatively simple, private networks isolated from the rest of the world gave way to more open networks with multiple external connections and to web applications.

Security consequently acquired much greater focus. However, it began to seem increasingly detached from the work of developers. Security management and testing became specialisms in their own right, and not just an aspect of technical management and support.

We developers and testers continued to build our applications, comforted by the thought that the technical security experts were ensuring that the network perimeter was secure.photo of business logic security article header

Nominally security testing was a part of non-functional testing. In reality, it had become somewhat detached from conventional testing.

According to the glossary of the British Computer Society’s Special Interest Group in Software Testing (BCS SIGIST) [1], security testing is determining whether the application meets the specified security requirements.

SIGIST also says that security entails the preservation of confidentiality, integrity and availability of information. Availability means ensuring that authorized users have access to information and associated assets when required. Integrity means safeguarding the accuracy and completeness of information and processing methods. Confidentiality means ensuring that information is accessible only to those authorized to have access.

Penetration testing, and testing the security of the network and infrastructure, are all obviously important, but if you look at security in the round, bearing in mind wider definitions of security (such as SIGIST’s), then these activities can’t be the whole of security testing.

Some security testing has to consist of routine functional testing that is purely a matter of how the internals of the application work. Security testing that is considered and managed as an exercise external to the development, an exercise that follows the main testing, is necessarily limited. It cannot detect defects that are within the application rather than on the boundary.

Within the application, insecure design features or insecure coding might be detected without any deep understanding of the application’s business role. However, like any class of requirements, security requirements will vary from one application to another, depending on the job the application has to do.

If there are control failures that reflect poorly applied or misunderstood business logic, or business rules, then will we as functional testers detect that? Testers test at the boundaries. Usually we think in terms of boundary values for the data, the boundary of the application or the network boundary with the outside world.

Do we pay enough attention to the boundary of what is permissible user behavior? Do we worry enough about abuse by authorized users, employees or outsiders who have passed legitimately through the network and attempt to subvert the application, using it in ways never envisaged by the developers?

I suspect that we do not, and this must be a matter for concern. A Gartner report of 2005 [2] claimed that 75% of attacks are at the application level, not the network level. The types of threats listed in the report all arise from technical vulnerabilities, such as command injection and buffer overflows.

Such application layer vulnerabilities are obviously serious, and must be addressed. However, I suspect too much attention has been given to them at the expense of vulnerabilities arising from failure to implement business logic correctly.

This is my main concern in this article. Such failures can offer great scope for abuse and fraud. Security testing has to be about both the technology and the business.

Problem of fraud and insider abuse

It is difficult to come up with reliable figures about fraud because of its very nature. According to PriceWaterhouseCoopers in 2007 [3] the average loss to fraud by companies worldwide over the two years from 2005 was $2.4 million (their survey being biased towards larger companies). This is based on reported fraud, and PWC increased the figure to $3.2 million to allow for unreported frauds.

In addition to the direct costs there were average indirect costs in the form of management time of $550,000 and substantial unquantifiable costs in terms of damage to the brand, staff morale, reduced share prices and problems with regulators.

PWC stated that 76% of their respondents reported the involvement of an outside party, implying that 24% were purely internal. However, when companies were asked for details on one or two frauds, half of the perpetrators were internal and half external.

It would be interesting to know the relative proportions of frauds (by number and value) which exploited internal applications and customer facing web applications but I have not seen any statistics for these.

The U.S. Secret Service and CERT Coordination Center have produced an interesting series of reports on “illicit cyber activity”. In their 2004 report on crimes in the US banking and finance sector [4] they reported that in 70% of the cases the insiders had exploited weaknesses in applications, processes or procedures (such as authorized overrides). 78% of the time the perpetrators were authorized users with active accounts, and in 43% of cases they were using their own account and password.

The enduring problem with fraud statistics is that many frauds are not reported, and many more are not even detected. A successful fraud may run for many years without being detected, and may never be detected. A shrewd fraudster will not steal enough money in one go to draw attention to the loss.

I worked on the investigation of an internal fraud at a UK insurance company that had lasted 8 years, as far back as we were able to analyze the data and produce evidence for the police. The perpetrator had raised 555 fraudulent payments, all for less than £5,000 and had stolen £1.1 million pounds by the time that we received an anonymous tip off.

The control weaknesses related to an abuse of the authorization process, and a failure of the application to deal appropriately with third party claims payments, which were extremely vulnerable to fraud. These weaknesses would have been present in the original manual process, but the users and developers had not taken the opportunities that a new computer application had offered to introduce more sophisticated controls.

No-one had been negligent or even careless in the design of the application and the surrounding procedures. The trouble was that the requirements had focused on the positive functions of the application, and on replicating the functionality of the previous application, which in turn had been based on the original manual process. There had not been sufficient analysis of how the application could be exploited.

Problem of requirements and negative requirements

Earlier I was careful to talk about failure to implement business logic correctly, rather than implementing requirements. Business logic and requirements will not necessarily be the same.

The requirements are usually written as “the application must do” rather than “the application must not…”. Sometimes the “must not” is obvious to the business. It “goes without saying” – that dangerous phrase!

However, the developers often lack the deep understanding of business logic that users have, and they design and code only the “must do”, not even being aware of the implicit corollary, the “must not”.

As a computer auditor I reviewed a sales application which had a control to ensure that debts couldn’t be written off without review by a manager. At the end of each day a report was run to highlight debts that had been cleared without a payment being received. Any discrepancies were highlighted for management action.

I noticed that it was possible to overwrite the default of today’s date when clearing a debt. Inserting a date in the past meant that the money I’d written off wouldn’t appear on any control report. The report for that date had been run already.

When I mentioned this to the users and the teams who built and tested the application the initial reaction was “but you’re not supposed to do that”, and then they all tried blaming each other. There was a prolonged discussion about the nature of requirements.

The developers were adamant that they’d done nothing wrong because they’d built the application exactly as specified, and the users were responsible for the requirements.

The testers said they’d tested according to the requirements, and it wasn’t their fault.

The users were infuriated at the suggestion that they should have to specify every last little thing that should be obvious – obvious to them anyway.

The reason I was looking at the application, and looking for that particular problem, was because we knew that a close commercial rival had suffered a large fraud when a customer we had in common had bribed an employee of our rival to manipulate the sales control application. As it happened there was no evidence that the same had happened to us, but clearly we were vulnerable.

Testers should be aware of missing or unspoken requirements, implicit assumptions that have to be challenged and tested. Such assumptions and requirements are a particular problem with security requirements, which is why the simple SIGIST definition of security testing I gave above isn’t sufficient – security testing cannot be only about testing the formal security requirements.

However, testers, like developers, are working to tight schedules and budgets. We’re always up against the clock. Often there is barely enough time to carry out all the positive testing that is required, never mind thinking through all the negative testing that would be required to prove that missing or unspoken negative requirements have been met.

Fraudsters, on the other hand, have almost unlimited time to get to know the application and see where the weaknesses are. Dishonest users also have the motivation to work out the weaknesses. Even people who are usually honest can be tempted when they realize that there is scope for fraud.

If we don’t have enough time to do adequate negative testing to see what weaknesses could be exploited than at least we should be doing a quick informal evaluation of the financial sensitivity of the application and alerting management, and the internal computer auditors, that there is an element of unquantifiable risk. How comfortable are they with that?

If we can persuade project managers and users that we need enough time to test properly, then what can we do?

CobiT and OWASP

If there is time, there are various techniques that testers can adopt to try and detect potential weaknesses or which we can encourage the developers and users to follow to prevent such weaknesses.

I’d like to concentrate on the CobiT (Control Objectives for Information and related Technology) guidelines for developing and testing secure applications (CobiT 4.1 2007 [5]), and the CobiT IT Assurance Guide [6], and the OWASP (Open Web Application Security Project) Testing Guide [7].

Together, CobiT and OWASP cover the whole range of security testing. They can be used together, CobiT being more concerned with what applications do, and OWASP with how applications work.

They both give useful advice about the internal application controls and functionality that developers and users can follow. They can also be used to provide testers with guidance about test conditions. If the developers and users know that the testers will be consulting these guides then they have an incentive to ensure that the requirements and build reflect this advice.

CobiT implicitly assumes a traditional, big up-front design, Waterfall approach. Nevertheless, it’s still potentially useful for Agile practitioners, and it is possible to map from CobiT to Agile techniques, see Gupta [8].

The two most relevant parts are in the CobiT IT Assurance Guide [6]. This is organized into domains, the most directly relevant being “Acquire and Implement” the solution. This is really for auditors, guiding them through a traditional development, explaining the controls and checks they should be looking for at each stage.

It’s interesting as a source of ideas, and as an alternative way of looking at the development, but unless your organization has mandated the developers to follow CobiT there’s no point trying to graft this onto your project.

Of much greater interest are the six CobiT application controls. Whereas the domains are functionally separate and sequential activities, a life-cycle in effect, the application controls are statements of intent that apply to the business area and the application itself. They can be used at any stage of the development. They are;

AC1 Source Data Preparation and Authorization

AC2 Source Data Collection and Entry

AC3 Accuracy, Completeness and Authenticity Checks

AC4 Processing Integrity and Validity

AC5 Output Review, Reconciliation and Error Handling

AC6 Transaction Authentication and Integrity

Each of these controls has stated objectives, and tests that can be made against the requirements, the proposed design and then on the built application. Clearly these are generic statements potentially applicable to any application, but they can serve as a valuable prompt to testers who are willing to adapt them to their own application. They are also a useful introduction for testers to the wider field of business controls.

CobiT rather skates over the question of how the business requirements are defined, but these application controls can serve as a useful basis for validating the requirements.

Unfortunately the CobiT IT Assurance Guide can be downloaded for free only by members of ISACA (Information Systems Audit and Control Association) and costs $165 for non-members to buy. Try your friendly neighborhood Internal Audit department! If they don’t have a copy, well maybe they should.

If you are looking for a more constructive and proactive approach to the requirements then I recommend the Open Web Application Security Project (OWASP) Testing Guide [7]. This is an excellent, accessible document covering the whole range of application security, both technical vulnerabilities and business logic flaws.

It offers good, practical guidance to testers. It also offers a testing framework that is basic, and all the better for that, being simple and practical.

The OWASP testing framework demands early involvement of the testers, and runs from before the start of the project to reviews and testing of live applications.

Phase 1: Before Deployment begins

1A: Review policies and standards

1B: Develop measurement and metrics criteria (ensure traceability)

Phase 2: During definition and design

2A: Review security requirements

2B: Review design and architecture

2C: Create and review UML models

2D: Create and review threat models

Phase 3: During development

3A: Code walkthroughs

3B: Code reviews

Phase 4: During development

4A: Application penetration testing

4B: Configuration management testing

Phase 5: Maintenance and operations

5A: Conduct operational management reviews

5B: Conduct periodic health checks

5C: Ensure change verification

OWASP suggests four test techniques for security testing; manual inspections and reviews, code reviews, threat modeling and penetration testing. The manual inspections are reviews of design, processes, policies, documentation and even interviewing people; everything except the source code, which is covered by the code reviews.

A feature of OWASP I find particularly interesting is its fairly explicit admission that the security requirements may be missing or inadequate. This is unquestionably a realistic approach, but usually testing models blithely assume that the requirements need tweaking at most.

The response of OWASP is to carry out what looks rather like reverse engineering of the design into the requirements. After the design has been completed testers should perform UML modeling to derive use cases that “describe how the application works.

In some cases, these may already be available”. Obviously in many cases these will not be available, but the clear implication is that even if they are available they are unlikely to offer enough information to carry out threat modeling.OWASP threat modelling UML
The feature most likely to be missing is the misuse case. These are the dark side of use cases! As envisaged by OWASP the misuse cases shadow the use cases, threatening them, then being mitigated by subsequent use cases.

The OWASP framework is not designed to be a checklist, to be followed blindly. The important point about using UML is that it permits the tester to decompose and understand the proposed application to the level of detail required for threat modeling, but also with the perspective that threat modeling requires; i.e. what can go wrong? what must we prevent? what could the bad guys get up to?

UML is simply a means to that end, and was probably chosen largely because that is what most developers are likely to be familiar with, and therefore UML diagrams are more likely to be available than other forms of documentation. There was certainly some debate in the OWASP community about what the best means of decomposition might be.

Personally, I have found IDEF0 a valuable means of decomposing applications while working as a computer auditor. Full details of this technique can be found at http://www.idef.com [9].

It entails decomposing an application using a hierarchical series of diagrams, each of which has between three and six functions. Each function has inputs, which are transformed into outputs, depending on controls and mechanisms.IDEF0
Is IDEF0 as rigorous and effective as UML? No, I wouldn’t argue that. When using IDEF0 we did not define the application in anything like the detail that UML would entail. Its value was in allowing us to develop a quick understanding of the crucial functions and issues, and then ask pertinent questions.

Given that certain inputs must be transformed into certain outputs, what are the controls and mechanisms required to ensure that the right outputs are produced?

In working out what the controls were, or ought to be, we’d run through the mantra that the output had to be accurate, complete, authorized, and timely. “Accurate” and “complete” are obvious. “Authorized” meant that the output must have been created or approved by people with the appropriate level of authority. “Timely” meant that the output must not only arrive in the right place, but at the right time. One could also use the six CobiT application controls as prompts.

In the example I gave above of the debt being written off I had worked down to the level of detail of “write off a debt” and looked at the controls required to produce the right output, “cancelled debts”. I focused on “authorized”, “complete” and “timely”.

Any sales operator could cancel a debt, but that raised the item for management review. That was fine. The problem was with “complete” and “timely”. All write-offs had to be collected for the control report, which was run daily. Was it possible to ensure some write-offs would not appear? Was it possible to over-key the default of the current date? It was possible. If I did so, would the write-off appear on another report? No. The control failure therefore meant that the control report could be easily bypassed.

The testing that I was carrying out had nothing to do with the original requirements. They were of interest, but not really relevant to what I was trying to do. I was trying to think like a dishonest employee, looking for a weakness I could exploit.

The decomposition of the application is the essential first step of threat modeling. Following that, one should analyze the assets for importance, explore possible vulnerabilities and threats, and create mitigation strategies.

I don’t want to discuss these in depth. There is plenty of material about threat modeling available. OWASP offers good guidance, [10] and [11]. Microsoft provides some useful advice [12], but its focus is on technical security, whereas OWASP looks at the business logic too. The OWASP testing guide [7] has a section devoted to business logic that serves as a useful introduction.

OWASP’s inclusion of mitigation strategies in the version of threat modeling that it advocates for testers is interesting. This is not normally a tester’s responsibility. However, considering such strategies is a useful way of planning the testing. What controls or protections should we be testing for? I think it also implicitly acknowledges that the requirements and design may well be flawed, and that threat modeling might not have been carried out in circumstances where it really should have been.

This perception is reinforced by OWASP’s advice that testers should ensure that threat models are created as early as possible in the project, and should then be revisited as the application evolves.

What I think is particularly valuable about the application control advice in CobIT and OWASP is that they help us to focus on security as an attribute that can, and must, be built into applications. Security testing then becomes a normal part of functional testing, as well as a specialist technical exercise. Testers must not regard security as an audit concern, with the testing being carried out by quasi-auditors, external to the development.

Getting the auditors on our side

I’ve had a fairly unusual career in that I’ve spent several years in each of software development, IT audit, IT security management, project management and test management. I think that gives me a good understanding of each of these roles, and a sympathetic understanding of the problems and pressures associated with them. It’s also taught me how they can work together constructively.

In most cases this is obvious, but the odd one out is the IT auditor. They have the reputation of being the hard-nosed suits from head office who come in to bayonet the wounded after a disaster! If that is what they do then they are being unprofessional and irresponsible. Good auditors should be pro-active and constructive. They will be happy to work with developers, users and testers to help them anticipate and prevent problems.

Auditors will not do your job for you, and they will rarely be able to give you all the answers. They usually have to spread themselves thinly across an organization, inevitably concentrating on the areas with problems and which pose the greatest risk.

They should not be dictating the controls, but good auditors can provide useful advice. They can act as a valuable sounding board, for bouncing ideas off. They can also be used as reinforcements if the testers are coming under irresponsible pressure to restrict the scope of security testing. Good auditors should be the friend of testers, not our enemy. At least you may be able to get access to some useful, but expensive, CobiT material.

Auditors can give you a different perspective and help you ask the right questions, and being able to ask the right questions is much more important than any particular tool or method for testers.

This article tells you something about CobiT and OWASP, and about possible new techniques for approaching testing of security. However, I think the most important lesson is that security testing cannot be a completely separate specialism, and that security testing must also include the exploration of the application’s functionality in a skeptical and inquisitive manner, asking the right questions.

Validating the security requirements is important, but so is exposing the unspoken requirements and disproving the invalid assumptions. It is about letting management see what the true state of the application is – just like the rest of testing.

References

[1] British Computer Society’s Special Interest Group in Software Testing (BCS SIGIST) Glossary.

[2] Gartner Inc. “Now Is the Time for Security at the Application Level” (NB PDF download), 2005.

[3] PriceWaterhouseCoopers. “Economic crime- people, culture and controls. The 4th biennial Global Economic Crime Survey”.

[4] US Secret Service. “Insider Threat Study: Illicit Cyber Activity in the Banking and Finance Sector”.

[5] IT Governance Institute. CobiT 4.1, 2007.

[6] IT Governance Institute. CobiT IT Assurance Guide (not free), 2007.

[7] Open Web Application Security Project. OWASP Testing Guide, V3.0, 2008.

[8] Gupta, S. “SOX Compliant Agile Processes”, Agile Alliance Conference, Agile 2008.

[9] IDEF0 Function Modeling Method.

[10] Open Web Application Security Project. OWASP Threat Modeling, 2007.

[11] Open Web Application Security Project. OWASP Code Review Guide “Application Threat Modeling”, 2009.

[12] Microsoft. “Improving Web Application Security: Threats and Countermeasures”, 2003.

Do standards keep testers in the kindergarten? (2009)

Do standards keep testers in the kindergarten? (2009)

Testing ExperienceThis article appeared in the December 2009 edition of Testing Experience magazine, which is no longer published. I’m moving the article onto my blog from my website, which will be decommissioned soon.

Normally when I re-post old articles I provide a warning about them being dated. This one was written in November 2009 but I think that its arguments are still valid. It is only dated in the sense that it doesn’t mention ISO 29119, the current ISO software testing standard, which was released in 2013. This article shows why I was dismayed when ISO 29119 arrived on the scene. I thought that prescriptive testing standards, such as IEEE 829, had had their day. They had failed and we had moved on.

The references in the article were all structured for a paper magazine. There are no hyperlinks and I have not tried to recreate them and check out whether they still work.
kindergarten

The article

Discussion of standards usually starts from the premise that they are intrinsically a good thing, and the debate then moves on to consider what form they should take and how detailed they should be.

Too often sceptics are marginalised. The presumption is that standards are good and beneficial. Those who are opposed to them appear suspect, even unprofessional.

I believe that although the content of standards for software development and testing can be valuable, especially within individual organisations, I do not believe that they should be regarded as generic “standards” for the whole profession. Turning useful guidelines into standards suggests that they should be mandatory.

My particular concern is that the IEEE 829 “Standard for Software and System Test Documentation”, and the many document templates derived from it, encourage a safety first approach to documentation, with testers documenting plans and scripts in slavish detail.

They do so not because the project genuinely requires it, but because they have been encouraged to equate documentation with quality, and they fear that they will look unprofessional and irresponsible in a subsequent review or audit. I think these fears are ungrounded and I will explain why.

A sensible debate about the value of standards must start with a look at what standards are, and the benefits that they bring in general, and specifically to testing.

Often discussion becomes confused because justification for applying standards in one context is transferred to a quite different context without any acknowledgement that the standards and the justification may no longer be relevant in the new context.

Standards can be internal to a particular organisation or they can be external standards attempting to introduce consistency across an industry, country or throughout the world.

I’m not going to discuss legal requirements enforcing minimum standards of safety, such as Health and Safety legislation, or the requirements of the US Food & Drug Administration. That’s the the law, and it’s not negotiable.

The justification for technical and product standards is clear. Technical standards introduce consistency, common protocols and terminology. They allow people, services and technology to be connected. Product standards protect consumers and make it easier for them to distinguish cheap, poor quality goods from more expensive but better quality competition.

Standards therefore bring information and mobility to the market and thus have huge economic benefits.

It is difficult to see where standards for software development or testing fit into this. To a limited extent they are technical standards, but only so far as they define the terminology, and that is a somewhat incidental role.

They appear superficially similar to product standards, but software development is not a manufacturing process, and buyers of applications are not in the same position as consumers choosing between rival, inter-changeable products.

Are software development standards more like the standards issued by professional bodies? Again, there’s a superficial resemblance. However, standards such as Generally Accepted Accounting Principles (Generally Accepted Accounting Practice in the UK) are backed up by company law and have a force no-one could dream of applying to software development.

Similarly, standards of professional practice and competence in the professions are strictly enforced and failure to meet these standards is punished.

Where does that leave software development standards? I do believe that they are valuable, but not as standards.

Susan Land gave a good definition and justification for standards in the context of software engineering in her book “Jumpstart CMM-CMMI Software Process Improvements – using IEEE software engineering standards”. [1]

“Standards are consensus-based documents that codify best practice. Consensus-based standards have seven essential attributes that aid in process engineering. They;

  1. Represent the collected experience of others who have been down the same road.
  2. Tell in detail what it means to perform a certain activity.
  3. Help to assure that two parties attach the same meaning to an engineering activity.
  4. Can be attached to or referenced by contracts.
  5. Improve the product.
  6. Protect the business and the buyer.
  7. Increase professional discipline.” (List sequence re-ordered from original).

The first four justifications are for standards in a descriptive form, to aid communication. Standards of this type would have a broader remit than the technical standards I referred to, and they would be guidelines rather than prescriptive. These justifications are not controversial, although the fourth has interesting implications that I will return to later.

The last three justifications hint at compulsion. These are valid justifications, but they are for standards in a prescriptive form and I believe that these justifications should be heavily qualified in the context of testing.

I believe that where testing standards have value they should be advisory, and that the word “standard” is unhelpful. “Standards” implies that they should be mandatory, or that they should at least be considered a level of best practice to which all practitioners should aspire.

Is the idea of “best practice” useful?

I don’t believe that software development standards, specifically the IEEE series, should be mandatory, or that they can be considered best practice. Their value is as guidelines, which would be a far more accurate and constructive term for them.

I do believe that there is a role for mandatory standards in software development. The time-wasting shambles that is created if people don’t follow file naming conventions is just one example. Secure coding standards that tell programmers about security flaws that they must not introduce into their programs are also a good example of standards that should be mandatory.

However, these are local, site-specific standards. They are about consistency, security and good housekeeping, rather than attempting to define an over-arching vision of “best practice”.

Testing standards should be treated as guidelines, practices that experienced practitioners would regard as generally sound and which should be understood and regarded as the default approach by inexperienced staff.

Making these practices mandatory “standards”, as if they were akin to technical or product standards and the best approach in any situation, will never ensure that experienced staff do a better job, and will often ensure they do a worse job than if they’d been allowed to use their own judgement.

Testing consultant Ben Simo, has clear views on the notion of best practice. He told me;

“‘Best’ only has meaning in context. And even in a narrow context, what we think is best now may not really be the best.

In practice, ‘best practice’ often seems to be either something that once worked somewhere else, or a technical process required to make a computer system do a task. I like for words to mean something. If it isn’t really best, let’s not call it best.

In my experience, things called best practices are falsifiable as not being best, or even good, in common contexts. I like guidelines that help people do their work. The word ‘guideline’ doesn’t imply a command. Guidelines can help set some parameters around what and how to do work and still give the worker the freedom to deviate from the guidelines when it makes sense.”

Rather than tie people’s hands and minds with standards and best practices, I like to use guidelines that help people think and communicate lessons learned – allowing the more experienced to share some of their wisdom with the novices.”

Such views cannot be dismissed as the musings of maverick testers who can’t abide the discipline and order that professional software development and testing require.

Ben is the President of the Association of Software Testing. His comments will be supported by many testers who see how it matches their own experience. Also, there has been some interesting academic work that justify such scepticism about standards. Interestingly, it has not come from orthodox IT academics.

Lloyd Roden drew on the work of the Dreyfus brothers as he presented a powerful argument against the idea of “best practice” at Starwest 2009 and the TestNet Najaarsevent. Hubert Dreyfus is a philosopher and psychologist and Stuart Dreyfus works in the fields of industrial engineering and artificial intelligence.

In 1980 they wrote an influential paper that described how people pass through five levels of competence as they move from novice to expert status, and analysed how rules and guidelines helped them along the way. The five level of the Dreyfus Model of Skills Acquisition can be summarised as follows.

  1. Novices require rules that can be applied in narrowly defined situations, free of the wider context.
  2. Advanced beginners can work with guidelines that are less rigid than the rules that novices require.
  3. Competent practitioners understand the plan and goals, and can evaluate alternative ways to reach the goal.
  4. Proficient practitioners have sufficient experience to foresee the likely result of different approaches and can predict what is likely to be the best outcome.
  5. Experts can intuitively see the best approach. Their vast experience and skill mean that rules and guidelines have no practical value.

For novices the context of the problem presents potentially confusing complications. Rules provide clarity. For experts, understanding the context is crucial and rules are at best an irrelevant hindrance.

Roden argued that we should challenge any references to “best practices”. We should talk about good practices instead, and know when and when not to apply them. He argued that imposing “best practice” on experienced professionals stifles creativity, frustrates the best people and can prompt them to leave.

However, the problem is not simply a matter of “rules for beginners, no rules for experts”. Rules can have unintended consequences, even for beginners.

Chris Atherton, a senior lecturer in psychology at the University of Central Lancashire, made an interesting point in a general, anecdotal discussion about the ways in which learners relate to rules.

“The trouble with rules is that people cling to them for reassurance, and what was originally intended as a guideline quickly becomes a noose.

The issue of rules being constrictive or restrictive to experienced professionals is a really interesting one, because I also see it at the opposite end of the scale, among beginners.”

Obviously the key difference is that beginners do need some kind of structural scaffold or support; but I think we often fail to acknowledge that the nature of that early support can seriously constrain the possibilities apparent to a beginner, and restrict their later development.”

The issue of whether rules can hinder the development of beginners has significant implications for the way our profession structures its processes. Looking back at work I did at the turn of the decade improving testing processes for an organisation that was aiming for CMMI level 3, I worry about the effect it had.

Independent professional testing was a novelty for this client and the testers were very inexperienced. We did the job to the best of our ability at the time, and our processes were certainly considered best practice by my employers and the client.

The trouble is that people can learn, change and grow faster than strict processes adapt. A year later and I’d have done it better. Two years later, it would have been different and better, and so on.

Meanwhile, the testers would have been gaining in experience and confidence, but the processes I left behind were set in tablets of stone.

As Ben Simo put it; “if an organisation is at a level less than the intent of level 5, CMM seems to often lock in ignorance that existed when the process was created”.

CMMI has its merits but also has dangers. Continuous process improvement is at its heart, but these are incremental advances and refinements in response to analysis of metrics.

Step changes or significant changes in response to a new problem don’t fit comfortably with that approach. Beginners advance from the first stage of the Dreyfus Model, but the context they come to know and accept is one of rigid processes and rules.

Rules, mandatory standards and inflexible processes can hinder the development of beginners. Rigid standards don’t promote quality. They can have the opposite effect if they keep testers in the kindergarten.

IEEE829 and the motivation behind documentation

One could argue that standards do not have to be mandatory. Software developers are pragmatic, and understand when standards should be mandatory and when they should be discretionary. That is true, but the problem is that the word “standards” strongly implies compulsion. That is the interpretation that most outsiders would place on the word.

People do act on the assumption that the standard should be mandatory, and then regard non-compliance as a failure, deviation or problem. These people include accountants and lawyers, and perhaps most significantly, auditors.

My particular concern is the effect of IEEE 829 testing documentation standard. I wonder if much more than 1% of testers have ever seen a copy of the standard. However, much of its content is very familiar, and its influence is pervasive.

IEEE 829 is a good document with much valuable material in it. It has excellent templates, which provide great examples of how to meticulously document a project.

Or at least they’re great examples of meticulous documentation if that is the right approach for the project. That of course is the question that has to be asked. What is the right approach? Too often the existence of a detailed documentation standard is taken as sufficient justification for detailed documentation.

I’m going to run through two objections to detailed documentation. They are related, but one refers to design and the other to testing. It could be argued that both have their roots in psychology as much as IT.

I believe that the fixation of many projects on documentation, and the highly dubious assumption that quality and planning are synonymous with detailed documentation, have their roots in the structured methods that dominated software development for so long.

These methods were built on the assumption that software development was an engineering discipline, rather than a creative process, and that greater quality and certainty in the development process could be achieved only through engineering style rigour and structure.

Paul Ward, one of the leading developers of structured methods, wrote a series of articles [2] on the history of structured methods, which admitted that they were neither based on empirical research nor subjected to peer-review.

Two other proponents of structured methods, Larry Constantine and Ed Yourdon, admitted that the early investigations were no more than informal “noon-hour” critiques” [3].

Fitzgerald, Russo and Stolterman gave a brief history of structured methods in their book “Information Systems Development – Methods in Action” [4] and concluded that “the authors relied on intuition rather than real-world experience that the techniques would work”.

One of the main problem areas for structured methods was the leap from the requirements to the design. Fitzgerald et al wrote that “the creation of hierarchical structure charts from data flow diagrams is poorly defined, thus causing the design to be loosely coupled to the results of the analysis. Coad & Yourdon [5] label this shift as a ‘Grand Canyon’ due to its fundamental discontinuity.”

The solution to this discontinuity, according to the advocates of structured methods, was an avalanche of documentation to help analysts to crawl carefully from the current physical system, through the current logical system to a future logical system and finally a future physical system.

Not surprisingly, given the massive documentation overhead, and developers’ propensity to pragmatically tailor and trim formal methods, this full process was seldom followed. What was actually done was more informal, intuitive, and opaque to outsiders.

An interesting strand of research was pursued by Human Computer Interface academics such as Curtis, Iscoe and Krasner [6], and Robbins, Hilbert and Redmiles [7].

They attempted to identify the mental processes followed by successful software designers when building designs. Their conclusion was that they did so using a high-speed, iterative process; repeatedly building, proving and refining mental simulations of how the system might work.

Unsuccessful designers couldn’t conceive working simulations, and fixed on designs whose effectiveness they couldn’t test till they’d been built.

Curtis et al wrote;

Exceptional designers were extremely familiar with the application domain. Their crucial contribution was their ability to map between the behavior required of the application system and the computational structures that implemented this behavior.

In particular, they envisioned how the design would generate the system behavior customers expected, even under exceptional circumstances.”

Robbins et al stressed the importance of iteration;

“The cognitive theory of reflection-in-action observes that designers of complex systems do not conceive a design fully-formed. Instead, they must construct a partial design, evaluate, reflect on, and revise it, until they are ready to extend it further.”

The eminent US software pioneer Robert Glass discussed these studies in his book “Software Conflict 2.0” [8] and observed that;

“people who are not very good at design … tend to build representations of a design rather than models; they are then unable to perform simulation runs; and the result is they invent and are stuck with inadequate design solutions.”

These studies fatally undermine the argument that linear and documentation driven processes are necessary for a quality product and that more flexible, light-weight documentation approaches are irresponsible.

Flexibility and intuition are vital to developers. Heavyweight documentation can waste time and suffocate staff if used when there is no need.

Ironically, it was the heavyweight approach that was founded on guesswork and intuition, and the lightweight approach that has sound conceptual underpinnings.

The lessons of the HCI academics have obvious implications for exploratory testing, which again is rooted in psychology as much as in IT. In particular, the finding by Curtis et al that “exceptional designers were extremely familiar with the application domain” takes us to the heart of exploratory testing.

What matters is not extensive documentation of test plans and scripts, but deep knowledge of the application. These need not be mutually exclusive, but on high-pressure, time-constrained projects it can be hard to do both.

Itkonen, Mäntylä and Lassenius conducted a fascinating experiment at the University of Helsinki in 2007 in which they tried to compare the effectiveness of exploratory testing and test case based testing. [9]

Their findings were that test case testing was no more effective in finding defects. The defects were a mixture of native defects in the application and defects seeded by the researchers. Defects were categorised according to the ease with which they could be found. Defects were also assigned to one of eight defect types (performance, usability etc.).

Exploratory testing scored better for defects at all four levels of “ease of detection”, and in 6 out of the 8 defect type categories. The differences were not considered statistically significant, but it is interesting that exploratory testing had the slight edge given that conventional wisdom for many years was that heavily documented scripting was essential for effective testing.

However, the really significant finding, which the researchers surprisingly did not make great play of, was that the exploratory testing results were achieved with 18% of the effort of the test case testing.

The exploratory testing required 1.5 hours per tester, and the test case testing required an average of 8.5 hours (7 hours preparation and 1.5 hours testing).

It is possible to criticise the methods of the researchers, particularly their use of students taking a course in software testing, rather than professionals experienced in applying the techniques they were using.

However, exploratory testing has often been presumed to be suitable only for experienced testers, with scripted, test case based testing being more appropriate for the less experienced.

The methods followed by the Helsinki researchers might have been expected to bias the results in favour of test case testing. Therefore, the finding that exploratory testing is at least as effective as test case testing with a fraction of the effort should make proponents of heavily documented test planning pause to reconsider whether it is always appropriate.

Documentation per se does not produce quality. Quality is not necessarily dependent on documentation. Sometimes they can be in conflict.

Firstly, the emphasis on producing the documentation can be a major distraction for test managers. Most of their effort goes into producing, refining and updating plans that often bear little relation to reality.

Meanwhile the team are working hard firming up detailed test cases based on an imperfect and possibly outdated understanding of the application. While the application is undergoing the early stages of testing, with consequent fixes and changes, detailed test plans for the later stages are being built on shifting sand.

You may think that is being too cynical and negative, and that testers will be able to produce useful test cases based on a correct understanding of the system as it is supposed to be delivered to the testing stage in question. However, even if that is so, the Helsinki study shows that this is not a necessary condition for effective testing.

Further, if similar results can be achieved with less than 20% of the effort, how much more could be achieved if the testers were freed from the documentation drudgery in order to carry out more imaginative and proactive testing during the earlier stages of development?

Susan Land’s fourth justification for standards (see start of article) has interesting implications.

Standards “can be attached to or referenced by contracts”. That is certainly true. However, the danger of detailed templates in the form of a standard is that organisations tailor their development practices to the templates rather than the other way round.

If the lawyers fasten onto the standard and write its content into the contract then documentation can become an end and not just a means to an end.

Documentation becomes a “deliverable”. The dreaded phrase “work product” is used, as if the documentation output is a product of similar value to the software.

In truth, sometimes it is more valuable if the payments are staged under the terms of the contract, and dependent on the production of satisfactory documentation.

I have seen triumphant announcements of “success” following approval of “work products” with the consequent release of payment to the supplier when I have known the underlying project to be in a state of chaos.

Formal, traditional methods attempt to represent a highly complex, even chaotic, process in a defined, repeatable model. These methods often bear only vague similarities to what developers have to do to craft applications.

The end product is usually poor quality, late and over budget. Any review of the development will find constant deviations from the mandated method.

The suppliers, and defenders, of the method can then breathe a sigh of relief. The sacred method was not followed. It was the team’s fault. If only they’d done it by the book! The possibility that the developers’ and testers’ apparent sins were the only reason anything was produced at all is never considered.

What about the auditors?

Adopting standards like IEEE 829 without sufficient thought causes real problems. If the standard doesn’t reflect what really has to be done to bring the project to a successful conclusion then mandated tasks or documents may be ignored or skimped on, with the result that a subsequent review or audit reports on a failure to comply.

An alternative danger is that testers do comply when there is no need, and put too much effort into the wrong things. Often testers arrive late on the project. Sometimes the emphasis is on catching up with plans and documentation that are of dubious value, and are not an effective use of the limited resources and time.

However, if the contract requires it, or if there is a fear of the consequences of an audit, then it could be rational to assign valuable staff to unproductive tasks.

Sadly, auditors are often portrayed as corporate bogey-men. It is assumed that they will conduct audits by following ticklists, with simplistic questions that require yes/no answers. “Have you done x to y, yes or no”.

If the auditees start answering “No, but …” they would be cut off with “So, it’s no”.

I have seen that style of auditing. It is unprofessional and organisations that tolerate it have deeper problems than unskilled, poorly trained auditors. It is senior management that creates the environment in which the ticklist approach thrives. However, I don’t believe it is common. Unfortunately people often assume that this style of auditing is the norm.

IT audit is an interesting example of a job that looks extremely easy at first sight, but is actually very difficult when you get into it.

It is very easy for an inexperienced auditor to do what appears to be a decent job. At least it looks competent to everyone except experienced auditors and those who really understand the area under review.

If auditors are to add value they have to be able to use their judgement, and that has to be based on their own skills and experience as well as formal standards.

They have to be able to analyse a situation and evaluate whether the risks have been identified and whether the controls are appropriate to the level of risk.

It is very difficult to find the right line and you need good experienced auditors to do that. I believe that ideally IT auditors should come from an IT background so that they do understand what is going on; poachers turned gamekeepers if you like.

Too often testers assume that they know what auditors expect, and they do not speak directly to the auditors or check exactly what professional auditing consists of.

They assume that auditors expect to see detailed documentation of every stage, without consideration of whether it truly adds value, promotes quality or helps to manage the risk.

Professional auditors take a constructive and pragmatic approach and can help testers. I want to help testers understand that. I used to find it frustrating when I worked as an IT auditor when I found that people had wasted time on unnecessary and unhelpful actions on the assumption that “the auditors require it”.

Kanwal Mookhey, an IT auditor and founder of NII consulting, wrote an interesting article for the Internal Auditor magazine of May 2008 [10] about auditing IT project management.

He described the checking that auditors should carry out at each stage of a project. He made no mention of the need to see documentation of detailed test plans and scripts whereas he did emphasize the need for early testing.

Kanwal told me.

“I would agree that auditors are – or should be – more inclined to see comprehensive testing, rather than comprehensive test documentation.

Documentation of test results is another matter of course. As an auditor, I would be more keen to know that a broad-based testing manual exists, and that for the system in question, key risks and controls identified during the design phase have been tested for. The test results would provide a higher degree of assurance than exhaustive test plans.”

One of the most significant developments in the field of IT governance in the last few decades has been the US 2002 Sarbanes-Oxley Act, which imposed new standards of reporting, auditing and control for US companies. It has had massive worldwide influence because it applies to the foreign subsidiaries of US companies, and foreign companies that are listed on the US stock exchanges.

The act attracted considerable criticism for the additional overheads it imposed on companies, duplicating existing controls and imposing new ones of dubious value.

Unfortunately, the response to Sarbanes-Oxley verged on the hysterical, with companies, and unfortunately some auditors, reading more into the legislation than a calmer reading could justify. The assumption was that every process and activity should be tied down and documented in great detail.

However, not even Sarbanes-Oxley, supposedly the sacred text of extreme documentation, requires detailed documentation of test plans or scripts. That may be how some people misinterpret the act. It is neither mandated by the act nor recommended in the guidance documents issued by the Institute of Internal Auditors [11] and the Information Systems Audit & Control Association [12].

If anyone tries to justify extensive documentation by telling you that “the auditors will expect it”, call their bluff. Go and speak to the auditors. Explain that what you are doing is planned, responsible and will have sufficient documentation of the test results.

Documentation is never required “for the auditors”. If it is required it is because it is needed to manage the project, or it is a requirement of the project that has to be justified like any other requirement. That is certainly true of safety critical applications, or applications related to pharmaceutical development and manufacture. It is not true in all cases.

IEEE 829 and other standards do have value, but in my opinion their value is not as standards! They do contain some good advice and the fruits of vast experience. However, they should be guidelines to help the inexperienced, and memory joggers for the more experienced.

I hope this article has made people think about whether mandatory standards are appropriate for software development and testing, and whether detailed documentation in the style of IEEE 829 is always needed. I hope that I have provided some arguments and evidence that will help testers persuade others of the need to give testers the freedom to leave the kindergarten and grow as professionals.

References

[1] Land, S. (2005). “Jumpstart CMM-CMMI Software Process Improvements – using IEEE software engineering standards”, Wiley.

[2a] Ward, P. (1991). “The evolution of structured analysis: Part 1 – the early years”. American Programmer, vol 4, issue 11, 1991. pp4-16.

[2b] Ward, P. (1992). “The evolution of structured analysis: Part 2 – maturity and its problems”. American Programmer, vol 5, issue 4, 1992. pp18-29.

[2c] Ward, P. (1992). “The evolution of structured analysis: Part 3 – spin offs, mergers and acquisitions”. American Programmer, vol 5, issue 9, 1992. pp41-53.

[3] Yourdon, E., Constantine, L. (1977) “Structured Design”. Yourdon Press, New York.

[4] Fitzgerald B., Russo N., Stolterman, E. (2002). “Information Systems Development – Methods in Action”, McGraw Hill.

[5] Coad, P., Yourdon, E. (1991). “Object-Oriented Analysis”, 2nd edition. Yourdon Press.

[6] Curtis, B., Iscoe, N., Krasner, H. (1988). “A field study of the software design process for large systems” (NB PDF download). Communications of the ACM, Volume 31, Issue 11 (November 1988), pp1268-1287.

[7] Robbins, J., Hilbert, D., Redmiles, D. (1998). “Extending Design Environments to Software Architecture Design” (NB PDF download). Automated Software Engineering, Vol. 5, No. 3, July 1998, pp261-290.

[8] Glass, R. (2006). “Software Conflict 2.0: The Art and Science of Software Engineering” Developer Dot Star Books.

[9a] Itkonen, J., Mäntylä, M., Lassenius C., (2007). “Defect detection efficiency – test case based vs exploratory testing”. First International Symposium on Empirical Software Engineering and Measurement. (Payment required).

[9b] Itkonen, J. (2008). “Do test cases really matter? An experiment comparing test case based and exploratory testing”.

[10] Mookhey, K. (2008). “Auditing IT Project Management”. Internal Auditor, May 2008, the Institute of Internal Auditors.

[11] The Institute of Internal Auditors (2008). “Sarbanes-Oxley Section 404: A Guide for Management by Internal Controls Practitioners”.

[12] Information Systems Audit and Control Association (2006). “IT Control Objectives for Sarbanes-Oxley 2nd Edition”.