The mysterious role of external audit in the Post Office Scandal

The mysterious role of external audit in the Post Office Scandal

Introduction

three-monkeys-1239552

Over the last few years when I have written about then Post Office Scandal I have concentrated on the failings of the Post Office and Fujitsu to manage software development and the IT service responsibly, and the abject failure of the Post Office’s corporate governance. I have made only passing mention of the role of the external auditors for almost the full duration of the scandal. This has been for two linked reasons.

Firstly, the prime culprits are the Post Office and Fujitsu; they have not yet been held accountable in any meaningful sense. The legal and political establishment are also partly responsible, and there has been little public scrutiny of their failings. It is right that campaigners and writers keep the focus on them.

Secondly, casting the external auditors as secondary villains is largely because of their role, and certainly not because of exemplary conduct. Their duty was to assess the truth and fairness of the Post Office’s corporate accounts. Should they have uncovered the problems with Horizon, or at least raised sufficient concern to ensure that these problems would be investigated? I believe the answer is yes, they probably failed. However, the failure is largely a result of the flawed and dated business model for external audit – certainly at the level of large, complex corporations. Many smaller audit firms do a good job. The Big 4 audit firms, i.e. EY, PricewaterhouseCoopers (PwC), KPMG and Deloitte, do not. I had no great wish to dive into the complex swamp of the external audit industry. That time has now come!

Statistical nonsense and the purpose of Horizon

Ernst & Young (EY) are one of the Big 4. They audited the Royal Mail from 1986, throughout the development and implementation of Horizon, and through most of the cover-up. EY continued to audit the Post Office after it split off from Royal Mail in 2012. In 2018 they were finally replaced by PwC, one of their Big 4 rivals.

Even if EY did fail I doubt if any of the Big 4 would have done any better. In this piece I will touch only on those factors relevant to the Post Office Scandal.

I was prompted to address the issue of external audit by an exchange on Twitter. Tim McCormack posted a screenshot of a quote by Warwick Tatford, the prosecuting barrister in Seema Misra’s 2010 trial. He offered this argument in his closing statement [PDF, opens in new tab].

“I conceded in my opening speech to you that no computer system is infallible. There are computer glitches with any system. Of course there are. But Horizon is clearly a robust system, used at the time we are concerned with in 14,000 post offices. Mr Bayfield (a Post Office manager) talked about 14 million transactions a day. It has got to work, has it not, otherwise the whole Post Office would fall apart? So there may be glitches. There may be serious glitches. That is perfectly possible as a theoretical possibility, but as a whole the system works as has been shown in practice.”

I tweeted about this argument, calling it a confused mess. It should have been torn apart by a well briefed defence counsel.

Warwick Tatford was guilty of a version of the Prosecutor’s Fallacy, and it is appalling that it took more than 10 years for the legal establishment to realise that Seema Misra’s prosecution was unsafe.

The odds against winning the UK National Lottery jackpot are about 45 million to 1. Tatford’s logic would mean that if someone claims that the £10 million they’ve suddenly acquired came from a lottery win they are obvious lying, aren’t they? Obviously not. The probability of a particular individual being lucky might be extremely low, but the probability of a winner emerging in a huge population approaches certainty.

The prosecutor’s reasoning was like claiming 10 successive coin tosses producing 10 heads is extremely unlikely (a probability of 1 in 1,024), therefore we can dismiss the possibility that if we get 1,000 people to perform the 10 tosses that it will happen. If a participant does report a run of 10 heads that is, per se, proof that they are a dishonest witness. The odds would actually be almost 2 to 1 on (or 62.4%) that it would happen to someone in that group. Even if Horizon performed reliably at most branches that does not mean courts can dismiss the possibility that there were serious errors at some branches.

The prosecutor was arguing that if Horizon worked “as a whole”, despite bugs, then evidence from the system was sufficiently reliable to convict, even without corroboration. Crucially the prosecutor confused different purposes of Horizon;

A- a source for the corporate accounts,
B- the Post Office managing branches,
C- subpostmasters managing their own branches,
D- a source of criminal evidence.

Bugs could render Horizon hopelessly unfit for purposes C and D, even B, while it remained adequate for A. External auditors were concerned only with Horizon’s reliability for A. Maybe they suspected correctly that it was inadequate for purposes B, C, and D, but errors at branch level were no more significant than rounding errors; they were “immaterial” to external auditors.

The flawed industry model for external audit means that these auditors have an incentive not to see errors they do not have a direct responsiblity to detect and analyse. Such errors were unquestionably within the remit of Post Office Internal Audit, but they slept through the whole scandal. [I wrote about the abysmal failure of Post Office Internal Audit at some length last year in the Digital Evidence and Electronic Signature Law Review.]

An interesting response to my tweet

Twitter threads are often over-simplified and lack nuance. Peter Crowley disagreed with my observations about Horizon’s role in feeding the corporate accounts.

In explaining why I was wrong Peter Crowley offered an argument with which I largely agree, but which does not refute my point.

“No, it’s not adequate for A. A large volume of microfails is a macrofail, full stop. Just because some SPMs are wrongly overpaid, some wrongly underpaid, and the differences net off does NOT make the system adequate, unless you assess the failing and reserve for compensation.”

In the case of Horizon a large number or minor errors may well have amounted to a “macrofail”, a failure of the system to comply with its business purpose of providing data to the corporate accounts. I don’t know. I strongly suspect that Peter Crowley is correct, but I have not seen the matter addressed explicitly so I cannot be certain.

This is not a question to which I have given much attention up till now, for the reasons I gave above, and in my tweets. Even if Horizon was adequate for the Post Office’s published financial accounts that tells us nothing about its suitability for its other vital purposes – the purposes that were being assessed in court.

The prosecution barrister was almost certainly being naive, as his other comments in the trial suggest, rather than attempting to mislead the jury. It is absurd to argue that if a system is fit for one particular purpose it can safely be considered reliable for all purposes. One has to be lamentably ignorant about complex financial systems to utter such nonsense. Sadly such naive ignorance is common.

Nevertheless, Peter Crowley did home in on an important point with significant implications. I did not want to address these in my tweet thread and they deserve a far more considered respones than I can provide on Twitter.

What was the significance of low level errors in Horizon?

Horizon might, maybe, have been okay for producing the corporate accounts if the errors at individual branches did not distort the financial reports issued to investors and the public. That is possible. However, what would be the basis for such confidence if we don’t understand the full impact of the low level errors?

There was no system audit. Post Office Internal Audit was invisible. The only reason to believe in Horizon was the stream of assertions without supporting evidence from the Post Office and Fujitsu – from executives who quite clearly did not want to see problems.

External auditors’ role is to vouch for the overall truth and fairness of the accounts, not guarantee that they are 100% accurate and free of error. They calculate the materiality figure for the audit. This is the threshold beyond which any errors “could reasonably be expected to influence the economic decisions of users taken on the basis of the financial statements” (International Standard on Auditing, UK 320 [PDF, opens in new tab], “Materiality in Planning and Performing an Audit”).

If the materiality figure is £50,000 then any errors at that level and above are material. Any that are below are not material. The threshold is a mixture of the quantitative and qualitative. The auditors have to assess the numerical amount and the possible impact in different contexts.

External auditors cannot check every transaction to satisfy themselves that the accounts give a true and fair picture. They sample their way through the accounts using a chosen sample size, and a sampling interval that will give them the number of items they want. The higher the interval the fewer transactions are inspected.

The size of the sample and the sampling interval depend on the auditors’ assessment of the materiality threshold for the corporation, the risks assessed by the auditors, and the quality of control that is being exercised by management. The greater the confidence external auditors have in management of risks and the control regime the more relaxed they can be about the possibility that their sampling might miss errors. They can then justify a smaller sample that they will check. The less confidence they have the greater the sample size must be, and the more work they have to do.

Once the auditors have assessed risk management and internal controls they perform a simple arithmetical calculation that can be crudely described as the materiality threshold divided by the confidence score in the management regime. This gives the sampling interval, assuming they are sampling based on monetary units rather than physical items or accounts. For example if the interval is £50,000 then they will guarantee to hit and examine every transaction at that level and above. A transaction of £5,000 would have a 10% chance of being selected.

In addition to the overall materiality threshold they have to calculate a further threshold for particular types of transactions, or types of account. This figure, the performance materiality, reflects the possibility of smaller errors in specific contexts accumulating to an error that would exceed the corporate materiality threshold.

This takes us straight back to Peter Crowley’s point about Horizon branch errors undermining confidence that the overall system could be adequate for the corporate financial accounts; “A large volume of microfails is a macrofail, full stop”. In this case that is probably true, although I don’t think the full stop is appropriate. We don’t know, and the external auditors must take the blame for that.

Question time for EY?

There are several awkward questions the external auditors should answer at the Williams Inquiry. There are many valid explanations for why we might not know what is happening in a complex system, but a failure to ask pertinent questions is not a good reason.

Did EY ask the right questions?

So did EY themselves ask challenging, relevant questions and perform appropriate substantive tests to establish whether Horizon’s bugs, which might have been individually immaterial, combined to compromise the integrity of the data provided to the corporate accounts?

What about “performance materiality”?

Did EY assign a lower level of performance materiality to the branch accounts because of the Horizon branch errors? If not, why not? If so, what was that threshold? What was the basis for the calculation? What reason did EY have to believe that the accumulation of immaterial branch errors did not add up to material error at the corporate level?

The forensic accounting firm Second Sight which was commissioned to investigate Horizon reported in 2015 that:

“22.11 … for most of the past five years, substantial credits have been made to Post Office’s Profit and Loss Account as a result of unreconciled balances held by Post Office in its Suspense Account.

22.12. It is, in our view, probable that some of those entries should have been re-credited to branches to offset losses previously charged.”

Were EY aware of the lack of control indicated by a failure to investigate and reconcile suspense account entries? Did they consider whether this had implications for their performance materiality assessment? Poor management of suspense accounts is a classic warning sign of possible misstatement in the accounts and also of fraud.

What about the superusers?

Did the lack of control that EY reported [PDF, opens in new tab] in 2011 over superusers, which had still not been fully addressed by 2015, influence their assessment of the potential for financial misstatement arising from Horizon, and the calculation of performance materiality? If not, why not? These superusers were user accounts with powerful privileges that allowed them to amend production data.

What about the risk of unauthorised and untested changes to Horizon?

EY also reported in 2011 that the Post Office and Fujitsu did not apply normal, basic management controls over the testing and release of system changes.

“We noted that POL (Post Office Limited) are not usually involved in testing fixes or maintenance changes to the in-scope applications; we were unable to identify an internal control with the third party service provider (Fujitsu) to authorize fixes and maintenance changes prior to development for the in-scope applications. There is an increased risk that unauthorised and inappropriate changes are deployed if they are not adequately authorised, tested and approved prior to migration to the production environment.”

How could EY place trust in a system that was being managed in such an amateurish, shambolic manner?

The lack of control over superusers and dreadful management of changes to Horizon were reported in EY’s 2011 management letter setting out “internal control matters” to the Post Office board and senior management. Eleanor Shaikh made a Freedom of Information request for copies of EY’s management letter from preceding years. These would have been extremely interesting, but the Post Office refused to divulge them on the unconvincing grounds that it was a “vexatious request”. “Embarrassing request” might have been more accurate.

Reserving for compensation and the Post Office as a “going concern”

Given the crucial role of Horizon in providing the only evidence for large numbers of prosecutions, and the widespread public concern about the reliability of Horizon and the accuracy of the evidence, did EY consider whether the Royal Mail or Post Office should provide reserves for compensation (as suggested by Peter Crowley)?

Did EY assess whether the possibility of compensation might be a “going concern” issue? The evidence suggests that the external auditors either did not consider this issue at all, or chose to assume, as their successors PwC explicitly did, that even if compensation were to leave the corporation insolvent it could always draw on unlimited government support and therefore would always “be able to meet its liabilities as they fall due” (the usual phrasing to acknowledge that a company is a going concern). As the sole shareholder, for whose benefit external audit was produced, the government had a duty to pay attention to that risk.

If EY had no concerns about the possibility of compensation we are entitled to infer they had full confidence in Horizon. That only raises the question of the basis for that confidence.

The Ismay Report and the lack of any system audit

So what was the basis for EY’s confidence in Horizon? We know from the Post Office’s internal Ismay Report [PDF, opens in new tab] in 2010 that EY had not performed a system audit. This report, entitled “Response to Challenges Regarding Systems Integrity”, makes no mention of any system audit, whether performed by internal or external audit.

“Ernst & Young and Deloittes are both aware of the issue from the media and we have discussed the pros and cons of reports with them. Both would propose significant caveats and would have limits on their ability to stand in court, therefore we have not pursued this further.”

What were the caveats that EY insisted on if they were to conduct a system audit, and what were the limits they put on providing evidence in court?

That quote from the Ismay Report reveals that EY were familiar with the public concern about Horizon and discussed this with the Post Office. The whole report also shows that EY must have known there had been no credible independent system audit. A further quote from Ismay is interesting.

“The external audit that EY perform does include tests of POL’s IT and finance control environment but the audit scope and materiality mean that EY would not give a specific opinion on the systems from this.”

This makes it clear that in 2010 EY were not in any position to offer an opinion on Horizon. The Post Office therefore knew that EY’s audits did not offer an opinion at system level, and EY knew that there had been no appropriate audit at that level. The question that must be asked repeatedly is – what was the basis for everyone’s confidence in Horizon, other than wishful thinking?

An embarrassing public intervention from the accountancy profession and the trade press

Ten months before the Ismay Report was issued, in October 2009, Accountancy Age reported on the mounting concerns that Horizon had flaws which misstated branch accounts. The article quoted a senior representative of the Institute of Chartered Accountants in England and Wales, and it stated that the newspaper asked the Post Office whether it would perform an IT audit of Horizon. Richard Anning, head of the ICAEW IT faculty, said:

“You need to make sure that your accounting system is bullet proof.

Whether they have an IT audit or not, they need to understand what was happening.”

Accountancy Age reported that the Post Office declined to comment when asked if it would undertake a system audit. The magazine also attached an editorial comment to the story:

“The Post Office should consider an IT audit to show it has taken the matter seriously. Although it may be small sums of money involved, perception is everything and it could not consider going back into bank services with an accounting system that had doubts attached to it.”

Such a call from the ICAEW, and in the trade press, must have caused serious, high level discussion within the Post Office, the wider Royal Mail, and within EY. Yet 10 months later, after these public, and authoritative, calls for the Post Office to remove doubts about Horizon, the corporation issued a report saying that after discussions with EY there would be no system audit. Professional scrutiny of Horizon would suggest that the Post Office shared the widespread doubts.

“It is also important to be crystal clear about any review if one were commissioned – any investigation would need to be disclosed in court. Although we would be doing the review to comfort others, any
perception that POL doubts its own systems would mean that all criminal prosecutions would have to be stayed. It would also beg a question for the Court of Appeal over past prosecutions and imprisonments.”

Did EY press for an appropriate system audit following the 2009 call from the ICAEW and Accountancy Age? If not, why not? What was EY’s response to that lack of action?

How did EY plan and conduct subsequent external audits in the knowledge that Horizon had not been audited rigorously, that the Post Office was determined not to allow any system audit, and that the ICAEW had expressed concern? What difference did this knowledge make? If EY carried on as before how can they justify that?

What did EY think about Post Office internal control?

How confident was EY in the Royal Mail’s and Post Office’s internal control regime, and specifically Internal Audit, in the light of the feeble response to the superuser problem, the pitifully inadequate change management practices, and the lack of any credible system audit?

After working in a highly competent financial services internal audit department I know how humiliating it would be for internal auditors if their external counterparts highlighted basic control failings and then reported four years later that problems had not been resolved. This would be an obvious sign that the internal auditors were ineffective. The fact that Fujitsu managed the IT service does not let anyone at the Post Office off the hook. The systems belonged to the Post Office, who managed the contract, monitored the service, and carried ultimate responsibility.

The level of confidence that external auditors have in internal control is a crucial factor in planning audits. This issue, and the question of whether EY applied strict performance materiality thresholds to Horizon’s branch accounts, contribute to the wider concern about the effectiveness of the external audit business model.

External audit – a broken business model

The assessments of materiality, and of performance materiality have a direct impact on the profitability of an audit. The more confidence that the external auditors have in risk management and internal control the less substantive testing they have to do.

If external auditors pretend that there is a tight and effective regime of risk management and internal control, and that financial systems possess processing integrity, then they can justify setting higher thresholds for materiality and performance materiality and correspondingly higher sampling intervals. Put crudely, the external auditors’ rating of internal control rating has a very strong influence on the amount of work the audit team must perform; better ratings mean less work and more profit.

If auditors judge that the corporation under audit is a badly managed mess, and they are working to a fixed audit fee then they have a problem. They can perform a rigorous audit and make a loss, or they can pretend that everything is tight and reliable, and make a profit. I saw the dilemma playing out in practice with the control rating being manipulated to fit the audit fee. It left me disillusioned with the external audit industry (as I wrote here).

The dilemma presents a massive conflict of interest to the auditors. It undermines their independence, and compromises the business model for external audit.rod ismay

The auditors have a financial incentive to ignore problems. Executive management and the board of directors of the company under audit have a story they want to tell investors, suppliers, and external stakeholders. The commercial pressure on the auditors is to reassure the world that this story is true and fair. If they cause trouble they will lose the audit contract to a less scrupulous rival.

This conflict of interest is exacerbated by the revolving door of senior staff moving between the Big 4 and clients. Rod Ismay, who wrote the notorious report bearing his name, joined the Post Office direct from Ernst & Young where he had been a senior audit manager. It is a cosy and lucrative arrangement, but it hardly inspires confidence that external auditors will challenge irresponsible clients.

Conclusion – to be provided by the Williams Inquiry?

I am glad that Peter Crowley prompted me into setting out my thoughts about external auditors’ possible, or likely, failings over Horizon. I stand by my statement that Horizon could have been perfectly adequate as a source for the external accounts and yet completely inadequate for the purposes of the subpostmasters. However, it is worth taking a close look at whether Horizon really was adequate for the high level accounts. That turns attention to the conduct of some extremely richly paid professionals who assured the world over many years that the Post Office’s accounts based on Horizon were sound. If they had shown any inclination to ask unwelcome questions this dreadful scandal might have been brought to a conclusion years earlier.

EY will have to come up with some compelling and persuasive answers to the Williams Inquiry to remove the suspicion that they chose not ask the right questions, that they chose not to challenge the Royal Mail and Post Office. Everything I have learned about external audit and its relations with internal audit make me suspicious about EY’s conduct. I will be very surprised if evidence appears that will make me change my mind.

I hope that the Williams Inquiry will prompt some serious debate in the media and amongst politicians about whether the current external audit model works. I doubt if any other member of audit’s Big 4 would have performed any better than EY, and that is a scandal in its own right.

If one of the world’s leading audit firms failed to spot that the Post Office was being steered into insolvency over three decades then we are entitled to ask – what is the point of external auditors apart from providing rich rewards to the partners who own them?

The myth of perfect software – IT audit and governance aspects of the Post Office scandal

This is a 21 minute presentation I prepared for the Kent Centre for European and Comparative Law, at the University of Kent. It was for an event devoted to the Post Office scandal on 21st May 2022.

People often ask. “How could the Post Office scandal go on for so long?” “Why did nobody realise what was going wrong?” “Why did nobody speak out?”

Part of the problem was a willful naivety about the fallibility of complex software systems. Too many people in important positions at the Post Office were ignorant about the nature of software and apparently extremely reluctant to learn. They wanted to retain their illusions. The people who should have educated them did not do their job. There is no such thing as perfect software unless we are talking about utterly trivial applications. The question, the challenge, is how we respond so that people do not get hurt.

The Post Office could hardly have failed this challenge more appallingly or more disgracefully. This was a scandal of IT management and corporate governance for which responsibility reaches to the highest level. The role of IT audit was crucial in this failure, as I explain in this talk.

The talk was based on a lengthy article “The Post Office IT scandal – why IT audit is essential for effective corporate governance” that I wrote for the Digital Evidence and Electronic Signature Law Review and which was published in March 2022.

“The Great Post Office Scandal” by Nick Wallis – my review

Nick Wallis’s book “The Great Post Office Scandal” should be compulsory reading for anyone setting out on a career as a software developer, tester, risk manager or internal auditor – or indeed anyone starting to study the law. The author sets out, in exhaustive detail, the story of how the Post Office’s Horizon system, developed by Fujitsu, and managed appallingly by both corporations ruined thousands of lives.

Nick Wallis followed the scandal for over a decade, showing remarkable and commendable committment to an important cause. Nobody could have written a more complete account of this scandal – and it is a story that the world has to hear.

A tale of flawed software and corporate malpractice might sound dull, but Nick Wallis never loses sight of the human impact. Throughout the book he weaves into his narrative the stories of individuals who have suffered heartbreaking persecution and tragedy. The result is a highly readable, deeply moving and gripping book.

Before reading the book I was already very familiar with the scandal, but I was still shocked on a professional and human level. Nick Wallis has uncovered a wealth of detail that will dismay those, like me, who have worked in IT in more responsible, competent, and professional companies.

If there is any justice “The Great Post Office Scandal” will become a classic. It should be widely read throughout the IT world. People working with these complex IT systems should reflect on how IT affects people and remember Jerry Weinberg’s words; “no matter what they tell you, it’s always a people problem.” Horizon was a human tragedy, caused by people – not technology.

Nick Wallis focuses on the human and legal story rather than the technical issues, but he devotes one brief chapter to the difficulties of working with large, complex systems. His explanation is well informed and he describes clearly how these systems change and evolve rather than being similar to a static machine. A separate chapter relates the disgracefully amateurish development by ICL/Fujitsu. This account is shocking but Nick’s scathing version of the development’s history has been by the evidence that has emerged in the official inquiry.

Setting aside the technology, the book illustrates aspects of modern Britain that should make us feel deeply uncomfortable. It was not inevitable that the subpostmasters would be vindicated. If a different judge, less technically aware than Justice Fraser, had been appointed to hear the group litigation brought by the subpostmasters it is very likely that the outcome would have been different. It is easy to see how the Post Office could have got away with its appalling behaviour and crushed the innocent victims, leaving them financially ruined, their reputations destroyed, their health broken. The justice system and legal profession have difficult lessons to learn.

However, I hope that Nick Wallis’s book will reach a much wider audience than computer experts and lawyers. It is a deeply moving version of one of the oldest stories in the world; it is a classic tale of good people fighting back to overcome fearsome odds and defeat the villains. It is a wonderful read.

External auditors – “arrogant beyond their competence”

Anyone who has paid close attention to my writing and ranting over the years may have gleaned that I have great respect for good internal auditors, none whatsoever for the poor ones, and a mixture of concern, irritation and anger over the conduct of the external audit profession.

External auditors, particularly the Big Four, are addicted to a seriously flawed business model that serves their clients, the shareholders, and wider society desperately poorly, but happens to enrich the partners. This blog post explains the root of my disenchantment, the time I spent as a trainee accountant with a firm that was then one of the Big Eight, now Big Four after various mergers and the collapse in disgrace of Arthur Andersen.

This all happened in the 1980s and there’s little chance of being sued but I will just list what I witnessed, without naming names. Sadly the problems are not historical. If anything they are worse now. I regularly came into contact with the external auditors later in my career. I remained unimpressed. Norman Marks, who was chief audit executive at major global corporations for more than 20 years summed up the problem pithily in 2010 (see comment 3 in the linked article). Marks’ verdict exactly matches my own experience.

“Management is often able to hide fraudulent transactions or estimates from the auditors. This is an inherent risk. The staff who actually talk to the accountants and others involved in day-to-day activities are junior and inexperienced. The partners and managers are, in general, not as proficient as they believe they are. Most internal auditors would join me in assessing the external audit partners and senior managers as arrogant beyond their competence.”

Here is my list of external audit misconduct, all of which I witnessed at first hand.

* Auditors approved an inadequate set of accounts that had been drawn up for a client by a different department of the same firm. Then, when the client announced it was going to launch on the Stock Exchange, senior people frantically tried to cover the tracks before the independent accountants got their hands on the books. There was also one manager equally frantically trying to preserve the paper trail so he wouldn’t take the career-ending rap for the whole thing.

* A partner ordered a team to stay on a client’s site for a few days longer than necessary when there was no work to do, so the firm could keep charging the client. A side benefit was that the audit team could keep claiming lucrative expenses, to be charged to the client – obviously.

* I was phoned at home one weekend and told to go straight to the office on Monday morning and not to the client site. The whole audit team gathered at the office. A partner told us that the client was being taken over and the factory, in a struggling part of Ayrshire, where we were working was almost certain to be closed down. There would not be a public announcement for several days. In the meantime our focus was to be on work that would facilitate the takeover, but without alerting any of the people who were about to lose their jobs. It was not an enjoyable week.

* Auditors ignored a company buying back its own shares rather than checking to see whether it had done so legally. When I queried the legality the response was “It could be illegal? Really? Don’t worry. Just approve it.”

* An audit manager allowed a client to record the purchases of some vehicles from the manufacturer in the following year. But they’d been delivered, and sold on at the very end of the previous year. The sales were recorded on the last day of the year. That meant the revenue looked like pure profit. “They can’t do that” I said. “Don’t worry” I was told. “Just approve it.”

* An audit manager took an audit team out on a Friday jolly to do a wages spot check at a client, conveniently located close to a decent pub that did very good meals. The team was staffed of people at a loose end in the office and who wanted a lengthy pub lunch at the client’s expense. The client went bust a few months later, with everyone losing their jobs.

* I was told by an audit team leader that he hadn’t had to touch his salary for months because he was raking in so much on expenses.

* A partner called in the audit team to tell us that the agreed audit fee with the client had been frozen at the previous year’s level, so it was a cut in real terms. We’d therefore have to do less work than last year if we were to make a profit. We would report that internal control had improved, so we could justify doing less checking. The audit team leader chipped in, “but we already know internal control has got worse, so we should be doing more work”. The partner replied, “you don’t understand – you MUST report that internal control has improved”.

* On my very first day on a client site after completing the graduate induction course I was told to check out the client’s internal control system. I was handed a large sheet with a complex flowchart of organisational responsibilities. “Go to the accounts department and find someone who will look at it. Ask them if anything has changed since last year.” I was incredulous but I did what I was told. I found someone who was willing to talk to me. I asked if anything had changed. He looked at me as if I was an idiot, glanced at the chart and said “Naw, nothing has changed.” I reported back. “Good” said the audit team leader. “Internal control is the same as last year.” This was for a multinational company, one of Scotland’s largest. I would not have had a clue if I was being lied to, which may well have been why I was sent.

* I moved to a life assurance company to work in their investment division. Their accounts were in a mess and it was impossible to do a proper reconciliation on the US dollar bank accounts. I.e. the opening balances plus credits and minus debits (for which we had evidence, crucially) gave a figure wildly different from the closing balances. There were a huge number of entries in the bank accounts that we could not explain. One of my jobs was to investigate and resolve this. My predecessor had a breakdown and covered up the shambles until the discrepancy became too big to hide. It was a very fast-moving and stressful environment, so I could see how he had lost control.

I had not completed this accounting salvage operation when the external auditors arrived. They were from one of the Big 8. I was ordered to hide all the embarrassing paperwork and stonewall the auditors. I put all the bank statements, invoices, dividend dockets, receipts etc in the basement garage and kept telling the young graduate trainee external auditor that I would “get around to providing the documentation”. I never did and, as predicted by my manager, the auditors soon gave up and ignored the problem. Was it a material problem? Unquestionably. Millions of dollars were flowing through these accounts every day.

I was ashamed to be part of this shabby exercise, but everyone else seemed relaxed about it. My employer wanted to fool the auditors. The auditors wanted to be fooled. I switched to a far more respectable and enjoyable career in IT – with a rival insurance company which had a much healthier culture.

The Post Office’s external auditors up till July 2018 were EY (formerly known as Ernst & Young). They are not the prime suspects in the Post Office’s Horizon IT scandal. However, the external auditors’ business model makes them far too dependent on client executives, and far too willing to miss or overlook obvious problems. This is an issue that governments do not seem willing to address.

“Privileged accesses” – an insight into incompetence at Fujitsu and the Post Office

Recently I have been thinking and writing about corporate governance failings at the Post Office during the two decades of the Post Office Horizon scandal. Having worked in software development, testing and IT audit I have experience that is relevant to several aspects of the scandal. I have a further slice of experience I have not yet commented on publicly. That is largely because I should not talk about experiences with clients when I worked for IBM. However, I have decided to break that rule, and I feel justified for two reasons. Firstly, I think it offers a useful insight into failings at the Post Office and Fujitsu. Secondly, my clients all set, and met, a far higher standard than we have seen in the long-running Horizon scandal. Nothing I write will embarrass them or IBM, quite the opposite.

I keep going back to the management letter, [PDF, opens in new tab] issued by Ernst & Young (E&Y), the Post Office’s external auditors, following the 2011 audit. The letter was commented on in the Horizon Issues court case, Bates v Post Office Ltd (No 6: Horizon Issues), [PDF, opens in new tab].

To normal people this 43 page letter is incomprehensible and boring. It lists a series of major and minor problems with Fujitsu’s management of the IT service it provided to the Post Office. Only people who have worked in this field will feel comfortable interpreting the letter and its significance.

The letter draws attention to problems that E&Y came across in the course of their audit. As the introduction says.

“Our review of the company’s systems of internal control is carried out to help us express an opinion on the accounts of the company as a whole. This work is not primarily directed towards the discovery of weaknesses, the detection of fraud or other irregularities (other than those which would influence us in forming that opinion) and should not, therefore, be relied upon to show that no other weaknesses exist or areas require attention. Accordingly, the comments in this letter refer only to those matters that have come to our attention during the course of our normal audit work and do not attempt to indicate all possible improvements that a special review might develop.

E&Y did not conduct a full technical audit. They were concerned with assessing whether the financial accounts offered a true and fair view of the financial position of the company. Their assessment of internal control was only sufficiently detailed to allow them to form an opinion on the company accounts.

It is, or it should be, monumentally embarrassing for the internal auditors if the external auditors find long-standing control problems. The internal auditors should have the staff, expertise and time to detect these problems and ensure that they are resolved long before external auditors spot them. The external auditors are around for only a few weeks or months, and it is not their primary responsibility to find problems like this. I wrote about this from the perspective of an IT auditor last year (see section “Superusers going ‘off piste'”).

The specific issue in the management letter that rightly attracted most attention in the Horizon Issues’ case was the poor control over user IDs with high privilege levels. Not only did this highlight the need to improve Fujitu’s management of the IT service and the oversight provided by the Post Office, it also pointed to an ineffective internal audit function at the Post Office, and previously the Royal Mail before the Post Office was hived off.

When I was reading through the E&Y management letter I was struck by how familiar the problems were. When I worked for IBM I spent three years as an information security manager. My background had been in software development, testing and IT audit. The contract on which I was working was winding down and one day my phone rang and I was made an interesting offer. Service Delivery Security wanted another information security manager to work with new outsourced accounts. My background showed I had a grasp of security issues, the ability to run projects, and a track record of working with clients without triggering unseemly brawls or litigation. So I was a plausible candidate. I would rely on the deeply technical experts and make sure that IBM and the client got what they wanted.

The job entailed working with the client right at the start of the outsourcing deal, for a few months either side of the cutover. An important responsibility was reaching agreement with the client about the detail of what IBM would provide.

All the issues relating to privileged access raised by E&Y in their management letter were within my remit. The others, mainly change management, were dealt with by the relevant experts. Each outsourcing contract required us to reach agreement on the full detail of the service by a set date, typically within a few months of the service cutover. In one case we had to reach agreement before service even started. On the service cutover date all staff transferring to IBM were required to continue working to exactly the same processes and standards until they were told to do something new.

I had to set up a series of meetings and workshops with the client and work through the detail of the security service. We would agree all the tedious but vital details; password lengths and formats, the processes required for authorising and reviewing new accounts and access privileges, logging and review of accesses, security incident response actions. It went on and on.

For each item we would document the IBM recommended action or setting. Alongside that we had to record what the client was currently doing. Finally we would agree the client’s requirement for the future service. If the future requirement entailed work by IBM to improve on what the client was currently doing that would entail a charge. If the client wanted something lower than the IBM recommendation then it was important that we had evidence that IBM was required to do something we would usually regard as unsatisfactory. This happened only rarely, and with good reason. The typical reason was that the client’s business meant the risk did not justify the tighter, and more expensive, control.

We also had to ensure that all the mainframe systems and servers were inventoried, and the settings documented. That was a huge job, but I farmed that out to the unenthusiastic platform experts. For all these platforms and settings we also had to agree how they should be configured in future.

The next step, and my final involvement, would be to set up a project plan to make all the changes required to bring the service up to the standard that the client needed. A new project manager would come in to run that boring project.

After three clients I felt I had learned a lot but staying in the job was going to mean endless repetition of very similar assignments. I also had some disagreements about IBM’s approach to outsourcing security services that meant I was unlikely to get promoted. I was doing a very good job at my current level and it was clearly recognised that I would only cause trouble if I were given more power! It’s true. I would have done. So I secured a move back to test management.

I enjoyed those three years because it gave me the chance to work with some very interesting clients. These were big, blue chip names; AstraZeneca, Boots (the UK retailer), and Nokia (when they were utterly dominant in the mobile phone market). I don’t have any qualms about naming these clients because they were all very thorough, professional and responsible.

The contrast with the Post Office and Fujitsu is striking. Fujitsu won the Post Office outsourcing contract [PDF, opens in new tab] in 1996 for an initial eight years. Yet, 15 years later, by which time the contract had been extended twice, E&Y reported that Fujitsu had not set up the control regime IBM demanded we create, with client agreement, at the very start of an outsourcing contract. The problems had still not been fully resolved by 2015.

Getting these basics correct is vital if corporations want to show that they are in control of their systems. If users have high privilege levels without effective authorisation, logging and monitoring then the corporation cannot have confidence in its data, which can be changed without permission and without a record of who took what action. Nobody can have confidence in the integrity of the systems. That has clear implications for the Horizon scandal. The Post Office insisted that Horizon was reliable when the reality was that Fujitsu did not apply the controls to justify that confidence.

Fujitsu may have failed to manage the service properly, but the Post Office is equally culpable. Outsourcing an IT service is not a matter of handing over responsibility then forgetting about it. The service has to be specified precisely then monitored carefully and constantly.

Why were the two corporations so incompetent and so negligent for so long? Why were the Post Office and Fujitsu so much less responsible and careful than IBM, AstraZeneca, Boots and Nokia?

Why did the Royal Mail’s and subsequently the Post Office’s internal auditors not detect problems with the outsourced service and force through an effective response?

When I became an information security manager I was told a major reason we had to tie the service down tightly was in case we ended up in court. We had to be able to demonstrate that we were in control of the systems, that we could prove the integrity of the data and the processing. So why did Fujitsu and the Post Office choose not to act as responsibly?

I was working in a well-trodden field. None of the issues we were dealing with were remotely new. The appropriate responses were very familiar. They were the mundane basics that every company using IT has to get right. Lay observers might think that the outsourcing arrangement was responsible for the failure of management control by distancing user management from the service providers. That would be wrong. The slackness seen at Fujitsu is more likely to occur in an in-house operation that has grown and evolved gradually. An outsourcing agreement should mean that everything is tied down precisely, and that was my experience.

I have worked as an IT auditor, and I have been an information security manager on big outsourcing contracts. I know how these jobs should be done and it amazes me to see that one of our major rivals was able to get away with such shoddy practices at the very time I was in the outsourcing game. Fujitsu still has the Post Office contract. That is astonishing.

Bugs are about more than code

Bugs are about more than code

Introduction

Recently I have had to think carefully about the nature of software systems, especially complex ones, and the bugs they contain. In doing so my thinking has been guided by certain beliefs I hold about complex software systems. These beliefs, or principles, are based on my practical experience but also on my studies, which, as well as teaching me much that I didn’t know, have helped me to make sense of what I have done and seen at work. Here are three vital principles I hold to be true.

Principle 1

Complex systems are not like calculators, which are predictable, deterministic instruments, i.e. they will always give the same answer from the same inputs. Complex systems are not predictable. We can only predict what they will probably do, but we cannot be certain. It is particularly important to remember this when working with complex socio-technical systems, i.e. complex systems, in the wider sense, that include humans, which are operated by people or require people to make them work. That covers most, or all, complex software systems.

Principle 2

Complex systems are more than the sum of their parts, or at least they are different. A system can be faulty even if all the individual programs,or components, are working correctly. The individual elements can combine with each other, and with the users, in unexpected and possibly damaging ways that could not have been predicted from inspecting the components separately.

Conversely, a system can be working satisfactorily even if some of the components are flawed. This inevitably means that the software code itself, however important it is, cannot be the only factor that determines the quality of the system.

Principle 3

Individual programs in a system can produce harmful outcomes even if their code was written perfectly. The outcome depends on how the different components, factors and people work together over the life of the system. Perfectly written code can cause a failure long after it has been released when there are changes to the technical, legal, or commercial environment in which the system runs.

The consequences

Bugs in complex systems are therefore inevitable. The absence of bugs in the past does not mean they are absent now, and certainly not that the system will be bug free in the future. The challenge is partly to find bugs, learn from them, and help users to learn how they can use the system safely. But testers should also try to anticipate future bugs, how they might arise, where the system is vulnerable, and learn how and whether users and operators will be able to detect problems and respond. They must then have the communication skills to explain what they have found to the people who need to know.

What we must not do is start off from the assumption that particular elements of the system are reliable and that any problems must have their roots elsewhere in the system. That mindset tends to push blame towards the unfortunate people who operate a flawed system.

Bugs and the Post Office Horizon scandal

justice lost in the postOver the last few months I have spent a lot of time on issues raised by the Post Office Horizon scandal. For a detailed account of the scandal I strongly recommend the supplement that Private Eye has produced, written by Richard Brooks and Nick Wallis, “Justice lost in the post”.

When I have been researching this affair I have realised, time and again, how the Post Office and Fujitsu, the outsourced IT services supplier, ignored the three principles I outlined. While trawling through the judgment of Mr Justice Fraser in Bates v Post Office Ltd (No 6: Horizon Issues, i.e. the second of the two court cases brought by the Justice For Subpostmasters Alliance), which should be compulsory reading for Computer Science students, I was struck by the judge’s discussion of the nature of bugs in computer systems. You can find the full 313 page judgment here [PDF, opens in new tab].

The definition of a bug was at the heart of the second court case. The Post Office, and Fujitsu (the outsourced IT services supplier) argued that a bug is a coding error, and the word should not apply to other problems. The counsel for the claimants, i.e. the subpostmasters and subpostmistresses who had been victims of the flawed system, took a broader view; a bug is anything that means the software does not operate as users, or the corporation, expect.

After listening to both sides Fraser came down emphatically on the claimants’ side.

“26 The phrase ‘bugs, errors or defects’ is sufficiently wide to capture the many different faults or characteristics by which a computer system might not work correctly… Computer professionals will often refer simply to ‘code’, and a software bug can refer to errors within a system’s source code, but ‘software bugs’ has become more of a general term and is not restricted, in my judgment, to meaning an error or defect specifically within source code, or even code in an operating system.

Source code is not the only type of software used in a system, particularly in a complex system such as Horizon which uses numerous applications or programmes too. Reference data is part of the software of most modern systems, and this can be changed without the underlying code necessarily being changed. Indeed, that is one of the attractions of reference data. Software bug means something within a system that causes it to cause an incorrect or unexpected result. During Mr de Garr Robinson’s cross-examination of Mr Roll, he concentrated on ‘code’ very specifically and carefully [de Garr Robinson was the lawyer representing the Post Office and Roll was a witness for the claimants who gave evidence about problems with Horizon that he had seen when he worked for Fujitsu]. There is more to the criticisms levelled at Horizon by the claimants than complaints merely about bugs within the Horizon source code.

27 Bugs, errors or defects is not a phrase restricted solely to something contained in the source code, or any code. It includes, for example, data errors, data packet errors, data corruption, duplication of entries, errors in reference data and/or the operation of the system, as well as a very wide type of different problems or defects within the system. ‘Bugs, errors or defects’ is wide enough wording to include a wide range of alleged problems with the system.”

The determination of the Post Office and Fujitsu to limit the definition of bugs to source code was part of a policy of blaming users for all errors that were not obviously caused by the source code. This is clear from repeated comments from witnesses and from Mr Justice Fraser in the judgment. “User error” was the default explanation for all problems.

Phantom transactions or bugs?

This stance of blaming the users if they were confused by Horizon’s design was taken to an extreme with “phantom transactions”. These were transactions generated by the system but which were recorded as if they had been made by a user (see in particular paragraphs 209 to 214 of Fraser’s judgment).

In paragraph 212 Fraser refers to a Fujitsu problem report.

“However, the conclusion reached by Fujitsu and recorded in the PEAK was as follows:

‘Phantom transactions have not been proven in circumstances which preclude user error. In all cases where these have occurred a user error related cause can be attributed to the phenomenon.'”

This is striking. These phantom transactions had been observed by Royal Mail engineers. They were known to exist. But they were dismissed as a cause of problems unless it could be proven that user error was not responsible. If Fujitsu could imagine a scenario where user error might have been responsible for a problem they would rule out the possibility that a phantom transaction could have been the cause, even if the phantom had occurred. The PEAK (error report) would simply be closed off, whether or not the subpostmaster agreed.

This culture of blaming users rather than software was illustrated by a case of the system “working as designed” when its behaviour clearly confused and misled users. In fact the system was acting contrary to user commands. In certain circumstances if a user entered the details for a transaction, but did not commit it, the system would automatically complete the transaction with no further user intervention, which might result in a loss to the subpostmaster.

The Post Office, in a witness statement, described this as a “design quirk”. However, the Post Office’s barrister, Mr de Garr Robinson, in his cross-examination of Jason Coyne, an IT consultant hired by the subpostmasters, was able to convince Nick Wallis (one of the authors of “Justice lost in the post”) that there wasn’t a bug.

“Mr de Garr Robinson directs Mr Coyne to Angela van den Bogerd’s witness statement which notes this is a design quirk of Horizon. If a bunch of products sit in a basket for long enough on the screen Horizon will turn them into a sale automatically.

‘So this isn’t evidence of Horizon going wrong, is it?’ asks Mr de Garr Robinson. ‘It is an example of Horizon doing what it was supposed to do.’

‘It is evidence of the system doing something without the user choosing to do it.’ retorts Mr Coyne.

But that is not the point. It is not a bug in the system.”

Not a bug? I would contest that very strongly. If I were auditing a system with this “quirk” I would want to establish the reasons for the system’s behaviour. Was this feature deliberately designed into the system? Or was it an accidental by-product of the system design? Whatever the answer, it would be simply the start of a more detailed scrutiny of technical explanations, understanding of the nature of bugs, the reasons for a two-stage committal of data, and the reasons why those two stages were not always applied. I would not consider “working as designed” to be an acceptable answer.

The Post Office’s failure to grasp the nature of complex systems

A further revealing illustration of the Post Office’s attitude towards user error came in a witness statement provided for the Common Issues trial, the first of the two court cases brought by the Justice For Subpostmasters Alliance. This first trial was about the contractual relationship between the Post Office and subpostmasters. The statement came from Angela van den Bogerd. At the time she was People Services Director for the Post Office, but over the previous couple of decades she had been in senior technical positions, responsible for Horizon and its deployment. She described herself in court as “not an IT expert”. That is an interesting statement to consider alongside some of the comments in her witness statement.

“[78]… the Subpostmaster has complete control over the branch accounts and transactions only enter the branch accounts with the Subpostmaster’s (or his assistant’s) knowledge.

[92] I describe Horizon to new users as a big calculator. I think this captures the essence of the system in that it records the transactions inputted into it, and then adds or subtracts from the branch cash or stock holdings depending on whether it was a credit or debit transaction.”

“Complete control”? That confirms her admission that she is not an IT expert. I would never have been bold, or reckless, enough to claim that I was in complete control of any complex IT system for which I was responsible. The better I understood the system the less inclined I would be to make such a claim. Likening Horizon to a calculator is particularly revealing. See Principle 1 above. When I have tried to explain the nature of complex systems I have also used the calculator analogy, but as an illustration of what a complex system is not.

If a senior manager responsible for Horizon could use such a fundamentally mistaken analogy, and be prepared to insert it in a witness statement for a court case, it reveals how poorly equipped the Post Office management was to deal with the issues raised by Horizon. When we are confronted by complexity it is a natural reaction to try and construct a mental model that simplifies the problems and makes them understandable. This can be helpful. Indeed it is often essential if we are too make any sense of complexity. I have written about this here in my blog series “Dragons of the unknown”.

However, even the best models become dangerous if we lose sight of their limitations and start to think that they are exact representations of reality. They are no longer fallible aids to understanding, but become deeply deceptive.

If you think a computer system is like a calculator then you will approach problems with the wrong attitude. Calculators are completely reliable. Errors are invariably the result of users’ mistakes, “finger trouble”. That is exactly how senior Post Office managers, like Angela van den Bogerd, regarded the Horizon problems.

BugsZero

The Horizon scandal has implications for the argument that software developers can produce systems that have no bugs, that zero bugs is an attainable target. Arlo Belshee is a prominent exponent of this idea, of BugsZero as it is called. Here is a short introduction.

Before discussing anyone’s approach to bugs it is essential that we are clear what they mean by a bug. Belshee has a useful definition, which he provided in this talk in Singapore in 2016. (The conference website has a useful introduction to the talk.)

3:50 “The definition (of a bug) I use is anything that would frustrate, confuse or annoy a human and is potentially visible to a human other than the person who is currently actively writing (code).”

This definition is close to Justice Fraser’s (see above); “a bug is anything that means the software does not operate as users, or the corporation, expect”. However, I think that both definitions are limited.

BugsZero is a big topic, and I don’t have the time or expertise to do it justice, but for the purposes of this blog I’m happy to concede that it is possible for good coders to deliver exactly what they intend to, so that the code itself, within a particular program, will not act in ways that will “frustrate, confuse or annoy a human”, or at least a human who can detect the problem. That is the limitation of the definition. Not all faults with complex software will be detected. Some are not even detectable. Our inability to see them does not mean they are absent. Bugs can produce incorrect but plausible answers to calculations, or they can corrupt data, without users being able to see that a problem exists.

I speak from experience here. It might even be impossible for technical system experts to identify errors with confidence. It is not always possible to know whether a complex system is accurate. The insurance finance systems I used to work on were notoriously difficult to understand and manipulate. 100% accuracy was never a serious, practicable goal. As I wrote in “Fix on failure – a failure to understand failure”;

“With complex financial applications an honest and constructive answer to the question ‘is the application correct?’ would be some variant on ‘what do you mean by correct?’, or ‘I don’t know. It depends’. It might be possible to say the application is definitely not correct if it is producing obvious garbage. But the real difficulty is distinguishing between the seriously inaccurate, but plausible, and the acceptably inaccurate that is good enough to be useful. Discussion of accuracy requires understanding of critical assumptions, acceptable margins of error, confidence levels, the nature and availability of oracles, and the business context of the application.”

It is therefore misleading to define bugs as being potentially visible to users. Nevertheless, Belshee’s definition is useful provided that that qualification is accepted. However, in the same talk, Belshee goes on to make further statements I do not accept.

19:55 “A bug is an encoded developer mistake.”

28:50 “A bug is a mistake by a developer.”

This is a developer-centric view of systems. It is understandable if developers focus on the bugs for which they are responsible. However, if you look at the systems, and bugs, from the other end, from the perspective of users when a bug has resulted in frustration, confusion or annoyance, the responsibility for the problem is irrelevant. The disappointed human is uninterested in whether the problem is with the coding, the design, the interaction of programs or components, or whatever. All that matters is that the system is buggy.

There is a further complication. The coder may well have delivered code that was perfect when it was written and released. But perfect code can create later problems if the world in which it operates changes. See Principle 3 above. This aspect of software is not sufficiently appreciated; it has caused me a great deal of trouble in my career (see the section “Across time, not just at a point in time” in this blog, about working with Big Data).

Belshee does say that developers should take responsibility for bugs that manifest themselves elswhere, even if their code was written correctly. He also makes it clear, when talking about fault tolerant systems (17:22 in the talk above), that faults can arise “when the behaviour of the world is not as we expect”.

However he also says that the system “needs to work exactly as the developer thought if it’s going to recover”. That’s too high a bar for complex socio-technical systems. The most anyone can say, and it’s an ambitious target, is that the programs have been developed exactly as the developers intended. Belshee is correct at the program level; if the programs were not built as the developers intended then recovery will be very much harder. But at the system level we need to be clear and outspoken about the limits of what we can know, and about the inevitability that bugs are present.

If we start to raise hopes that systems might be perfect and bug-free because we believe that we can deliver perfectly written code then we are setting ourselves up for unpleasant recriminations when our users suffer from bugs. It is certainly laudable to eradicate sloppy and cavalier coding and it might be noble for developers to be willing to assume responsibility for all bugs. But it could leave them exposed to legal recriminations if the wider world believes that software developers can and should` ensure systems are free of bugs. This is where lawyers might become involved and that is why I’m unhappy about the name BugsZero, and the undeliverable promise that it implies.

Unnoticed and undetectable bugs in the Horizon case

The reality that a bug might be real and damaging but not detected by users, or even detectable by them, was discussed in the Horizon case.

“[972] Did the Horizon IT system itself alert Subpostmasters of such bugs, errors or defects… and if so how?

[973] Answer: Although the experts were agreed that the extent to which any IT system can automatically alert its users to bugs within the system itself is necessarily limited, and although Horizon has automated checks which would detect certain bugs, they were also agreed that there are types of bugs which would not be detected by such checks. Indeed, the evidence showed that some bugs lay undiscovered in the Horizon system for years. This issue is very easy, therefore, to answer. The correct answer is very short. The answer… is ‘No, the Horizon system did not alert SPMs’. The second part of the issue does not therefore arise.”

That is a significant extract from an important judgment. A senior judge directly addressed the question of system reliability and pronounced that he is satisfied that a complex system cannot be expected to have adequate controls to warn users of all errors.

This is more than an abstract, philosophical debate about proof, evidence and what we can know. In England there is a legal presumption that computer evidence is reliable. This made a significant contribution to the Horizon scandal. Both parties in a court case are obliged to disclose documents which might either support or undermine their case, so that the other side has a chance to inspect and challenge them. The Post Office and Fujitsu did not disclose anything that would have cast doubt on their computer evidence. That failure to disclose meant it was impossible for the subpostmasters being prosecuted to challenge the presumption that the evidence was reliable. The subpostmasters didn’t know about the relevant system problems, and they didn’t even know that that knowledge had been withheld from them.

Replacing the presumption of computer reliability

There are two broad approaches that can be taken in response to the presumption that computer evidence is reliable and the ease with which it can be abused, apart of course from ignoring the problem and accepting that injustice is a price worth paying for judicial convenience. England can scrap the presumption, which would require the party seeking to use the evidence to justify its reliability. Or the rules over disclosure can be strengthened to try and ensure that all relevant information about systems is revealed. Some blend of the two approaches seems most likely.

I have recently contributed to a paper entitled “Recommendations for the probity of computer evidence”. It has been submitted to the Ministry of Justice, which is responsible for the courts in England & Wales, and is available from the Digital Evidence and Electronic Signature Law Review.

The paper argues that the presumption of computer reliability should be replaced by a two stage approach when reliability is challenged. The provider of the data should first be required to provide evidence to demonstrate that they are in control of their systems, that they record and track all bugs, fixes, changes and releases, and that they have implemented appropriate security standards and processes.

If the party wishing to rely on the computer evidence cannot provide a satisfactory response in this first stage then the presumption of reliability should be reversed. The second stage would require them to prove that none of the failings revealed in the first stage might affect the reliability of the computer evidence.

Whatever approach is taken, IT professionals would have to offer an opinion on their systems. How reliable are the systems? What relevant evidence might there be that systems are reliable, or unreliable? Can they demonstrate that they are in control of their systems? Can they reassure senior managers who will have to put their name to a legal document and will be very keen to avoid the humiliation that has been heaped upon Post Office and Fujitsu executives, with the possibility of worse to come?

A challenging future

The extent to which we can rely on computers poses uncomfortable challenges for the English law now, but it will be an increasingly difficult problem for the IT world over the coming years. What can we reasonably say about the systems we work with? How reliable are they? What do we know about the bugs? Are we sufficiently clear about the nature of our systems to brief managers who will have to appear in court, or certify legal documents?

It will be essential that developers and testers are clear in their own minds, and in their communications, about what bugs are. They are not just coding errors, and we must try to ensure people outside IT understand that. Testers must also be able to communicate clearly what they have learned about systems, and they must never say or do anything that suggests systems will be free of bugs.

Testers will have to think carefully about the quality of their evidence, not just about the quality of their systems. How good is our evidence? How far can go in making confident statements of certainty? What do we still not know, and what is the significance of that missing knowledge? Much of this will be a question of good management. But organisations will need good testers, very good testers, who can explain what we know, and what we don’t know, about complex systems; testers who have the breadth of knowledge, the insight, and the communication skills to tell a convincing story to those who require the unvarnished truth.

We will need confident, articulate testers who can explain that a lack of certainty about how complex systems will behave is an honest, clear sighted, statement of truth. It is not an admission of weakness or incompetence. Too many people in IT have built good careers on bullshitting, on pretending they are more confident and certain than they have any right to be. Systems will inevitably become more complex. IT people will increasingly be dragged into litigation, and as the Post Office and Fujitsu executives have found, misplaced and misleading confidence and bluster in court have excruciating personal consequences. Unpleasant though these consequences are, they hardly compare with the tragedies endured by the subpostmasters and subpostmistresses, whose lives were ruined by corporations who insisted that their complex software was reliable.

The future might be difficult, even stressful, for software testers, but they will have a valuable, essential role to play in helping organisations and users to gain a better understanding of the fallible nature of software. To say the future will be interesting is an understatement; it will present exciting challenges and there should be great opportunities for the best testers.

Business logic security testing (2009)

Business logic security testing (2009)

Testing ExperienceThis article appeared in the June 2009 edition of Testing Experience magazine and the October 2009 edition of Security Acts magazine.Security Acts magazine

If you choose to read the article please bear in mind that it was written in January 2009 and is therefore inevitably dated in some respects. In particular, ISACA has restructured COBIT, but it remains a useful source. Overall I think the arguments I made in this article are still valid.

The references in the article were all structured for a paper magazine. They were not set up as hyperlinks and I have not tried to recreate them and check out whether they still work.business logic security testing article

The article

When I started in IT in the 80s the company for which I worked had a closed network restricted to about 100 company locations with no external connections.

Security was divided neatly into physical security, concerned with the protection of the physical assets, and logical security, concerned with the protection of data and applications from abuse or loss.

When applications were built the focus of security was on internal application security. The arrangements for physical security were a given, and didn’t affect individual applications.

There were no outsiders to worry about who might gain access, and so long as the common access controls software was working there was no need for analysts or designers to worry about unauthorized internal access.

Security for the developers was therefore a matter of ensuring that the application reflected the rules of the business; rules such as segregation of responsibilities, appropriate authorization levels, dual authorization of high value payments, reconciliation of financial data.

The world quickly changed and relatively simple, private networks isolated from the rest of the world gave way to more open networks with multiple external connections and to web applications.

Security consequently acquired much greater focus. However, it began to seem increasingly detached from the work of developers. Security management and testing became specialisms in their own right, and not just an aspect of technical management and support.

We developers and testers continued to build our applications, comforted by the thought that the technical security experts were ensuring that the network perimeter was secure.photo of business logic security article header

Nominally security testing was a part of non-functional testing. In reality, it had become somewhat detached from conventional testing.

According to the glossary of the British Computer Society’s Special Interest Group in Software Testing (BCS SIGIST) [1], security testing is determining whether the application meets the specified security requirements.

SIGIST also says that security entails the preservation of confidentiality, integrity and availability of information. Availability means ensuring that authorized users have access to information and associated assets when required. Integrity means safeguarding the accuracy and completeness of information and processing methods. Confidentiality means ensuring that information is accessible only to those authorized to have access.

Penetration testing, and testing the security of the network and infrastructure, are all obviously important, but if you look at security in the round, bearing in mind wider definitions of security (such as SIGIST’s), then these activities can’t be the whole of security testing.

Some security testing has to consist of routine functional testing that is purely a matter of how the internals of the application work. Security testing that is considered and managed as an exercise external to the development, an exercise that follows the main testing, is necessarily limited. It cannot detect defects that are within the application rather than on the boundary.

Within the application, insecure design features or insecure coding might be detected without any deep understanding of the application’s business role. However, like any class of requirements, security requirements will vary from one application to another, depending on the job the application has to do.

If there are control failures that reflect poorly applied or misunderstood business logic, or business rules, then will we as functional testers detect that? Testers test at the boundaries. Usually we think in terms of boundary values for the data, the boundary of the application or the network boundary with the outside world.

Do we pay enough attention to the boundary of what is permissible user behavior? Do we worry enough about abuse by authorized users, employees or outsiders who have passed legitimately through the network and attempt to subvert the application, using it in ways never envisaged by the developers?

I suspect that we do not, and this must be a matter for concern. A Gartner report of 2005 [2] claimed that 75% of attacks are at the application level, not the network level. The types of threats listed in the report all arise from technical vulnerabilities, such as command injection and buffer overflows.

Such application layer vulnerabilities are obviously serious, and must be addressed. However, I suspect too much attention has been given to them at the expense of vulnerabilities arising from failure to implement business logic correctly.

This is my main concern in this article. Such failures can offer great scope for abuse and fraud. Security testing has to be about both the technology and the business.

Problem of fraud and insider abuse

It is difficult to come up with reliable figures about fraud because of its very nature. According to PriceWaterhouseCoopers in 2007 [3] the average loss to fraud by companies worldwide over the two years from 2005 was $2.4 million (their survey being biased towards larger companies). This is based on reported fraud, and PWC increased the figure to $3.2 million to allow for unreported frauds.

In addition to the direct costs there were average indirect costs in the form of management time of $550,000 and substantial unquantifiable costs in terms of damage to the brand, staff morale, reduced share prices and problems with regulators.

PWC stated that 76% of their respondents reported the involvement of an outside party, implying that 24% were purely internal. However, when companies were asked for details on one or two frauds, half of the perpetrators were internal and half external.

It would be interesting to know the relative proportions of frauds (by number and value) which exploited internal applications and customer facing web applications but I have not seen any statistics for these.

The U.S. Secret Service and CERT Coordination Center have produced an interesting series of reports on “illicit cyber activity”. In their 2004 report on crimes in the US banking and finance sector [4] they reported that in 70% of the cases the insiders had exploited weaknesses in applications, processes or procedures (such as authorized overrides). 78% of the time the perpetrators were authorized users with active accounts, and in 43% of cases they were using their own account and password.

The enduring problem with fraud statistics is that many frauds are not reported, and many more are not even detected. A successful fraud may run for many years without being detected, and may never be detected. A shrewd fraudster will not steal enough money in one go to draw attention to the loss.

I worked on the investigation of an internal fraud at a UK insurance company that had lasted 8 years, as far back as we were able to analyze the data and produce evidence for the police. The perpetrator had raised 555 fraudulent payments, all for less than £5,000 and had stolen £1.1 million pounds by the time that we received an anonymous tip off.

The control weaknesses related to an abuse of the authorization process, and a failure of the application to deal appropriately with third party claims payments, which were extremely vulnerable to fraud. These weaknesses would have been present in the original manual process, but the users and developers had not taken the opportunities that a new computer application had offered to introduce more sophisticated controls.

No-one had been negligent or even careless in the design of the application and the surrounding procedures. The trouble was that the requirements had focused on the positive functions of the application, and on replicating the functionality of the previous application, which in turn had been based on the original manual process. There had not been sufficient analysis of how the application could be exploited.

Problem of requirements and negative requirements

Earlier I was careful to talk about failure to implement business logic correctly, rather than implementing requirements. Business logic and requirements will not necessarily be the same.

The requirements are usually written as “the application must do” rather than “the application must not…”. Sometimes the “must not” is obvious to the business. It “goes without saying” – that dangerous phrase!

However, the developers often lack the deep understanding of business logic that users have, and they design and code only the “must do”, not even being aware of the implicit corollary, the “must not”.

As a computer auditor I reviewed a sales application which had a control to ensure that debts couldn’t be written off without review by a manager. At the end of each day a report was run to highlight debts that had been cleared without a payment being received. Any discrepancies were highlighted for management action.

I noticed that it was possible to overwrite the default of today’s date when clearing a debt. Inserting a date in the past meant that the money I’d written off wouldn’t appear on any control report. The report for that date had been run already.

When I mentioned this to the users and the teams who built and tested the application the initial reaction was “but you’re not supposed to do that”, and then they all tried blaming each other. There was a prolonged discussion about the nature of requirements.

The developers were adamant that they’d done nothing wrong because they’d built the application exactly as specified, and the users were responsible for the requirements.

The testers said they’d tested according to the requirements, and it wasn’t their fault.

The users were infuriated at the suggestion that they should have to specify every last little thing that should be obvious – obvious to them anyway.

The reason I was looking at the application, and looking for that particular problem, was because we knew that a close commercial rival had suffered a large fraud when a customer we had in common had bribed an employee of our rival to manipulate the sales control application. As it happened there was no evidence that the same had happened to us, but clearly we were vulnerable.

Testers should be aware of missing or unspoken requirements, implicit assumptions that have to be challenged and tested. Such assumptions and requirements are a particular problem with security requirements, which is why the simple SIGIST definition of security testing I gave above isn’t sufficient – security testing cannot be only about testing the formal security requirements.

However, testers, like developers, are working to tight schedules and budgets. We’re always up against the clock. Often there is barely enough time to carry out all the positive testing that is required, never mind thinking through all the negative testing that would be required to prove that missing or unspoken negative requirements have been met.

Fraudsters, on the other hand, have almost unlimited time to get to know the application and see where the weaknesses are. Dishonest users also have the motivation to work out the weaknesses. Even people who are usually honest can be tempted when they realize that there is scope for fraud.

If we don’t have enough time to do adequate negative testing to see what weaknesses could be exploited than at least we should be doing a quick informal evaluation of the financial sensitivity of the application and alerting management, and the internal computer auditors, that there is an element of unquantifiable risk. How comfortable are they with that?

If we can persuade project managers and users that we need enough time to test properly, then what can we do?

CobiT and OWASP

If there is time, there are various techniques that testers can adopt to try and detect potential weaknesses or which we can encourage the developers and users to follow to prevent such weaknesses.

I’d like to concentrate on the CobiT (Control Objectives for Information and related Technology) guidelines for developing and testing secure applications (CobiT 4.1 2007 [5]), and the CobiT IT Assurance Guide [6], and the OWASP (Open Web Application Security Project) Testing Guide [7].

Together, CobiT and OWASP cover the whole range of security testing. They can be used together, CobiT being more concerned with what applications do, and OWASP with how applications work.

They both give useful advice about the internal application controls and functionality that developers and users can follow. They can also be used to provide testers with guidance about test conditions. If the developers and users know that the testers will be consulting these guides then they have an incentive to ensure that the requirements and build reflect this advice.

CobiT implicitly assumes a traditional, big up-front design, Waterfall approach. Nevertheless, it’s still potentially useful for Agile practitioners, and it is possible to map from CobiT to Agile techniques, see Gupta [8].

The two most relevant parts are in the CobiT IT Assurance Guide [6]. This is organized into domains, the most directly relevant being “Acquire and Implement” the solution. This is really for auditors, guiding them through a traditional development, explaining the controls and checks they should be looking for at each stage.

It’s interesting as a source of ideas, and as an alternative way of looking at the development, but unless your organization has mandated the developers to follow CobiT there’s no point trying to graft this onto your project.

Of much greater interest are the six CobiT application controls. Whereas the domains are functionally separate and sequential activities, a life-cycle in effect, the application controls are statements of intent that apply to the business area and the application itself. They can be used at any stage of the development. They are;

AC1 Source Data Preparation and Authorization

AC2 Source Data Collection and Entry

AC3 Accuracy, Completeness and Authenticity Checks

AC4 Processing Integrity and Validity

AC5 Output Review, Reconciliation and Error Handling

AC6 Transaction Authentication and Integrity

Each of these controls has stated objectives, and tests that can be made against the requirements, the proposed design and then on the built application. Clearly these are generic statements potentially applicable to any application, but they can serve as a valuable prompt to testers who are willing to adapt them to their own application. They are also a useful introduction for testers to the wider field of business controls.

CobiT rather skates over the question of how the business requirements are defined, but these application controls can serve as a useful basis for validating the requirements.

Unfortunately the CobiT IT Assurance Guide can be downloaded for free only by members of ISACA (Information Systems Audit and Control Association) and costs $165 for non-members to buy. Try your friendly neighborhood Internal Audit department! If they don’t have a copy, well maybe they should.

If you are looking for a more constructive and proactive approach to the requirements then I recommend the Open Web Application Security Project (OWASP) Testing Guide [7]. This is an excellent, accessible document covering the whole range of application security, both technical vulnerabilities and business logic flaws.

It offers good, practical guidance to testers. It also offers a testing framework that is basic, and all the better for that, being simple and practical.

The OWASP testing framework demands early involvement of the testers, and runs from before the start of the project to reviews and testing of live applications.

Phase 1: Before Deployment begins

1A: Review policies and standards

1B: Develop measurement and metrics criteria (ensure traceability)

Phase 2: During definition and design

2A: Review security requirements

2B: Review design and architecture

2C: Create and review UML models

2D: Create and review threat models

Phase 3: During development

3A: Code walkthroughs

3B: Code reviews

Phase 4: During development

4A: Application penetration testing

4B: Configuration management testing

Phase 5: Maintenance and operations

5A: Conduct operational management reviews

5B: Conduct periodic health checks

5C: Ensure change verification

OWASP suggests four test techniques for security testing; manual inspections and reviews, code reviews, threat modeling and penetration testing. The manual inspections are reviews of design, processes, policies, documentation and even interviewing people; everything except the source code, which is covered by the code reviews.

A feature of OWASP I find particularly interesting is its fairly explicit admission that the security requirements may be missing or inadequate. This is unquestionably a realistic approach, but usually testing models blithely assume that the requirements need tweaking at most.

The response of OWASP is to carry out what looks rather like reverse engineering of the design into the requirements. After the design has been completed testers should perform UML modeling to derive use cases that “describe how the application works.

In some cases, these may already be available”. Obviously in many cases these will not be available, but the clear implication is that even if they are available they are unlikely to offer enough information to carry out threat modeling.OWASP threat modelling UML
The feature most likely to be missing is the misuse case. These are the dark side of use cases! As envisaged by OWASP the misuse cases shadow the use cases, threatening them, then being mitigated by subsequent use cases.

The OWASP framework is not designed to be a checklist, to be followed blindly. The important point about using UML is that it permits the tester to decompose and understand the proposed application to the level of detail required for threat modeling, but also with the perspective that threat modeling requires; i.e. what can go wrong? what must we prevent? what could the bad guys get up to?

UML is simply a means to that end, and was probably chosen largely because that is what most developers are likely to be familiar with, and therefore UML diagrams are more likely to be available than other forms of documentation. There was certainly some debate in the OWASP community about what the best means of decomposition might be.

Personally, I have found IDEF0 a valuable means of decomposing applications while working as a computer auditor. Full details of this technique can be found at http://www.idef.com [9].

It entails decomposing an application using a hierarchical series of diagrams, each of which has between three and six functions. Each function has inputs, which are transformed into outputs, depending on controls and mechanisms.IDEF0
Is IDEF0 as rigorous and effective as UML? No, I wouldn’t argue that. When using IDEF0 we did not define the application in anything like the detail that UML would entail. Its value was in allowing us to develop a quick understanding of the crucial functions and issues, and then ask pertinent questions.

Given that certain inputs must be transformed into certain outputs, what are the controls and mechanisms required to ensure that the right outputs are produced?

In working out what the controls were, or ought to be, we’d run through the mantra that the output had to be accurate, complete, authorized, and timely. “Accurate” and “complete” are obvious. “Authorized” meant that the output must have been created or approved by people with the appropriate level of authority. “Timely” meant that the output must not only arrive in the right place, but at the right time. One could also use the six CobiT application controls as prompts.

In the example I gave above of the debt being written off I had worked down to the level of detail of “write off a debt” and looked at the controls required to produce the right output, “cancelled debts”. I focused on “authorized”, “complete” and “timely”.

Any sales operator could cancel a debt, but that raised the item for management review. That was fine. The problem was with “complete” and “timely”. All write-offs had to be collected for the control report, which was run daily. Was it possible to ensure some write-offs would not appear? Was it possible to over-key the default of the current date? It was possible. If I did so, would the write-off appear on another report? No. The control failure therefore meant that the control report could be easily bypassed.

The testing that I was carrying out had nothing to do with the original requirements. They were of interest, but not really relevant to what I was trying to do. I was trying to think like a dishonest employee, looking for a weakness I could exploit.

The decomposition of the application is the essential first step of threat modeling. Following that, one should analyze the assets for importance, explore possible vulnerabilities and threats, and create mitigation strategies.

I don’t want to discuss these in depth. There is plenty of material about threat modeling available. OWASP offers good guidance, [10] and [11]. Microsoft provides some useful advice [12], but its focus is on technical security, whereas OWASP looks at the business logic too. The OWASP testing guide [7] has a section devoted to business logic that serves as a useful introduction.

OWASP’s inclusion of mitigation strategies in the version of threat modeling that it advocates for testers is interesting. This is not normally a tester’s responsibility. However, considering such strategies is a useful way of planning the testing. What controls or protections should we be testing for? I think it also implicitly acknowledges that the requirements and design may well be flawed, and that threat modeling might not have been carried out in circumstances where it really should have been.

This perception is reinforced by OWASP’s advice that testers should ensure that threat models are created as early as possible in the project, and should then be revisited as the application evolves.

What I think is particularly valuable about the application control advice in CobIT and OWASP is that they help us to focus on security as an attribute that can, and must, be built into applications. Security testing then becomes a normal part of functional testing, as well as a specialist technical exercise. Testers must not regard security as an audit concern, with the testing being carried out by quasi-auditors, external to the development.

Getting the auditors on our side

I’ve had a fairly unusual career in that I’ve spent several years in each of software development, IT audit, IT security management, project management and test management. I think that gives me a good understanding of each of these roles, and a sympathetic understanding of the problems and pressures associated with them. It’s also taught me how they can work together constructively.

In most cases this is obvious, but the odd one out is the IT auditor. They have the reputation of being the hard-nosed suits from head office who come in to bayonet the wounded after a disaster! If that is what they do then they are being unprofessional and irresponsible. Good auditors should be pro-active and constructive. They will be happy to work with developers, users and testers to help them anticipate and prevent problems.

Auditors will not do your job for you, and they will rarely be able to give you all the answers. They usually have to spread themselves thinly across an organization, inevitably concentrating on the areas with problems and which pose the greatest risk.

They should not be dictating the controls, but good auditors can provide useful advice. They can act as a valuable sounding board, for bouncing ideas off. They can also be used as reinforcements if the testers are coming under irresponsible pressure to restrict the scope of security testing. Good auditors should be the friend of testers, not our enemy. At least you may be able to get access to some useful, but expensive, CobiT material.

Auditors can give you a different perspective and help you ask the right questions, and being able to ask the right questions is much more important than any particular tool or method for testers.

This article tells you something about CobiT and OWASP, and about possible new techniques for approaching testing of security. However, I think the most important lesson is that security testing cannot be a completely separate specialism, and that security testing must also include the exploration of the application’s functionality in a skeptical and inquisitive manner, asking the right questions.

Validating the security requirements is important, but so is exposing the unspoken requirements and disproving the invalid assumptions. It is about letting management see what the true state of the application is – just like the rest of testing.

References

[1] British Computer Society’s Special Interest Group in Software Testing (BCS SIGIST) Glossary.

[2] Gartner Inc. “Now Is the Time for Security at the Application Level” (NB PDF download), 2005.

[3] PriceWaterhouseCoopers. “Economic crime- people, culture and controls. The 4th biennial Global Economic Crime Survey”.

[4] US Secret Service. “Insider Threat Study: Illicit Cyber Activity in the Banking and Finance Sector”.

[5] IT Governance Institute. CobiT 4.1, 2007.

[6] IT Governance Institute. CobiT IT Assurance Guide (not free), 2007.

[7] Open Web Application Security Project. OWASP Testing Guide, V3.0, 2008.

[8] Gupta, S. “SOX Compliant Agile Processes”, Agile Alliance Conference, Agile 2008.

[9] IDEF0 Function Modeling Method.

[10] Open Web Application Security Project. OWASP Threat Modeling, 2007.

[11] Open Web Application Security Project. OWASP Code Review Guide “Application Threat Modeling”, 2009.

[12] Microsoft. “Improving Web Application Security: Threats and Countermeasures”, 2003.

Do standards keep testers in the kindergarten? (2009)

Do standards keep testers in the kindergarten? (2009)

Testing ExperienceThis article appeared in the December 2009 edition of Testing Experience magazine, which is no longer published. I’m moving the article onto my blog from my website, which will be decommissioned soon.

Normally when I re-post old articles I provide a warning about them being dated. This one was written in November 2009 but I think that its arguments are still valid. It is only dated in the sense that it doesn’t mention ISO 29119, the current ISO software testing standard, which was released in 2013. This article shows why I was dismayed when ISO 29119 arrived on the scene. I thought that prescriptive testing standards, such as IEEE 829, had had their day. They had failed and we had moved on.

The references in the article were all structured for a paper magazine. There are no hyperlinks and I have not tried to recreate them and check out whether they still work.
kindergarten

The article

Discussion of standards usually starts from the premise that they are intrinsically a good thing, and the debate then moves on to consider what form they should take and how detailed they should be.

Too often sceptics are marginalised. The presumption is that standards are good and beneficial. Those who are opposed to them appear suspect, even unprofessional.

I believe that although the content of standards for software development and testing can be valuable, especially within individual organisations, I do not believe that they should be regarded as generic “standards” for the whole profession. Turning useful guidelines into standards suggests that they should be mandatory.

My particular concern is that the IEEE 829 “Standard for Software and System Test Documentation”, and the many document templates derived from it, encourage a safety first approach to documentation, with testers documenting plans and scripts in slavish detail.

They do so not because the project genuinely requires it, but because they have been encouraged to equate documentation with quality, and they fear that they will look unprofessional and irresponsible in a subsequent review or audit. I think these fears are ungrounded and I will explain why.

A sensible debate about the value of standards must start with a look at what standards are, and the benefits that they bring in general, and specifically to testing.

Often discussion becomes confused because justification for applying standards in one context is transferred to a quite different context without any acknowledgement that the standards and the justification may no longer be relevant in the new context.

Standards can be internal to a particular organisation or they can be external standards attempting to introduce consistency across an industry, country or throughout the world.

I’m not going to discuss legal requirements enforcing minimum standards of safety, such as Health and Safety legislation, or the requirements of the US Food & Drug Administration. That’s the the law, and it’s not negotiable.

The justification for technical and product standards is clear. Technical standards introduce consistency, common protocols and terminology. They allow people, services and technology to be connected. Product standards protect consumers and make it easier for them to distinguish cheap, poor quality goods from more expensive but better quality competition.

Standards therefore bring information and mobility to the market and thus have huge economic benefits.

It is difficult to see where standards for software development or testing fit into this. To a limited extent they are technical standards, but only so far as they define the terminology, and that is a somewhat incidental role.

They appear superficially similar to product standards, but software development is not a manufacturing process, and buyers of applications are not in the same position as consumers choosing between rival, inter-changeable products.

Are software development standards more like the standards issued by professional bodies? Again, there’s a superficial resemblance. However, standards such as Generally Accepted Accounting Principles (Generally Accepted Accounting Practice in the UK) are backed up by company law and have a force no-one could dream of applying to software development.

Similarly, standards of professional practice and competence in the professions are strictly enforced and failure to meet these standards is punished.

Where does that leave software development standards? I do believe that they are valuable, but not as standards.

Susan Land gave a good definition and justification for standards in the context of software engineering in her book “Jumpstart CMM-CMMI Software Process Improvements – using IEEE software engineering standards”. [1]

“Standards are consensus-based documents that codify best practice. Consensus-based standards have seven essential attributes that aid in process engineering. They;

  1. Represent the collected experience of others who have been down the same road.
  2. Tell in detail what it means to perform a certain activity.
  3. Help to assure that two parties attach the same meaning to an engineering activity.
  4. Can be attached to or referenced by contracts.
  5. Improve the product.
  6. Protect the business and the buyer.
  7. Increase professional discipline.” (List sequence re-ordered from original).

The first four justifications are for standards in a descriptive form, to aid communication. Standards of this type would have a broader remit than the technical standards I referred to, and they would be guidelines rather than prescriptive. These justifications are not controversial, although the fourth has interesting implications that I will return to later.

The last three justifications hint at compulsion. These are valid justifications, but they are for standards in a prescriptive form and I believe that these justifications should be heavily qualified in the context of testing.

I believe that where testing standards have value they should be advisory, and that the word “standard” is unhelpful. “Standards” implies that they should be mandatory, or that they should at least be considered a level of best practice to which all practitioners should aspire.

Is the idea of “best practice” useful?

I don’t believe that software development standards, specifically the IEEE series, should be mandatory, or that they can be considered best practice. Their value is as guidelines, which would be a far more accurate and constructive term for them.

I do believe that there is a role for mandatory standards in software development. The time-wasting shambles that is created if people don’t follow file naming conventions is just one example. Secure coding standards that tell programmers about security flaws that they must not introduce into their programs are also a good example of standards that should be mandatory.

However, these are local, site-specific standards. They are about consistency, security and good housekeeping, rather than attempting to define an over-arching vision of “best practice”.

Testing standards should be treated as guidelines, practices that experienced practitioners would regard as generally sound and which should be understood and regarded as the default approach by inexperienced staff.

Making these practices mandatory “standards”, as if they were akin to technical or product standards and the best approach in any situation, will never ensure that experienced staff do a better job, and will often ensure they do a worse job than if they’d been allowed to use their own judgement.

Testing consultant Ben Simo, has clear views on the notion of best practice. He told me;

“‘Best’ only has meaning in context. And even in a narrow context, what we think is best now may not really be the best.

In practice, ‘best practice’ often seems to be either something that once worked somewhere else, or a technical process required to make a computer system do a task. I like for words to mean something. If it isn’t really best, let’s not call it best.

In my experience, things called best practices are falsifiable as not being best, or even good, in common contexts. I like guidelines that help people do their work. The word ‘guideline’ doesn’t imply a command. Guidelines can help set some parameters around what and how to do work and still give the worker the freedom to deviate from the guidelines when it makes sense.”

Rather than tie people’s hands and minds with standards and best practices, I like to use guidelines that help people think and communicate lessons learned – allowing the more experienced to share some of their wisdom with the novices.”

Such views cannot be dismissed as the musings of maverick testers who can’t abide the discipline and order that professional software development and testing require.

Ben is the President of the Association of Software Testing. His comments will be supported by many testers who see how it matches their own experience. Also, there has been some interesting academic work that justify such scepticism about standards. Interestingly, it has not come from orthodox IT academics.

Lloyd Roden drew on the work of the Dreyfus brothers as he presented a powerful argument against the idea of “best practice” at Starwest 2009 and the TestNet Najaarsevent. Hubert Dreyfus is a philosopher and psychologist and Stuart Dreyfus works in the fields of industrial engineering and artificial intelligence.

In 1980 they wrote an influential paper that described how people pass through five levels of competence as they move from novice to expert status, and analysed how rules and guidelines helped them along the way. The five level of the Dreyfus Model of Skills Acquisition can be summarised as follows.

  1. Novices require rules that can be applied in narrowly defined situations, free of the wider context.
  2. Advanced beginners can work with guidelines that are less rigid than the rules that novices require.
  3. Competent practitioners understand the plan and goals, and can evaluate alternative ways to reach the goal.
  4. Proficient practitioners have sufficient experience to foresee the likely result of different approaches and can predict what is likely to be the best outcome.
  5. Experts can intuitively see the best approach. Their vast experience and skill mean that rules and guidelines have no practical value.

For novices the context of the problem presents potentially confusing complications. Rules provide clarity. For experts, understanding the context is crucial and rules are at best an irrelevant hindrance.

Roden argued that we should challenge any references to “best practices”. We should talk about good practices instead, and know when and when not to apply them. He argued that imposing “best practice” on experienced professionals stifles creativity, frustrates the best people and can prompt them to leave.

However, the problem is not simply a matter of “rules for beginners, no rules for experts”. Rules can have unintended consequences, even for beginners.

Chris Atherton, a senior lecturer in psychology at the University of Central Lancashire, made an interesting point in a general, anecdotal discussion about the ways in which learners relate to rules.

“The trouble with rules is that people cling to them for reassurance, and what was originally intended as a guideline quickly becomes a noose.

The issue of rules being constrictive or restrictive to experienced professionals is a really interesting one, because I also see it at the opposite end of the scale, among beginners.”

Obviously the key difference is that beginners do need some kind of structural scaffold or support; but I think we often fail to acknowledge that the nature of that early support can seriously constrain the possibilities apparent to a beginner, and restrict their later development.”

The issue of whether rules can hinder the development of beginners has significant implications for the way our profession structures its processes. Looking back at work I did at the turn of the decade improving testing processes for an organisation that was aiming for CMMI level 3, I worry about the effect it had.

Independent professional testing was a novelty for this client and the testers were very inexperienced. We did the job to the best of our ability at the time, and our processes were certainly considered best practice by my employers and the client.

The trouble is that people can learn, change and grow faster than strict processes adapt. A year later and I’d have done it better. Two years later, it would have been different and better, and so on.

Meanwhile, the testers would have been gaining in experience and confidence, but the processes I left behind were set in tablets of stone.

As Ben Simo put it; “if an organisation is at a level less than the intent of level 5, CMM seems to often lock in ignorance that existed when the process was created”.

CMMI has its merits but also has dangers. Continuous process improvement is at its heart, but these are incremental advances and refinements in response to analysis of metrics.

Step changes or significant changes in response to a new problem don’t fit comfortably with that approach. Beginners advance from the first stage of the Dreyfus Model, but the context they come to know and accept is one of rigid processes and rules.

Rules, mandatory standards and inflexible processes can hinder the development of beginners. Rigid standards don’t promote quality. They can have the opposite effect if they keep testers in the kindergarten.

IEEE829 and the motivation behind documentation

One could argue that standards do not have to be mandatory. Software developers are pragmatic, and understand when standards should be mandatory and when they should be discretionary. That is true, but the problem is that the word “standards” strongly implies compulsion. That is the interpretation that most outsiders would place on the word.

People do act on the assumption that the standard should be mandatory, and then regard non-compliance as a failure, deviation or problem. These people include accountants and lawyers, and perhaps most significantly, auditors.

My particular concern is the effect of IEEE 829 testing documentation standard. I wonder if much more than 1% of testers have ever seen a copy of the standard. However, much of its content is very familiar, and its influence is pervasive.

IEEE 829 is a good document with much valuable material in it. It has excellent templates, which provide great examples of how to meticulously document a project.

Or at least they’re great examples of meticulous documentation if that is the right approach for the project. That of course is the question that has to be asked. What is the right approach? Too often the existence of a detailed documentation standard is taken as sufficient justification for detailed documentation.

I’m going to run through two objections to detailed documentation. They are related, but one refers to design and the other to testing. It could be argued that both have their roots in psychology as much as IT.

I believe that the fixation of many projects on documentation, and the highly dubious assumption that quality and planning are synonymous with detailed documentation, have their roots in the structured methods that dominated software development for so long.

These methods were built on the assumption that software development was an engineering discipline, rather than a creative process, and that greater quality and certainty in the development process could be achieved only through engineering style rigour and structure.

Paul Ward, one of the leading developers of structured methods, wrote a series of articles [2] on the history of structured methods, which admitted that they were neither based on empirical research nor subjected to peer-review.

Two other proponents of structured methods, Larry Constantine and Ed Yourdon, admitted that the early investigations were no more than informal “noon-hour” critiques” [3].

Fitzgerald, Russo and Stolterman gave a brief history of structured methods in their book “Information Systems Development – Methods in Action” [4] and concluded that “the authors relied on intuition rather than real-world experience that the techniques would work”.

One of the main problem areas for structured methods was the leap from the requirements to the design. Fitzgerald et al wrote that “the creation of hierarchical structure charts from data flow diagrams is poorly defined, thus causing the design to be loosely coupled to the results of the analysis. Coad & Yourdon [5] label this shift as a ‘Grand Canyon’ due to its fundamental discontinuity.”

The solution to this discontinuity, according to the advocates of structured methods, was an avalanche of documentation to help analysts to crawl carefully from the current physical system, through the current logical system to a future logical system and finally a future physical system.

Not surprisingly, given the massive documentation overhead, and developers’ propensity to pragmatically tailor and trim formal methods, this full process was seldom followed. What was actually done was more informal, intuitive, and opaque to outsiders.

An interesting strand of research was pursued by Human Computer Interface academics such as Curtis, Iscoe and Krasner [6], and Robbins, Hilbert and Redmiles [7].

They attempted to identify the mental processes followed by successful software designers when building designs. Their conclusion was that they did so using a high-speed, iterative process; repeatedly building, proving and refining mental simulations of how the system might work.

Unsuccessful designers couldn’t conceive working simulations, and fixed on designs whose effectiveness they couldn’t test till they’d been built.

Curtis et al wrote;

Exceptional designers were extremely familiar with the application domain. Their crucial contribution was their ability to map between the behavior required of the application system and the computational structures that implemented this behavior.

In particular, they envisioned how the design would generate the system behavior customers expected, even under exceptional circumstances.”

Robbins et al stressed the importance of iteration;

“The cognitive theory of reflection-in-action observes that designers of complex systems do not conceive a design fully-formed. Instead, they must construct a partial design, evaluate, reflect on, and revise it, until they are ready to extend it further.”

The eminent US software pioneer Robert Glass discussed these studies in his book “Software Conflict 2.0” [8] and observed that;

“people who are not very good at design … tend to build representations of a design rather than models; they are then unable to perform simulation runs; and the result is they invent and are stuck with inadequate design solutions.”

These studies fatally undermine the argument that linear and documentation driven processes are necessary for a quality product and that more flexible, light-weight documentation approaches are irresponsible.

Flexibility and intuition are vital to developers. Heavyweight documentation can waste time and suffocate staff if used when there is no need.

Ironically, it was the heavyweight approach that was founded on guesswork and intuition, and the lightweight approach that has sound conceptual underpinnings.

The lessons of the HCI academics have obvious implications for exploratory testing, which again is rooted in psychology as much as in IT. In particular, the finding by Curtis et al that “exceptional designers were extremely familiar with the application domain” takes us to the heart of exploratory testing.

What matters is not extensive documentation of test plans and scripts, but deep knowledge of the application. These need not be mutually exclusive, but on high-pressure, time-constrained projects it can be hard to do both.

Itkonen, Mäntylä and Lassenius conducted a fascinating experiment at the University of Helsinki in 2007 in which they tried to compare the effectiveness of exploratory testing and test case based testing. [9]

Their findings were that test case testing was no more effective in finding defects. The defects were a mixture of native defects in the application and defects seeded by the researchers. Defects were categorised according to the ease with which they could be found. Defects were also assigned to one of eight defect types (performance, usability etc.).

Exploratory testing scored better for defects at all four levels of “ease of detection”, and in 6 out of the 8 defect type categories. The differences were not considered statistically significant, but it is interesting that exploratory testing had the slight edge given that conventional wisdom for many years was that heavily documented scripting was essential for effective testing.

However, the really significant finding, which the researchers surprisingly did not make great play of, was that the exploratory testing results were achieved with 18% of the effort of the test case testing.

The exploratory testing required 1.5 hours per tester, and the test case testing required an average of 8.5 hours (7 hours preparation and 1.5 hours testing).

It is possible to criticise the methods of the researchers, particularly their use of students taking a course in software testing, rather than professionals experienced in applying the techniques they were using.

However, exploratory testing has often been presumed to be suitable only for experienced testers, with scripted, test case based testing being more appropriate for the less experienced.

The methods followed by the Helsinki researchers might have been expected to bias the results in favour of test case testing. Therefore, the finding that exploratory testing is at least as effective as test case testing with a fraction of the effort should make proponents of heavily documented test planning pause to reconsider whether it is always appropriate.

Documentation per se does not produce quality. Quality is not necessarily dependent on documentation. Sometimes they can be in conflict.

Firstly, the emphasis on producing the documentation can be a major distraction for test managers. Most of their effort goes into producing, refining and updating plans that often bear little relation to reality.

Meanwhile the team are working hard firming up detailed test cases based on an imperfect and possibly outdated understanding of the application. While the application is undergoing the early stages of testing, with consequent fixes and changes, detailed test plans for the later stages are being built on shifting sand.

You may think that is being too cynical and negative, and that testers will be able to produce useful test cases based on a correct understanding of the system as it is supposed to be delivered to the testing stage in question. However, even if that is so, the Helsinki study shows that this is not a necessary condition for effective testing.

Further, if similar results can be achieved with less than 20% of the effort, how much more could be achieved if the testers were freed from the documentation drudgery in order to carry out more imaginative and proactive testing during the earlier stages of development?

Susan Land’s fourth justification for standards (see start of article) has interesting implications.

Standards “can be attached to or referenced by contracts”. That is certainly true. However, the danger of detailed templates in the form of a standard is that organisations tailor their development practices to the templates rather than the other way round.

If the lawyers fasten onto the standard and write its content into the contract then documentation can become an end and not just a means to an end.

Documentation becomes a “deliverable”. The dreaded phrase “work product” is used, as if the documentation output is a product of similar value to the software.

In truth, sometimes it is more valuable if the payments are staged under the terms of the contract, and dependent on the production of satisfactory documentation.

I have seen triumphant announcements of “success” following approval of “work products” with the consequent release of payment to the supplier when I have known the underlying project to be in a state of chaos.

Formal, traditional methods attempt to represent a highly complex, even chaotic, process in a defined, repeatable model. These methods often bear only vague similarities to what developers have to do to craft applications.

The end product is usually poor quality, late and over budget. Any review of the development will find constant deviations from the mandated method.

The suppliers, and defenders, of the method can then breathe a sigh of relief. The sacred method was not followed. It was the team’s fault. If only they’d done it by the book! The possibility that the developers’ and testers’ apparent sins were the only reason anything was produced at all is never considered.

What about the auditors?

Adopting standards like IEEE 829 without sufficient thought causes real problems. If the standard doesn’t reflect what really has to be done to bring the project to a successful conclusion then mandated tasks or documents may be ignored or skimped on, with the result that a subsequent review or audit reports on a failure to comply.

An alternative danger is that testers do comply when there is no need, and put too much effort into the wrong things. Often testers arrive late on the project. Sometimes the emphasis is on catching up with plans and documentation that are of dubious value, and are not an effective use of the limited resources and time.

However, if the contract requires it, or if there is a fear of the consequences of an audit, then it could be rational to assign valuable staff to unproductive tasks.

Sadly, auditors are often portrayed as corporate bogey-men. It is assumed that they will conduct audits by following ticklists, with simplistic questions that require yes/no answers. “Have you done x to y, yes or no”.

If the auditees start answering “No, but …” they would be cut off with “So, it’s no”.

I have seen that style of auditing. It is unprofessional and organisations that tolerate it have deeper problems than unskilled, poorly trained auditors. It is senior management that creates the environment in which the ticklist approach thrives. However, I don’t believe it is common. Unfortunately people often assume that this style of auditing is the norm.

IT audit is an interesting example of a job that looks extremely easy at first sight, but is actually very difficult when you get into it.

It is very easy for an inexperienced auditor to do what appears to be a decent job. At least it looks competent to everyone except experienced auditors and those who really understand the area under review.

If auditors are to add value they have to be able to use their judgement, and that has to be based on their own skills and experience as well as formal standards.

They have to be able to analyse a situation and evaluate whether the risks have been identified and whether the controls are appropriate to the level of risk.

It is very difficult to find the right line and you need good experienced auditors to do that. I believe that ideally IT auditors should come from an IT background so that they do understand what is going on; poachers turned gamekeepers if you like.

Too often testers assume that they know what auditors expect, and they do not speak directly to the auditors or check exactly what professional auditing consists of.

They assume that auditors expect to see detailed documentation of every stage, without consideration of whether it truly adds value, promotes quality or helps to manage the risk.

Professional auditors take a constructive and pragmatic approach and can help testers. I want to help testers understand that. I used to find it frustrating when I worked as an IT auditor when I found that people had wasted time on unnecessary and unhelpful actions on the assumption that “the auditors require it”.

Kanwal Mookhey, an IT auditor and founder of NII consulting, wrote an interesting article for the Internal Auditor magazine of May 2008 [10] about auditing IT project management.

He described the checking that auditors should carry out at each stage of a project. He made no mention of the need to see documentation of detailed test plans and scripts whereas he did emphasize the need for early testing.

Kanwal told me.

“I would agree that auditors are – or should be – more inclined to see comprehensive testing, rather than comprehensive test documentation.

Documentation of test results is another matter of course. As an auditor, I would be more keen to know that a broad-based testing manual exists, and that for the system in question, key risks and controls identified during the design phase have been tested for. The test results would provide a higher degree of assurance than exhaustive test plans.”

One of the most significant developments in the field of IT governance in the last few decades has been the US 2002 Sarbanes-Oxley Act, which imposed new standards of reporting, auditing and control for US companies. It has had massive worldwide influence because it applies to the foreign subsidiaries of US companies, and foreign companies that are listed on the US stock exchanges.

The act attracted considerable criticism for the additional overheads it imposed on companies, duplicating existing controls and imposing new ones of dubious value.

Unfortunately, the response to Sarbanes-Oxley verged on the hysterical, with companies, and unfortunately some auditors, reading more into the legislation than a calmer reading could justify. The assumption was that every process and activity should be tied down and documented in great detail.

However, not even Sarbanes-Oxley, supposedly the sacred text of extreme documentation, requires detailed documentation of test plans or scripts. That may be how some people misinterpret the act. It is neither mandated by the act nor recommended in the guidance documents issued by the Institute of Internal Auditors [11] and the Information Systems Audit & Control Association [12].

If anyone tries to justify extensive documentation by telling you that “the auditors will expect it”, call their bluff. Go and speak to the auditors. Explain that what you are doing is planned, responsible and will have sufficient documentation of the test results.

Documentation is never required “for the auditors”. If it is required it is because it is needed to manage the project, or it is a requirement of the project that has to be justified like any other requirement. That is certainly true of safety critical applications, or applications related to pharmaceutical development and manufacture. It is not true in all cases.

IEEE 829 and other standards do have value, but in my opinion their value is not as standards! They do contain some good advice and the fruits of vast experience. However, they should be guidelines to help the inexperienced, and memory joggers for the more experienced.

I hope this article has made people think about whether mandatory standards are appropriate for software development and testing, and whether detailed documentation in the style of IEEE 829 is always needed. I hope that I have provided some arguments and evidence that will help testers persuade others of the need to give testers the freedom to leave the kindergarten and grow as professionals.

References

[1] Land, S. (2005). “Jumpstart CMM-CMMI Software Process Improvements – using IEEE software engineering standards”, Wiley.

[2a] Ward, P. (1991). “The evolution of structured analysis: Part 1 – the early years”. American Programmer, vol 4, issue 11, 1991. pp4-16.

[2b] Ward, P. (1992). “The evolution of structured analysis: Part 2 – maturity and its problems”. American Programmer, vol 5, issue 4, 1992. pp18-29.

[2c] Ward, P. (1992). “The evolution of structured analysis: Part 3 – spin offs, mergers and acquisitions”. American Programmer, vol 5, issue 9, 1992. pp41-53.

[3] Yourdon, E., Constantine, L. (1977) “Structured Design”. Yourdon Press, New York.

[4] Fitzgerald B., Russo N., Stolterman, E. (2002). “Information Systems Development – Methods in Action”, McGraw Hill.

[5] Coad, P., Yourdon, E. (1991). “Object-Oriented Analysis”, 2nd edition. Yourdon Press.

[6] Curtis, B., Iscoe, N., Krasner, H. (1988). “A field study of the software design process for large systems” (NB PDF download). Communications of the ACM, Volume 31, Issue 11 (November 1988), pp1268-1287.

[7] Robbins, J., Hilbert, D., Redmiles, D. (1998). “Extending Design Environments to Software Architecture Design” (NB PDF download). Automated Software Engineering, Vol. 5, No. 3, July 1998, pp261-290.

[8] Glass, R. (2006). “Software Conflict 2.0: The Art and Science of Software Engineering” Developer Dot Star Books.

[9a] Itkonen, J., Mäntylä, M., Lassenius C., (2007). “Defect detection efficiency – test case based vs exploratory testing”. First International Symposium on Empirical Software Engineering and Measurement. (Payment required).

[9b] Itkonen, J. (2008). “Do test cases really matter? An experiment comparing test case based and exploratory testing”.

[10] Mookhey, K. (2008). “Auditing IT Project Management”. Internal Auditor, May 2008, the Institute of Internal Auditors.

[11] The Institute of Internal Auditors (2008). “Sarbanes-Oxley Section 404: A Guide for Management by Internal Controls Practitioners”.

[12] Information Systems Audit and Control Association (2006). “IT Control Objectives for Sarbanes-Oxley 2nd Edition”.

The Post Office Horizon IT scandal, part 2 – evidence & the “off piste” issue

In the first post of this three part series about the scandal of the Post Office’s Horizon IT system I explained the concerns I had about the approach to errors and accuracy. In this post I’ll talk about my experience working as an IT auditor investigating frauds, and my strong disapproval for the way the Post Office investigated and prosecuted the Horizon cases.

Evidence, certainty and prosecuting fraud

Although I worked on many fraud cases that resulted in people going to prison I was never required to give evidence in person. This was because we built our case so meticulously, with an overwhelmingly compelling set of evidence, that the fraudsters always pleaded guilty rather than risk antagonising the court with a wholly unconvincing plea of innocence.

We always had to be aware of the need to find out what had happened, rather than simply to sift for evidence that supported our working hypothesis. We had to follow the trail of evidence, but remain constantly alert to the possibility we might miss vital, alternative routes that could lead to a different conclusion. It’s very easy to fall quickly into the mindset that the suspect is definitely guilty and ignore anything that might shake that belief. Working on these investigations gave me great sympathy for the police carrying out detective work. If you want to make any progress you can’t follow up everything, but you have to be aware of the significance of the choices you don’t make.

In these cases there was a clear and obvious distinction between the investigators and the prosecutors. We, the IT auditors, would do enough investigation for us to be confident we had the evidence to support a conviction. We would then present that package of evidence to the police, who were invariably happy to run with a case where someone else had done the leg work. The police would do some confirmatory investigation of their own, but it was our work that would put people in jail. The prosecution of the cases was the responsibility of the Crown Prosecution Service in England & Wales, and the Procurator Fiscal Service in Scotland. That separation of responsibilities helps to guard against some of the dangers that concerned me about bias during investigation.

This separation didn’t apply in the case of the Post Office, which for anachronistic, historical reasons, employs its own prosecutors. It also has its own investigation service. There’s nothing unusual about internal investigators, but when they are working with an in house prosecution service that creates the danger of unethical behaviour. In the case of the Post Office the conduct of prosecutions was disgraceful.

The usual practice was to charge a sub-postmaster with theft and false accounting, even if the suspect had flagged up a problem with the accounts and there was no evidence that he or she had benefitted from a theft, or even committed one. Under pressure sub-postmasters would usually accept a deal. The more serious charge of theft would be dropped if they pleaded guilty to false accounting, which would allow the Post Office to pursue them for the losses.

What made this practice shameful was that the Post Office knew it had no evidence for theft that would secure a conviction. This doesn’t seem to have troubled them. They knew the suspects were guilty. They were protecting the interests of the Post Office and the end justified the means.

The argument that the prosecution tactics were deplorable is being taken very seriously. The Criminal Cases Review Commission has referred 39 Horizon cases for appeal, on the grounds of “abuse of process” by the prosecution.

The approach taken by Post Office investigators and prosecutors was essentially to try and ignore the weakest points of their case, while concentrating on the strongest points. This strikes me as fundamentally wrong. It is unprofessional and unethical. It runs counter to my experience.

Although I was never called to appear as a witness in court, when I was assembling the evidence to be used in a fraud trial I always prepared on the assumption I would have to face a barrister, or advocate, who had been sufficiently well briefed to home in on any possible areas of doubt, or uncertainty. I had to be prepared to face an aggressive questioner who could understand where weak points might lie in the prosecution case. The main areas of concern were where it was theoretically possible that data might have been tampered with, or where it was possible that someone else had taken the actions that we were pinning on the accused. Our case was only as strong as the weakest link in the chain of evidence. I had to be ready to explain why the jury should be confident “beyond reasonable doubt” that the accused was guilty.

Yes, it was theoretically possible that a systems programmer could have bypassed access controls and tampered with the logs, but it was vanishingly unlikely that they could have set up a web of consistent evidence covering many applications over many months, even years, and that they could have done so without leaving any trace.

In any case, these sysprogs lacked the deep application knowledge required. Some applications developers, and the IT auditors, did have the application knowledge, but they lacked the necessary privileges to subvert access controls before tampering with evidence.

The source code and JCL decks for all the fraud detection programs would have been available to the defence so that an expert witness could dissect them. We not only had to do the job properly, we had to be confident we could justify our code in court.

Another theoretical possibility was that another employee had logged into the accused’s account to make fraudulent transactions, but we could match these transactions against network logs showing that the actions had always been taken from the terminal sitting on the accused’s desk during normal office hours. I could sit at my desk in head office and use a network monitoring tool to watch what a suspect was doing hundreds of mile away. In one case I heard a colleague mention that the police were trailing a suspect around Liverpool that afternoon. I told my colleague to get back to the cops and tell them they were following the wrong guy. Our man was sitting at his desk in Preston and I could see him working. Half an hour later the police phoned back to say we were right.

In any case, fanciful speculation that our evidence had been manufactured hit the problem of motive; the accused was invariably enjoying a lifestyle well beyond his or her salary, whereas those who might have tampered with evidence had nothing to gain and a secure job, pension and mortgage to lose.

I’ve tried to explain our mindset and thought processes so that you can understand why I was shocked to read about what happened at the Post Office. We investigated and prepared meticulously in case we had to appear in court. That level of professional preparation goes a long way to explaining why we were never called to give evidence. The fraudsters always put their hands up when they realised how strong the evidence was.

Superusers going “off piste”

One of the most contentious aspects of the Horizon case was the prevalence of Transaction Corrections, i.e. corrections applied centrally by IT support staff to correct errors. The Post Office seems to have regarded these as being a routine part of the system, in the wider sense of the word “system”. But it regarded them as being outside the scope of the technical Horizon system. They were just a routine, administrative matter.

I came across an astonishing phrase in the judgment [PDF, opens in new tab, see page 117], lifted from an internal Post Office document. “When we go off piste we use APPSUP”. That is a powerful user privilege which allows users to do virtually anything. It was intended “for unenvisaged ad-hoc live amendment” of data. It had been used on average about once a day, and was assigned on a permanent basis to the ID’s of all the IT support staff looking after Horizon.

I’m not sure readers will realise how shocking the phrase “off piste” is in that context to someone with solid IT audit experience in a respectable financial services company. Picture the reaction of of a schools inspector coming across an email saying “our teachers are all tooled up with Kalashnikovs in case things get wild in the playground”. It’s not just a question of users holding a superuser privilege all the time, bad though that is. It reveals a lot about the organisation and its systems if staff have to jump in and change live data routinely. An IT shop that can’t control superusers effectively probably doesn’t control much. It’s basic.

Where I worked as an IT auditor nobody was allowed to have an account with which they could create, amend or delete production data. There were elaborate controls applied whenever an ad hoc or emergency change had to be made. We had to be confident in the integrity of our data. If we’d discovered staff having permanent update access to live data, for when they went “off piste”, we’d have raised the roof and wouldn’t have eased off till the matter was fully resolved. And if the company had been facing a court action that was centred on how confident we could be in our systems and data we’d have argued strongly that we should cut our losses and settle once we were aware of the “off piste” problem.

Were the Post Office’s, or Fujitsu’s, internal auditors aware of this? Yes, but they clearly did nothing. If I hadn’t discovered that powerful user privileges were out of control on the first day of a two day, high level, IT installation audit I’d have been embarrassed. It’s that basic. However, the Post Office’s internal auditors don’t have the excuse of incompetence. The problem was flagged up by the external auditors Ernst & Young in 2011. If internal audit was unaware of a problem raised by the external auditors they were stealing their salaries.

The only times when work has ever affected my sleep have been when I knew that the police were going to launch dawn raids on suspects’ houses. I would lie in bed thinking about the quality of the evidence I’d gathered. Had I got it all? Had I missed anything? Could I rely on the data and the systems? I worried because I knew that people were going to have the police hammering on their their front doors at 5 o’clock in the morning.

I am appalled that Post Office investigators and prosecutors could approach fraud investigations with the attitude “what can we do to get a conviction?”. They pursued the sub-postmasters aggressively, knowing the weaknesses in Horizon and the Post Office; that was disgraceful.

In the final post in this series I’ll look further at the role of internal audit, how it should be independent and its role in keeping an eye on risk. In all those respects the Post Office’s internal auditors have fallen short.

The Post Office Horizon IT scandal, part 1 – errors and accuracy

For the last few years I’ve been following the controversy surrounding the Post Office’s accounting system, Horizon. This controls the accounts of some 11,500 Post Office branches around the UK. There was a series of alleged frauds by sub-postmasters, all of whom protested their innocence. Nevertheless, the Post Office prosecuted these cases aggressively, pushing the supposed perpetrators into financial ruin, and even suicide. The sub-postmasters affected banded together to take a civil action against the Post Office, claiming that no frauds had taken place but that the discrepancies arose from system errors.

I wasn’t surprised to see that the sub-postmasters won their case in December 2019, with the judge providing some scathing criticism of the Post Office, and Fujitsu, the IT supplier, who had to pay £57.75 million to settle the case. Further, in March 2020 the Criminal Cases Review Commission decided to refer for appeal the convictions of 39 subpostmasters, based on the argument that their prosecution involved an “abuse of process”. I will return to the prosecution tactics in my next post.

Having worked as an IT auditor, including fraud investigations, and as a software tester the case intrigued me. It had many features that would have caused me great concern if I had been working at the Post Office and I’d like to discuss a few of them. The case covered a vast amount of detail. If you want to see the full 313 page judgment you can find it here [PDF, opens in new tab].

What caught my eye when I first heard about this case were the arguments about whether the problems were caused by fraud, system error, or user error. As an auditor who worked on the technical side of many fraud cases the idea that there could be any confusion between fraud and system error makes me very uncomfortable. The system design should incorporate whatever controls are necessary to ensure such confusion can’t arise.

When we audited live systems we established what must happen and what must not happen, what the system must do and what it must never do. We would ask how managers could know that the system would do the right things, and never do the wrong things. We then tested the system looking for evidence that these controls were present and effective. We would try to break the system, evading the controls we knew should be there, and trying to exploit missing or ineffective controls. If we succeeded we’d expect, at the least, the system to hold unambiguous evidence about what we had done.

As for user error, it’s inevitable that users will make mistakes and systems should be designed to allow for that. “User error” is an inadequate explanation for things going wrong. If the system doesn’t help users avoid error then that is a system failure. Mr Justice Fraser, the judge, took the same line. He expected the system “to prevent, detect, identify, report or reduce the risk” of user error. He concluded that controls had been put in place, but they had failed and that Fujitsu had “inexplicably” chosen to treat one particularly bad example of system error as being the fault of a user.

The explanation for the apparently inexplicable might lie in the legal arguments surrounding the claim by the Post Office and Fujitsu that Horizon was “robust”. The rival parties could not agree even on the definition of “robust” in this context, never mind whether the system was actually robust.

Nobody believed that “robust” meant error free. That would be absurd. No system is perfect and it was revealed that Horizon had a large, and persistent number of bugs, some serious. The sub-postmasters’ counsel and IT expert argued that “robust” must mean that it was extremely unlikely the system could produce the sort of errors that had ruined so many lives. The Post Office confused matters by adopting different definitions at different times, which was made clear when they were asked to clarify the point and they provided an IT industry definition of robustness that sat uneasily with their earlier arguments.

The Post Office approach was essentially top down. Horizon was robust because it could handle any risks that threatened its ability to perform its overall business role. They then took a huge logical leap to claim that because Horizon was robust by their definition it couldn’t be responsible for serious errors at the level of individual branch accounts.

Revealingly, the Post Office and Fujitsu named bugs using the branch where they had first occurred. Two of the most significant were the Dalmellington Bug, discovered at a branch in Ayrshire, and the Callendar Square Bug, also from a Scottish branch, in Falkirk. This naming habit linked bugs to users, not the system.

The Dalmellington Bug [PDF, opens in new tab – see see para 163+] entailed a user repeatedly hitting a key when the system froze as she was trying to record the transfer of £8,000 in cash from her main branch to a sub-branch. Unknown to her each time she struck the key she was confirming dispatch of a further £8,000 to the other office. The bug created a discrepancy of £24,000 for which she was held responsible.

Similarly, the Callendar Square Bug generated spurious, duplicate financial transactions for which the user was considered to be responsible, even though this was clearly a technical problem related to the database, the messaging software, the communications link, or some combination.

The Horizon system processed millions of transactions a day and did so with near 100% accuracy. The Post Office’s IT expert therefore tried to persuade the judge that the odds were 2 in a million that any particular error could be attributable to the system.

Unsurprisingly the judge rejected this argument. If only 0.0002% of transactions were to go wrong then a typical day’s processing of eight million transactions would lead to 16 errors. It would be innumerate to look at one of those outcomes and argue that there was a 2 in a million chance of it being a system error. That probability would make sense only if one of the eight million were chosen at random. The supposed probability is irrelevant if you have chosen a case for investigation because you know it has a problem.

It seemed strange that the Post Office persisted with its flawed perspective. I knew all too well from my own experience of IT audit and testing that different systems, in different contexts, demanded different approaches to accuracy. For financial analysis and modelling it was counter-productive to chase 100% accuracy. It would be too difficult and time consuming. The pursuit might introduce such complexity and fragility to the system that it would fail to produce anything worthwhile, certainly in the timescales required. 98% accuracy might be good enough to give valuable answers to management, quickly enough for them to exploit them. Even 95% could be good enough in some cases.

In other contexts, when dealing with financial transactions and customers’ insurance policies you really do need a far higher level of accuracy. If you don’t reach 100% you need some way of spotting and handling the exceptions. These are not theoretical edge cases. They are people’s insurance policies or claims payments. Arguing that losing a tiny fraction of 1% is acceptable, would have been appallingly irresponsible, and I can’t put enough stress on the point that as IT auditors we would have come down hard, very hard, on anyone who tried to take that line. There are some things the system should always do, and some it should never do. Systems should never lose people’s data. They should never inadvertently produce apparently fraudulent transactions that could destroy small businesses and leave the owners destitute. The amounts at stake in each individual Horizon case were trivial as far as the Post Office was concerned, immaterial in accountancy jargon. But for individual sub-postmasters they were big enough to change, and to ruin, lives.

The willingness of the Post Office and Fujitsu to absolve the system of blame and accuse users instead was such a constant theme that it produced a three letter acronym I’d never seen before; UEB, or user error bias. Naturally this arose on the claimants’ side. The Post Office never accepted its validity, but it permeated their whole approach; Horizon was robust, therefore any discrepancies must be the fault of users, whether dishonestly or accidentally, and they could proceed safely on that basis. I knew from my experience that this was a dreadful mindset with which to approach fraud investigations. I will turn to this in my next post in this series.