The Post Office Horizon IT scandal, part 1 – errors and accuracy

For the last few years I’ve been following the controversy surrounding the Post Office’s accounting system, Horizon. This controls the accounts of some 11,500 Post Office branches around the UK. There was a series of alleged frauds by sub-postmasters, all of whom protested their innocence. Nevertheless, the Post Office prosecuted these cases aggressively, pushing the supposed perpetrators into financial ruin, and even suicide. The sub-postmasters affected banded together to take a civil action against the Post Office, claiming that no frauds had taken place but that the discrepancies arose from system errors.

I wasn’t surprised to see that the sub-postmasters won their case in December 2019, with the judge providing some scathing criticism of the Post Office, and Fujitsu, the IT supplier, who had to pay £57.75 million to settle the case. Further, in March 2020 the Criminal Cases Review Commission decided to refer for appeal the convictions of 39 subpostmasters, based on the argument that their prosecution involved an “abuse of process”. I will return to the prosecution tactics in my next post.

Having worked as an IT auditor, including fraud investigations, and as a software tester the case intrigued me. It had many features that would have caused me great concern if I had been working at the Post Office and I’d like to discuss a few of them. The case covered a vast amount of detail. If you want to see the full 313 page judgment you can find it here [PDF, opens in new tab].

What caught my eye when I first heard about this case were the arguments about whether the problems were caused by fraud, system error, or user error. As an auditor who worked on the technical side of many fraud cases the idea that there could be any confusion between fraud and system error makes me very uncomfortable. The system design should incorporate whatever controls are necessary to ensure such confusion can’t arise.

When we audited live systems we established what must happen and what must not happen, what the system must do and what it must never do. We would ask how managers could know that the system would do the right things, and never do the wrong things. We then tested the system looking for evidence that these controls were present and effective. We would try to break the system, evading the controls we knew should be there, and trying to exploit missing or ineffective controls. If we succeeded we’d expect, at the least, the system to hold unambiguous evidence about what we had done.

As for user error, it’s inevitable that users will make mistakes and systems should be designed to allow for that. “User error” is an inadequate explanation for things going wrong. If the system can’t do that then it is a system failure. Mr Justice Fraser, the judge, took the same line. He expected the system “to prevent, detect, identify, report or reduce the risk” of user error. He concluded that controls had been put in place, but they had failed and that Fujitsu had “inexplicably” chosen to treat one particularly bad example of system error as being the fault of a user.

The explanation for the apparently inexplicable might lie in the legal arguments surrounding the claim by the Post Office and Fujitsu that Horizon was “robust”. The rival parties could not agree even on the definition of “robust” in this context, never mind whether the system was actually robust.

Nobody believed that “robust” meant error free. That would be absurd. No system is perfect and it was revealed that Horizon had a large, and persistent number of bugs, some serious. The sub-postmasters’ counsel and IT expert argued that “robust” must mean that it was extremely unlikely the system could produce the sort of errors that had ruined so many lives. The Post Office confused matters by adopting different definitions at different times, which was made clear when they were asked to clarify the point and they provided an IT industry definition of robustness that sat uneasily with their earlier arguments.

The Post Office approach was essentially top down. Horizon was robust because it could handle any risks that threatened its ability to perform its overall business role. They then took a huge logical leap to claim that because Horizon was robust by their definition it couldn’t be responsible for serious errors at the level of individual branch accounts.

Revealingly, the Post Office and Fujitsu named bugs using the branch where they had first occurred. Two of the most significant were the Dalmellington Bug, discovered at a branch in Ayrshire, and the Callendar Square Bug, also from a Scottish branch, in Falkirk. This naming habit linked bugs to users, not the system.

The Dalmellington Bug entailed a user repeatedly hitting a key when the system froze as she was trying to acknowledge receipt of a consignment of £8,000 in cash. Unknown to her each time she struck the key she accepted responsibility for a further £8,000. The bug created a discrepancy of £24,000 for which she was held responsible.

Similarly, the Callendar Square Bug generated spurious, duplicate financial transactions for which the user was considered to be responsible, even though this was clearly a database problem.

The Horizon system processed millions of transactions a day and did so with near 100% accuracy. The Post Office’s IT expert therefore tried to persuade the judge that the odds were 2 in a million that any particular error could be attributable to the system.

Unsurprisingly the judge rejected this argument. If only 0.0002% of transactions were to go wrong then a typical day’s processing of eight million transactions would lead to 16 errors. It would be innumerate to look at one of those outcomes and argue that there was a 2 in a million chance of it being a system error. That probability would make sense only if one of the eight million were chosen at random. The supposed probability is irrelevant if you have chosen a case for investigation because you know it has a problem.

It seemed strange that the Post Office persisted with its flawed perspective. I knew all too well from my own experience of IT audit and testing that different systems, in different contexts, demanded different approaches to accuracy. For financial analysis and modelling it was counter-productive to chase 100% accuracy. It would be too difficult and time consuming. The pursuit might introduce such complexity and fragility to the system that it would fail to produce anything worthwhile, certainly in the timescales required. 98% accuracy might be good enough to give valuable answers to management, quickly enough for them to exploit them. Even 95% could be good enough in some cases.

In other contexts, when dealing with financial transactions and customers’ insurance policies you really do need a far higher level of accuracy. If you don’t reach 100% you need some way of spotting and handling the exceptions. These are not theoretical edge cases. They are people’s insurance policies or claims payments. Arguing that losing a tiny fraction of 1% is acceptable, would have been appallingly irresponsible, and I can’t put enough stress on the point that as IT auditors we would have come down hard, very hard, on anyone who tried to take that line. There are some things the system should always do, and some it should never do. Systems should never lose people’s data. They should never inadvertently produce apparently fraudulent transactions that could destroy small businesses and leave the owners destitute. The amounts at stake in each individual Horizon case were trivial as far as the Post Office was concerned, immaterial in accountancy jargon. But for individual sub-postmasters they were big enough to change, and to ruin, lives.

The willingness of the Post Office and Fujitsu to absolve the system of blame and accuse users instead was such a constant theme that it produced a three letter acronym I’d never seen before; UEB, or user error bias. Naturally this arose on the claimants’ side. The Post Office never accepted its validity, but it permeated their whole approach; Horizon was robust, therefore any discrepancies must be the fault of users, whether dishonestly or accidentally, and they could proceed safely on that basis. I knew from my experience that this was a dreadful mindset with which to approach fraud investigations. I will turn to this in my next post in this series.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.