February 14, 2023 by James Christie

The mysterious role of external audit in the Post Office Scandal

Introduction

Over the last few years when I have written about then Post Office Scandal I have concentrated on the failings of the Post Office and Fujitsu to manage software development and the IT service responsibly, and the abject failure of the Post Office’s corporate governance. I have made only passing mention of the role of the external auditors for almost the full duration of the scandal. This has been for two linked reasons.

Firstly, the prime culprits are the Post Office and Fujitsu; they have not yet been held accountable in any meaningful sense. The legal and political establishment are also partly responsible, and there has been little public scrutiny of their failings. It is right that campaigners and writers keep the focus on them.

Secondly, casting the external auditors as secondary villains is largely because of their role, and certainly not because of exemplary conduct. Their duty was to assess the truth and fairness of the Post Office’s corporate accounts. Should they have uncovered the problems with Horizon, or at least raised sufficient concern to ensure that these problems would be investigated? I believe the answer is yes, they probably failed. However, the failure is largely a result of the flawed and dated business model for external audit – certainly at the level of large, complex corporations. Many smaller audit firms do a good job. The Big 4 audit firms, i.e. EY, PricewaterhouseCoopers (PwC), KPMG and Deloitte, do not. I had no great wish to dive into the complex swamp of the external audit industry. That time has now come!

Statistical nonsense and the purpose of Horizon

Ernst & Young (EY) are one of the Big 4. They audited the Royal Mail from 1986, throughout the development and implementation of Horizon, and through most of the cover-up. EY continued to audit the Post Office after it split off from Royal Mail in 2012. In 2018 they were finally replaced by PwC, one of their Big 4 rivals.

Even if EY did fail I doubt if any of the Big 4 would have done any better. In this piece I will touch only on those factors relevant to the Post Office Scandal.

I was prompted to address the issue of external audit by an exchange on Twitter. Tim McCormack posted a screenshot of a quote by Warwick Tatford, the prosecuting barrister in Seema Misra’s 2010 trial. He offered this argument in his closing statement [PDF, opens in new tab].

“I conceded in my opening speech to you that no computer system is infallible. There are computer glitches with any system. Of course there are. But Horizon is clearly a robust system, used at the time we are concerned with in 14,000 post offices. Mr Bayfield (a Post Office manager) talked about 14 million transactions a day. It has got to work, has it not, otherwise the whole Post Office would fall apart? So there may be glitches. There may be serious glitches. That is perfectly possible as a theoretical possibility, but as a whole the system works as has been shown in practice.”

I tweeted about this argument, calling it a confused mess. It should have been torn apart by a well briefed defence counsel.

Warwick Tatford was guilty of a version of the Prosecutor’s Fallacy, and it is appalling that it took more than 10 years for the legal establishment to realise that Seema Misra’s prosecution was unsafe.

The odds against winning the UK National Lottery jackpot are about 45 million to 1. Tatford’s logic would mean that if someone claims that the £10 million they’ve suddenly acquired came from a lottery win they are obvious lying, aren’t they? Obviously not. The probability of a particular individual being lucky might be extremely low, but the probability of a winner emerging in a huge population approaches certainty.

The prosecutor’s reasoning was like claiming 10 successive coin tosses producing 10 heads is extremely unlikely (a probability of 1 in 1,024), therefore we can dismiss the possibility that if we get 1,000 people to perform the 10 tosses that it will happen. If a participant does report a run of 10 heads that is, per se, proof that they are a dishonest witness. The odds would actually be almost 2 to 1 on (or 62.4%) that it would happen to someone in that group. Even if Horizon performed reliably at most branches that does not mean courts can dismiss the possibility that there were serious errors at some branches.

The prosecutor was arguing that if Horizon worked “as a whole”, despite bugs, then evidence from the system was sufficiently reliable to convict, even without corroboration. Crucially the prosecutor confused different purposes of Horizon;

A- a source for the corporate accounts,
B- the Post Office managing branches,
C- subpostmasters managing their own branches,
D- a source of criminal evidence.

Bugs could render Horizon hopelessly unfit for purposes C and D, even B, while it remained adequate for A. External auditors were concerned only with Horizon’s reliability for A. Maybe they suspected correctly that it was inadequate for purposes B, C, and D, but errors at branch level were no more significant than rounding errors; they were “immaterial” to external auditors.

The flawed industry model for external audit means that these auditors have an incentive not to see errors they do not have a direct responsiblity to detect and analyse. Such errors were unquestionably within the remit of Post Office Internal Audit, but they slept through the whole scandal. [I wrote about the abysmal failure of Post Office Internal Audit at some length last year in the Digital Evidence and Electronic Signature Law Review.]

An interesting response to my tweet

Twitter threads are often over-simplified and lack nuance. Peter Crowley disagreed with my observations about Horizon’s role in feeding the corporate accounts.

In explaining why I was wrong Peter Crowley offered an argument with which I largely agree, but which does not refute my point.

“No, it’s not adequate for A. A large volume of microfails is a macrofail, full stop. Just because some SPMs are wrongly overpaid, some wrongly underpaid, and the differences net off does NOT make the system adequate, unless you assess the failing and reserve for compensation.”

In the case of Horizon a large number or minor errors may well have amounted to a “macrofail”, a failure of the system to comply with its business purpose of providing data to the corporate accounts. I don’t know. I strongly suspect that Peter Crowley is correct, but I have not seen the matter addressed explicitly so I cannot be certain.

This is not a question to which I have given much attention up till now, for the reasons I gave above, and in my tweets. Even if Horizon was adequate for the Post Office’s published financial accounts that tells us nothing about its suitability for its other vital purposes – the purposes that were being assessed in court.

The prosecution barrister was almost certainly being naive, as his other comments in the trial suggest, rather than attempting to mislead the jury. It is absurd to argue that if a system is fit for one particular purpose it can safely be considered reliable for all purposes. One has to be lamentably ignorant about complex financial systems to utter such nonsense. Sadly such naive ignorance is common.

Nevertheless, Peter Crowley did home in on an important point with significant implications. I did not want to address these in my tweet thread and they deserve a far more considered respones than I can provide on Twitter.

What was the significance of low level errors in Horizon?

Horizon might, maybe, have been okay for producing the corporate accounts if the errors at individual branches did not distort the financial reports issued to investors and the public. That is possible. However, what would be the basis for such confidence if we don’t understand the full impact of the low level errors?

There was no system audit. Post Office Internal Audit was invisible. The only reason to believe in Horizon was the stream of assertions without supporting evidence from the Post Office and Fujitsu – from executives who quite clearly did not want to see problems.

External auditors’ role is to vouch for the overall truth and fairness of the accounts, not guarantee that they are 100% accurate and free of error. They calculate the materiality figure for the audit. This is the threshold beyond which any errors “could reasonably be expected to influence the economic decisions of users taken on the basis of the financial statements” (International Standard on Auditing, UK 320 [PDF, opens in new tab], “Materiality in Planning and Performing an Audit”).

If the materiality figure is £50,000 then any errors at that level and above are material. Any that are below are not material. The threshold is a mixture of the quantitative and qualitative. The auditors have to assess the numerical amount and the possible impact in different contexts.

External auditors cannot check every transaction to satisfy themselves that the accounts give a true and fair picture. They sample their way through the accounts using a chosen sample size, and a sampling interval that will give them the number of items they want. The higher the interval the fewer transactions are inspected.

The size of the sample and the sampling interval depend on the auditors’ assessment of the materiality threshold for the corporation, the risks assessed by the auditors, and the quality of control that is being exercised by management. The greater the confidence external auditors have in management of risks and the control regime the more relaxed they can be about the possibility that their sampling might miss errors. They can then justify a smaller sample that they will check. The less confidence they have the greater the sample size must be, and the more work they have to do.

Once the auditors have assessed risk management and internal controls they perform a simple arithmetical calculation that can be crudely described as the materiality threshold divided by the confidence score in the management regime. This gives the sampling interval, assuming they are sampling based on monetary units rather than physical items or accounts. For example if the interval is £50,000 then they will guarantee to hit and examine every transaction at that level and above. A transaction of £5,000 would have a 10% chance of being selected.

In addition to the overall materiality threshold they have to calculate a further threshold for particular types of transactions, or types of account. This figure, the performance materiality, reflects the possibility of smaller errors in specific contexts accumulating to an error that would exceed the corporate materiality threshold.

This takes us straight back to Peter Crowley’s point about Horizon branch errors undermining confidence that the overall system could be adequate for the corporate financial accounts; “A large volume of microfails is a macrofail, full stop”. In this case that is probably true, although I don’t think the full stop is appropriate. We don’t know, and the external auditors must take the blame for that.

Question time for EY?

There are several awkward questions the external auditors should answer at the Williams Inquiry. There are many valid explanations for why we might not know what is happening in a complex system, but a failure to ask pertinent questions is not a good reason.

Did EY ask the right questions?

So did EY themselves ask challenging, relevant questions and perform appropriate substantive tests to establish whether Horizon’s bugs, which might have been individually immaterial, combined to compromise the integrity of the data provided to the corporate accounts?

What about “performance materiality”?

Did EY assign a lower level of performance materiality to the branch accounts because of the Horizon branch errors? If not, why not? If so, what was that threshold? What was the basis for the calculation? What reason did EY have to believe that the accumulation of immaterial branch errors did not add up to material error at the corporate level?

The forensic accounting firm Second Sight which was commissioned to investigate Horizon reported in 2015 that:

“22.11 … for most of the past five years, substantial credits have been made to Post Office’s Profit and Loss Account as a result of unreconciled balances held by Post Office in its Suspense Account.

22.12. It is, in our view, probable that some of those entries should have been re-credited to branches to offset losses previously charged.”

Were EY aware of the lack of control indicated by a failure to investigate and reconcile suspense account entries? Did they consider whether this had implications for their performance materiality assessment? Poor management of suspense accounts is a classic warning sign of possible misstatement in the accounts and also of fraud.

What about the superusers?

Did the lack of control that EY reported [PDF, opens in new tab] in 2011 over superusers, which had still not been fully addressed by 2015, influence their assessment of the potential for financial misstatement arising from Horizon, and the calculation of performance materiality? If not, why not? These superusers were user accounts with powerful privileges that allowed them to amend production data.

What about the risk of unauthorised and untested changes to Horizon?

EY also reported in 2011 that the Post Office and Fujitsu did not apply normal, basic management controls over the testing and release of system changes.

“We noted that POL (Post Office Limited) are not usually involved in testing fixes or maintenance changes to the in-scope applications; we were unable to identify an internal control with the third party service provider (Fujitsu) to authorize fixes and maintenance changes prior to development for the in-scope applications. There is an increased risk that unauthorised and inappropriate changes are deployed if they are not adequately authorised, tested and approved prior to migration to the production environment.”

How could EY place trust in a system that was being managed in such an amateurish, shambolic manner?

The lack of control over superusers and dreadful management of changes to Horizon were reported in EY’s 2011 management letter setting out “internal control matters” to the Post Office board and senior management. Eleanor Shaikh made a Freedom of Information request for copies of EY’s management letter from preceding years. These would have been extremely interesting, but the Post Office refused to divulge them on the unconvincing grounds that it was a “vexatious request”. “Embarrassing request” might have been more accurate.

Reserving for compensation and the Post Office as a “going concern”

Given the crucial role of Horizon in providing the only evidence for large numbers of prosecutions, and the widespread public concern about the reliability of Horizon and the accuracy of the evidence, did EY consider whether the Royal Mail or Post Office should provide reserves for compensation (as suggested by Peter Crowley)?

Did EY assess whether the possibility of compensation might be a “going concern” issue? The evidence suggests that the external auditors either did not consider this issue at all, or chose to assume, as their successors PwC explicitly did, that even if compensation were to leave the corporation insolvent it could always draw on unlimited government support and therefore would always “be able to meet its liabilities as they fall due” (the usual phrasing to acknowledge that a company is a going concern). As the sole shareholder, for whose benefit external audit was produced, the government had a duty to pay attention to that risk.

If EY had no concerns about the possibility of compensation we are entitled to infer they had full confidence in Horizon. That only raises the question of the basis for that confidence.

The Ismay Report and the lack of any system audit

So what was the basis for EY’s confidence in Horizon? We know from the Post Office’s internal Ismay Report [PDF, opens in new tab] in 2010 that EY had not performed a system audit. This report, entitled “Response to Challenges Regarding Systems Integrity”, makes no mention of any system audit, whether performed by internal or external audit.

“Ernst & Young and Deloittes are both aware of the issue from the media and we have discussed the pros and cons of reports with them. Both would propose significant caveats and would have limits on their ability to stand in court, therefore we have not pursued this further.”

What were the caveats that EY insisted on if they were to conduct a system audit, and what were the limits they put on providing evidence in court?

That quote from the Ismay Report reveals that EY were familiar with the public concern about Horizon and discussed this with the Post Office. The whole report also shows that EY must have known there had been no credible independent system audit. A further quote from Ismay is interesting.

“The external audit that EY perform does include tests of POL’s IT and finance control environment but the audit scope and materiality mean that EY would not give a specific opinion on the systems from this.”

This makes it clear that in 2010 EY were not in any position to offer an opinion on Horizon. The Post Office therefore knew that EY’s audits did not offer an opinion at system level, and EY knew that there had been no appropriate audit at that level. The question that must be asked repeatedly is – what was the basis for everyone’s confidence in Horizon, other than wishful thinking?

An embarrassing public intervention from the accountancy profession and the trade press

Ten months before the Ismay Report was issued, in October 2009, Accountancy Age reported on the mounting concerns that Horizon had flaws which misstated branch accounts. The article quoted a senior representative of the Institute of Chartered Accountants in England and Wales, and it stated that the newspaper asked the Post Office whether it would perform an IT audit of Horizon. Richard Anning, head of the ICAEW IT faculty, said:

“You need to make sure that your accounting system is bullet proof.

Whether they have an IT audit or not, they need to understand what was happening.”

Accountancy Age reported that the Post Office declined to comment when asked if it would undertake a system audit. The magazine also attached an editorial comment to the story:

“The Post Office should consider an IT audit to show it has taken the matter seriously. Although it may be small sums of money involved, perception is everything and it could not consider going back into bank services with an accounting system that had doubts attached to it.”

Such a call from the ICAEW, and in the trade press, must have caused serious, high level discussion within the Post Office, the wider Royal Mail, and within EY. Yet 10 months later, after these public, and authoritative, calls for the Post Office to remove doubts about Horizon, the corporation issued a report saying that after discussions with EY there would be no system audit. Professional scrutiny of Horizon would suggest that the Post Office shared the widespread doubts.

“It is also important to be crystal clear about any review if one were commissioned – any investigation would need to be disclosed in court. Although we would be doing the review to comfort others, any
perception that POL doubts its own systems would mean that all criminal prosecutions would have to be stayed. It would also beg a question for the Court of Appeal over past prosecutions and imprisonments.”

Did EY press for an appropriate system audit following the 2009 call from the ICAEW and Accountancy Age? If not, why not? What was EY’s response to that lack of action?

How did EY plan and conduct subsequent external audits in the knowledge that Horizon had not been audited rigorously, that the Post Office was determined not to allow any system audit, and that the ICAEW had expressed concern? What difference did this knowledge make? If EY carried on as before how can they justify that?

What did EY think about Post Office internal control?

How confident was EY in the Royal Mail’s and Post Office’s internal control regime, and specifically Internal Audit, in the light of the feeble response to the superuser problem, the pitifully inadequate change management practices, and the lack of any credible system audit?

After working in a highly competent financial services internal audit department I know how humiliating it would be for internal auditors if their external counterparts highlighted basic control failings and then reported four years later that problems had not been resolved. This would be an obvious sign that the internal auditors were ineffective. The fact that Fujitsu managed the IT service does not let anyone at the Post Office off the hook. The systems belonged to the Post Office, who managed the contract, monitored the service, and carried ultimate responsibility.

The level of confidence that external auditors have in internal control is a crucial factor in planning audits. This issue, and the question of whether EY applied strict performance materiality thresholds to Horizon’s branch accounts, contribute to the wider concern about the effectiveness of the external audit business model.

External audit – a broken business model

The assessments of materiality, and of performance materiality have a direct impact on the profitability of an audit. The more confidence that the external auditors have in risk management and internal control the less substantive testing they have to do.

If external auditors pretend that there is a tight and effective regime of risk management and internal control, and that financial systems possess processing integrity, then they can justify setting higher thresholds for materiality and performance materiality and correspondingly higher sampling intervals. Put crudely, the external auditors’ rating of internal control rating has a very strong influence on the amount of work the audit team must perform; better ratings mean less work and more profit.

If auditors judge that the corporation under audit is a badly managed mess, and they are working to a fixed audit fee then they have a problem. They can perform a rigorous audit and make a loss, or they can pretend that everything is tight and reliable, and make a profit. I saw the dilemma playing out in practice with the control rating being manipulated to fit the audit fee. It left me disillusioned with the external audit industry (as I wrote here).

The dilemma presents a massive conflict of interest to the auditors. It undermines their independence, and compromises the business model for external audit.

The auditors have a financial incentive to ignore problems. Executive management and the board of directors of the company under audit have a story they want to tell investors, suppliers, and external stakeholders. The commercial pressure on the auditors is to reassure the world that this story is true and fair. If they cause trouble they will lose the audit contract to a less scrupulous rival.

This conflict of interest is exacerbated by the revolving door of senior staff moving between the Big 4 and clients. Rod Ismay, who wrote the notorious report bearing his name, joined the Post Office direct from Ernst & Young where he had been a senior audit manager. It is a cosy and lucrative arrangement, but it hardly inspires confidence that external auditors will challenge irresponsible clients.

Conclusion – to be provided by the Williams Inquiry?

I am glad that Peter Crowley prompted me into setting out my thoughts about external auditors’ possible, or likely, failings over Horizon. I stand by my statement that Horizon could have been perfectly adequate as a source for the external accounts and yet completely inadequate for the purposes of the subpostmasters. However, it is worth taking a close look at whether Horizon really was adequate for the high level accounts. That turns attention to the conduct of some extremely richly paid professionals who assured the world over many years that the Post Office’s accounts based on Horizon were sound. If they had shown any inclination to ask unwelcome questions this dreadful scandal might have been brought to a conclusion years earlier.

EY will have to come up with some compelling and persuasive answers to the Williams Inquiry to remove the suspicion that they chose not ask the right questions, that they chose not to challenge the Royal Mail and Post Office. Everything I have learned about external audit and its relations with internal audit make me suspicious about EY’s conduct. I will be very surprised if evidence appears that will make me change my mind.

I hope that the Williams Inquiry will prompt some serious debate in the media and amongst politicians about whether the current external audit model works. I doubt if any other member of audit’s Big 4 would have performed any better than EY, and that is a scandal in its own right.

If one of the world’s leading audit firms failed to spot that the Post Office was being steered into insolvency over three decades then we are entitled to ask – what is the point of external auditors apart from providing rich rewards to the partners who own them?

June 3, 2022 by James Christie

The myth of perfect software – IT audit and governance aspects of the Post Office scandal

This is a 21 minute presentation I prepared for the Kent Centre for European and Comparative Law, at the University of Kent. It was for an event devoted to the Post Office scandal on 21st May 2022.

People often ask. “How could the Post Office scandal go on for so long?” “Why did nobody realise what was going wrong?” “Why did nobody speak out?”

Part of the problem was a willful naivety about the fallibility of complex software systems. Too many people in important positions at the Post Office were ignorant about the nature of software and apparently extremely reluctant to learn. They wanted to retain their illusions. The people who should have educated them did not do their job. There is no such thing as perfect software unless we are talking about utterly trivial applications. The question, the challenge, is how we respond so that people do not get hurt.

The Post Office could hardly have failed this challenge more appallingly or more disgracefully. This was a scandal of IT management and corporate governance for which responsibility reaches to the highest level. The role of IT audit was crucial in this failure, as I explain in this talk.

The talk was based on a lengthy article “The Post Office IT scandal – why IT audit is essential for effective corporate governance” that I wrote for the Digital Evidence and Electronic Signature Law Review and which was published in March 2022.

August 4, 2021 by James Christie

“Privileged accesses” – an insight into incompetence at Fujitsu and the Post Office

Recently I have been thinking and writing about corporate governance failings at the Post Office during the two decades of the Post Office Horizon scandal. Having worked in software development, testing and IT audit I have experience that is relevant to several aspects of the scandal. I have a further slice of experience I have not yet commented on publicly. That is largely because I should not talk about experiences with clients when I worked for IBM. However, I have decided to break that rule, and I feel justified for two reasons. Firstly, I think it offers a useful insight into failings at the Post Office and Fujitsu. Secondly, my clients all set, and met, a far higher standard than we have seen in the long-running Horizon scandal. Nothing I write will embarrass them or IBM, quite the opposite.

I keep going back to the management letter, [PDF, opens in new tab] issued by Ernst & Young (E&Y), the Post Office’s external auditors, following the 2011 audit. The letter was commented on in the Horizon Issues court case, Bates v Post Office Ltd (No 6: Horizon Issues), [PDF, opens in new tab].

To normal people this 43 page letter is incomprehensible and boring. It lists a series of major and minor problems with Fujitsu’s management of the IT service it provided to the Post Office. Only people who have worked in this field will feel comfortable interpreting the letter and its significance.

The letter draws attention to problems that E&Y came across in the course of their audit. As the introduction says.

“Our review of the company’s systems of internal control is carried out to help us express an opinion on the accounts of the company as a whole. This work is not primarily directed towards the discovery of weaknesses, the detection of fraud or other irregularities (other than those which would influence us in forming that opinion) and should not, therefore, be relied upon to show that no other weaknesses exist or areas require attention. Accordingly, the comments in this letter refer only to those matters that have come to our attention during the course of our normal audit work and do not attempt to indicate all possible improvements that a special review might develop.

E&Y did not conduct a full technical audit. They were concerned with assessing whether the financial accounts offered a true and fair view of the financial position of the company. Their assessment of internal control was only sufficiently detailed to allow them to form an opinion on the company accounts.

It is, or it should be, monumentally embarrassing for the internal auditors if the external auditors find long-standing control problems. The internal auditors should have the staff, expertise and time to detect these problems and ensure that they are resolved long before external auditors spot them. The external auditors are around for only a few weeks or months, and it is not their primary responsibility to find problems like this. I wrote about this from the perspective of an IT auditor last year (see section “Superusers going ‘off piste'”).

The specific issue in the management letter that rightly attracted most attention in the Horizon Issues’ case was the poor control over user IDs with high privilege levels. Not only did this highlight the need to improve Fujitu’s management of the IT service and the oversight provided by the Post Office, it also pointed to an ineffective internal audit function at the Post Office, and previously the Royal Mail before the Post Office was hived off.

When I was reading through the E&Y management letter I was struck by how familiar the problems were. When I worked for IBM I spent three years as an information security manager. My background had been in software development, testing and IT audit. The contract on which I was working was winding down and one day my phone rang and I was made an interesting offer. Service Delivery Security wanted another information security manager to work with new outsourced accounts. My background showed I had a grasp of security issues, the ability to run projects, and a track record of working with clients without triggering unseemly brawls or litigation. So I was a plausible candidate. I would rely on the deeply technical experts and make sure that IBM and the client got what they wanted.

The job entailed working with the client right at the start of the outsourcing deal, for a few months either side of the cutover. An important responsibility was reaching agreement with the client about the detail of what IBM would provide.

All the issues relating to privileged access raised by E&Y in their management letter were within my remit. The others, mainly change management, were dealt with by the relevant experts. Each outsourcing contract required us to reach agreement on the full detail of the service by a set date, typically within a few months of the service cutover. In one case we had to reach agreement before service even started. On the service cutover date all staff transferring to IBM were required to continue working to exactly the same processes and standards until they were told to do something new.

I had to set up a series of meetings and workshops with the client and work through the detail of the security service. We would agree all the tedious but vital details; password lengths and formats, the processes required for authorising and reviewing new accounts and access privileges, logging and review of accesses, security incident response actions. It went on and on.

For each item we would document the IBM recommended action or setting. Alongside that we had to record what the client was currently doing. Finally we would agree the client’s requirement for the future service. If the future requirement entailed work by IBM to improve on what the client was currently doing that would entail a charge. If the client wanted something lower than the IBM recommendation then it was important that we had evidence that IBM was required to do something we would usually regard as unsatisfactory. This happened only rarely, and with good reason. The typical reason was that the client’s business meant the risk did not justify the tighter, and more expensive, control.

We also had to ensure that all the mainframe systems and servers were inventoried, and the settings documented. That was a huge job, but I farmed that out to the unenthusiastic platform experts. For all these platforms and settings we also had to agree how they should be configured in future.

The next step, and my final involvement, would be to set up a project plan to make all the changes required to bring the service up to the standard that the client needed. A new project manager would come in to run that boring project.

After three clients I felt I had learned a lot but staying in the job was going to mean endless repetition of very similar assignments. I also had some disagreements about IBM’s approach to outsourcing security services that meant I was unlikely to get promoted. I was doing a very good job at my current level and it was clearly recognised that I would only cause trouble if I were given more power! It’s true. I would have done. So I secured a move back to test management.

I enjoyed those three years because it gave me the chance to work with some very interesting clients. These were big, blue chip names; AstraZeneca, Boots (the UK retailer), and Nokia (when they were utterly dominant in the mobile phone market). I don’t have any qualms about naming these clients because they were all very thorough, professional and responsible.

The contrast with the Post Office and Fujitsu is striking. Fujitsu won the Post Office outsourcing contract [PDF, opens in new tab] in 1996 for an initial eight years. Yet, 15 years later, by which time the contract had been extended twice, E&Y reported that Fujitsu had not set up the control regime IBM demanded we create, with client agreement, at the very start of an outsourcing contract. The problems had still not been fully resolved by 2015.

Getting these basics correct is vital if corporations want to show that they are in control of their systems. If users have high privilege levels without effective authorisation, logging and monitoring then the corporation cannot have confidence in its data, which can be changed without permission and without a record of who took what action. Nobody can have confidence in the integrity of the systems. That has clear implications for the Horizon scandal. The Post Office insisted that Horizon was reliable when the reality was that Fujitsu did not apply the controls to justify that confidence.

Fujitsu may have failed to manage the service properly, but the Post Office is equally culpable. Outsourcing an IT service is not a matter of handing over responsibility then forgetting about it. The service has to be specified precisely then monitored carefully and constantly.

Why were the two corporations so incompetent and so negligent for so long? Why were the Post Office and Fujitsu so much less responsible and careful than IBM, AstraZeneca, Boots and Nokia?

Why did the Royal Mail’s and subsequently the Post Office’s internal auditors not detect problems with the outsourced service and force through an effective response?

When I became an information security manager I was told a major reason we had to tie the service down tightly was in case we ended up in court. We had to be able to demonstrate that we were in control of the systems, that we could prove the integrity of the data and the processing. So why did Fujitsu and the Post Office choose not to act as responsibly?

I was working in a well-trodden field. None of the issues we were dealing with were remotely new. The appropriate responses were very familiar. They were the mundane basics that every company using IT has to get right. Lay observers might think that the outsourcing arrangement was responsible for the failure of management control by distancing user management from the service providers. That would be wrong. The slackness seen at Fujitsu is more likely to occur in an in-house operation that has grown and evolved gradually. An outsourcing agreement should mean that everything is tied down precisely, and that was my experience.

I have worked as an IT auditor, and I have been an information security manager on big outsourcing contracts. I know how these jobs should be done and it amazes me to see that one of our major rivals was able to get away with such shoddy practices at the very time I was in the outsourcing game. Fujitsu still has the Post Office contract. That is astonishing.

June 3, 2020 by James Christie

Business logic security testing (2009)

This article appeared in the June 2009 edition of Testing Experience magazine and the October 2009 edition of Security Acts magazine.

If you choose to read the article please bear in mind that it was written in January 2009 and is therefore inevitably dated in some respects. In particular, ISACA has restructured COBIT, but it remains a useful source. Overall I think the arguments I made in this article are still valid.

The references in the article were all structured for a paper magazine. They were not set up as hyperlinks and I have not tried to recreate them and check out whether they still work.

The article

When I started in IT in the 80s the company for which I worked had a closed network restricted to about 100 company locations with no external connections.

Security was divided neatly into physical security, concerned with the protection of the physical assets, and logical security, concerned with the protection of data and applications from abuse or loss.

When applications were built the focus of security was on internal application security. The arrangements for physical security were a given, and didn’t affect individual applications.

There were no outsiders to worry about who might gain access, and so long as the common access controls software was working there was no need for analysts or designers to worry about unauthorized internal access.

Security for the developers was therefore a matter of ensuring that the application reflected the rules of the business; rules such as segregation of responsibilities, appropriate authorization levels, dual authorization of high value payments, reconciliation of financial data.

The world quickly changed and relatively simple, private networks isolated from the rest of the world gave way to more open networks with multiple external connections and to web applications.

Security consequently acquired much greater focus. However, it began to seem increasingly detached from the work of developers. Security management and testing became specialisms in their own right, and not just an aspect of technical management and support.

We developers and testers continued to build our applications, comforted by the thought that the technical security experts were ensuring that the network perimeter was secure.photo of business logic security article header

Nominally security testing was a part of non-functional testing. In reality, it had become somewhat detached from conventional testing.

According to the glossary of the British Computer Society’s Special Interest Group in Software Testing (BCS SIGIST) [1], security testing is determining whether the application meets the specified security requirements.

SIGIST also says that security entails the preservation of confidentiality, integrity and availability of information. Availability means ensuring that authorized users have access to information and associated assets when required. Integrity means safeguarding the accuracy and completeness of information and processing methods. Confidentiality means ensuring that information is accessible only to those authorized to have access.

Penetration testing, and testing the security of the network and infrastructure, are all obviously important, but if you look at security in the round, bearing in mind wider definitions of security (such as SIGIST’s), then these activities can’t be the whole of security testing.

Some security testing has to consist of routine functional testing that is purely a matter of how the internals of the application work. Security testing that is considered and managed as an exercise external to the development, an exercise that follows the main testing, is necessarily limited. It cannot detect defects that are within the application rather than on the boundary.

Within the application, insecure design features or insecure coding might be detected without any deep understanding of the application’s business role. However, like any class of requirements, security requirements will vary from one application to another, depending on the job the application has to do.

If there are control failures that reflect poorly applied or misunderstood business logic, or business rules, then will we as functional testers detect that? Testers test at the boundaries. Usually we think in terms of boundary values for the data, the boundary of the application or the network boundary with the outside world.

Do we pay enough attention to the boundary of what is permissible user behavior? Do we worry enough about abuse by authorized users, employees or outsiders who have passed legitimately through the network and attempt to subvert the application, using it in ways never envisaged by the developers?

I suspect that we do not, and this must be a matter for concern. A Gartner report of 2005 [2] claimed that 75% of attacks are at the application level, not the network level. The types of threats listed in the report all arise from technical vulnerabilities, such as command injection and buffer overflows.

Such application layer vulnerabilities are obviously serious, and must be addressed. However, I suspect too much attention has been given to them at the expense of vulnerabilities arising from failure to implement business logic correctly.

This is my main concern in this article. Such failures can offer great scope for abuse and fraud. Security testing has to be about both the technology and the business.

Problem of fraud and insider abuse

It is difficult to come up with reliable figures about fraud because of its very nature. According to PriceWaterhouseCoopers in 2007 [3] the average loss to fraud by companies worldwide over the two years from 2005 was $2.4 million (their survey being biased towards larger companies). This is based on reported fraud, and PWC increased the figure to $3.2 million to allow for unreported frauds.

In addition to the direct costs there were average indirect costs in the form of management time of $550,000 and substantial unquantifiable costs in terms of damage to the brand, staff morale, reduced share prices and problems with regulators.

PWC stated that 76% of their respondents reported the involvement of an outside party, implying that 24% were purely internal. However, when companies were asked for details on one or two frauds, half of the perpetrators were internal and half external.

It would be interesting to know the relative proportions of frauds (by number and value) which exploited internal applications and customer facing web applications but I have not seen any statistics for these.

The U.S. Secret Service and CERT Coordination Center have produced an interesting series of reports on “illicit cyber activity”. In their 2004 report on crimes in the US banking and finance sector [4] they reported that in 70% of the cases the insiders had exploited weaknesses in applications, processes or procedures (such as authorized overrides). 78% of the time the perpetrators were authorized users with active accounts, and in 43% of cases they were using their own account and password.

The enduring problem with fraud statistics is that many frauds are not reported, and many more are not even detected. A successful fraud may run for many years without being detected, and may never be detected. A shrewd fraudster will not steal enough money in one go to draw attention to the loss.

I worked on the investigation of an internal fraud at a UK insurance company that had lasted 8 years, as far back as we were able to analyze the data and produce evidence for the police. The perpetrator had raised 555 fraudulent payments, all for less than £5,000 and had stolen £1.1 million pounds by the time that we received an anonymous tip off.

The control weaknesses related to an abuse of the authorization process, and a failure of the application to deal appropriately with third party claims payments, which were extremely vulnerable to fraud. These weaknesses would have been present in the original manual process, but the users and developers had not taken the opportunities that a new computer application had offered to introduce more sophisticated controls.

No-one had been negligent or even careless in the design of the application and the surrounding procedures. The trouble was that the requirements had focused on the positive functions of the application, and on replicating the functionality of the previous application, which in turn had been based on the original manual process. There had not been sufficient analysis of how the application could be exploited.

Problem of requirements and negative requirements

Earlier I was careful to talk about failure to implement business logic correctly, rather than implementing requirements. Business logic and requirements will not necessarily be the same.

The requirements are usually written as “the application must do” rather than “the application must not…”. Sometimes the “must not” is obvious to the business. It “goes without saying” – that dangerous phrase!

However, the developers often lack the deep understanding of business logic that users have, and they design and code only the “must do”, not even being aware of the implicit corollary, the “must not”.

As a computer auditor I reviewed a sales application which had a control to ensure that debts couldn’t be written off without review by a manager. At the end of each day a report was run to highlight debts that had been cleared without a payment being received. Any discrepancies were highlighted for management action.

I noticed that it was possible to overwrite the default of today’s date when clearing a debt. Inserting a date in the past meant that the money I’d written off wouldn’t appear on any control report. The report for that date had been run already.

When I mentioned this to the users and the teams who built and tested the application the initial reaction was “but you’re not supposed to do that”, and then they all tried blaming each other. There was a prolonged discussion about the nature of requirements.

The developers were adamant that they’d done nothing wrong because they’d built the application exactly as specified, and the users were responsible for the requirements.

The testers said they’d tested according to the requirements, and it wasn’t their fault.

The users were infuriated at the suggestion that they should have to specify every last little thing that should be obvious – obvious to them anyway.

The reason I was looking at the application, and looking for that particular problem, was because we knew that a close commercial rival had suffered a large fraud when a customer we had in common had bribed an employee of our rival to manipulate the sales control application. As it happened there was no evidence that the same had happened to us, but clearly we were vulnerable.

Testers should be aware of missing or unspoken requirements, implicit assumptions that have to be challenged and tested. Such assumptions and requirements are a particular problem with security requirements, which is why the simple SIGIST definition of security testing I gave above isn’t sufficient – security testing cannot be only about testing the formal security requirements.

However, testers, like developers, are working to tight schedules and budgets. We’re always up against the clock. Often there is barely enough time to carry out all the positive testing that is required, never mind thinking through all the negative testing that would be required to prove that missing or unspoken negative requirements have been met.

Fraudsters, on the other hand, have almost unlimited time to get to know the application and see where the weaknesses are. Dishonest users also have the motivation to work out the weaknesses. Even people who are usually honest can be tempted when they realize that there is scope for fraud.

If we don’t have enough time to do adequate negative testing to see what weaknesses could be exploited than at least we should be doing a quick informal evaluation of the financial sensitivity of the application and alerting management, and the internal computer auditors, that there is an element of unquantifiable risk. How comfortable are they with that?

If we can persuade project managers and users that we need enough time to test properly, then what can we do?

CobiT and OWASP

If there is time, there are various techniques that testers can adopt to try and detect potential weaknesses or which we can encourage the developers and users to follow to prevent such weaknesses.

I’d like to concentrate on the CobiT (Control Objectives for Information and related Technology) guidelines for developing and testing secure applications (CobiT 4.1 2007 [5]), and the CobiT IT Assurance Guide [6], and the OWASP (Open Web Application Security Project) Testing Guide [7].

Together, CobiT and OWASP cover the whole range of security testing. They can be used together, CobiT being more concerned with what applications do, and OWASP with how applications work.

They both give useful advice about the internal application controls and functionality that developers and users can follow. They can also be used to provide testers with guidance about test conditions. If the developers and users know that the testers will be consulting these guides then they have an incentive to ensure that the requirements and build reflect this advice.

CobiT implicitly assumes a traditional, big up-front design, Waterfall approach. Nevertheless, it’s still potentially useful for Agile practitioners, and it is possible to map from CobiT to Agile techniques, see Gupta [8].

The two most relevant parts are in the CobiT IT Assurance Guide [6]. This is organized into domains, the most directly relevant being “Acquire and Implement” the solution. This is really for auditors, guiding them through a traditional development, explaining the controls and checks they should be looking for at each stage.

It’s interesting as a source of ideas, and as an alternative way of looking at the development, but unless your organization has mandated the developers to follow CobiT there’s no point trying to graft this onto your project.

Of much greater interest are the six CobiT application controls. Whereas the domains are functionally separate and sequential activities, a life-cycle in effect, the application controls are statements of intent that apply to the business area and the application itself. They can be used at any stage of the development. They are;

AC1 Source Data Preparation and Authorization

AC2 Source Data Collection and Entry

AC3 Accuracy, Completeness and Authenticity Checks

AC4 Processing Integrity and Validity

AC5 Output Review, Reconciliation and Error Handling

AC6 Transaction Authentication and Integrity

Each of these controls has stated objectives, and tests that can be made against the requirements, the proposed design and then on the built application. Clearly these are generic statements potentially applicable to any application, but they can serve as a valuable prompt to testers who are willing to adapt them to their own application. They are also a useful introduction for testers to the wider field of business controls.

CobiT rather skates over the question of how the business requirements are defined, but these application controls can serve as a useful basis for validating the requirements.

Unfortunately the CobiT IT Assurance Guide can be downloaded for free only by members of ISACA (Information Systems Audit and Control Association) and costs $165 for non-members to buy. Try your friendly neighborhood Internal Audit department! If they don’t have a copy, well maybe they should.

If you are looking for a more constructive and proactive approach to the requirements then I recommend the Open Web Application Security Project (OWASP) Testing Guide [7]. This is an excellent, accessible document covering the whole range of application security, both technical vulnerabilities and business logic flaws.

It offers good, practical guidance to testers. It also offers a testing framework that is basic, and all the better for that, being simple and practical.

The OWASP testing framework demands early involvement of the testers, and runs from before the start of the project to reviews and testing of live applications.

Phase 1: Before Deployment begins

1A: Review policies and standards

1B: Develop measurement and metrics criteria (ensure traceability)

Phase 2: During definition and design

2A: Review security requirements

2B: Review design and architecture

2C: Create and review UML models

2D: Create and review threat models

Phase 3: During development

3A: Code walkthroughs

3B: Code reviews

Phase 4: During development

4A: Application penetration testing

4B: Configuration management testing

Phase 5: Maintenance and operations

5A: Conduct operational management reviews

5B: Conduct periodic health checks

5C: Ensure change verification

OWASP suggests four test techniques for security testing; manual inspections and reviews, code reviews, threat modeling and penetration testing. The manual inspections are reviews of design, processes, policies, documentation and even interviewing people; everything except the source code, which is covered by the code reviews.

A feature of OWASP I find particularly interesting is its fairly explicit admission that the security requirements may be missing or inadequate. This is unquestionably a realistic approach, but usually testing models blithely assume that the requirements need tweaking at most.

The response of OWASP is to carry out what looks rather like reverse engineering of the design into the requirements. After the design has been completed testers should perform UML modeling to derive use cases that “describe how the application works.

In some cases, these may already be available”. Obviously in many cases these will not be available, but the clear implication is that even if they are available they are unlikely to offer enough information to carry out threat modeling.
The feature most likely to be missing is the misuse case. These are the dark side of use cases! As envisaged by OWASP the misuse cases shadow the use cases, threatening them, then being mitigated by subsequent use cases.

The OWASP framework is not designed to be a checklist, to be followed blindly. The important point about using UML is that it permits the tester to decompose and understand the proposed application to the level of detail required for threat modeling, but also with the perspective that threat modeling requires; i.e. what can go wrong? what must we prevent? what could the bad guys get up to?

UML is simply a means to that end, and was probably chosen largely because that is what most developers are likely to be familiar with, and therefore UML diagrams are more likely to be available than other forms of documentation. There was certainly some debate in the OWASP community about what the best means of decomposition might be.

Personally, I have found IDEF0 a valuable means of decomposing applications while working as a computer auditor. Full details of this technique can be found at http://www.idef.com [9].

It entails decomposing an application using a hierarchical series of diagrams, each of which has between three and six functions. Each function has inputs, which are transformed into outputs, depending on controls and mechanisms.
Is IDEF0 as rigorous and effective as UML? No, I wouldn’t argue that. When using IDEF0 we did not define the application in anything like the detail that UML would entail. Its value was in allowing us to develop a quick understanding of the crucial functions and issues, and then ask pertinent questions.

Given that certain inputs must be transformed into certain outputs, what are the controls and mechanisms required to ensure that the right outputs are produced?

In working out what the controls were, or ought to be, we’d run through the mantra that the output had to be accurate, complete, authorized, and timely. “Accurate” and “complete” are obvious. “Authorized” meant that the output must have been created or approved by people with the appropriate level of authority. “Timely” meant that the output must not only arrive in the right place, but at the right time. One could also use the six CobiT application controls as prompts.

In the example I gave above of the debt being written off I had worked down to the level of detail of “write off a debt” and looked at the controls required to produce the right output, “cancelled debts”. I focused on “authorized”, “complete” and “timely”.

Any sales operator could cancel a debt, but that raised the item for management review. That was fine. The problem was with “complete” and “timely”. All write-offs had to be collected for the control report, which was run daily. Was it possible to ensure some write-offs would not appear? Was it possible to over-key the default of the current date? It was possible. If I did so, would the write-off appear on another report? No. The control failure therefore meant that the control report could be easily bypassed.

The testing that I was carrying out had nothing to do with the original requirements. They were of interest, but not really relevant to what I was trying to do. I was trying to think like a dishonest employee, looking for a weakness I could exploit.

The decomposition of the application is the essential first step of threat modeling. Following that, one should analyze the assets for importance, explore possible vulnerabilities and threats, and create mitigation strategies.

I don’t want to discuss these in depth. There is plenty of material about threat modeling available. OWASP offers good guidance, [10] and [11]. Microsoft provides some useful advice [12], but its focus is on technical security, whereas OWASP looks at the business logic too. The OWASP testing guide [7] has a section devoted to business logic that serves as a useful introduction.

OWASP’s inclusion of mitigation strategies in the version of threat modeling that it advocates for testers is interesting. This is not normally a tester’s responsibility. However, considering such strategies is a useful way of planning the testing. What controls or protections should we be testing for? I think it also implicitly acknowledges that the requirements and design may well be flawed, and that threat modeling might not have been carried out in circumstances where it really should have been.

This perception is reinforced by OWASP’s advice that testers should ensure that threat models are created as early as possible in the project, and should then be revisited as the application evolves.

What I think is particularly valuable about the application control advice in CobIT and OWASP is that they help us to focus on security as an attribute that can, and must, be built into applications. Security testing then becomes a normal part of functional testing, as well as a specialist technical exercise. Testers must not regard security as an audit concern, with the testing being carried out by quasi-auditors, external to the development.

Getting the auditors on our side

I’ve had a fairly unusual career in that I’ve spent several years in each of software development, IT audit, IT security management, project management and test management. I think that gives me a good understanding of each of these roles, and a sympathetic understanding of the problems and pressures associated with them. It’s also taught me how they can work together constructively.

In most cases this is obvious, but the odd one out is the IT auditor. They have the reputation of being the hard-nosed suits from head office who come in to bayonet the wounded after a disaster! If that is what they do then they are being unprofessional and irresponsible. Good auditors should be pro-active and constructive. They will be happy to work with developers, users and testers to help them anticipate and prevent problems.

Auditors will not do your job for you, and they will rarely be able to give you all the answers. They usually have to spread themselves thinly across an organization, inevitably concentrating on the areas with problems and which pose the greatest risk.

They should not be dictating the controls, but good auditors can provide useful advice. They can act as a valuable sounding board, for bouncing ideas off. They can also be used as reinforcements if the testers are coming under irresponsible pressure to restrict the scope of security testing. Good auditors should be the friend of testers, not our enemy. At least you may be able to get access to some useful, but expensive, CobiT material.

Auditors can give you a different perspective and help you ask the right questions, and being able to ask the right questions is much more important than any particular tool or method for testers.

This article tells you something about CobiT and OWASP, and about possible new techniques for approaching testing of security. However, I think the most important lesson is that security testing cannot be a completely separate specialism, and that security testing must also include the exploration of the application’s functionality in a skeptical and inquisitive manner, asking the right questions.

Validating the security requirements is important, but so is exposing the unspoken requirements and disproving the invalid assumptions. It is about letting management see what the true state of the application is – just like the rest of testing.

References

[1] British Computer Society’s Special Interest Group in Software Testing (BCS SIGIST) Glossary.

[2] Gartner Inc. “Now Is the Time for Security at the Application Level” (NB PDF download), 2005.

[3] PriceWaterhouseCoopers. “Economic crime- people, culture and controls. The 4th biennial Global Economic Crime Survey”.

[4] US Secret Service. “Insider Threat Study: Illicit Cyber Activity in the Banking and Finance Sector”.

[5] IT Governance Institute. CobiT 4.1, 2007.

[6] IT Governance Institute. CobiT IT Assurance Guide (not free), 2007.

[7] Open Web Application Security Project. OWASP Testing Guide, V3.0, 2008.

[8] Gupta, S. “SOX Compliant Agile Processes”, Agile Alliance Conference, Agile 2008.

[9] IDEF0 Function Modeling Method.

[10] Open Web Application Security Project. OWASP Threat Modeling, 2007.

[11] Open Web Application Security Project. OWASP Code Review Guide “Application Threat Modeling”, 2009.

[12] Microsoft. “Improving Web Application Security: Threats and Countermeasures”, 2003.

July 8, 2019 by James Christie

An abdication of managerial responsibility?

The two recent Boeing 737 MAX crashes have been grimly absorbing for software developers and testers. It seems that the crashes were caused by the MCAS system, which should prevent a stall, responding to false data from a sensor by forcing the planes into steep dives despite the attempts of the pilots to make the planes climb. The MCAS problem may have been a necessary condition for disaster, but it clearly was not sufficient. There were many other factors involved. Most strikingly, it seems that MCAS itself may have been working as specified but there were problems in the original design and the way it interfaces with the sensor and crew.

I have no wish to go into all this in serious detail (yet), but I read an article on the Bloomberg website, “Boeing’s 737 Max software outsourced to $9-an-hour engineers” which contained many sentences and phrases that jumped off the screen at me. These snippets all point towards issues that concern me, that I’ve been talking and writing about recently, or that I’ve been long aware of. I’d like to run through them. I’ll use a brief quote from the Bloomberg article in each section before discussing the implications. All software designers and testers should reflect on these issues.

The commoditization of software development and testing

“Boeing has also expanded a design center in Moscow. At a meeting with a chief 787 engineer in 2008, one staffer complained about sending drawings back to a team in Russia 18 times before they understood that the smoke detectors needed to be connected to the electrical system, said Cynthia Cole, a former Boeing engineer who headed the engineers’ union from 2006 to 2010.

‘Engineering started becoming a commodity’, said Vance Hilderman, who co-founded a company called TekSci that supplied aerospace contract engineers and began losing work to overseas competitors in the early 2000s.”

The threat of testing becoming a commodity has been a long standing concern amongst testers. To a large extent we’re already there. However, I’d assumed, naively perhaps, that this was a route chosen by organisations that could get away with poor testing, in the short term at least. I was deeply concerned to see it happening in a safety critical industry.

To summarise the problem, if software development and testing are seen as commodities, bought and sold on the basis of price, then commercial pressures will push quality downwards. The inevitable pressure sends cost and prices spiralling down to the level set by the lowest cost supplier, regardless of value. Testing is particularly vulnerable. When the value of the testing is low then whatever cost does remain becomes more visible and harder to justify.

There is pressure to keep reducing costs, and if you’re getting little value from testing just about any cost-cutting measure is going to look attractive. If you head down the route of outsourcing, offshoring and increasing commoditization, losing sight of value, you will lock yourself into a vicious circle of poor quality.

Iain McCowatt’s EuroSTAR webinar on “The commoditization of testing” is worth watching.

ETTO – the efficiency-thoroughness trade-off

…the planemakers say global design teams add efficiency as they work around the clock.

Ah! There we have it! Efficiency. Isn’t that a good thing? Of course it is. But there is an inescapable trade-off, and organisations must understand what they are doing. There is a tension between the need to deliver a safe, reliable product or service, and the pressure to do so at the lowest cost possible. The idea of ETTO, the efficiency-thoroughness trade-off was was popularised by Erik Hollnagel.

Making the organisation more efficient means it is less likely to achieve its important goals. Pursuing vital goals, such as safety, comes at the expense of efficiency, which eliminates margins of error and engineering redundancy, with potentially dangerous results. This is well recognised in safety critical industries, obviously including air transport. I’ve discussed this further in my blog, “The dragons of the unknown; part 6 – Safety II, a new way of looking at safety”.

Drift into failure

“’Boeing was doing all kinds of things, everything you can imagine, to reduce cost, including moving work from Puget Sound, because we’d become very expensive here,’ said Rick Ludtke, a former Boeing flight controls engineer laid off in 2017. ‘All that’s very understandable if you think of it from a business perspective. Slowly over time it appears that’s eroded the ability for Puget Sound designers to design.’”

“Slowly over time”. That’s the crucial phrase. Organisations drift gradually into failure. People are working under pressure, constantly making the trade off between efficiency and thoroughness. They keep the show on the road, but the pressure never eases. So margins are increasingly shaved. The organisation finds new and apparently smarter ways of working. Redundancy is eliminated. The workers adapt the official processes. The organisation seems efficient, profitable and safe. Then BANG! Suddenly it isn’t. The factors that had made it successful turn out to be responsible for disaster.

“Drifting into failure” is an important concept to understand for anyone working with complex systems that people will have to use, and for anyone trying to make sense of how big organisations should work, and really do work. See my blog “The dragons of the unknown; part 4 – a brief history of accident models” for a quick introduction to the drift into failure. The idea was developed by Sidney Dekker. Check out his work.

Conway’s Law

“But outsourcing has long been a sore point for some Boeing engineers, who, in addition to fearing job losses say it has led to communications issues and mistakes.

This takes me to one of my favourites, Conway’s Law. In essence it states that the design of systems corresponds to the design of the organisation. It’s not a normative rule, saying that this should (or shouldn’t) happen. It merely says that it generally does happen. Traditionally the organisation’s design shaped the technology. Nowadays the causation might be reversed, with the technology shaping the organisation. Conway’s Law was intended as a sound heuristic, never a hard and fast rule.

a slide from one of my courses

Perhaps it is less generally applicable today, but for large, long established corporations I think it still generally holds true.

I’m going to let you in on a little trade secret of IT auditors. Conway’s Law was a huge influence on the way we audited systems and development projects.

another slide from one of my courses

Audits were always strictly time boxed. We had to be selective in how we used our time and what we looked at. Modern internal auditing is risk based, meaning we would focus on the risks that posed the greatest threat to the organisation, concentrating on the areas most exposed to risk and looking for assurance that the risks were being managed effectively.

Conway’s Law guided the auditors towards low hanging fruit. We knew that we were most likely to find problems at the interfaces, and these were likely to be particularly serious. This was also my experience as a test manager. In both jobs I saw the same pattern unfold when different development teams, or different companies worked on different parts of a project.

Development teams would be locked into their delivery schedule before the high level requirements were clear and complete, or even mutually consistent. The different teams, especially if they were in different companies, based their estimates on assumptions that were flawed, or inconsistent with other teams’ assumptions. Under pressure to reduce estimates and delivery quickly each team might assume they’d be able to do the minimum necessary, especially at the interfaces; other teams would pick up the trickier stuff.

This would create gaps at the interfaces, and cries of “but I thought you were going to do that – we can’t possibly cope in time”. Or the data that was passed from one suite couldn’t be processed by the next one. Both might have been built correctly to their separate specs, but they weren’t consistent. The result would be last minute solutions, hastily lashed together, with inevitable bugs and unforeseen problems down the line – ready to be exposed by the auditors.

Splitting the work across continents and suppliers always creates big management problems. You have to be prepared for these. The additional co-ordination, chasing, reporting and monitoring takes a lot of effort. This all poses big problems for test managers, who have to be strong, perceptive and persuasive to ensure that the testing is planned consistently across the whole solution.

It is tempting, but dangerous, to allow the testing to be segmented. The different sub-systems are tested according to the assumptions that the build teams find convenient. That might be the easy option at the planning stage, but it doesn’t seem so clever when the whole system is bolted together and crashes as the full implications emerge of all those flawed assumptions, long after they should have been identified and challenged.

Outsourcing and global teams don’t provide a quick fix. Without strong management and a keen awareness of the risks it’s a sure way to let serious problems slip through into production. Surely safety critical industries would be smarter, more responsible? I learned all this back in the 1990s. It’s not new, and when I read Bloomberg’s account of Boeing’s engineering practices I swore, quietly and angrily.

Consequences

“During the crashes of Lion Air and Ethiopian Airlines planes that killed 346 people, investigators suspect, the MCAS system pushed the planes into uncontrollable dives because of bad data from a single sensor.

That design violated basic principles of redundancy for generations of Boeing engineers, and the company apparently never tested to see how the software would respond, Lemme said. ‘It was a stunning fail,’ he said. ‘A lot of people should have thought of this problem – not one person – and asked about it.’

So the consequences of commoditization, ETTO, the drift into failure and complacency about developing and testing complex, safety critical systems with global teams all came together disastrously in the Lion Air and Ehtiopian Airlines crashes.

A lot of people should certainly have thought of this problem. As a former IT auditor I thought of this passage by Norman Marks, a distinguished commentator on auditing. Writing about risk-based auditing he said;

A jaw-dropping moment happened when I explained my risk assessment and audit plan to the audit committee of the oil company where I was CAE (Tosco Corp.). The CEO asked whether I had considered risks relating to the blending of gasoline, diesel, and jet fuel.

As it happened, I had — but it was not considered high risk; it was more a compliance issue than anything else. But, when I talked to the company’s executives I heard that when Exxon performed an enterprise-wide risk assessment, this area had been identified as their #1 risk!

Poorly-blended jet fuel could lead to Boeing 747s dropping out of the sky into densely-packed urban areas — with the potential to bankrupt the largest (at that time) company in the world. A few years later, I saw the effect of poor blending of diesel fuel when Southern California drivers had major problems and fingers were pointed at us as well as a few other oil companies.

In training courses, when I’ve been talking about the big risks that keep the top management awake at night I’ve used this very example; planes crashing. In big corporations it’s easy for busy people to obsess about the smaller risks, those that delay projects, waste money, or disrupt day to day work. These problems hit us all the time. Disasters happen rarely and we can lose sight of the way the organisation is drifting into catastrophic failure.

That’s where auditors, and I believe testers too, come in. They should be thinking about these big risks. In the case of Boeing the engineers, developers and testers should have spoken out about the problems. The internal auditors should certainly have been looking out for it, and these are the people who have the organisational independence and power to object. They have to be listened to.

An abdication of management responsibility?

“Boeing also has disclosed that it learned soon after Max deliveries began in 2017 that a warning light that might have alerted crews to the issue with the sensor wasn’t installed correctly in the flight-display software. A Boeing statement in May, explaining why the company didn’t inform regulators at the time, said engineers had determined it wasn’t a safety issue.

‘Senior company leadership,’ the statement added, ‘was not involved in the review.’”

Senior management was not involved in the review. Doubtless there are a host of reasons why they were not involved. The bottom line, however, is that it was their responsibility. I spent six years as an IT auditor. In that time only one of my audits led to the group’s chief auditor using that nuclear phrase, which incidentally was not directed at IT management. A very senior executive was accused of “abdicating managerial responsibility”. The result was a spectacular display of bad temper and attempted intimidation of the auditors. We didn’t back down. That controversy related to shady behaviour at a subsidiary where the IT systems were being abused and frauds had become routine. It hardly compared to a management culture that led to hundreds of avoidable deaths.

One of the core tenets of Safety II, the new way of looking at safety, is that there is never a single, root cause for failure in complex systems. There are always multiple causes, all of them necessary, but none of them sufficient, on their own, for disaster. The Boeing 737-MAX case bears that out. No one person was responsible. No single act led to disaster. The fault lies with the corporate culture as a whole, with a culture of leadership that abdicated responsibility, that “wasn’t involved”.

May 8, 2017 by James Christie

Auditors and testing – a rant justified by experience

A couple of weeks ago I was drawn into a discussion on Twitter about auditors and testing. At the time I was on holiday in a delightfully faraway part of Galloway, in south west Scotland.

One of the attractions of the cottage where we were staying was that it lacked a mobile (cell) phone signal, never mind internet access. Only when we happened to be in a pub or restaurant could I sneak onto wifi discreetly, without incurring a disapproving look from my wife.

Having worked as both an IT auditor and a tester, and having both strong opinions and an argumentative nature, I had plenty to say on the subject. That had to wait till I returned (via New York, but that’s another subject) when I unleashed a rant on Twitter. Here is that thread, in a more readable format. It might be a rant, but it is based on extensive experience.

Auditors looking for items they can check that MUST be called test cases? That’s a big, flashing, warning sign they have a lousy conceptual grasp of auditing. It’s true, but missing the point, to say that’s old fashioned. It’s like saying the problem with ISO 29119 is it’s old fashioned.

The crucial point is it’s bad, unprofessional auditing. The company that taught me to audit was promoting good auditing 30 years ago. If anyone had remained ignorant of the transformation in software development in the last 30 years you’d call them idiots, not old-fashioned.

A test case is just a name for a receptacle. It’s a bucket of ideas. Who cares about the bucket? Ideas and evidence really matter to auditors, who live and die by evidence; they expect compelling evidence that the auditees have been thinking about what they are doing. A lack of useful evidence showing what testing has been performed, or a lack of thought about how to test should be certain ways to attract criticism from auditors. The IT auditors’ governance model COBIT5 mentions “test cases” once (in passing). It mentions “ideas” 32 times & “evidence” 16 times.

COBIT5 isn’t just about testing of course. Its principles apply across the whole range of IT and testing is no exception. Auditors should expect testers to have;

a clear vision or strategy of how testing should be performed in their organisation,

a clear (but not necessarily detailed) plan for testing the product,

relevant, contemporary evidence that justifies and leads inescapably to the conclusions, lessons and insights that the testers derived and reported from their testing.

That’s what auditors should expect. Some (or many?) organisations are locked into a pattern of low quality and low value auditing. They define auditing as brainless compliance checking that is performed by low quality staff who don’t understand what they’re doing. Their work is worthless. As a result audit is held in low esteem in the organisation. Smart people don’t want to work there. Therefore audit must be defined in such a way that low quality staff are able to carry it out.

This is inexcusable. At best it is negligence. Maintaining that model of auditing requires willful ignorance of what the audit profession stipulates. It is damaging and contributes towards the creation of a dysfunctional culture. Nevertheless it is cheap and ensures there are no good auditors who might pose uncomfortable, challenging questions to senior managers.

However, this doesn’t mean there are never times when auditors do need to see test cases. If a contract has been stupidly written so that test cases must be produced and visible then there’s no wriggle room. It’s just the same (and just as stupid) as if the contract says testers must wear pink shirts. It might be stupid but it is a contractual deliverable; auditors will want to see proof of compliance. As Griffin Jones pointed out on seeing my tweet, “often (the contract) is stupidly written – thus the need to get involved with the contracting organization. The problem is bigger than test or SW dev”.

I fully agree with Griffin. Testers should get involved in contractual discussions that will influence their work, in order to anticipate and head off unhelpful contractual terms.

I would add that testers should ask to see the original contract. Contractual terms are sometimes misinterpreted as they are passed through the organisation to the testers. It might be possible to produce the required evidence by smarter means.

Apart from such tiresome contractual requirements, demanding to see “test cases” is a classic case of confusing form and content. It’s unprofessional. That’s not just my opinion; it’s not novel or radical. It’s simply orthodox, professional opinion. Anyone who says otherwise is clueless or bullshitting. Either way they must be resisted. Clueless bullshitters can enjoy good, lucrative careers, but do huge damage. I’ve no respect for them.

The US Food and Drug Administration’s “General Principles of Software Validation” do pose a problem. They date back to 1997, updated in 2002. They are creakily old. They mention test cases many times, but they were written when it was assumed that testing meant writing test cases. The term seems to be used as jargon for tests. If testing satisfies FDA criteria then there’s no obvious reason why you can’t just call planned tests “test cases”.

There’s no requirement to produce test scripts as well as test cases, but expected results with objective pass/fail criteria are required. That doesn’t, and mustn’t, mean testers should be looking only for the expected results. The underlying principle is that compliance should follow the “least burdensome” approach and the FDA do say that they are open to considering alternative approaches to comply with the requirements in a way that is less burdensome.

Further, the FDA does not have a problem with Agile development (PDF, opens in new tab), and they also do approve of exploratory testing, as explained by Michael Bolton and James Bach.