This news story in the Guardian grabbed my attention. The Nationwide Building Society mistakenly processed 704,426 payments a second time.
The Independent carried a less detailed report on the story, but it contained a quote that irked me (my emphases).
The society instead blamed an “inputting” error by an operative at its Swindon HQ. The phantom transactions were removed from customers accounts overnight, the bank said.
Jenny Groves, divisional director for customer experience said: “Nationwide wishes to apologise to those customers affected by an issue which has affected some of our debit card customers.”
She said those put into the red would have all charges “refunded in full and any costs associated with this error will be reimbursed in full. None of our customers will suffer financial loss as a result of this one-off error“.
Wow! This is 2012, and a big bank is making excuses that didn’t wash 30 years ago. I’ve worked extensively with big, batch financial systems. Here are some basic, utterly fundamental precepts that were well known by developers before I even knew what a computer was.
People screw up. Sometimes they do it in ways you expect, often in ways that surprise you. The only certainty is that they screw up.
You process every payment accurately, no exceptions.
You never, ever, process payments twice. It is a big deal. It’s not just about keeping your job, or staying out of jail. It’s about self respect. It’s about going to sleep at night knowing you’re a competent professional, not an irresponsible cowboy who gets it right only some of the time.
The user requirements will not state every requirement that is absolute and non-negotiable. Some requirements are so fundamental and essential that the users will assume that “they go without saying”. If such requirements do not appear in specifications then you will look stupid if you subsequently pretend that they didn’t matter, or that you believe the users did not really require them. Processing payments accurately, once, and once only falls into this category.
It is the system designers’ responsibility to build these unstated, fundamental requirements into the application, even if the business analysts missed them.
It is the testers’ responsibility to test the application against these unstated, fundamental requirements.
All this means that financial applications need carefully designed controls to ensure that the right things always happen and the wrong things never do. It means that the application needs built in checks to detect these “one off human errors”. The techniques are ancient, at least in computing, and maybe that’s part of the problem. They’re boring, pedantic old-school stuff.
The main techniques are control files to keep track of files as they are being processed, hash counts and record counts to show that all records have been processed, and file version numbers so that the application can check that the right files are being processed and being processed only once.
These techniques are boring and fiddly, but they work. Unfortunately they frequently trip up test runs. Control files and version numbers have to be reset after a run is halted. It’s easy to lose track and have to explain that the failure was a embarrassing test setup problem, rather than a genuine defect.
It’s much simpler to forget about these controls, or to switch them off for testing, or even switch them off in live running (it happens) when they complicate restarts after problems.
I said earlier that testers have a responsibility to test unstated, fundamental requirements. Actually, that was a slightly tricky one. Of course it is perfectly true, but sadly some project managers, and even whole organisations, prefer to put pressure on testers to script tests only against written requirements.
If you are testing a financial payment application and you’re not testing to see if every payment is processed accurately, once, and only once then you’re not really testing. Such ”testing” is an embarrassment to the testing profession.
Organisations that skimp on effective testing, that don’t understand the value of thoughtful, risk-based controls, that blame “human error” when there is a management or systemic failure are placing their customers and reputation at risk. They are inviting humiliating press coverage and they deserve it.
I had to get that off my chest. People screw up. Human error is inevitable. Testers have to show how it can happen. It’s so much less embarrassing to read it in a test report than a national newspaper. That’s all.