Posted by: James Christie | August 27, 2015

They can’t handle the truth

Have you ever had to deal with managers or users who were sceptical about the time and effort a piece of work would take? Have you ever complained in vain about a project that was clearly doomed to fail right from the start? Have you ever felt that a project was being planned on the basis of totally unjustified optimism?

If you’ve been in IT for a while there’s a good chance you’ve answered “yes” to at least one of these questions. Over the years I grew wearily familiar with the pattern of willful refusal to consider anything but the happy path to a smooth, speedy delivery of everything on the wish list, within a comical budget that is challenging I admit, but realistic if we all pull together.

Over time I gradually came to realise that many senior managers and stakeholders didn’t want the truth. They want the fiction, to be lied to because knowing the truth would make them responsible for dealing with it. In their world it is better to be deceived and then firefight a failing project than to deal honestly with likely problems and uncertainty. Above all, they can’t bring themselves to deal with the truth of uncertainty. It is far more comfortable to pretend that uncertainty is evidence of lack of competence, that problems can be anticipated, that risks can be ignored or managed out of existence, that complexity can be eliminated by planning and documentation (and by standards).

Telling the truth – a brave act in an unfair world

Perhaps the toughest roles in IT are those that are senior enough to be accountable for the results, but too junior to beat uncomfortable truths into the brains of those who really don’t want to know.

These budding fall guys have the nous and experience to see what is going to happen. One of the rarely acknowledged skills of these battle scarred veterans is the ability to judge the right moment and right way to start shouting the truth loudly. Reveal all too early and they can be written off as negative, defeatist, “not a team player”. Reveal it too late and they will be castigated for covering up imminent failure, and failing to comply with some standard or process. Everyone fails to comply. Not everyone is going to be kicked for it, but late deliverers of bad news are dead meat.

Of course that’s not fair, but that’s hardly the point. Fairness isn’t relevant if the culture is one where rationality, prudence and pragmatism all lead to crazy behaviour because that is what is rewarded. People rationally adapt to the requirement to stop thinking when they see others being punished for honesty and insight.

What is an estimate?

So what’s the answer? The easy one is, “run, and run fast”. Get out and find a healthier culture. However, if you’re staying then you have to deal with the problem of handling senior people who can’t handle the truth.

It is important to be clear in your own mind about what you are being asked for when you have to estimate. Is it a quote? Is there an implied instruction that something must be delivered by a certain date? Are there certain deliverables that are needed by that date, and others that can wait? Could it be a starting point for negotiation? See this article I wrote a few years ago.

Honesty is non-negotiable

It’s a personal stance, but honesty about uncertainty and the likelihood of serious but unforeseeable problems is non-negotiable. I know others have thought I have a rather casual attitude towards job security and contract renewal! However, I can’t stomach the idea of lingering for years in an unhealthy culture. And it’s not as if honesty means telling the senior guys who don’t want the truth that they are morons (even if they are).

Honesty requires clear thinking, and careful explanation of doubt and uncertainty. It means being a good communicator, so that the guys who take the big decisions have a better understanding that your problems will quickly become their problems. It requires careful gathering of relevant information if you are ordered into a project death march so that you can present a compelling case for a rethink when there might still be time for the senior managers and stakeholders to save face. Having the savvy to help the deliberately ignorant to handle the truth really is a valuable skill. Perhaps Jack Nicholson’s character from “A Few Good Men” isn’t such a great role model, however. His honesty in that memorable scene resulted in him being arrested!

Posted by: James Christie | August 4, 2015

Personal statement of opposition to ISO 29119 based on principle

Introduction

I believe that prescriptive standards are inappropriate for software testing. This paper states my objections in principle, i.e. it explains why I believe that the decision to develop ISO 29119 was fundamentally misguided. These objections are valid without any consideration of the detailed content of the standard, which testers cannot view for themselves unless they or their employers buy a set. Members of the Context Driven School of testing (CDT) were making these arguments before ISO 29119 was issued, or even conceived.

This is not a conventional article in essay style. It is more in the nature of a work of reference, providing sources, links and a basis for further work. I expect to update it periodically. The original version arose from work that I did for the Association for Software Testing’s (AST) Committee on Standards and Professional Practices.

My objections in principle fall into four categories, based on regulatory theory and practice, the nature of software development, the social sciences, and the need for fair competition. These objections are based on academic and practical evidence. A full explanation of my objections could run to book length. I will therefore restrict myself to a brief statement of each objection, with supporting links.

The AST’s mission and the principles of CDT have informed and guided my analysis so I shall start by quoting them.

AST mission

The AST’s mission, as stated on its website is as follows.

“The Association for Software Testing is dedicated to advancing the understanding of the science and practice of software testing according to Context-Driven principles.

The Association for Software Testing (AST) is an international non-profit professional association with members in over 50 countries. AST is dedicated and strives to build a testing community that views the role of testing as skilled, relevant, and essential to the production of faster, better, and less expensive software products. We value a scientific approach to developing and evaluating techniques, processes, and tools. We believe that a self-aware, self-critical attitude is essential to understanding and assessing the impact of new ideas on the practice of testing.”

CDT principles

The seven basic principles of Context-Driven Testing (CDT) are as follows.

  1. “The value of any practice depends on its context.
  2. There are good practices in context, but there are no best practices.
  3. People, working together, are the most important part of any project’s context.
  4. Projects unfold over time in ways that are often not predictable.
  5. The product is a solution. If the problem isn’t solved, the product doesn’t work.
  6. Good software testing is a challenging intellectual process.
  7. Only through judgment and skill, exercised cooperatively throughout the entire project, are we able to do the right things at the right times to effectively test our products.”

1 – Objections based on regulatory theory and practice

1a – Principles and rules

There has been much discussion in recent years about the relative merits of principles and rules in regulating and influencing behavior. For the purposes of this paper I will define principles as non-specific statements of what is expected, and rules as detailed and specific statements of what must be done.

I believe that for a complex, context dependent and cognitively demanding activity such as software testing it is unhelpful to present a binary choice between either principles or rules. It is far more effective to combine a set of general principles applying to all testers with specific rules that are relevant to particular organizations and settings. See Braithwaite’s “Rules and principles; a theory of legal certainty”. For these rules and principles to work effectively there should be constant discussion, or regulatory conversations (PDF, opens in a new tab) between regulators and those being regulated about the meaning and application of the principles.

The situation is confused because standards in legal discussions are usually assumed to be synonymous with principles. A standard is usually conceived as a clear statement of what must be achieved by an activity, rather than how it should be performed in detail. Standards for software testing have traditionally be pitched at the detailed level of rules, e.g. IEEE 829.

A standard that would be appropriate for software testing would therefore be brief, clear and non-specific.

A suitable example of such a standard would be the International Standards for the Professional Practice of Internal Auditing.

1b – Regulators

Following on from the last point I believe that the specific requirements of industry regulators are of paramount importance for testers in those industries. Any testing standards should be framed as principles in such a way that they are consistent with those requirements and not attempt to provide competing, detailed rules.

As an example, the US Food and Drugs Administration offers clear advice about what is required, focusing on the need for testing to provide high quality evidence that is capable of standing up in court. The FDA advice does not specify exactly how this should be done, but does allow companies to adopt the “least burdensome approach”. See “General Principles of Software Validation; Final Guidance for Industry and FDA Staff”, 2002.

1c – Accountability

A frequent justification of the need for software testing standards is that testers should be accountable to stakeholders: testers must demonstrate that they are providing valuable information, effectively and efficiently.

We agree that accountability is important, but do not believe that prescriptive standards meet that need. In evidence I point to the requirements of the audit profession.

1c1 – IIA standards

The standards of the global Institute of Internal Auditors offer no support for generic, prescriptive testing standards. The Global Technology Audit Guide, “Management of IT Audit” (PDF, opens in a new tab), 1st edition, 2006 describes the Snowflake Theory of IT Audit:

“Every IT environment is unique and represents a unique set of risks. The differences make it increasingly difficult to take a generic or checklist approach to auditing.”

In the section “Frameworks and Standards” the Audit Guide says:

“One challenge auditors face when executing IT audit work is knowing what to audit against. Most organizations have not fully developed IT control baselines for all applications and technologies. The rapid evolution of technology could likely render any baselines useless after a short period of time.”

Although the IIA Global Technology Audit Guides refer to ISO standards “for consideration” as baselines against which to audit they make no mention of software testing standards and emphasize the need to understand the different mix of standards, methods and tools that each organization uses, and that they should not expect to see any set of “best practice” implemented wholesale without customization.

1c2 – Information Systems Audit and Control Association

COBIT 5 is ISACA’s model for IT governance. It stresses the need for clear organization-specific testing standards, and for careful documentation of test plans and results. However, there is no mention of organizations incorporating external standards. “Best practices” are to be used as a “reference when improving and tailoring”. Organizations should “employ code inspection, test-driven development practices, automated testing, continuous integration, walk-throughs and testing of applications as appropriate.” COBIT 5 has countless references to various external standards, but none to testing standards.

COBIT 5 is important because it is widely accepted as an effective means of complying with the requirements of Sarbanes-Oxley.

1c3 – End of binary opinions

The audit profession has traditionally offered binary opinions on whether accounts were true and fair, or internal controls were present or lacking. Increasingly the profession requires more nuanced reporting about the risks that matter, the risks that keep stakeholders up at night. This requires testers to offer more valuable information about the quality of products than merely saying how many test cases were run and passed. Testers have to tell a valuable story rather than rely on filling in generic standard based templates.

1c4 – Lifebelt for inexperienced investigators

Prescriptive standards act as a lifebelt for investigators and auditors who lack the experience and confidence to make their own judgments. They search for something they can use to give them reassuring answers about how a job should be done. The US Government Accountability Office in its March 2015 report (PDF, opens in a new tab) on the problems with the Healthcare.gov website checked the project for compliance with the IEEE 829 test documentation standard. The last revision was made in 2008, and it said that the content of standards more than five years old “do not wholly reflect the present state of the art”. In fact IEEE 829 is hopelessly out of date. It is now quite irrelevant to modern testing.

Standards have credibility and huge influence simply from their status as standards. If testers are to have standards it is essential that they are relevant, credible and framed in a way that does not mislead investigators.

2 – Objections based on the nature of software development

2a – Experience with IEEE829

2a1 – Documentation obsession

IEEE 829 was for many years the pre-eminent testing standard. It had a huge influence on testing. Many organizations based their testing processes and documents on this standard and its templates.

The result was a widespread fixation with excessively large, uninformative documents, which became the focus for testing teams’ activities, rather than acquiring knowledge about the product under test.

2a2 – Traditional methods

IEEE 829 was aligned conceptually with traditional methods, especially Structured Methods, which were based on the assumption that linear and documentation driven processes are necessary for a quality product and that more flexible, light-weight documentation approaches were irresponsible. However, Structured Methods were based on intuition rather than evidence and studies showed that a document driven approach did not match the way people think when creating software. There is a lack of evidence that the detailed and document driven approach associated with IEEE 829 was ever an effective, generic approach to testing.

2b – Complexity

Prescriptive processes are ill equipped to handle complex activities like software development. In software development it is impossible to state with certainty what the end product should look like at an early stage. This undermines the rational for a heavy investment in advance testing documentation.

2c – Cynefin

The Cynefin Framework is a valuable way to help us to make sense of the differing responses that different situations require. Software development and testing are inherently complex, and therefore “best practice” and prescriptive standards are inappropriate. A flexible, iterative approach is required.

3 – Objections based on the social sciences

3a – The nature of expertise.

Prescriptive standards do not take account of how people acquire skills and apply their knowledge and experience in cognitively demanding jobs such as testing.

3a1 – Tacit knowledge

Michael Polanyi and Harry Collins have offered valuable arguments about how we apply knowledge. Polanyi introduced the term “tacit knowledge” to describe the knowledge we have and use but cannot articulate; Collins built on his work. Foisting prescriptive standards onto skilled people is counter productive and encourages them to concentrate on the means rather than the ends.

3a2 – Schön’s reflection-in-action

Donald Schön (PDF, opens in a new tab) argues that creative professionals, such as software designers or architects, have an iterative approach to developing ideas. Much of their knowledge is tacit. They can’t write down all of their knowledge as a neatly defined process. To gain access to what they know they have to perform the creative act so they can learn, reflect on what they’ve learned and then apply the new knowledge. This is inconsistent with following a prescriptive process.

3b – Goal displacement

Losing sight of the true goals and focusing on methods, processes and documentation is a constant danger of prescriptive standards. Not only does this reflect my own experience there is a wealth of academic studies backing this up.

3b1 – Trained incapacity

A full century ago, in 1914, Thorstein Veblen identified the problem of trained incapacity. People who are trained in specific skills can lack the ability to adapt. Their response has worked in the past, and they apply it regardless thereafter. They focus on responding in the way they have been trained, and cannot see that the circumstances require a different response. Their training has rendered them incapable of doing the job effectively unless it fits their mental framework.

3b2 – Ways of seeing

In 1935 Kenneth Burke built on Veblen’s work, arguing that trained incapacity meant that one’s abilities become blindnesses. People can focus on the means or the ends, not both, and their specific training in prescriptive methods or processes leads them to focus on the means. They do not even see what they are missing.

3b3 – Conformity to the rules

Robert Merton made the point more explicitly in 1957.

“Adherence to the rules… becomes an end in itself… Formalism, even ritualism, ensues with an unchallenged insistence upon punctilious adherence to formalized procedures. This may be exaggerated to the point where primary concern with conformity to the rules interferes with the achievement of the purposes of the organization”.

So the problem had been recognised before software development was even in its infancy.

3c – Defense against anxiety

3c1 – Social defenses

Isabel Menzies Lyth (PDF, opens in a new tab) provided a different slant in the 1950s using her experience in psychoanalysis. Her specialism was analyzing the social dynamics of organizations.

She argued that the main factor shaping an organization’s structure and processes was the need for people to cope with stress and anxiety. “Social defenses” were therefore built to help people cope. The defenses identified by Menzies Lyth included rigid processes that removed discretion and the need for decision making, hierarchical staffing structures, increased specialization, and people being managed as fungible (i.e. readily interchangeable) units, rather than skilled professionals.

3c2 – Transitional objects

Donald Winnicott’s contribution (PDF, opens in a new tab) was the idea of the transitional object. This is something that helps infants to cope with loosening the bonds with their mother. Babies don’t distinguish between themselves and their mother. Objects like security blankets and teddy bears give them something comforting to cling onto while they come to terms with the beginnings of independence in a big, scary world.

David Wastell linked the work of Menzies Lyth and Winnicott. He found that developers used Structured Methods as transitional objects, i.e. as a defence mechanism to alleviate the stress and anxiety of a difficult job.

Wastell could see no evidence that Structured Methods worked. The evidence was that the resulting systems were no better than the old ones, took much longer to develop and were more expensive. Managers became hooked on technique and lost sight of the true goal.

“Methodology becomes a fetish, a procedure used with pathological rigidity for its own sake, not as a means to an end. Used in this way, methodology provides a relief against anxiety; it insulates the practitioner from the risks and uncertainties of real engagement with people and problems.”

3d – Loss of communication

Effective two way communication requires effort, “interpretive labor”. The anthropologist David Graeber (PDF, opens in a new tab) argues that the greater the degree of force or compulsion, and the greater the bureaucratic regime of rules and forms, the less is the communication. Those who issue the orders don’t need to, and therefore don’t take the trouble to understand the complexities of the situation they’re managing. This problem works against regulatory conversations (see 1a).

4 – Objections based on the need for competition in testing services

4a – ISO brand advantage

ISO has the reputation for being the global leader in standardization. Any standard that it issues has a huge advantage over alternative methods, simply because of the ISO brand. It is therefore vital that any testing standard is both credible and widely accepted on its own merits. My view, based on the evidence I have set out, is that a detailed, prescriptive standard could not meet those tests.

4b – Market for lemons

Buyers of testing services are often ill informed about the quality of the service that they buy. Economists recognize that where there is asymmetric information (PDF, opens in a new tab) purchasers are vulnerable and the market is distorted. Naive buyers at used car auctions cannot distinguish between good cars and lemons. This puts the sellers of lemons at an advantage. They can get a higher price than their product is worth; sellers of better products find it difficult to reach the price they want and are likely to leave the market, which becomes dominated by poor products.

For the reasons I have outlined, any prescriptive testing standard will help sellers of poor testing services to sell plausible “lemon” services. That will make it harder for sellers of high quality testing services to gain a worthwhile price; prices will drift downwards as testing is commoditized, sold on price rather than quality.

4c – Compulsion through contracts

Any ISO standard is likely to be referenced in contracts by lawyers and managers. This will introduce compulsion. This was the profession’s experience with IEEE 829. Even when IEEE 829 was not directly mandated it had huge influence on many organizations’ test strategies.

If standards are detailed, prescriptive and mandatory then that will reduce the flexibility that testers need for context driven testing. This would not fit the principles driven style of regulation that I outlined as being desirable in 1a. It would also lead to poorer communication, as described in 3d.

Posted by: James Christie | March 12, 2015

Standards – a charming illusion of action

The other day I posted an article I’d written that appeared on the uTest blog a few weeks ago. It was a follow up to an article I wrote last year about ISO 29119. Pmhut (the Project Management Hut website) provided an interesting comment.

”…are you sure that the ISO standards will be really enforced on testing – notably if they don’t really work? After all, lawyers want to get paid and clients want their projects done (regardless of how big the clients are).”

Well, as I answered, whether or not ISO 29119 works is, in a sense, irrelevant. Whether or not it is adopted and enforced will not depend on its value or efficacy. ISO 29119 might go against the grain of good software development and testing, but it is very much aligned with a hugely pervasive trend in bureaucratic, corporate life.

I pointed the commenter to an article I wrote on “Teddy Bear Methods”. People cling to methods not because they work, but because they gain comfort from doing so. That is the only way they can deal with difficult, stressful jobs in messy and complex environments. I could also have pointed to this article “Why do we think we’re different?”, in which I talk about goal displacement, our tendency to focus on what we can manage while losing sight of what we’re supposed to be managing.

A lesson from Afghanistan

I was mulling over this when I started to read a fascinating looking book I was given at Christmas; “Heirs to Forgotten Kingdoms” by Gerard Russell, a deep specialist in the Middle East and a fluent Arabic and Farsi speaker.

The book is about minority religions in the Middle East. Russell is a former diplomat in the British Foreign Office. The foreword was by Rory Stewart, the British Conservative MP. Stewart was writing about his lack of surprise that Russell, a man deeply immersed in the culture of the region, had left the diplomatic service, then added;

”Foreign services and policy makers now want ‘management competency’ – slick and articulate plans, not nuance, deep knowledge, and complexity.”

That sentence resonated with me, and reminded me of a blistering passage from Stewart’s great book “The Places in Between”, his account of walking through the mountains of Afghanistan in early 2002 in the immediate aftermath of the expulsion of the Taliban and the NATO intervention.

Rory Stewart is a fascinating character, far removed from the modern identikit politician. The book is almost entirely a dispassionate account of his adventures and the people whom he met and who provided him with hospitality. Towards the end he lets rip, giving his brutally honest and well-informed perspective of the inadequacies of the western, bureaucratic, managerial approach to building a democratic state where none had previously existed.

It’s worth quoting at some length.

“I now had half a dozen friends working in embassies, thinktanks, international development agencies, the UN and the Afghan government, controlling projects worth millions of dollars. A year before they had been in Kosovo or East Timor and in a year’s time they would have been moved to Iraq or Washington or New York.

Their objective was (to quote the United Nations Assistance Mission for Afghanistan) ‘The creation of a centralised, broad-based, multi-ethnic government committed to democracy, human rights and the rule of law’. They worked twelve- or fourteen- hour days, drafting documents for heavily-funded initiatives on ‘democratisation’, ‘enhancing capacity’, ‘gender’, ‘sustainable development,’ ‘skills training’ or ‘protection issues’. They were mostly in their late twenties or early thirties, with at least two degrees – often in international law, economics or development. They came from middle class backgrounds in Western countries and in the evenings they dined with each other and swapped anecdotes about corruption in the Government and the incompetence of the United Nations. They rarely drove their 4WDs outside Kabul because they were forbidden to do so by their security advisers. There were people who were experienced and well informed about conditions in rural areas of Afghanistan. But such people were barely fifty individuals out of many thousands. Most of the policy makers knew next to nothing about the villages where 90% of the population of Afghanistan lived…

Their policy makers did not have the time, structures or resources for a serious study of an alien culture. They justified their lack of knowledge and experience by focusing on poverty and implying that dramatic cultural differences did not exist. They acted as though villagers were interested in all the priorities of international organisations, even when they were mutually contradictory…

Critics have accused this new breed of administrators of neo-colonialism. But in fact their approach is not that of a nineteenth-century colonial officer. Colonial administrations may have been racist and exploitative but they did at least work seriously at the business of understanding the people they were governing. They recruited people prepared to spend their entire careers in dangerous provinces of a single alien nation. They invested in teaching administrators and military officers the local language…

Post-conflict experts have got the prestige without the effort or stigma of imperialism. Their implicit denial of the difference between cultures is the new mass brand of international intervention. Their policy fails but no one notices. There are no credible monitoring bodies and there is no one to take formal responsibility. Individual officers are never in any one place and rarely in any one organisation long enough to be adequately assessed. The colonial enterprise could be judged by the security or revenue it delivered, but neo-colonialists have no such performance criteria. In fact their very uselessness benefits them. By avoiding any serious action or judgement they, unlike their colonial predecessors, are able to escape accusations of racism, exploitation and oppression.

Perhaps it is because no one requires more than a charming illusion of action in the developing world. If the policy makers know little about the Afghans, the public knows even less, and few care about policy failure when the effects are felt only in Afghanistan.”

Stewart’s experience and insight, backed up by the recent history of Afghanistan, allow him to present an irrefutable case. Yet, in the eyes of pretty much everyone who matters he is wrong. Governments and the military are prepared to ignore the evidence and place their trust in irrelevant and failed techniques rather than confront the awful truth; they don’t know what they’re doing and they can’t know the answers.

Vast sums of money, and millions of lives are at stake. Yet very smart and experienced people will cling on to things that don’t work, and will repeat their mistakes in the future. Stewart, meanwhile, is very unlikely to be allowed anywhere near the levers of power in the United Kingdom. Being right isn’t necessarily a great career move.

Deep knowledge, nuance and complexity

I’m conscious that I’m mixing up quite different subjects here. Software development and testing are very different activities from state building. However, both are complex and difficult. Governments fail repeatedly at something as important and high-profile as constructing new, democratic states, and do so without feeling the need to reconsider their approach. If that can happen in the glare of publicity is it likely that corporations will refrain from adopting and enforcing standards just because they don’t work? Whether or not they work barely matters. Such approaches fit the mindset and culture of many organisations, especially large bureaucracies, and once adopted it is very difficult to persuade them to abandon them.

Any approach to testing that is based on standardisation is doomed to fail unless you define success in a way that is consistent with the flawed assumptions of the standardisation. What’s the answer? Not adopting standards that don’t work is an obvious start, but that doesn’t take you very far. You’ve got to acknowledge those things that Stewart referred to in his foreword to Gerard Russell’s book; answers aren’t easy, they require deep knowledge, an understanding of nuance and an acceptance of complexity.

A video worth watching

Finally, I’d strongly recommend this video of Rory Stewart being interviewed by Harry Kreisler of the University of California about his experiences and the problems I’ve been discussing. I’ve marked the parts I found most interesting.

34 minutes; Stewart is asked about applying abstract ideas in practice.

40:20; Stewart talks about a modernist approach of applying measurement, metrics and standardisation in contexts where they are irrelevant.

47:05; Harry Kreisler and then Stewart talk about participants failing to spot the obvious, that their efforts are futile.

49:33; Stewart says how his Harvard students regarded him as a colourful contrarian, and that all Afghanistan needed was a new plan and new resources.

Posted by: James Christie | March 10, 2015

ISO 29119: Why is the Debate One-Sided?

This article originally appeared on the uTest blog on February 23rd 2015.

In August last year the Stop 29119 campaign and petition kicked off at the CAST conference in New York.


In September I wrote on the uTest blog about why the new ISO/IEEE 29119 software testing standards are a danger to good testing and the whole testing profession.

I was amazed at the commotion that Stop 29119 caused. It was the biggest talking point in testing in 2014. Six months on it’s time to look back. What has actually happened?

The remarkable answer is – very little. The Stop 29119 campaigners haven’t given up. There has been a steady stream of blogs and articles. However, there has been no real debate; the discussion has been almost entirely one sided.

There has been only one response from ISO. In September Dr Stuart Reid, the convenor of the working goup that produced the standard, issued a statement attempting to rebut the arguments of Stop 29119. That was it. ISO then retreated into its bunker and ignored invitations to debate.

Dr Reid’s response was interesting, both in its content and the way it engaged with the arguments of Stop 29119. The Stop 29119 petition was initiated by the board of the International Society for Software Testing. ISST’s website had a link to the petition, and a long list of blogs and articles from highly credible testing experts criticising ISO 29119. It is a basic rule of debate that one always tackles an opponent’s strongest points. However, Dr Reid ignored these authoritative arguments and responded to a series of points that he quoted from the comments on the petition site.

To be more accurate, Dr Reid paraphrased a selection of the comments and criticisms from elsewhere, framing them in a way that made it easier to refute them. Some of these points were no more than strawmen.

So Cem Kaner argued that IEEE adopts a “software engineering standards process that I see as a closed vehicle that serves the interests of a relatively small portion of the software engineering community… The imposition of a standard that imposes practices and views on a community that would not otherwise agree to them, is a political power play”..

Dr Reid presented such arguments as “no-one outside the Working Group is allowed to participate” and “the standards ‘movement’ is politicized and driven by big business to the exclusion of others”.

These arguments were then dismissed by stating that anyone can join the Working Group, which consists of people from all parts of the industry. Dr Reid also emphasized that “consensus” applies only to those within the ISO process, failing to address the criticism that this excludes those who believe, with compelling evidence, that ISO-style standardization is inappropriate for testing.

These criticisms had been made forcefully for many years, in articles and at conferences, yet Dr Reid blithely presented the strawman that “no-one knew about the standards and the Working Group worked in isolation”. He then effortlessly demolished the argument that no-one was making.

What of the content? There were concerns about how ISO 29119 deals with Agile and Exploratory Testing. Eg, Rikard Edgren offered a critique arguing that the standards tried but failed to deal with Agile. Similarly, Huib Schoots argued that a close reading of the standards revealed that the writers didn’t understand exploratory testing at all.

These are serious arguments that defenders of the standard must deal with if they are to appear credible. What was the ISO response?

Dr Reid reduced such concerns to bland and inaccurate statements that “the standards represent an old-fashioned view and do not address testing on agile projects” and ”the Testing Standards do not allow exploratory testing to be used”. Again these were strawmen that he could dismiss easily.

I could go on to highlight in detail other flaws in the ISO response; the failure to address the criticism that the standards weren’t based on research or experience that demonstrates the validity of that approach; the failure to answer the concern that the standards will lead to compulsion by the back door; the failure to address the charge from the founders of Context Driven Testing that the standards are the antithesis of CDT; the evasion of the documented links between certification and standards.

In the case of research Dr Reid told us of the distinctly underwhelming claims from a Finnish PhD thesis (PDF, opens in a new tab) that the standards represent “a feasible process model for a practical organisation with some limitations”. These limitations are pretty serious; “too detailed” and “the standard model is top heavy”. It’s interesting to note that the PhD study was produced before ISO 29119 part 3 was issued; the study does not mention part 3 in the references. The study can therefore offer no support for the heavyweight documentation approach that ISO 29119 embodies.

So instead of standards based on credible research we see a search for any research offering even lukewarm support for standards that have already been developed. That is not the way to advance knowledge and practice.

These are all huge concerns, and the testing community has received no satisfactory answers. As I said, we should always confront our opponents strongest arguments in a debate. In this case I’ve run through the only arguments that ISO have presented. Is it any wonder that the Stop 29119 campaigners don’t believe we have been given any credible answers at all?

What will ISO do? Does it wish to avoid public discussion in the hope that the ISO brand and the magic word “standards” will help them embed the standards in the profession? That might have worked in the past. Now, in the era of social media and blogging there is no hiding place. Anyone searching for information about ISO 29119 will have no difficulty finding persuasive arguments against it. They will not find equally strong arguments in favour of the standards. That seems to be ISO’s choice.

Posted by: James Christie | February 9, 2015

“A novel-long standard” A1QA interview

A1QA blogI was asked to take part in this interview about ISO 29119 by Elizabeth Soroka of A1QA. The interview ran in January 2015 under the headline “A novel-long standard; interview with James Christie”

James, you’ve tried so many IT fields. Can you explain why you switched into auditing?

I worked for a big insurance company. They had just re-organized their Audit department. One of the guys who worked there knew me and thought I’d be well suited to audit. I was a developer who had moved on into a mixture of programming and systems analysis. However, I had studied accountancy at university and spent a couple of years working in accountancy and insurance investment, so I had a wider business perspective than most devs. I think that was a major reason for me being approached.

I turned down the opportunity because I was enjoying my job and I wanted to finish the project I was responsible for. The Audit department kept in touch with me and I gradually realised that it would be a much more interesting role than I’d thought. A couple of years later another opportunity came up at a time when I was doing less interesting work so I jumped at the chance. It was a great decision. I learned a huge amount about how IT fitted into the business.

As a person with an audit background, do you think standards improve software testing or block it?

They don’t improve testing. I don’t think there’s any evidence to support that assertion. The most that ISO 29119 defenders have come up with is the claim that you can do good testing using the standard. That’s arguable, but even if it is true it is a very weak defence for making something a standard. It’s basically saying that ISO 29119 isn’t necessarily harmful.

I wouldn’t have said that ISO 29119 blocks testing. It’s a distraction from testing because it focuses attention on the documentation, rather than the real testing. An auditor should expect three things; a clear idea about how testing will be performed, and evidence that explains what testing was done, and what the significance was of the results.

ISO 29119, and the previous testing standard IEEE 829, emphasize heavy advance documentation and deal pitifully with the final reports. Auditors should expect an over-arching test strategy saying “this is our approach to testing in this organization”. They should also expect an explanation of how that strategy will be interpreted for the project in question.

Detailed test case specifications shouldn’t impress auditors any more than detailed project plans would convince anyone that the project was successful. ISO 29119 says that “test cases shall be recorded in the test case specification” and “the test case specification shall be approved by the stakeholders”.

That means that if testers are to be compliant with the standard they have to document their planned testing in detail, then get the documents approved by many people who can’t be expected to understand all that detail. Trying to comply with the standard will create a mountain of unnecessary paper. As I said, it’s a distraction from the real work.

You started the campaign “STOP 29119”.Tell us a few words about the standard?

I don’t claim that I started the campaign. The people who deserve most credit for that are probably Karen Johnson and Iain McCowatt, who responded so energetically to my talk at CAST 2014 in New York.

ISO 29119 is an ambitious attempt, in ISO’s words “to define an internationally-agreed set of standards for software testing that can be used by any organization when performing any form of software testing.”

The full standard will consist of five documents; glossary, processes, documentation, techniques and finally key-word driven testing. So far the first three documents have been issued, i.e. the glossary, processes and documentation. The fourth document, test techniques, is due to be issued any time now. The fifth, on key-word driven testing should come out in 2015.

The campaign has called on ISO to withdraw the standard. However, I would happily settle for damaging its credibility as a standard for “any organization when performing any form of software testing”. That aim is more than just being ambitious. It stretches credulity.

Testing standards are beneficial for testing (I hope you agree): they implement some new practices and can school the untutored. Still, what is wrong with the 29119 standard?

The content of ISO 29119 is very old-fashioned. It is based on a world view from the 1970s and 1980s that confused rigour and professionalism with massive documentation. It really is the last place to go to look for new ideas. Newcomers to testing should be encouraged to look elsewhere for ideas about how to perform good testing.

Testing standards can be beneficial in a particular organization. They may even be beneficial in industries that have specific needs, such as medical devices and drugs, and financial services. However, they have to be very carefully written and they must maintain a clear distinction between true standards and overly prescriptive guidance. ISO 29119 fails to make the distinction. It is far too detailed and prescriptive.

The three documents that have been issued so far add up to 89,000 words over 270 pages. That’s as long as many novels. In fact it’s as long as George Orwell’s “Animal Farm” plus Erich Maria Remarque’s “All Quiet on the Western Front” combined. It’s almost exactly the same length as Orwell’s “1984” and Jane Austen’s “Persuasion”.

That is ridiculously long for a standard. The Institute of Internal Auditors’ “International Standards for the Professional Practice of Internal Auditing” runs to only 26 pages and 8,000 words. The IIA’s standards are high level statements of principle, covering all types of auditing. More detailed guidance about how to perform audits in particular fields is published separately. That guidance doesn’t amount to a series of “you shall do x, y & z”. It offers auditors advice on potential problems, and gives useful tips to guide the inexperienced. The difference between standards and guidance is crucial, and ISO blurs that distinction.

The defenders of ISO 29119 argue that tailored compliance is possible; testers don’t have to follow the full standard. There are two problems with that. Tailored compliance requires agreement from all of the stakeholders for all of the tasks that won’t be performed, and documents that won’t be produced. There are hundreds of mandatory tasks and documents, so even tailored compliance imposes a huge bureaucratic overhead. The second problem is that tailored compliance will look irresponsible. The marketing of the standard appeals to fear. Stuart Reid has put it explicitly.

“Imagine something goes noticeably wrong. How easy will you find it to explain that your testing doesn’t comply with international testing standards? So, can you afford not to use them?”

Anyone who is motivated by that to introduce ISO 29119 is likely to believe that full compliance must be safer and more responsible than tailored compliance. The old IEEE 829 test documentation standard also permitted tailored compliance. That wasn’t the way it worked out in practice. Organizations which followed the standard didn’t tailor their compliance and produced far too much wasteful documentation. ISO should have thought more carefully about how they would promote the standard and what the effects might be of their appeal to fear.

And in the end, what are the results of your campaign?

It’s hard to say what the results are. No-one seriously expected that ISO would roll over and withdraw the standard. I did think that ISO would make a serious attempt to defend it, and to engage with the arguments of the Stop 29119 campaigners. That hasn’t happened. The result has been that when people search for information about ISO 29119 they can’t fail to find articles by Stop 29119 campaigners. They will find nothing to refute them. I think that damages ISO’s credibility. ISO is now caught in a bind. It can ignore the opposition, and therefore concede the field to its opponents. Or it can try to engage in debate and reveal the lack of credible foundations of the standard.

I think the campaign has been successful in demonstrating that the standard lacks credibility in a very important part of the testing profession and therefore lacks the consensus that a standard should enjoy. I hope that if the campaign keeps going then it will prevent many organizations from forcing the standard onto their testers and thus forcing them to do less effective and efficient testing. Sometimes it feels like Stop 29119 is very negative, but if we can persuade people not to adopt the standard then I think that makes a positive contribution towards more testers doing better testing.

Posted by: James Christie | January 22, 2015

Service Virtualization interview about usability

Service VirtualizationI was asked to take part in this interview by George Lawton of Service Virtualization. Initially I wasn’t enthusiastic because I didn’t think I would have much to say. However, the questions set me thinking, and I felt they were relevant to my experience so I was happy to take part. It gave me something to do while I was waiting to fly back from EuroSTAR in Dublin!

How does usability relate to the notion of the purpose of a software project?

When I started in IT over 30 years ago I never heard the word usability. It was “user friendliness”, but that was just a nice thing to have. It was nice if your manager was friendly, but that was incidental to whether he was actually good at the job. Likewise, user friendliness was incidental. If everything else was ok then you could worry about that, but no-one was going to spend time or money, or sacrifice any functionality just to make the application user friendly. And what did “user friendly” mean anyway. “Who knows? Who cares? We’ve got serious work do do. Forget about that touchy feely stuff.”

The purpose of software development was to save money by automating clerical routines. Any online part of the system was a mildly anomalous relic of the past. It was just a way of getting the data into the system so the real work could be done. Ok, that’s an over-simplification, but I think there’s enough truth in it to illustrate why developers just didn’t much care about the users and their experience. Development moved on from that to changing the business, rather than merely changing the business’s bureaucracy, but it took a long time for these attitudes to shift.

The internet revolution turned everything upside down. Users are no longer employees who have to put up with whatever they’re given. They are more likely to be customers. They are ruthless and rightly so. Is your website confusing? Too slow to load? Your customers have gone to your rivals before you’ve even got anywhere near their credit card number.

The lesson that’s been getting hammered into the heads of software engineers over the last decade or so is that usability isn’t an extra. I hate the way that we traditionally called it a “non-functional requirement”, or one of the “quality criteria”. Usability is so important and integral to every product that telling developers that they’ve got to remember it is like telling drivers they’ve got to remember to use the steering wheel and the brakes. If they’re not doing these things as a matter of course they shouldn’t be allowed out in public. Usability has to be designed in from the very start. It can’t be considered separately.

What are the main problems in specifying for and designing for software usability?

Well, who’s using the application? Where are they? What is the platform? What else are they doing? Why are they using the application? Do they have an alternative to using your application, and if so, how do you keep them with yours? All these things can affect decisions you take that are going to have a massive impact on usability.

It’s payback time for software engineering. In the olden days it would have been easy to answer these questions, but we didn’t care. Now we have to care, and it’s all got horribly difficult.

These questions require serious research plus the experience and nous to make sound judgements with imperfect evidence.

In what ways do organisations lose track of the usability across the software development lifecycle?

I’ve already hinted at a major reason. Treating usability as a non-functional requirement or quality criterion is the wrong approach. That segregates the issue. It’s treated as being like the other quality criteria, the “…ities” like security, maintainability, portability, reliability. It creates the delusion that the core function is of primary importance and the other criteria can be tackled separately, even bolted on afterwards.

Lewis & Rieman came out with a great phrase fully 20 years ago to describe that mindset. They called it the peanut butter theory of usability. You built the application, and then at the end you smeared a nice interface over the top, like a layer of peanut butter (PDF, opens in new tab).

“Usability is seen as a spread that can be smeared over any design, however dreadful, with good results if the spread is thick enough. If the underlying functionality is confusing, then spread a graphical user interface on it. … If the user interface still has some problems, smear some manuals over it. If the manuals are still deficient, smear on some training which you force users to take.”

Of course they were talking specifically about the idea that usability was a matter of getting the interface right, and that it could be developed separately from the main application. However, this was an incredibly damaging fallacy amongst usability specialists in the 80s and 90s. There was a huge effort to try to justify this idea by experts like Hartson & Hix, Edmonds, and Green. Perhaps the arrival of Object Oriented technology contributed towards the confusion. A low level of coupling so that different parts of the system are independent of each other is a good thing. I wonder if that lured usability professionals into believing what they wanted to believe, that they could be independent from the grubby developers.

Usability professionals tried to persuaded themselves that they could operate a separate development lifecycle that would liberate them from the constraints and compromises that would be inevitable if they were fully integrated into development projects. The fallacy was flawed conceptually and architecturally. However, it was also a politically disastrous approach. The usability people made themselves even less visible, and were ignored at a time when they really needed to be getting more involved at the heart of the development process.

As I’ve explained, the developers were only too happy to ignore the usability people. They were following methods and lifecycles that couldn’t easily accommodate usability.

How can organisations incorporate the idea of usability engineering into the software development and testing process?

There aren’t any right answers, certainly none that will guarantee success. However, there are plenty of wrong answers. Historically in software development we’ve kidded ourselves thinking that the next fad, whether Structured Methods, Agile, CMMi or whatever, will transform us into rigorous, respected professionals who can craft high quality applications. Now some (like Structured Methods) suck, while others (like Agile) are far more positive, but the uncomfortable truth is that it’s all hard and the most important thing is our attitude. We have to acknowledge that development is inherently very difficult. Providing good UX is even harder and it’s not going to happen organically as a by-product of some over-arching transformation of the way we develop. We have to consciously work at it.

Whatever the answer is for any particular organisation it has to incorporate UX at the very heart of the process, from the start. Iteration and prototyping are both crucial. One of the few fundamental truths of development is that users can’t know what they want and like till they’ve seen what is possible and what might be provided.

Even before the first build there should have been some attempt to understand the users and how they might be using the proposed product. There should be walkthroughs of the proposed design. It’s important to get UX professionals involved, if at all possible. I think developers have advanced to the point that they are less likely to get it horribly wrong, but actually getting it right, and delivering good UX is asking too much. For that I think you need the professionals.

I do think that Agile is much better suited to producing good UX than traditional methods, but there are still dangers. A big one is that many Agile developers are understandably sceptical about anything that smells of Big Up-Front Analysis and Design. It’s possible to strike a balance and learn about your users and their needs without committing to detailed functional requirements and design.

How can usability relate to the notion of testable hypothesis that can lead to better software?

Usability and testability go together naturally. They’re also consistent with good development practice. I’ve worked on, or closely observed, many applications where the design had been fixed and the build had been completed before anyone realised that there were serious usability problems, or that it would be extremely difficult to detect and isolate defects, or that there would be serious performance issues arising from the architectural choices that had been made.

We need to learn from work that’s been done with complexity theory and organisation theory. Developing software is mostly a complex activity, in the sense that there are rarely predictable causes and effects. Good outcomes emerge from trialling possible solutions. These possibilities aren’t just guesswork. They’re based on experience, skill, knowledge of the users. But that initial knowledge can’t tell you the solution, because trying different option changes your understanding of the problem. Indeed it changes the problem. The trials give you more knowledge about what will work. So you have to create further opportunities that will allow you to exploit that knowledge. It’s a delusion that you can get it right first time just by running through a sequential process. It would help if people thought of good software as being grown rather than built.

Posted by: James Christie | January 19, 2015

“Fix on failure” – a failure to understand failure

Wikipedia is a source that should always be treated with extreme scepticism and the article on the “Year 2000 problem” is a good example. It is now being widely quoted on the subject, even though it contains some assertions that are either clearly wrong, or implausible, and lacking any supporting evidence.

Since I wrote about ”Y2K – why I know it was a real problem” last week I’ve been doing more reading around the subject. I’ve been struck by how often I’ve come across arguments, or rather assertions, that a “fix on failure” response would have been the best response. Those who argue that Y2K was a big scare and a scam usually offer a rewording of this gem from the Wikipedia article.

”Others have claimed that there were no, or very few, critical problems to begin with, and that correcting the few minor mistakes as they occurred, the “fix on failure” approach, would have been the most efficient and cost-effective way to solve the problem.”

There is nothing to back up these remarkable claims, but Wikipedia now seems to be regarded as an authoritative source on Y2K.

I want to talk about the infantile assertion that that “fix on failure was the right approach. Infantile? Yes, I use the word carefully. It ignores big practical problems that would have been obvious to anyone with experience of developing and supporting large, complicated applications. Perhaps worse, it betrays a dangerously naive understanding of “failure”, a misunderstanding that it shares with powerful people in software testing nowadays. Ok, I’m talking about the standards lobby there.

”Fix on failure” – deliberate negligence

Firstly, “fix on failure” doesn’t allow for the seriousness of the failure. As Larry Burkett wrote;

“It is the same mindset that believes it is better to put an ambulance at the bottom of a cliff rather than a guardrail at the top”.

“Fix on failure” could have been justified only if the problems were few and minor. That is a contentious assumption that has to be justified. However, the only justification on offer is that those problems which occurred would have been suitable for “fix on failure”. It is a circular argument lacking evidence or credibility, and crucially ignores all the serious problems that were prevented.

Once one acknowledges that there were a huge number of problems to be fixed one has to deal with the practical consequences of “fix on failure”. That approach does not allow for the difficulty of managing masses of simultaneous failures. These failures might not have been individually serious, but the accumulation might have been crippling. It would have been impossible to fix them all within acceptable timescales. There would have been insufficient staff to do the work in time.

Release and configuration management would have posed massive problems. If anyone tells you Y2K was a scam ask them how they would have handled configuration and release management when many interfacing applications were experiencing simultaneous problems. If they don’t know what you are talking about then they don’t know what they are talking about.

Of course not all Y2K problems would have occurred on 1st January 2000. Financial applications in particular would have been affected at various points in 1999 and even earlier. That doesn’t affect my point, however. There might have been a range of critical dates across the whole economy, but for any individual organisation there would have been relatively few, each of which would have brought a massive, urgent workload.

Attempting to treat Y2K problems as if they were run of the mill, “business as usual” problems, as advocated by sceptics, betrays appalling ignorance of how a big IT shop works. They are staffed and prepared to cope with a relatively modest level of errors and enhancements in their applications. The developers who support applications aren’t readily inter-changeable. They’re not fungible burger flippers. Supporting a big complicated application requires extensive experience with that application. Staff have to be rotated in and out carefully and piecemeal so that a core of deep experience remains.

IT installations couldn’t have coped with Y2K problems in the normal course of events any more than garages could cope if all cars started to have problems. The Ford workshops would be overwhelmed when the Fords started breaking down, the Toyota dealers would seize up when the Toyotas suffered.

The idea that “fix on failure” was a generally feasible and responsible approach simply doesn’t withstand scrutiny. Code that wasn’t Y2K-compliant could be spotted at a glance. It was then possible to predict the type of error that might arise, if not always the exact consequences. Why on earth would anyone wait to see if one could detect obscure, but potentially serious distortions? Why would anyone wait to let unfortunate citizens suffer or angry customers complain?

The Y2K sceptics argue that organisations took expensive pre-emptive action because they were scared of being sued. Well, yes, that’s true, and it was responsible. The sceptics were advocating a policy of conscious, deliberate negligence. The legal consequences would quite rightly have been appalling. “Fix on failure” was never a serious contribution to the debate.

”Fix on failure” – a childlike view of failure

The practical objections to a “fix on failure” strategy were all hugely significant. However, I have a deeper, fundamental objection. “Fix on failure” is a wholly misguided notion for anything but simple applications. It is based on a childlike, binary view of failure. We are supposed to believe an application is either right or wrong; it is working or it is broken; that if there is a Y2K problem then the application obligingly falls over. Really? That is not my experience.

With complicated financial applications an honest and constructive answer to the question “is the application correct?” would be some variant on “what do you mean by correct?”, or “I don’t know. It depends”. It might be possible to say the application is definitely not correct if it is producing obvious garbage. But the real difficulty is distinguishing between the seriously inaccurate, but plausible, and the acceptably accurate. Discussion of accuracy requires understanding of critical assumptions, acceptable margins of error, confidence levels, the nature and availability of oracles, and the business context of the application.

I’ve never seen any discussion of Y2K by one of the “sceptical” conspiracy theorists that showed any awareness of these factors. There is just the naïve assumption that a “failed” application is like a patient in a doctor’s surgery, saying “I’m sick, and here are my symptons”.

Complicated applications have to be nursed and constantly monitored to detect whether some new, extraneous factor, or long hidden bug, is skewing the figures. A failing application might appear to be working as normal, but it would be gradually introducing distortions.

Testing highly complicated applications is not a simple, binary exercise of determining “pass or fail”. Testing has to be a process of learning about the application and offering an informed opinion about what it is, and what it does. That is very different from checking it against our preconceptions, which might have been seriously flawed. Determining accuracy is more a matter of judgement than inspection.

Throughout my career I have seen failures and problems of all types, with many different causes. However, if there is a single common underlying theme then the best candidate would be the illusion that development is like manufacturing, with a predictable end product that can be checked. The whole development and testing process is then distorted to try and fit the illusion.

The advocates of Y2K “fix on failure” had much in common with the ISO 29119 standards lobby. Both shared that “manufacturing” mindset, that unwillingness to recognise the complexity of development, and the difficulty of performing good, effective testing. Both looked for certainty and simplicity where it was not available.

Good testers know that an application is not necessarily “correct” just because it has passed the checks on the test script. Likewise failure is not an absolute concept. Ignoring these truths is ignoring reality, trying to redefine it so we can adopt practices that seem more efficient and effective. I suspect the mantra that “fix on failure would have been more effective and efficient” has its roots with economists, like the Australian Quiggin, who wanted to assume complexity away. See this poor paper (PDF, opens in a new tab).

Doing the wrong thing is never effective. Negligence is rarely efficient. Reality is uncomfortable. We have to understand that and know what we are talking about before coming up with simplistic, snake-oil solutions that assume simplicity where the reality is complexity.

Posted by: James Christie | January 12, 2015

Y2K – why I know it was a real problem

It’s confession time. I was a Y2K test manager for IBM. As far as some people are concerned that means I was party to a huge scam that allowed IT companies to make billions out of spooking poor deluded politicians and the public at large. However, my role in Y2K means I know what I am talking about, so when I saw some recent comment that it was all nothing more than hype I felt the need to set down my first hand experience. At the time, and in the immediate aftermath of Y2K, we were constrained by client confidentiality from explaining what we did, but 15 years on I feel comfortable about speaking out.

Was there a huge amount of hype? Unquestionably.

Was money wasted? Certainly, but show me huge IT programmes where that hasn’t happened.

Would it have been better to do nothing and adopt a “fix on failure” approach? No, emphatically not as a general rule and I will explain why.

There has been a remarkable lack of studies of Y2K and the effectiveness of the actions that were taken to mitigate the problem. The field has been left to those who saw few serious incidents and concluded that this must mean there could have been no serious problem to start with.

The logic runs as follows. Action was taken in an attempt to turn outcome X into outcome Y. The outcome was Y. Therefore X would not have happened anyway and the action was pointless. The fallacy is so obvious it hardly needs pointing out. If action was pointless then the critics have to demonstrate why the action that was taken had no impact and why outcome Y would have happened regardless. In all the years since 2000 I have seen only unsubstantiated assertion and reference to those countries, industries and sectors where Y2K was not going to be a signficant problem anyway. The critics always ignore the sectors where there would have been massive damage.

An academic’s flawed perspective

This quote from Anthony Finkelstein, professor of software systems engineering at University College London, on the BBC website, is typical of the critics’ reasoning.

”The reaction to what happened was that of a tiger repellent salesman in Golders Green High Street,” says Finkelstein. ‘No-one who bought my tiger repellent has been hurt. Had it not been for my foresight, they would have.’ “

The analogy is presumably flippant and it is entirely fatuous. There were no tigers roaming the streets of suburban London. There were very significant problems with computer systems. Professor Finkelstein also used the analogy back in 2000 (PDF, opens in new tab).

In that paper he made a point that revealed he had little understanding of how dates were being processed in commercial systems.

”In the period leading up to January 1st those who had made dire predictions of catastrophe proved amazingly unwilling to adjust their views in the face of what was actually happening. A good example of this was September 9th 1999 (9/9/99). On this date data marked “never to expire” (realised as expiry 9999) would be deleted bringing major problems. This was supposed to be a pre-shock that would prepare the way for the disaster of January 1st. Nothing happened. Now, if you regarded the problem as a serious threat in the first place, this should surely have acted as a spur to some serious rethinking. It did not.”

I have never seen a date stored in the way Finkelstein describes, 9th September 1999 being held as 9999. If that were done there would be no way to distinguish 1st December 2014 from 11th February 2014. Both would be 1122014. Dates are held either in the form 090999, with leading zeroes so the dates can be interpreted correctly, or with days, months and years in separate sub-fields for simpler processing. Programmers who flooded date fields with the integer 9 would have created 99/99/99, which could obviously not be interpreted as 9th September 1999.

Anyway, the main language of affected applications was Cobol, and the convention was for programmers to move “high values”, i.e. the highest possible value the compiler could handle, into the field rather than nines. “High values” doesn’t translate into any date. Why doesn’t Finkelstein know this sort of basic thing if he’s setting himself up as a Y2K expert? I never heard any concern about 9/9/99 at the time, and it certainly never featured in our planning or work. It is a straw man, quite irrelevant to the main issue.

In the same paper from 2000 Finkelstein made another claim that revealed his lack of understanding of what had actually been happening.

September 9th 1999 is only an example. Similar signs should have been evident on January 1st 1999, the beginning of the financial year 99-00, December 1st, and so on. Indeed assuming, as was frequently stated, poor progress had been made on Y2K compliance programmes we would have anticipated that such early problems would be common and severe. I see no reason to suppose that problems should not have been more frequent (or at any rate as frequent) in the period leading up to December 31st 1999 than afterwards given that transactions started in 1999 may complete in 2000, while after January 1st new transactions start and finish in the new millennium.

Finkelstein is entirely correct that the problem would not have suddenly manifested itself in January 2000, but he writes as if this is an insight the practitioners lacked at the front line. At General Accident the first critical date that we had to hit was the middle of October 1998, when renewal invitations for the first annual insurance contracts extending past December 1999 would be issued. At various points over the next 18 months until the spring of 2000 all the other applications would hit their trigger dates. Everything of significance had been fixed, tested and re-implemented by September 1999.

We knew that timetable because it was our job to know it. We were in trouble not because time was running out till 31/12/1999, but because we had little time before 15/10/1998. We made sure we did the right work at the right time so that all of the business critical applications were fixed in time. Finkelstein seems unaware of what was happening. A massed army of technical staff were dealing with a succession of large waves sweeping towards them over a long period, rather than a single tsunami at the millennium.

Academics like Finkelstein have a deep understanding of the technology and how it can, and should be used, but this is a different matter from knowing how it is being applied by practitioners acting under extreme pressure in messy and complex environments. These practitioners aren’t doing a bad job because of difficult conditions, lack of knowledge and insufficient expertise. They are usually doing a good job, despite those difficult conditions, drawing on vast experience and deep technical knowledge.

Comments such as those of Professor Finkelstein betray a lack of respect for practitioners, as if the only worthwhile knowledge is that possessed by academics.

What I did in the great Y2K “scare”

Let me tell you why I was recruited as a Y2K test manager by IBM. I had worked as a computer auditor for General Accident. A vital aspect of that role had been to understand how all the different business critical applications fitted together, so that we could provide an overview to the business. We could advise on the implications and risks of amending applications, or building new ones to interface with the existing applications.

A primary source - my report explaining the problem with a business critical application

A primary source – my report explaining the problem with a business critical application

Shortly before General Accident’s Y2K programme kicked off I was transferred to IBM under an outsourcing deal. General Accident wanted a review performed of a vital back office insurance claims system. The review had to establish whether the application should be replaced before Y2K, or converted. Senior management asked IBM that I should perform the review because I was considered the person with the deepest understanding of the business and technical issues. The review was extremely urgent, but it was delayed by a month till I had finished my previous project.

I explained in the review exactly why the system was business critical and how it was vital to the company’s reserving, and therefore the production of the company accounts. I explained how the processing was all date dependent, and showed how and when it would fail. If the system was unavailable then the accountants and premium setters would be flying blind, and the external auditors would be unable to sign off the company accounts. The risks involved in trying to replace the application in the available time were unacceptable. The best option was therefore to make the application Y2K compliant. This advice was accepted.

As soon as I’d completed the review IBM moved me into a test management position on Y2K, precisely because I had all the business and technical experience to understand how eveything fitted together, and what the implications of Y2K would be. The first thing I did was to write a suite of SAS programs that crawled through the production code libraries, job schedules and job control language libraries to track the relationship between programs, data and schedules. For the first time we had a good understanding of the inventory, and which assets depended on each other. Although I was nominally only the test manager I drew up the conversion strategy and timetable for all the applications within my remit, based on my accumulated experience and the new knowledge we’d derived from the inventory.

An insurance company’s processing is heavily date dependent. Premiums are earned on a daily basis, with the appropriate proportion being refunded if a policy is cancelled mid-term. Claims are paid only if the appropriate cover is in place on the date that the incident occurred. Income and expenditure might be paid on a certain date, but then spread over many years. If the date processing doesn’t work then the company can’t take in money, or pay it out. It cannot survive. The processing is so complex that individual errors in production often require lengthy investigation and fixing, and then careful testing. The notion that a “fix on failure” response to Y2K would have worked is risible.

We fixed the applications, taking a careful, triaged risk-based approach. The most date sensitive programs within the most critical applications received the most attention. Some applications were triaged out of sight. For these, “fix on failure” was appropriate.

We tested the converted applications in simulated runs across the end of 1999, in 2000 and again in 2004. These simulations exposed many more problems not just with our code, but also with all the utility and housekeeping routines and tools. In these test runs we overrode the mainframe system date within the test runs.

In the final stage of testing we went a step further. We booted up a mainframe LPAR (logical partition) to run with the future dates. I managed this exercise. We had a corner of the office with a sign saying “you are now entering 2000”, and everything was done with future dates. This exercise flagged up further problems with code that we had been confident would run smoothly.

Y2K was a fascinating time in my career because I was at a point that I now recognise as a sweet spot. I was still sufficiently technically skilled to do anything that my team members could do, even being called on to fix overnight production problems. However, I was sufficiently confident, experienced and senior to be able to give presentations to the most senior managers explaining problems and what the appropriate solutions would be.

December 19th 1999, Mary, her brother Malcolm & I in the snow. Not panicking about Y2K.

December 19th 1999, Mary, her brother Malcolm & I in the snow. Not panicking much about Y2K.

For these reasons I know what I’m talking about when I write that Y2K was a huge problem that had to be tackled. The UK’s financial sector would have suffered a massive blow if we had not fixed the problem. I can’t say how widespread the damage might have been, but I do know it would have been appalling.

My personal millennium experience

What was on my mind on 31st December 1999

What was on my mind on 31st December 1999

When I finished with Y2K in September 1999, at the end of the future mainframe exercise, at the end of a hugely pressurised 30 months, I negotiated seven weeks leave and took off to Peru. IBM could be a great employer at times! My job was done, and I knew that General Accident, or CGU as it had evolved into by then, would be okay. There would inevitably be a few glitches, but then there always are in IT. I was so relaxed about Y2K that on my return from Peru it was the least of my concerns. There was much more interesting stuff going on in my life.

I got engaged in December 1999, and on 31st December Mary and I bought our engagement and wedding rings. That night we were at a wonderful party with our friends, and after midnight we were on Perth’s North Inch to watch the most spectacular fireworks display I’ve ever seen. 1st January 2000? It was a great day that I’ll always remember happily. It was far from being a disaster, and that was thanks to people like me.

PS – I have written a follow up article explaining why “fix on failure” was based on an infantile view of software failure.

Posted by: James Christie | December 30, 2014

2014 in review – WordPress’s report on my blog

This is the standard WordPress annual report for my blog in 2014.

Here's an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 13,000 times in 2014. If it were a concert at Sydney Opera House, it would take about 5 sold-out performances for that many people to see it.

Click here to see the complete report.

Posted by: James Christie | December 5, 2014

Interview about Stop 29119 with Service Virtualization

This is an email interview I gave to Jeff Bounds of Service Virtualization about the Stop 29119 campaign in October 2014. It appeared in two parts, ”James Christie explains resistance to ISO standards” and “ISO 29119 is damaging, so ignore it, advises James Christie”.

The full interview in the original format follows. Jeff’s questions are in red.

What is ISO 29119?

And why is it important to the software testing field?

ISO 29119 is described by the International Organization for Standardization (ISO) as “an internationally agreed set of standards for software testing that can be used within any software development life cycle or organization” .

When ISO are promoting a standard that is intended to cover everything that testers do then that is a big deal for all testers. We cannot afford to ignore it.

What’s wrong with ISO 29119?

Why do you oppose ISO29119?

I think the question is framed the wrong way round. A standard requires consensus and it has to be relevant. It is up to the promoters of the standard to justify it. They’ve not made any serious, credible attempt to do so. Their interpretation of agreement and consensus is restricted to insiders, to those already in the working group developing the standard. Those testers who don’t believe that formal, generic standards are the way ahead have been ignored.

Even before I found out about ISO 29119 I was opposed to it in principle. Standards in general are a a good thing, but software testing is an intellectual activity that doesn’t lend itself to standardization.

There is a wide range of evidence from pyschology, sociology and management studies to back up the argument that document driven standards like ISO 29119 are counter-productive. They just don’t fit with the way that people think and work in organizations. On the other hand the defenders to standards have never bothered to refute these arguments, or even address them. They simply assert, without evidence, that standardization is good for testing. Typically they make spurious arguments that because standards are a good thing in many contexts then they must be a good thing for testing. It’s logical nonsense.

This is all without considering the detailed content of the standard. It is dated, excessively bureaucratic, prescriptive and badly written.

Sure, the standard does say that it is possible to apply parts of the standard selectively and claim “tailored conformance”. However, the standard requires agreement with stakeholders for each departure from the standard. For any significant project that means documented agreement with many people on all sorts of detailed points.

Dr Stuart Reid has claimed that he wants to see companies and governments mandating the use of ISO 29119 in contracts. Lawyers and procurement managers don’t understand testing, as Dr Reid concedes. He sees that as being a case for providing them with a standard they can mandate.

My perspective is that such people, precisely because they don’t understand testing, will require full compliance. In their eyes, full compliance will seem responsible and professional while tailored compliance will look like cutting corners. That’s the way that people react. It’s no good shrugging that off by saying people don’t have to act that way.

All the evidence supports the opponents because they know how people behave. There is no evidence to support the standards lobby.

Isn’t standardisation good?

An argument in favor of ISO 29119 is that it would bring standardization to a software testing process that historically has seen people using a variety of techniques and methods, rather than one set way of doing things. What’s wrong with that?

Everything. Testing has to fit the problem. It seems crazy to think that everyone should be expected to do the same things. Again, why should that be the case? If everyone is doing the same then most people will be doing the wrong thing.

Are opponents trying to save their jobs?

Some proponents of ISO 29119 could also argue that opponent of the standard are simply trying to save their jobs, when automation and simulation represent a better, faster and cheaper way of doing testing. What are your thoughts about that?

Even if it were the case that opponents were simply concerned about their jobs it would still be a compelling argument against ISO 29119. As the ISO working group has conceded many opponents are more expert than the average tester. Why should they have to change the way they operate, for the worse, or pass up opportunities for work?

I could actually earn more money by collaborating with the standard and cleaning up the mess it will create. There will be a good market for test consultants to do that. However, I am not interested in that sort of work.

Anyway, opponents are unhappy about the standard, not automation and simulation, which are extremely important and valuable at the right time. The standard isn’t based on the assumption of an automated approach and the test process paper (Part 2) doesn’t even mention simulation. The discussion about automation is a quite separate matter from the debate about ISO 29119.

Can ISO 29119 provide a baseline?

Dr. Stuart Reid recently argued that ISO 29119 would, among other things, help define good practices in testing, along with providing a baseline to compare different test design techniques. What are your thoughts about that? (His full argument is here).

I don’t think the standards deals with good testing practices. It advocates what it sees as good practices in test management, specifically documentation. It is really a documentation standard rather than a testing standard. It is a classic case of confusing the process with the real work. The difference is crucial. It is like the difference between the map and the territory. The map is a guide to the territory, but it is not the real thing.

Dr Reid hasn’t argued the point about a baseline. He has merely asserted it without evidence or explanation. I’m afraid that is typical of ISO’s approach. Even if it is so, I don’t think testers should have to tie themselves in knots for the benefit of others.

What is the alternative?

If you believe ISO 29119 isn’t the solution, then what is the best standard for software testing, and why?

As I’ve said above I don’t think a generic standard is appropriate for testing. A good alternative to doing wasteful and damaging things is to ignore them. There are many sound alternatives to ISO 29119. I don’t think it is up to the opponents of the standard to justify these. They are being applied and they work. Where is the evidence that ISO 29119 works?

Older Posts »

Categories

Follow

Get every new post delivered to your Inbox.

Join 82 other followers