Precertification of low risk digital products by FDA

Occasionally I am asked why I use Twitter. “Why do you need to know what people have had for breakfast? Why get involved with all those crazies?”. I always answer that it’s easy to avoid the bores and trolls (all the easier if one is a straight, white male I suspect) and Twitter is a fantastic way of keeping in touch with interesting people, ideas and developments.

A good recent example was this short series of tweets from Griffin Jones.
This was the first I’d heard of the pre-certification program proposed by the Food and Drug Administration (FDA), the USA’s federal body regulating food, drugs and medical devices.

Griffin is worried that IT certification providers will rush to sell their services. My first reaction was to agree, but on consideration I’m cautiously more optimistic.

Precertification would be for organisations, not individuals. The certification controversy in software testing relates to certifying individuals through ISTQB. FDA precertification is aimed at organisations, which would need “an existing track record in developing, testing, and maintaining software products demonstrating a culture of quality and organizational excellence measured and tracked by Key Performance Indicators (KPIs) or other similar measures.” That quote is from the notification for the pilot program for the precertification scheme, so it doesn’t necessarily mean the same criteria would apply to the final scheme. However, the FDA’s own track record of highly demanding standards (no, not like ISO 29119) that are applied with pragmatism provides grounds for optimism.

Sellers of CMMi and TMMi consultancy might hope this would give them a boost, but I’ve not heard much about these in recent years. It could be a tough sell for consultancies to push these models at the FDA when it is wanting to adopt more lightweight governance with products that are relatively low risk to consumers.

The FDA action plan (PDF, opens in new tab) that announced the precertification program did contain a word that jumped out at me. The FDA will precertify companies “who demonstrate a culture of quality and organizational excellence based on objective criteria”.

“Objective” might provide an angle for ISO 29119 proponents to exploit. A standard can provide an apparently objective basis for reviewing testing. If you don’t understand testing you can check for compliance with the standard. In a sense that is objective. Checkers are not bringing their own subjective opinions to the exercise. Or are they? The check is based on the assumption that the standard is relevant, and that the exercise is useful. In the absence of any evidence of efficacy, and there is no such evidence for ISO 29119, then using ISO 29119 as the benchmark is a subjective choice. It is used because it makes the job easier; it facilitates checking for compliance, it has nothing to do with good testing.

“Objective” should mean something different, and more constructive, to the FDA. They expect evidence of testing to be sufficient in quality and quantity so that third parties would have to come to the same conclusion if they review it, without interpretation by the testers. Check out Griffin Jones’ talk about evidence on YouTube.


Incidentally, the FDA’s requirements are strikingly similar to the professional standards of the Institute of Internal Auditors (IIA). In order to form an audit opinion auditors must gather sufficient information that is “factual, adequate, and convincing so that a prudent, informed person would reach the same conclusions as the auditor.” The IIA also has an interesting warning in its Global Technology Audit Guide, “Management of IT Auditing“. It warns IT auditors of the pitfalls of auditing against standards or benchmarks that might be dated or useless just because they want something to “audit against”.

So will ISO, or some large consultancies, try to influence the FDA to endorse ISO 29119 on the grounds that it would provide an objective benchmark against which to assess testing? That wouldn’t surprise me at all. What would surprise me is if the FDA bought into it. I like to think they are too smart for that. I am concerned that some day external political pressure might force adoption of ISO 29119. There was a hint of that in the fallout from the problems with the US’s Healthcare.gov website. Politicians who are keen to see action, any action, in a field they don’t understand always worry me. That’s another subject, however, and I hope it stays that way.

Dave Snowden’s Cynefin masterclass in New York, 2nd May 2017 – part 2

This post is the second of two discussing Dave Snowden’s recent Cynefin masterclass at the Test Leadership Congress in New York. I wrote the series with the support of the Committee on Standards and Professional Practices of the Association for Software Testing. The posts originally appeared on the AST site.

In the first I gave an overview of Cynefin and explained why I think it is important, and how it can helpfully shape the way we look at the world and make sense of the problems we face. In this post I will look at some of the issues raised in Dave’s class and discuss their relevance to development and testing.

The dynamics between domains

Understanding that the boundaries between the different domains are fluid and permeable is crucial to understanding Cynefin. A vital lesson is that we don’t start in one domain and stay there; we can and should move between them. Even if we ignore that lesson reality will drag us from one domain to another. Dave said “all the domains have value – it’s the ability to move between them that is key”.

The Cynefin dynamics are closely tied to the concept of constraints, which are so important to Cynefin that they act as differentiators between the domains. You could say that constraints define the domains.

Constraint is perhaps a slightly misleading word. In Cynefin terms it is not necessarily something that compels or prevents certain behaviour. That does apply to the Obvious domain, where the constraints are fixed and rigid. The constraints in the Complicated domain govern behaviour, and can be agreed by expert consensus. In the Complex domain the constraints enable action, rather than restricting it or compelling it. They are a starting point rather than an end. In Chaos there are no constraints.

Dave Snowden puts it as follows, differentiating rules and heuristics.

“Rules are governing constraints, they set limits to action, they contain all possible instances of action. In contrast heuristics are enabling constraints, they provide measurable guidance which can adapt to the unknowable unknowns.”

If we can change the constraints then we are moving from one domain to another. The most significant dynamic is the cycle between Complex and Complicated.

Cynefin core dynamic - Complex to ComplicatedCrucially, we must recognise that if we are attempting something new, that involves a significant amount of uncertainty then we start in the Complex domain exploring and discovering more about the problem. Once we have a better understanding and have found constraints that allow us to achieve repeatable outcomes we have moved the problem to the Complicated domain where we can manage it more easily and exploit our new knowledge. If our testing reveals that the constraints are not producing repeatable results then it’s important to get back into the Complex domain and carry out some more probing experiments.

This is not a one off move. We have to keep cycling to ensure the solution remains relevant. The cadence, or natural flow of the cycle will vary depending on the context. Different industries, or sectors, or applications will have different cadences. It could be days, or years, or anything in between. If, or rather when, our constraints fail to produce repeatable results we have to get back into the Complex domain.

This cycle between Complex and Complicated is key for software development in particular. Understanding this dynamic is essential in order to understand how Cynefin might be employed.

Setting up developments

As I said earlier the parts of a software development project that will provide value are where we are doing something new, and that is where the risk also lies. Any significant and worthwhile development project will start in the Complex domain. The initial challenge is to learn enough to move it to Complicated. Dave explained it as follows in a talk at Agile India in 2015.

“As things are Complex we see patterns, patterns emerge. We stabilise the patterns. As we stabilise them we can actually shift them into the Complicated domain. So the basic principle of Complexity-based intervention is you start off with multiple, parallel, safe-to-fail experiments, which is why Scrum is not a true Complexity technique; it does one thing in a linear way. We call (these experiments) a pre-Scrum technique. You do smaller experiments faster in parallel… So you’re moving from the centre of the Complex domain into the boundary, once you’re in the boundary you use Scrum to move it across the boundary.”

Such a safe-to-fail experiment might be an XP pair programming team being assigned to knock up a small, quick prototype.

So the challenge in starting the move from Complex to Complicated is to come up with the ideas for safe-to-fail pre-Scrum experiments that would allow us to use Scrum effectively.

Dave outlined the criteria that suitable experiments should meet. There should be some way of knowing whether the experiment is succeeding and it must be possible to amplify (i.e. reinforce) signs of success. Similarly, there should be some way of knowing whether it is failing and of dampening, or reducing, the damaging impact of a failing experiment. Failure is not bad. In any useful set of safe-to-fail experiments some must fail if we are to learn anything worthwhile The final criterion is that the experiment must be coherent. This idea of coherence requires more attention.

Dave Snowden explains the tests for coherence here. He isn’t entirely clear about how rigid these tests should be. Perhaps it’s more useful to regard them as heuristics than fixed rules, though the first two are of particular importance.

  • A coherent experiment, the ideas and assumptions behind it, should be compatible with natural science. That might seem like a rather banal statement, till you consider all the massive IT developments and change programmes that were launched in blissful ignorance of the fact that science could have predicted inevitable failure.
  • There should be some evidence from elsewhere to support the proposal. Replicating past cases is no guarantee of success, far from it, but it is a valid way to try and learn about the problem.
  • The proposal should fit where we are. It has to be consistent to some degree with what we have been doing. A leap into the unknown attempting something that is utterly unfamiliar is unlikely to gain any traction.
  • Can the proposal pass a series of “ritual dissent challenges? These are a formalised way of identifying flaws and refining possible experiments.
  • Does the experiment reflect an unmet, unarticulated need that has been revealed by sense-making, by attempts to make sense of the problem?

The two latter criteria refer explicitly to Cynefin techniques. The final one, identifying unmet needs, assumes the use of Cognitive Edge’s SenseMaker. Remember Fred Brooks’ blunt statement about requirements? Clients do not know what they want. They cannot articulate their needs if they are asked directly. They cannot envisage what is possible. Dave Snowden takes that point further. If users can articulate their needs than you’re dealing with a commoditized product and the solution is unlikely to have great value. Real values lies in meeting needs that users are unaware of and that they cannot articulate. This has always been so, but in days of yore we could often get away with ignoring that problem. Most applications were in-house developments that either automated back-office functions or were built around business rules and clerical processes that served as an effective proxy for true requirements. The inadequacies of the old structured methods and traditional requirements gathering could be masked.

With the arrival of web development, and then especially with mobile technology this gulf between user needs and the ability of developers to grasp them became a problem that could be ignored only through wilful blindness, admittedly a trait that has never been in short supply in corporate life. The problem has been exacerbated by our historic willingness to confuse rigour with a heavily documented, top-down approach to software development. Sense-making entails capturing large numbers of user reports in order to discern patterns that can be exploited. This appears messy, random and unstructured to anyone immured in traditional ways of development. It might appear to lack rigour, but such an approach is in accord with messy, unpredictable reality. That means it offers a more rigorous and effective way of deriving requirements than we can get by pretending that every development belongs in the Obvious domain. A simple lesson I’ve had to learn and relearn over the years is that rigour and structure are not the same as heavy documentation, prescriptive methods and a linear, top-down approach to problem solving.

This all raises big questions for testers. How do we respond? How do we get involved in testing requirements that have been derived this way and indeed the resulting applications? Any response to those questions should take account of another theme that really struck me from Dave’s day in New York. That was the need for resilience.

Resilience

The crucial feature of complex adaptive systems is their unpredictability. Applications operating in such a space will inevitably be subject to problems and threats that we would never have predicted. Even where we can confidently predict the type of threat the magnitude will remain uncertain. Failure is inevitable. What matters is how the application responds.

The need for resilience, with its linked themes of tolerance, diversity and redundancy, was a recurring message in Dave’s class. Resilience is not the same as robustness. The example that Dave gave was that a seawall is robust but a salt marsh is resilient. A seawall is a barrier to large waves and storms. It protects the harbour behind, but if it fails it does so catastrophically. A salt marsh protects inland areas by acting as a buffer, absorbing storm waves rather than repelling them. It might deteriorate over time but it won’t fail suddenly and disastrously.

An increasing challenge for testers will be to look for information about how systems fail, and test for resilience rather than robustness. Tolerance for failure becomes more important than a vain attempt to prevent failure. This tolerance often requires greater redundancy. Stripping out redundancy and maximizing the efficiency of systems has a downside, as I’ve discovered in my career. Greater efficiency can make applications brittle and inflexible. When problems hit they hit hard and recovery can be difficult.

it could be worse - not sure how, but it could be

The six years I spent working as an IT auditor had a huge impact on my thinking. I learned that things would go wrong, that systems would fail, and that they’d do so in ways I couldn’t have envisaged. There is nothing like a spell working as an auditor to imbue one with a gloomy sense of realism about the possibility of perfection, or even adequacy. I ended up like the gloomy old pessimist Eeyore in Winnie the Pooh. When I returned to development work a friend once commented that she could always spot one of my designs. Like Eeyore I couldn’t be certain exactly how things would go wrong, I just knew they would and my experience had taught me where to be wary. I was destined to end up as a tester.

Liz Keogh, in this talk on Safe-to-Fail makes a similar point.

“Testers are really, really good at spotting failure scenarios… they are awesomely imaginative at calamity… Devs are problem solvers. They spot patterns. Testers spot holes in patterns… I have a theory that other people who are in critical positions, like compliance and governance people are also really good at this”.

Testers should have the creativity to imagine how things might go wrong. In a Complex domain, working with applications that have been developed working with Cynefin, this insight and imagination, the ability to spot potential holes, will be extremely valuable. Testers have to seize that opportunity to remain relevant.

There is an upside to redundancy. If there are different ways of achieving the same ends then that diversity will offer more scope for innovation, for users to learn about the application and how it could be adapted and exploited to do more than the developers had imagined. Again, this is an opportunity for testers. Stakeholders need to know about the application and what it can do. Telling them that the application complied with a set of requirements that might have been of dubious relevance and accuracy just doesn’t cut it.

Conclusion

Conclusion is probably the wrong word. Dave Snowden’s class opened my mind to a wide range of new ideas and avenues to explore. This was just the starting point. These two essays can’t go very far in telling you about Cynefin and how it might apply to software testing. All I can realistically do is make people curious to go and learn more for themselves, to explore in more depth. That is what I will be doing, and as a starter I will be in London at the end of June for the London Tester Gathering. I will be at the workshop An Introduction to Complexity and Cynefin for Software Testers” being run by Martin Hynie and Ben Kelly where I hope to discuss Cynefin with fellow testers and explorers.

If you are going to the CAST conference in Nashville in August you will have the chance to hear Dave Snowden giving a keynote speech. He really is worth hearing.

Dave Snowden’s Cynefin masterclass in New York, 2nd May 2017 – part 1

This is part one of a two post series on Cynefin and software testing. I wrote it with the support of the Committee on Standards and Professional Practices of the Association for Software Testing. The posts originally appeared on the AST site.

Introduction

On May 2nd I attended Dave Snowden’s masterclass in New York, “A leader’s framework for decision making: managing complex projects using Cynefin”, at the Test Leadership Congress. For several years I have been following Dave’s work and I was keen to hear him speak in person. Dave is a gifted communicator, but he moves through his material fast, very fast. In a full day class he threw out a huge range of information, insights and arguments. I was writing frantically throughout, capturing key ideas and phrases I could research in detail later.

It was an extremely valuable day. All of it was relevant to software development, and therefore indirectly to testing. However, it would require a small book to do justice to Dave’s ideas. I will restrict myself to two posts in which I will concentrate on a few key themes that struck me as being particularly important to the testing community.

Our worldview matters

We need to understand how the world works or we will fail to understand the problems we face. We won’t recognise what success might look like, nor will we be able to anticipate unacceptable failure till we are beaten over the head, and we will select the wrong techniques to address problems.it ain't what you don't know that gets you into trouble - it's what you know for sure that just ain't do

Dave used a slide with this quote from Mark Twain. It’s an important point. Software development and testing has been plagued over the years by unquestioned assumptions and beliefs that we were paid well to take for granted, without asking awkward questions, but which just ain’t so. And they’ve got us into endless trouble.

A persistent damaging feature of software development over the years has been the illusion that is a neater, more orderly process than it really is. We craved certainty, fondly imagining that if we just put a bit more effort and expertise into the upfront analysis and requirements then good, experienced professionals can predictably develop high quality applications. It hardly ever panned out that way, and the cruel twist was that the people who finally managed to crank out something workable picked up the blame for the lack of perfection.

Fred Brooks made the point superbly in his classic paper, “No Silver Bullet”.

“The truth is, the client does not know what he wants. The client usually does not know what questions must be answered, and he has almost never thought of the problem in the detail necessary for specification. … in planning any software-design activity, it is necessary to allow for an extensive iteration between the client and the designer as part of the system definition.

…… it is really impossible for a client, even working with a software engineer, to specify completely, precisely, and correctly the exact requirements of a modern software product before trying some versions of the product.”

So iteration is required, but that doesn’t mean simply taking a linear process and repeating it. Understanding and applying Cynefin does not mean tackling problems in familiar ways but with a new vocabulary. It means thinking about the world in a different way, drawing on lessons from complexity science, cognitive neuroscience and biological anthropology.

Cynefin and ISO 29119

Cynefin is not based on successful individual cases, or on ideology, or on wishful thinking. Methods that are rooted in successful cases are suspect because of the survivorship bias (how many failed projects did the same thing?), and because people do not remember clearly and accurately what they did after the event; they reinterpret their actions dependent on the outcome. Cynefin is rooted in science and the way things are, the way systems behave, and the way that people behave. Developing software is an activity carried out by humans, for humans, mostly in social organisations. If we follow methods that are not rooted in reality, in science and which don’t allow for the way people behave then we will fail.

Dave Snowden often uses the philosophical phrase “a priori”, usually in the sense of saying that something is wrong a priori. A priori knowledge is based on theoretical deduction, or on mathematics, or the logic of the language in which the proposition is stated. We can say that certain things are true or false a priori, without having to refer to experience. Knowledge based on experience is a posteriori.

The distinction is important in the debate over the software testing standard ISO 29119. The ISO standards lobby has not attempted to defend 29119 on either a priori or on a posteriori grounds. The standard has its roots in linear, document driven development methods that were conspicuously unsuccessful. ISO were unable to cite any evidence or experience to justify their approach.

Defenders of the standard, and some neutrals, have argued that critics must examine the detailed content of the standard, which is extremely expensive to purchase, in order to provide meaningful criticism. However, this defence is misconceived because the standard itself is misconceived. The standard’s stated purpose, “is to define an internationally agreed set of standards for software testing that can be used by any organization when performing any form of software testing”. If ISO believes that a linear, prescriptive standard like ISO 29119 will apply to “any form of software testing” we can refer to Cynefin and say that they are wrong; we can say so confidently knowing that our stance is backed by reputable science and theory. ISO is attempting to introduce a practice that might, sometimes at best, be appropriate for the Obvious domain into the Complicated and Complex domains where it is wildly unsuitable and damaging. ISO is wrong a priori.

What is Cynefin?

The Wikipedia article is worth checking out, not least because Dave Snowden keeps an eye on it. This short video presented by Dave is also helpful.

The Cynefin Framework might look like a quadrant, but it isn’t. It is a collection of five domains that are distinct and clearly defined in principle, but which blur into one another in practice.

In addition to the four domains that look like the cells of a quadrant there is a fifth, in the middle, called Disorder, and this one is crucial to an understanding of the framework and its significance.

Cynefin is not a categorisation model, as would be implied if it were a simple matrix. It is not a matter of dropping data into the framework then cracking on with the work. Cynefin is a framework that is designed to help us make sense of what confronts us, to give us a better understanding of our situation and the approaches that we should take.

The first domain is Obvious, in which there are clear and predictable causes and effects. The second is Complicated, which also has definite causes and effects, but where the connections are not so obvious; expert knowledge and judgement is required.

The third is Complex, where there is no clear cause and effect. We might be able to discern it with hindsight, but that knowledge doesn’t allow us to predict what will happen next; the system adapts continually. Dave Snowden and Mary Boone used a key phrase in their Harvard Business Review article about Cynefin.

”Hindsight does not lead to foresight because the external conditions and systems constantly change.”

The fourth domain is Chaotic. Here, urgent action rather than reflective analysis, is required. The participants must act, sense feedback and respond. Complex situations might be suited to safe probing, which can teach us more about the problem, but such probing is a luxury in the Chaotic domain.

The appropriate responses in all four of these domains are different. In Obvious, the categories are clearly defined, one simply chooses the right one, and that provides the right route to follow. Best practices are appropriate here.

In the Complicated domain there is no single, right category to choose. There could be several valid options, but an expert can select a good route. There are various good practices, but the idea of a single best practice is misconceived.

In the Complex domain it is essential to probe the problem and learn by trial and error. The practices we might follow will emerge from that learning. In Chaos as I mentioned, we simply have to start with action, firefighting to stop the situation getting worse. It is helpful to remember that, instead of the everyday definition, chaos in Cynefin terms refer to the concept in physics. Here chaos refers to a system that it is so dynamic that minor variations in initial conditions lead to outcomes so dramatically divergent that the system is unpredictable. In some circumstances it makes sense to make a deliberate and temporary move into Chaos to learn new practice. That would require removing constraints and the connections that impose some sort of order.

The fifth domain is that of Disorder, in the middle of the diagram. This is the default position in a sense. It’s where we find ourselves when we don’t know which domain we should really be in. It’s therefore the normal starting point. The great danger is that we don’t choose the appropriate domain, but simply opt for the one that fits our instincts or our training, or that is aligned with the organisation’s traditions and culture, regardless of the reality.

The only stable domains are Obvious, Complicated and Complex. Chaotic and Disorder are transitional. You don’t (can’t) stay there. Chaotic is transitional because constraints will kick in very quickly, almost as a reflex. Disorder is transitional because you are actually in one of the other domains, but you just don’t know it.

The different domains have blurred edges. In any context there might be elements that fit into different domains if they are looked at independently. That isn’t a flaw with Cynefin. It merely reflects reality. As I said, Cynefin is not a neat categorisation model. It is intended to help us make sense of what we face. If reality is messy and blurred then there’s no point trying to force it into a straitjacket.

Many projects will have elements that are Obvious, that deal with a problem that is well understood, that we have dealt with before and whose solution is familiar and predictable. However, these are not the parts of a project that should shape the approach we take. The parts where the potential value, and the risk, lie are where we are dealing with something we have not done before. Liz Keogh has given many talks and written some very good blogs and articles about applying Cynefin to software development. Check out her work. This video is a good starter.

The boundaries between the domains are therefore fuzzy, but there is one boundary that is fundamentally different from the others; the border between Obvious and Chaotic. This is not really a boundary at all. It is more of a cliff. If you move from Obvious to Chaotic you don’t glide smoothly into a subtly changing landscape. You fall off the cliff.

Within the Obvious domain the area approaching the cliff is the complacent zone. Here, we think we are working in a neat, ordered environment and “we believe our own myths” as Snowden puts it in the video above. The reality is quite different and we are caught totally unaware when we hit a crisis and plunge off the cliff into chaos.

That was a quick skim through Cynefin. However, you shouldn’t think of it as being a static framework. If you are going to apply it usefully you have to understand the dynamics of the framework, and I will return to that in part two.

Auditors and testing – a rant justified by experience

A couple of weeks ago I was drawn into a discussion on Twitter about auditors and testing. At the time I was on holiday in a delightfully faraway part of Galloway, in south west Scotland.

One of the attractions of the cottage where we were staying was that it lacked a mobile (cell) phone signal, never mind internet access. Only when we happened to be in a pub or restaurant could I sneak onto wifi discreetly, without incurring a disapproving look from my wife.

Having worked as both an IT auditor and a tester, and having both strong opinions and an argumentative nature, I had plenty to say on the subject. That had to wait till I returned (via New York, but that’s another subject) when I unleashed a rant on Twitter. Here is that thread, in a more readable format. It might be a rant, but it is based on extensive experience.

Auditors looking for items they can check that MUST be called test cases? That’s a big, flashing, warning sign they have a lousy conceptual grasp of auditing. It’s true, but missing the point, to say that’s old fashioned. It’s like saying the problem with ISO 29119 is it’s old fashioned.

The crucial point is it’s bad, unprofessional auditing. The company that taught me to audit was promoting good auditing 30 years ago. If anyone had remained ignorant of the transformation in software development in the last 30 years you’d call them idiots, not old-fashioned.

A test case is just a name for a receptacle. It’s a bucket of ideas. Who cares about the bucket? Ideas and evidence really matter to auditors, who live and die by evidence; they expect compelling evidence that the auditees have been thinking about what they are doing. A lack of useful evidence showing what testing has been performed, or a lack of thought about how to test should be certain ways to attract criticism from auditors. The IT auditors’ governance model COBIT5 mentions “test cases” once (in passing). It mentions “ideas” 32 times & “evidence” 16 times.

COBIT5 isn’t just about testing of course. Its principles apply across the whole range of IT and testing is no exception. Auditors should expect testers to have;

  • a clear vision or strategy of how testing should be performed in their organisation,
  • a clear (but not necessarily detailed) plan for testing the product,
  • relevant, contemporary evidence that justifies and leads inescapably to the conclusions, lessons and insights that the testers derived and reported from their testing.

That’s what auditors should expect. Some (or many?) organisations are locked into a pattern of low quality and low value auditing. They define auditing as brainless compliance checking that is performed by low quality staff who don’t understand what they’re doing. Their work is worthless. As a result audit is held in low esteem in the organisation. Smart people don’t want to work there. Therefore audit must be defined in such a way that low quality staff are able to carry it out.

This is inexcusable. At best it is negligence. Maintaining that model of auditing requires willful ignorance of what the audit profession stipulates. It is damaging and contributes towards the creation of a dysfunctional culture. Nevertheless it is cheap and ensures there are no good auditors who might pose uncomfortable, challenging questions to senior managers.

However, this doesn’t mean there are never times when auditors do need to see test cases. If a contract has been stupidly written so that test cases must be produced and visible then there’s no wriggle room. It’s just the same (and just as stupid) as if the contract says testers must wear pink shirts. It might be stupid but it is a contractual deliverable; auditors will want to see proof of compliance. As Griffin Jones pointed out on seeing my tweet, “often (the contract) is stupidly written – thus the need to get involved with the contracting organization. The problem is bigger than test or SW dev”.

I fully agree with Griffin. Testers should get involved in contractual discussions that will influence their work, in order to anticipate and head off unhelpful contractual terms.

I would add that testers should ask to see the original contract. Contractual terms are sometimes misinterpreted as they are passed through the organisation to the testers. It might be possible to produce the required evidence by smarter means.

Apart from such tiresome contractual requirements, demanding to see “test cases” is a classic case of confusing form and content. It’s unprofessional. That’s not just my opinion; it’s not novel or radical. It’s simply orthodox, professional opinion. Anyone who says otherwise is clueless or bullshitting. Either way they must be resisted. Clueless bullshitters can enjoy good, lucrative careers, but do huge damage. I’ve no respect for them.

The US Food and Drug Administration’s “General Principles of Software Validation” do pose a problem. They date back to 1997, updated in 2002. They are creakily old. They mention test cases many times, but they were written when it was assumed that testing meant writing test cases. The term seems to be used as jargon for tests. If testing satisfies FDA criteria then there’s no obvious reason why you can’t just call planned tests “test cases”.

There’s no requirement to produce test scripts as well as test cases, but expected results with objective pass/fail criteria are required. That doesn’t, and mustn’t, mean testers should be looking only for the expected results. The underlying principle is that compliance should follow the “least burdensome” approach and the FDA do say that they are open to considering alternative approaches to comply with the requirements in a way that is less burdensome.

Further, the FDA does not have a problem with Agile development (PDF, opens in new tab), and they also do approve of exploratory testing, as explained by James Bach.

Quantum computing; a whole new field of bewilderment

At primary school in London I was taught about different number bases, concentrating on binary arithmetic. This was part of an attempt in the 1960s to bring mathematics up to date. The introduction of binary to the curriculum was prompted by the realisation that computers would become hugely more significant. It was fascinating to realise that there was nothing inevitable or sacrosanct about using base 10 and I enjoyed it all.

When I started working in IT I quickly picked up the technical side. I understood binary and with a bit of work, keeping a clear head, everything made sense. It was mostly based on Boolean logic. Everything was true or false, on or off, pass or fail, 1 or 0.

Now this simple state of affairs applied only to the technology. When you started to deal with humans, with organisations, with messy social reality it was important to step back from a simple binary worldview. You can’t develop worthwhile applications, or test them, if you assume that the job simply requires stepping through a process that is akin to a computer program, with every decision being a straightforward binary, yes/no, pass/fail. I’ve talked about that at length elsewhere, eg here “binary opinions – yes or no?“.

Recently I’ve been reading about quantum computing. I’d vaguely known about the subject before. I knew it was a radically different approach to computing, but I hadn’t thought through just how radical the differences are, or the implications for testing. It’s not just testing of course. When a completely new form of computing comes on the scene, one that leaves all our expectations, assumptions and beliefs in tatters, there will be huge ramifications throughout IT. Traditional computers won’t vanish; there will still be plenty of jobs working with them. People will continue to specialise in specific areas of computing. The problem for testers will be that quantum computers are going to be used for new applications, to do things that were previously impossible, and traditional testing techniques won’t necessarily be readily transferable.

What’s so different about quantum computing?

I’m not going to try and provide an introduction to quantum computers. If you think you understand them and you haven’t done some serious study of quantum physics then you’re kidding yourself. And as Richard Feynman said;

“If you think you understand quantum physics, you don’t understand quantum physics.”

If a primary school introduction to binary arithmetic allowed me to get to grips with traditional, digital computers, then the equivalent for quantum computers would be at least an undergraduate degree in physics. Here is a good overview.

I’m just going to run through three features of quantum computers that make my head spin, that have persuaded me that everything I knew about computers will be useless and I will be as well prepared as a peasant trying to get to grips with Excel after stumbling through a time portal from the Middle Ages.

First, instead of bits we have qubits. Bits can be 0 or 1. Once set to a value they don’t change until we perform some operation. The possible states of a qubit are excited or relaxed, which could be considered equivalent to 0 or 1. The fundamental difference is that they can be both at the same time, or rather a mixture of the two. This property is superposition. However, when the qubit is measured it will be frozen as one or the other.

The second weird feature of qubits is that different qubits influence each other. In traditional computing if we operate on a bit then no other bits will change. If they do it’s because the logic of the program controls these changes. In quantum computing that doesn’t apply. Changing a qubit will cause other qubits to change. Qubits are not independent of each other; they are all linked to each other. This is called entanglement.

The third weird thing about quantum computing is that algorithms are probabilistic, not deterministic. A classical digital computer will run through an algorithm and produce the right answer, or rather it will produce a predictable answer that is consistent with the algorithm’s logic. A quantum computer won’t always give the right answer straight away. It might have to send the same input through the same process many times and the results should converge on the right answer, probably.

I’m telling you this not because I think it will give you any sort of useful understanding of quantum computing. A spot of humility is in order. I pass on this information in the hope you realise you have as little idea as I do.

In summary, quantum computing is moving way beyond dear old familiar Boolean logic. Boolean algebra is still valid, but it is only relevant to one part of a wider reality and quantum computers will allow us to explore beyond the Boolean boundaries.

I have done my share of white box testing, delving into the guts of applications to understand what is going on. I know I will never be in a position to do that with quantum computers, even assuming that they are in widespread use before I retire. I doubt if there are many, if any testers, who will be better placed than I am.

What will quantum computers be used for?

If white box testing poses a tough challenge what about black box testing? As I said earlier, quantum computers will be used for problems that traditional computers can’t help with. One application will be the transformation of cryptography. Quantum computing will probably render existing cryptographing techniques obsolete. Other likely uses will be new ways to search and interrogate massive databases, and financial modelling that requires fiendishly intricate calculations on huge volumes of data. These are the applications that caught my eye, because of my background in financial services. A big problem for testers will be, how do you evaluate the output if the computer is doing something that can’t be replicated by other means? This isn’t a new problem, but quantum computers will massively increase the difficulty.

Blind computing guiding the blind?

I wrote about the problems I’d encountered working with big data a few years ago. Our approach could be summed up simply as;

  • build a deep understanding of the domain and the data, and the essential relationships or rules,
  • get the business or domain experts heavily involved and pick their brains,
  • ensure you influence the design so that it is testable, ideally so that the solution enables testing of separate segments which can be isolated so you can bring your acquired knowledge and understanding to bear.

The first two points will always be relevant regardless of the technology. The third one has been picked up by quantum computing scientists who have developed a technique called blind quantum computing.

The blind technique involves running a separate, simpler(!), quantum computer called a verifier alongside the server running the test application. The verifier feeds the server with small, discreet calculations to perform. It does this in a way so that only the verifier, and the testers, can know what the answer should be, and the server knows only what operations it should perform. This is the simplest explanation of blind quantum computing I’ve been able to find.

An obvious observation is that this is hardly black box testing as we have known it. Who programs and controls the verifier? It is a quantum computer, and I doubt if ISTQB accreditation will ever get you very far here.

Fields of bewilderment

I don’t have any answers here, and I strongly suspect easy answers will never be available to testers who work with quantum computers. That, perhaps, is the point to take away from my article. Writing this has felt less like trying to spread some understanding of an immensely difficult subject, and more like an account of a bewildered ramble through my freshly expanded fields of ignorance.

I can’t state with confidence that software testers will ever move into quantum computing. Perhaps the serious testing will be done by quantum computing scientists. That raises its own dangers. Will they have the right instincts, an appropriate sceptical and questioning approach? There is already talk of quantum computers permitting the development of perfect software. Apparently they will be able to work their way through every conceivable permutation of data to establish that the application was built to its specification. Of course there is a huge, familiar old question being begged there; how does anyone know the spec was right?
Opening of the poem 'Youth and Hope' from which the quote 'perplexity is the beginning of knowledge' comes

I worry that good testers may be frozen out of working with quantum computers, and that the skills and questions they do have, and that will remain relevant, will be lost. Whatever happens I am confident that testers imbued with the principles of Context-Driven Testing will be better prepared than those who know only what they learned to pass ISTQB exams, or who are accustomed to following prescriptive standards. Whatever the future holds for us it will be interesting! After all, perplexity is the beginning of knowledge.

2017 Test Leadership Congress in New York

From May 1st to 4th I’ll be in New York, at the Test Leadership Congress in New York.


Why go all that way to New York? Well, there’s an obvious answer; it’s in New York for goodness sake, and there are interesting looking talks by some cool people. But for me the clincher is a one day masterclass that could hardly be better designed to lure me over the Atlantic.

Dave Snowden is presenting “A leader’s framework for decision making: managing complex projects using Cynefin“. That would be hugely interesting in any context, but when he’s addressing an audience of testers, with the added inducement that “a new collaborative project on software testing will be introduced to delegates”, well I’m checking out flights already.

For a few years now I’ve been fascinated by the understanding that Cynefin can help us gain of the environment we are working in and the problems with which we must grapple. I’ve written about this, providing an introduction to Cynefin, but that was aimed at the highest level of testing strategy, the worldview that shapes our approach to testing.

If you think about the world in the wrong way, if you make flawed and unchallenged assumptions about the nature of the problems you face then you won’t do much that is useful, and even less that is valuable. You can close your mind to all this and get on with working the process but the chances are you’ll be “faking the testing”, a phrase of James Bach’s that keeps coming back to me over the years.

Understanding Cynefin will help testers to choose an approach that will enable more effective testing; your approach will be aligned both to the way that software is developed, and also to the product or application under test. The nature of development means that we are mostly dealing with complicated and complex problems, with the most challenging problems taking us into the complex domain.

The essential feature of complex systems is that they are not predictable. There is no clear cause and effect. We might be able to discern them with hindsight, but that knowledge doesn’t allow us to predict what will happen next; the system adapts continually. In order to understand the system we need to probe it, make sense of what we discover, then respond. Cynefin is therefore likely to guide us towards more flexible, iterative development and exploratory testing. But exactly how should we use that understanding to conduct testing?

In particular I am keen to hear Dave’s ideas about how “safe to fail” experiments with complex systems can be applied in testing. Dave describes safe-fail experiments as follows.

“I can’t get it right in advance, so I need to experiment with different ways of approaching the journey that are safe-fail in nature. That is to say I can afford them to fail and critically, I plan them so that through that failure I learn more about the terrain through which I wish to travel.”

This cycle of probing and learning is obviously relevant to testing. You could say it is testing. I can’t wait for this class.

An added attraction that the Congress has for me is the class on the following day, led by Fiona Charles and Anna Royzman, “Strategic Leadership for Testers“. Test management is changing. Treating the job as a matter of managing the testing process simply won’t cut it in future. Both in testing and software development we’ve been poor at thinking strategically and that is one of Fiona’s strengths. That, combined with her incisive style and wit, makes this is a class I really don’t want to miss while I’m in New York. If you’re going to be there let me know. I look forward to seeing you.

Why ISO 29119 is a flawed quality standard

Why ISO 29119 is a flawed quality standard

This article originally appeared in the Fall 2015 edition of Better Software magazine.

In August 2014, I gave a talk attacking ISO 29119” at the Association for Software Testing’s conference in New York. That gave me the reputation for being opposed to standards in general — and testing standards in particular. I do approve of standards, and I believe it’s possible that we might have a worthwhile standard for testing. However, it won’t be the fundamentally flawed ISO 29119.

Technical standards that make life easier for companies and consumers are a great idea. The benefit of standards is that they offer protection to vulnerable consumers or help practitioners behave well and achieve better outcomes. The trouble is that even if ISO 29119 aspires to do these things, it doesn’t.

Principles, standards, and rules

The International Organization for Standardization (ISO) defines a standard as “a document that provides requirements, specifications, guidelines or characteristics that can be used consistently to ensure that materials, products, processes and services are fit for their purpose.”

It might be possible to derive a useful software standard that fits this definition, but only if it focuses on guidelines, rather than requirements, specifications, or characteristics. According to ISO’s definition, a standard doesn’t have to be all those things. A testing standard that is instead framed as high level guidelines would be consistent with the widespread view among regulatory theorists that standards are conceptually like high-level principles. Rules, in contrast, are detailed and specific (see Frederick Schauer’s “The Convergence of Rules and Standards”: PDF opens in new tab). One of ISO 29119’s fundamental problems is that it is pitched at a level consistent with rules, which will undoubtedly tempt people to treat them as fixed rules.

Principles focus on outcomes rather than detailed processes or specific rules. This is how many professional bodies have defined standards. They often use the words principles and standards interchangeably. Others favor a more rules-based approach. If you adopt a detailed, rules-based approach, there is a danger of painting yourself into a corner; you have to try to specify exactly what is compliant and noncompliant. This creates huge opportunities for people to game the system, demonstrating creative compliance as they observe the letter of the law while trashing underlying quality principles, (see John Braithwaite’s “Rules and Principles: A Theory of Legal Certainty”). Whether one follows a principles-based or a rules-based approach, regulators, lawyers, auditors, and investigators are likely to assume standards define what is acceptable.

As a result, there is a real danger that ISO 29119 could be viewed as the default set of rules for responsible software testing. People without direct experience in development or testing look for some form of reassurance about what constitutes responsible practice. They are likely to take ISO 29119 at face value as a definitive testing standard. The investigation into the HealthCare.gov website problems showed what can happen.

In its March 2015 report (PDF, opens in new tab) on the website’s problems, the US Government Accountability Office checked the HealthCare.gov project for compliance with the IEEE 829 test documentation standard. The agency didn’t know anything about testing. They just wanted a benchmark. IEEE 829 was last revised in 2008; it said that the content of standards more than five years old “do not wholly reflect the present state of the art”. Few testers would disagree that IEEE 829 is now hopelessly out of date.

when a document is more than five years old

IEEE 829’s obsolescence threshold


The obsolescence threshold for ISO 29119 has increased from five to ten years, presumably reflecting the lengthy process of creating and updating such cumbersome documents rather than the realities of testing. We surely don’t want regulators checking testing for compliance against a detailed, outdated standard they don’t understand.

Scary lessons from the social sciences

If we step away from ISO 29119, and from software development, we can learn some thought-provoking lessons from the social sciences.

Prescriptive standards don’t recognize how people apply knowledge in demanding jobs like testing. Scientist Michael Polanyi and sociologist Harry Collins have offered valuable insights into tacit knowledge, which is knowledge we possess and use but cannot articulate. Polanyi first introduced the concept, and Collins developed the idea, arguing that much valuable knowledge is cultural and will vary between different contexts and countries. Defining a detailed process as a standard for all testing excludes vital knowledge; people will respond by concentrating on the means, not the ends.

Donald Schön, a noted expert on how professionals learn and work, offered a related argument with “reflection in action” (see Willemien Visser’s article: PDF opens in new tab). Schön argued that creative professionals, such as software designers or architects, have an iterative approach to developing ideas—much of their knowledge is understood without being expressed. In other words, they can’t turn all their knowledge into an explicit, written process. Instead, to gain access to what they know, they have to perform the creative act so that they can learn, reflect on what they’ve learned, and then apply this new knowledge. Following a detailed, prescriptive process stifles learning and innovation. This applies to all software development—both agile and traditional methods.

In 1914, Thorstein Veblen identified the problem of trained incapacity. People who are trained in specific skills can lack the ability to adapt. Their response worked in the past, so they apply it regardless thereafter.

young girl, old woman

Young woman or old woman? Means or ends? We can focus on only one at a time.

Kenneth Burke built upon Veblen’s work, arguing that trained incapacity means one’s abilities become blindnesses. People can focus on the means or the ends, not both; their specific training makes them focus on the means. They don’t even see what they’re missing. As Burke put it, “a way of seeing is also a way of not seeing; a focus upon object A involves a neglect of object B”. This leads to goal displacement, and the dangers for software testing are obvious.

The problem of goal displacement was recognized before software development was even in its infancy. When humans specialize in organizations, they have a predictable tendency to see their particular skill as a hammer and every problem as a nail. Worse, they see their role as hitting the nail rather than building a product. Give test managers a detailed standard, and they’ll start to see the job as following the standard, not testing.

In the 1990s, British academic David Wastell studied software development shops that used structured methods, the dominant development technique at the time. Wastell found that developers used these highly detailed and prescriptive methods in exactly the same way that infants use teddy bears and security blankets: to give them a sense of comfort and help them deal with stress. In other words, a developer’s mindset betrayed that the method wasn’t a way to build better software but rather a defense mechanism to alleviate stress and anxiety.

Wastell could find no empirical evidence, either from his own research at these companies or from a survey of the findings of other experts, that structured methods worked. In fact, the resulting systems were no better than the old ones, and they took much more time and money to develop. Managers became hooked on the technique (the standard) while losing sight of the true goal. Wastell concluded the following:

Methodology becomes a fetish, a procedure used with pathological rigidity for its own sake, not as a means to an end. Used in this way, methodology provides a relief against anxiety; it insulates the practitioner from the risks and uncertainties of real engagement with people and problems.

Developers were delivering poorer results but defining that as the professional standard. Techniques that help managers cope with stress and anxiety but give an illusory, reassuring sense of control harm the end product. Developers and testers cope by focusing on technique, mastery of tools, or compliance with standards. In doing so they can feel that they are doing a good job, so long as they don’t think about whether they are really working toward the true ends of the organization or the needs of the customer.

Standards must be fit for their purpose

Is all this relevant to ISO 29119? We’re still trying to do a difficult, stressful job, and in my experience, people will cling to prescriptive processes and standards that give the illusion of being in control. Standards have credibility and huge influence simply from their status as standards. If we must have standards, they should be relevant, credible, and framed in a way that is helpful to practitioners. Crucially, they must not mislead stakeholders and regulators who don’t understand testing but who wield great influence and power.

The level of detail in ISO 29119 is a real concern. Any testing standard should be in the style favored by organizations like the Institute of Internal Auditors (IIA), whose principles based professional standards cover the entire range of internal auditing but are only one-tenth as long as the three completed parts of ISO 29119. The IIA’s standards are light on detail but far more demanding in the outcomes required.

Standards must be clear about the purpose they serve if we are to ensure testing is fit for its purpose, to hark back to ISO’s definition of a standard. In my opinion, this is where ISO 29119 falls down. The standard does not clarify the purpose of testing, only the mechanism—and that mechanism focuses on documentation, not true testing. It is this lack of purpose, the why, that leads to teams concentrating on standards compliance rather than delivering valuable information to stakeholders. This is a costly mistake. Standards should be clear about the outcomes and leave the means to the judgment of practitioners.

A good example of this problem is ISO 29119’s test completion report, which is defined simply as a summary of the testing that was performed. The standard offers examples for traditional and agile projects. Both focus on the format, not the substance of the report. The examples give some metrics without context or explanation and provide no information or insight that would help stakeholders understand the product and the risk and make better decisions. Testers could comply with the standard without doing anything useful. In contrast, the IIA’s standards say audit reports must be “accurate, objective, clear, concise, constructive, complete, and timely.” Each of these criteria is defined briefly in a way that makes the standard far more demanding and useful than ISO 29119, in far less space.

It’s no good saying that ISO 29119 can be used sensibly and doesn’t have to be abused. People are fallible and will misuse the standard. If we deny that fallibility, we deny the experience of software development, testing, and, indeed, human nature. As Jerry Weinberg said (in “The Secrets of Consulting”), “no matter how it looks at first, it’s always a people problem”. Any prescriptive standard that focuses on compliance with highly detailed processes is doomed. Maybe you can buck the system, but you can’t buck human nature.