Testers and coders are both developers (2009)

This article appeared in September 2009 as an opinion piece on the front page of TEST magazine‘s website. I’m moving the article onto my blog from my website, which will be decommissioned soon. It might be an old article, but it remains valid.

The article

testers and coders are both developersWhen I was a boy I played football non-stop; in organised matches, in playgrounds or in the park, even kicking coal around the street!

There was a strict hierarchy. The good players, the cool kids; they were forwards. The one who couldn’t play were defenders. If you were really hopeless you were a goalkeeper. Defending was boring. Football was about fun, attacking and scoring goals.

When I moved into IT I found a similar hierarchy. I had passed the programming aptitude test. I was one of the elite. I had a big head, to put it mildly! The operators were the defenders, the ones who couldn’t do the fun stuff. We were vaguely aware they thought the coding kids were irresponsible cowboys, but who cared?

As for testers, well, they were the goalkeepers. Frankly, they were lucky to be allowed to play at all. They did what they were told. Independence? You’re joking, but if they were good they were allowed to climb the ladder and become programmers.

Gradually things changed. Testers became more clearly identified with the users. They weren’t just menial team members. A clear career path opened up as testing professionals.

However, that didn’t earn them respect from programmers. Now testing is changing again. Agile gives testers the chance to learn and apply interesting coding skills. Testers can be just as technical as coders. They might code in different ways, for different reasons, but they can be coders too.

That’s great isn’t it? Well, up to a point. It’s fantastic that testers have these exciting opportunities. But I worry that programmers might start respecting the more technical testers for the wrong reason, and that testers who don’t code will still be looked down on. Testers shouldn’t try to impress programmers with their coding skills. That’s just a means to an end.

We’ll still need testers who don’t code and it’s vital that if testers are to achieve the respect they deserve then they must be valued for all the skills they bring to the team, not just the skills they share with programmers. For a start, we should stop referring to developers and testers. Testers always were part of the development process. In Agile teams they quite definitely are developers. It’s time everyone acknowledged that. Development is a team game.

Football teams who played the way we used to as kids got thrashed if they didn’t grow up and play as a team. Development teams who don’t ditch similar attitudes will be equally ineffective.

Bridging the gap – Agile and the troubled relationship between UX and software engineering (2009)

Bridging the gap – Agile and the troubled relationship between UX and software engineering (2009)

This article appeared as the cover story for the September 2009 edition of TEST magazine.

I’m moving the article onto my blog from my website, which will be decommissioned soon. If you choose to read the article please bear in mind that it was written in August 2009 and is therefore inevitably dated in some respects, though I think that much of it is still valid, especially where I am discussing the historical problems.

The article

TEST magazine coverFor most of its history software engineering has had great difficulty building applications that users found enjoyable. Far too many applications were frustrating and wasted users’ time.

That has slowly changed with the arrival of web developments, and I expect the spread of Agile development to improve matters further.

I’m going to explain why I think developers have had difficulty building usable applications and why user interaction designers have struggled to get their ideas across. For simplicity I’ll just refer to these designers and psychologists as “UX”. There are a host of labels and acronyms I could have used, and it really wouldn’t have helped in a short article.

Why software engineering didn’t get UX

Software engineering’s problems with usability go back to its roots, when geeks produced applications for fellow geeks. Gradually applications spread from the labs to commerce and government, but crucially users were wage slaves who had no say in whether they would use the application. If they hated it they could always find another job!

Gradually applications spread into the general population until the arrival of the internet meant that anyone might be a user. Now it really mattered if users hated your application. They would be gone for good, into the arms of the competition.

Software engineering had great difficulty coming to terms with this. The methods it had traditionally used were poison to usability. The Waterfall lifecycle was particularly damaging.

The Waterfall had two massive flaws. At its root was the implicit assumption that you can tackle the requirements and design up front, before the build starts. This led to the second problem; iteration was effectively discouraged.

Users cannot know what they want till they’ve seen what is possible and what can work. In particular, UX needs iteration to let analysts and users build on their understanding of what is required.

The Waterfall meant that users could not see and feel what an application was like until acceptance testing at the end when it was too late to correct defects that could be dismissed as cosmetic.

The Waterfall was a project management approach to development, a means of keeping control, not building good products. This made perfect sense to organisations who wanted tight contracts and low costs.

The desire to keep control and make development a more predictable process explained the damaging attempt to turn software engineering into a profession akin to civil engineering.

So developers were sentenced to 20 years hard labour with structured methodologies, painstakingly creating an avalanche of documentation; moving from the current physical system, through the current logical system to a future logical system and finally a future physical system.

However, the guilty secret of software engineering was that translating requirements into a design wasn’t just a difficult task that required a methodical approach; it’s a fundamental problem for developers. It’s not a problem specific to usability requirements, and it was never resolved in traditional techniques.

The mass of documentation obscured the fact that crucial design decisions weren’t flowing predictably and objectively from the requirements, but were made intuitively by the developers – people who by temperament and training were polar opposites of typical users.

Why UX didn’t get software development

Not surprisingly, given the massive documentation overhead of traditional techniques, and developers’ propensity to pragmatically tailor and trim formal methods, the full process was seldom followed. What actually happened was more informal and opaque to outsiders.

The UX profession understandably had great difficulty working out what was happening. Sadly they didn’t even realise that they didn’t get it. They were hampered by their naivety, their misplaced sense of the importance of UX and their counter-productive instinct to retain their independence from developers.

If developers had traditionally viewed functionality as a matter of what the organisation required, UX went to the other extreme and saw functionality as being about the individual user. Developers ignored the human aspect, but UX ignored commercial realities – always a fast track to irrelevance.

UX took software engineering at face value, tailoring techniques to fit what they thought should be happening rather than the reality. They blithely accepted the damaging concept that usability was all about the interface; that the interface was separate from the application.

This separability concept was flawed on three grounds. Conceptually it was wrong. It ignored the fact that the user experience depends on how the whole application works, not just the interface.

Architecturally it was wrong. Detailed technical design decisions can have a huge impact on the users. Finally separability was wrong organisationally. It left the UX profession stranded on the margins, in a ghetto, available for consultation at the end of the project, and then ignored when their findings were inconvenient.

An astonishing amount of research and effort went into justifying this fallacy, but the real reason UX bought the idea was that it seemed to liberate them from software engineering. Developers could work on the boring guts of the application while UX designed a nice interface that could be bolted on at the end, ready for testing. However, this illusory freedom actually meant isolation and impotence.

The fallacy of separability encouraged reliance on usability testing at the end of the project, on summative testing to reveal defects that wouldn’t be fixed, rather than formative testing to stop these defects being built in the first place.

There’s an argument that there’s no such thing as effective usability testing. If it takes place at the end it’s too late to be effective. If it’s effective then it’s done iteratively during design, and it’s just good design rather than testing.

So UX must be hooked into the development process. It must take place early enough to allow alternative designs to be evaluated. Users must therefore be involved early and often. Many people in UX accept this completely, though the separability fallacy is still alive and well. However, its days are surely numbered. I believe, and hope, that the Agile movement will finally kill it.

Agile and UX – the perfect marriage?

The mutual attraction of Agile and UX isn’t simply a case of “my enemy’s enemy is my friend”. Certainly they do have a common enemy in the Waterfall, but each discipline really does need the other.

Agile gives UX the chance to hook into development, at the points where it needs to be involved to be effective. Sure, with the Waterfall it is possible to smuggle effective UX techniques into a project, but they go against the grain. It takes strong, clear-sighted project managers and test managers to make them work. The schedule and political pressures on these managers to stop wasting time iterating and to sign off each stage is huge.

If UX needs Agile, then equally Agile needs UX if it is to deliver high quality applications. The opening principle of the Agile Manifesto states that “our highest priority is to satisfy the customer through early and continuous delivery of valuable software”.

There is nothing in Agile that necessarily guarantees better usability, but if practitioners believe in that principle then they have to take UX seriously and use UX professionals to interpret users’ real needs and desires. This is particularly important with web applications when developers don’t have direct access to the end users.

There was considerable mutual suspicion between the two communities when Agile first appeared. Agile was wary of UX’s detailed analyses of the users, and suspected that this was a Waterfall style big, up-front requirements phase.

UX meanwhile saw Agile as a technical solution to a social, organisational problem and was rightly sceptical of claims that manual testing would be a thing of the past. Automated testing of the human interaction with the application is a contradiction.

Both sides have taken note of these criticisms. Many in UX now see the value in speeding up the user analysis and focussing on the most important user groups, and Agile has recognised the value of up-front analysis to help them understand the users.

The Agile community is also taking a more rounded view of testing, and how UX can fit into the development process. In particular check out Brian Marick’s four Agile testing quadrants.Agile Testing Quadrants

UX straddles quadrants two and three. Q2 contains the up-front analysis to shape the product, and Q3 contains the evaluation of the results. Both require manual testing, and use such classic UX tools as personas (fictional representative users) and wireframes (sketches of webpages).

Other people who’re working on the boundary of Agile and UX are Jeff Patton, Jared Spool and Anders Ramsay. They’ve come up with some great ideas, and it’s well worth checking them out. Microsoft have also developed an interesting technique called the RITE method.

This is an exciting field. Agile changes the role of testers and requires them to learn new skills. The integration of UX into Agile will make testing even more varied and interesting.

There will still be tension between UX and software professionals who’ve been used to working remotely from the users. However, Agile should mean that this is a creative tension, with each group supporting and restraining the others.

This is a great opportunity. Testers will get the chance to help create great applications, rather than great documentation!

The dragons of the unknown; part 9 – learning to live with the unknowable

Introduction

This is the ninth and final post in a series about problems that fascinate me, that I think are important and interesting. The series draws on important work from the fields of safety critical systems and from the study of complexity, specifically complex socio-technical systems. This was the theme of my keynote at EuroSTAR in The Hague (November 12th-15th 2018).

The first post was a reflection, based on personal experience, on the corporate preference for building bureaucracy rather than dealing with complex reality, “facing the dragons part 1 – corporate bureaucracies”. Part 2 was about the nature of complex systems. The third followed on from part 2, and talked about the impossibility of knowing exactly how complex socio-technical systems will behave with the result that it is impossible to specify them precisely, “I don’t know what’s going on”.

Part 4 “a brief history of accident models”, looked at accident models, i.e. the way that safety experts mentally frame accidents when they try to work out what caused them.

The fifth post, “accident investigations and treating people fairly”, looked at weaknesses in the way that we have traditionally investigated accidents and failures, assuming neat linearity with clear cause and effect. In particular, our use of root cause analysis, and willingness to blame people for accidents is hard to justify.

Part six “Safety II, a new way of looking at safety” looks at the response of the safety critical community to such problems and the necessary trade offs that a practical response requires. The result, Safety II, is intriguing and has important lessons for software testers.

The seventh post “Resilience requires people” is about the importance of system resilience and the vital role that people play in keeping systems going.

The eighth post “How we look at complex systems” is about the way we choose to look at complex systems, the mental models that we build to try and understand them, and the relevance of Devops.

This final post will try to draw all these strands together and present some thoughts about the future of testing as we are increasingly confronted with complex systems that are beyond our ability to comprehend.

Computing will become more complex

Even if we choose to focus on the simpler problems, rather than help users understand complexity, the reality is that computing is only going to get more complex. The problems that users of complex socio-technical systems have to grapple with will inevitably get more difficult and more intractable. The choice is whether we want to remain relevant, but uncomfortable, or go for comfortable bullshit that we feel we can handle. Remember Zadeh’s Law of Incompatibility (see part 7 – resilience requires people). “As complexity increases, precise statements lose their meaning, and meaningful statements lose precision”. Quantum computing, artificial intelligence and adaptive algorithms are just three of the areas of increasing importance whose inherent complexity will make it impossible for testers to offer opinions that are both precise and meaningful.

Quantum computing, in particular, is fascinating. By its very nature it is probabilistic, not deterministic. The idea that well designed and written programs should always produce the same output from the same data is relevant only to digital computers (and even then the maxim has to be heavily qualified in practice); it never holds true at any level for quantum computers. I wrote about this in “Quantum computing; a whole new field of bewilderment”.

The final quote from that article, “perplexity is the beginning of knowledge”, applies not only to quantum computing but also to artificial intelligence and the fiendish complexity of algorithms processing big data. One of the features of quantum computing is the way that changing a single qubit, the equivalent of digital bytes, will trigger changes in other qubits. This is entanglement, but the same word is now being used to describe the incomprehensible complexity of modern digital systems. Various writers have talked about this being the Age of Entanglement, eg Samuel Arbesman, in his book “Overcomplicated: Technology at the Limits of Comprehension)”, Emmet Connolly, in an article “Design in the Age of Entanglement” and Danny Hillis, in an article “The Enlightenment is Dead, Long Live the Entanglement”.

The purist in me disapproves of recycling a precise term from quantum science to describe loosely a phenomenon in digital computing. However, it does serve a useful role. It is a harsh and necessary reminder and warning that modern systems have developed beyond our ability to understand them. They are no more comprehensible than quantum systems, and as Richard Feynman is popularly, though possibly apocryphally, supposed to have said; “If you think you understand quantum physics, you don’t understand quantum physics.”

So the choice for testers will increasingly be to decide how we respond to Zadeh’s Law. Do we offer answers that are clear, accurate, precise and largely useless to the people who lose sleep at night worrying about risks? Or do we say “I don’t know for sure, and I can’t know, but this is what I’ve learned about the dangers lurking in the unknown, and what I’ve learned about how people will try to stay clear of these dangers, and how we can help them”?

If we go for the easy options and restrict our role to problems which allow definite answers then we will become irrelevant. We might survive as process drones, holders of a “bullshit job” that fits neatly into the corporate bureaucracy but offers little of value. That will be tempting in the short to medium term. Large organisations often value protocol and compliance more highly than they value technical expertise. That’s a tough problem to deal with. We have to confront that and communicate why that worldview isn’t just dated, it’s wrong. It’s not merely a matter of fashion.

If we’re not offering anything of real value then there are two possible dangers. We will be replaced by people prepared to do poor work cheaper; if you’re doing nothing useful then there is always someone who can undercut you. Or we will be increasingly replaced by automation because we have opted to stay rooted in the territory where machines can be more effective, or at least efficient.

If we fail to deal with complexity the danger is that mainstream testing will be restricted to “easy” jobs – the dull, boring jobs. When I moved into internal audit I learned to appreciate the significance of all the important systems being inter-related. It was where systems interfaced, and when people were involved that they got interesting. The finance systems with which I worked may have been almost entirely batch based, but they performed a valuable role for the people with whom we were constantly discussing the behaviour of these systems. Anything standalone was neither important nor particularly interesting. Anything that didn’t leave smart people scratching their heads and worrying was likely to be boring. Inter-connectedness and complexity will only increase and however difficult testing becomes it won’t be boring – so long as we are doing a useful job.

If we want to work with the important, interesting systems then we have to deal with complexity and the sort of problems the safety people have been wrestling with. There will always be a need for people to learn and inform others about complex systems. The American economist Tyler Cowen in his book “Average is Over” states the challenge clearly. We will need interpreters of complex systems.

“They will become clearing houses for and evaluators of the work of others… They will hone their skills of seeking out, absorbing and evaluating information… They will be translators of the truths coming out of our network of machines… At least for a while they will be the only people who will have a clear notion of what is going on.”

I’m not comfortable with the idea of truths coming out of machines, and we should resist the idea that we can ever be entirely clear about what is going on. But the need for experts who can interpret complex systems is clear. Society will look for them. Testers should aspire to be among those valuable specialists. conductorThe complexity of these systems will be beyond the ability of any one person to comprehend, but perhaps these interpreters, in addition to deploying their own skills, will be able to act like a conductor of an orchestra, to return to the analogy I used in part seven (Resilience requires people). Conductors are talented musicians in their own right, but they call on the expertise of different specialists, blending their contribution to produce something of value to the audience. Instead of a piece of music the interpreter tester would produce a story that sheds light on the system, guiding the people who need to know.

Testers in the future will have to be confident and assertive when they try to educate others about complexity, the inexplicable and the unknowable. Too often in corporate life a lack of certainty has been seen as a weakness. We have to stand our ground and insist on our right to be heard and taken seriously when we argue that certainty cannot be available if we want to talk about the risks and problems that matter. My training and background meant I couldn’t keep quiet when I saw problems, that were being ignored because no-one knew how to deal with them. As Better Software said about me, I’m never afraid to voice my opinion.better software says I am never afraid to voice my opinion

Never be afraid to speak out, to explain why your experience and expertise make your opinions valuable, however uncomfortable these may be for others. That’s what you’re paid for, not to provide comforting answers. The metaphor of facing the dragons of the unknown is extremely important. People will have to face these dragons. Testers have a responsibility to try and shed as much light as possible on those dragons lurking in the darkness beyond what we can see and understand easily. If we concentrate only on what we can know and say with certainty it means we walk away from offering valuable, heavily qualified advice about the risks, threats & opportunities that matter to people. Our job should entail trying to help and protect people. As Jerry Weinberg said in “Secrets of Consulting”;

“No matter what they tell you, it’s always a people problem.”

The dragons of the unknown; part 8 – how we look at complex systems

Introduction

This is the eighth post in a series about problems that fascinate me, that I think are important and interesting. The series draws on important work from the fields of safety critical systems and from the study of complexity, specifically complex socio-technical systems. This was the theme of my keynote at EuroSTAR in The Hague (November 12th-15th 2018).

The first post was a reflection, based on personal experience, on the corporate preference for building bureaucracy rather than dealing with complex reality, “facing the dragons part 1 – corporate bureaucracies”. Part 2 was about the nature of complex systems. The third followed on from part 2, and talked about the impossibility of knowing exactly how complex socio-technical systems will behave with the result that it is impossible to specify them precisely, “I don’t know what’s going on”.

Part 4 “a brief history of accident models”, looked at accident models, i.e. the way that safety experts mentally frame accidents when they try to work out what caused them.

The fifth post, “accident investigations and treating people fairly”, looked at weaknesses in the way that we have traditionally investigated accidents and failures, assuming neat linearity with clear cause and effect. In particular, our use of root cause analysis, and willingness to blame people for accidents is hard to justify.

Part six “Safety II, a new way of looking at safety” looks at the response of the safety critical community to such problems and the necessary trade offs that a practical response requires. The result, Safety II, is intriguing and has important lessons for software testers.

The seventh post “Resilience requires people” is about the importance of system resilience and the vital role that people play in keeping systems going.

This eighth post is about the way we choose to look at complex systems, the mental models that we build to try and understand them, and the relevance of Devops.

Choosing what we look at

The ideas I’ve been writing about resonated strongly with me when I first read about the safety and resilience engineering communities. What unites them is a serious, mature awareness of the importance of their work. Compared to these communities I sometimes feel as if normal software developers and testers are like children playing with cool toys while the safety critical engineers are the adults worrying about the real world.

The complex insurance finance systems I worked with were part of a wider system with correspondingly more baffling complexity. Remember the comments of Professor Michael McIntyre (in part six, “Safety II, a new way of looking at safety”).

“If we want to understand things in depth we usually need to think of them both as objects and as dynamic processes and see how it all fits together. Understanding means being able to see something from more than one viewpoint.”

If we zoom out for a wider perspective in both space and time we can see that objects which looked simple and static are part of a fluid, dynamic process. We can choose where we place the boundaries of the systems we want to learn about. We should make that decision based on where we can offer most value, not where the answers are easiest. We should not be restricting ourselves to problems that allow us to make definite, precise statements. We shouldn’t be looking only where the light is good, but also in the darkness. We should be peering out into the unknown where there may be dragons and dangers lurking.
drunkard looking under the streetlight
Taking the wider perspective, the insurance finance systems for which I was responsible were essentially control mechanisms to allow statisticians to continually monitor and fine tune the rates, the price at which we sold insurance. They were continually searching for patterns, trying to understand how the different variables played off each other. We made constant small adjustments to keep the systems running effectively. We had to react to business problems that the systems revealed to our users, and to technical problems caused by all the upstream feeding applications. Nobody could predict the exact result of adjustments, but we learned to predict confidently the direction; good or bad.

The idea of testing these systems with a set of test cases, with precisely calculated expected results, was laughably naïve. These systems weren’t precise or accurate in a simple book-keeping sense, but they were extremely valuable. If we as developers and testers were to do a worthwhile job for our users we couldn’t restrict ourselves to focusing on whether the outputs from individual programs matched our expectations, which were no more likely to be “correct” (whatever that might mean in context) than the output.

Remember, these systems were performing massively complex calculations on huge volumes of data and thus producing answers that were not available any other way. We could predict how an individual record would be processed, but putting small numbers of records through the systems would tell us nothing worthwhile. Rounding errors would swamp any useful information. A change to a program that introduced a serious bug would probably produce a result that was indistinguishable from correct output with a small sample of data, but introduce serious and unacceptable error when we were dealing with the usual millions of records.

We couldn’t spot patterns from a hundred records using programs designed to tease out patterns from datasets with millions of records. We couldn’t specify expected outputs from systems that are intended to help us find out about unknown unknowns.

The only way to generate predictable output was to make unrealistic assumptions about the input data, to distort it so it would fit what we thought we knew. We might do that in unit testing but it was pointless in more rigorous later testing. We had to lift our eyes and understand the wider context, the commercial need to compete in the insurance marketplace with rates that were set with reasonable confidence in the accuracy of the pricing of the risks, rather than being set by guesswork, as had traditionally been the case. We were totally reliant on the expertise of our users, who in turn were totally reliant on our technical skills and experience.

I had one difficult, but enlightening, discussion with a very experienced and thoughtful user. I asked her to set acceptance criteria for a new system. She refused and continued to refuse even when I persisted. I eventually grasped why she wouldn’t set firm criteria. She couldn’t possibly articulate a full set of criteria. Inevitably she would only be able to state a small set of what might be relevant. It was misleading to think only in terms of a list of things that should be acceptable. She also had to think about the relationships between different factors. The permutations were endless, and spotting what might be wrong was a matter of experience and deep tacit knowledge.

This user was also concerned that setting a formal set of acceptance criteria would lead me and my team to focus on that list, which would mean treating the limited set of knowledge that was explicit as if it were the whole. We would be looking for confirmation of what we expected, rather than trying to help her shed light on the unknown.

Dealing with the wider context and becoming comfortable with the reality that we were dealing with the unknown was intellectually demanding and also rewarding. We had to work very closely with our users and we built strong, respectful and trusting relationships that ran deep and lasted long. When we hit serious problems, those good relations were vital. We could all work together, confident in each other’s abilities. These relationships have lasted many years, even though none of us still work for the same company.

We had to learn as much as possible from the users. This learning process was never ending. We were all learning, both users and developers, all the time. The more we learned about our systems the better we could understand the marketplace. The more we learned about how the business was working in the outside world the better our fine tuning of the systems.

Devops – a future reminiscent of my past?

With these complex insurance finance systems the need for constant learning dominated the whole development lifecyle to such an extent that we barely thought in terms of a testing phase. Some of our automated tests were built into the production system to monitor how it was running. We never talked of “testing in production”. That was a taboo phrase. Constant monitoring? Learning in production? These were far more diplomatic ways of putting it. However, the frontier between development and production was so blurred and arbitrary that we once, under extreme pressure of time, went to the lengths of using what were officially test runs to feed the annual high level business planning. This was possible only because of a degree of respect and trust between users, developers and operations staff that I’ve never seen before or since.

That close working relationship didn’t happen by chance. Our development team was pulled out of Information Services, the computing function, and assigned to the business, working side by side with the insurance statisticians. Our contact in operations wasn’t similarly seconded, but he was permanently available and was effectively part of the team.

The normal development standards and methods did not apply to our work. The company recognised that they were not appropriate and we were allowed to feel our way and come up with methods that would work for us. I wrote more about this a few years ago in “Adventures with Big Data”.

When Devops broke onto the scene I was fascinated. It is a response not only to the need for continuous delivery, but also to the problems posed by working with increasingly complex and intractable systems. I could identify with so much; the constant monitoring, learning about the system in production, breaking down traditional structures and barriers, different disciplines working more closely together. None of that seemed new to me. These had felt like a natural way to develop the deeply complicated insurance finance systems that would inevitably evolve into creatures as complex as the business environment in which they helped us to survive.

I’ve found Noah Sussman’s work very helpful. He has explicitly linked Devops with the ideas I have been discussing (in this whole series) that have emerged from the resilience engineering and safety critical communities. In particular, Sussman has picked up on an argument that Sidney Dekker has been making, notably in his book “Safety Differently”, that nobody can have a clear idea of how complex sociotechnical systems are working. There cannot be a single, definitive and authoritative (ie canonical) description of the system. The view that each expert has, as they try to make the system work, is valid but it is inevitably incomplete. Sussman put it as follows in his blog series “Software as Narrative”.

“At the heart of Devops is the admission that no single actor can ever obtain a ‘canonical view’ of an incident that took place during operations within an intractably complex sociotechnical system such as a software organization, hospital, airport or oil refinery (for instance).”

Dekker describes this as ontological relativism. The terminology from philosophy might seem intimidating, but anyone who has puzzled their way through a production problem in a complex system in the middle of the night should be able to identify with it. Brian Fay (in “Contemporary Philosophy of Social Science”) defines ontological relativism as meaning “reality itself is thought to be determined by the particular conceptual scheme of those living within it”.

If you’ve ever been alone in the deep of the night, trying to make sense of an intractable problem that has brought a complex system down, you’ll know what it feels like to be truly alone, to be dependent on your own skills and experience. The system documentation is of limited help. The insights of other people aren’t much use. They aren’t there, and the commentary they’ve offered in the past reflected their own understanding that they have constructed of how the system works. The reality that you have to deal with is what you are able to make sense of. What matters is your understanding, your own mental model.

I was introduced to this idea that we use mental models to help us gain traction with intractable systems by David Woods’ work. He (along with co-authors Paul Feltovich, Robert Hoffman and Axel Roesler) introduced me to the “envisaged worlds” that I mentioned in part one of this series. Woods expanded on this in “Behind Human Error” (chapter six), co-written with Sidney Dekker, Richard Cook, Leila Johannesen and Nadine Sarter.

These mental models are potentially dangerous, as I explained in part one. They are invariably oversimplified. They are partial, flawed and (to use the word favoured by Woods et al) they are “buggy”. But it is an oversimplification to dismiss them as useless because they are oversimplified; they are vitally important. We have to be aware of their limitations, and our own instinctive desire to make them too simple, but we need them to get anywhere when we work with complex systems.

Without these mental models we would be left bemused and helpless when confronted with deep complexity. Rather than thinking of these models as attempts to form precise representations of systems it is far more useful to treat them as heuristics, which are (as defined by James Bach, I think), a useful but fallible way to solve a problem or make a decision.

David Woods is a member of Snafucatchers, which describes itself as “a consortium of industry leaders and researchers united in the common cause of understanding and coping with the immense levels of complexity involved in the operation of critical digital services.”

Snafucatchers produced an important report in 2017, “STELLA – Report from the SNAFUcatchers Workshop on Coping With Complexity”. The workshop and report looked at how experts respond to anomalies and failures with complex systems. It’s well worth reading and reflecting on. The report discusses mental models and adds an interesting refinement, the line of representation.the line of representation
Above the line of representation we have the parts of the overall system that are visible; the people, their actions and interactions. The line itself has the facilities and tools that allow us to monitor and manage what is going on below the line. We build our mental models of how the system is working and use the information from the screens we see, and the controls available to us to operate the system. However, what we see and manipulate is not the system itself.

There is a mass of artifacts under the line that we can never directly see working. We see only the representation that is available to us at the level of the line. Everything else is out of sight and the representations that are available to us offer us only the chance to peer through a keyhole as we try to make sense of the system below. There has always been a large and invisible substructure in complex IT systems that was barely visible or understood. With internet systems this has grown enormously.

The green line is the line of representation. It is composed of terminal display screens, keyboards, mice, trackpads, and other interfaces. The software and hardware (collectively, the technical artifacts) running below the line cannot be seen or controlled directly. Instead, every interaction crossing the line is mediated by a representation. This is true as well for people in the using world who interact via representations on their computer screens and send keystrokes and mouse movements.

A somewhat startling consequence of this is that what is below the line is inferred from people’s mental models of The System.

And those models of the system are based on the partial representation that is visible to us above the line.

An important consequence of this is that people interacting with the system are critically dependent on their mental models of that system – models that are sure to be incomplete, buggy (see Woods et al above, “Behind Human Error”), and quickly become stale. When a technical system surprises us, it is most often because our mental models of that system are flawed.

This has important implications for teams working with complex systems. The system is constantly adapting and evolving. The mental models that people use must also constantly be revised and refined if they are to remain useful. Each of these individual models represents the reality that each operator understands. All the models are different, but all are equally valid, as ontological relativism tells us. As each team member has a different, valid model it is important that they work together closely, sharing their models so they can co-operate effectively.

This is a world in which traditional corporate bureaucracy with clear, fixed lines of command and control, with detailed and prescriptive processes, is redundant. It offers little of value – only an illusion of control for those at the top, and it hinders the people who are doing the most valuable work (see “part 1 – corporate bureaucracies”).

For those who work with complex, sociotechnical systems the flexibility, the co-operative teamwork, the swifter movement and, above all, the realism of Devops offer greater promise. My experience with deeply complex systems has persuaded me that such an approach is worthwhile. But just as these complex systems will constantly change so must the way we respond. There is no magic, definitive solution that will always work for us. We will always have to adapt, to learn and change if we are to remain relevant.

It is important that developers and testers pay close attention to the work of people like the Snafucatchers. They are offering the insights, the evidence and the arguments that will help us to adapt in a world that will never stop adapting.

In the final part of this series, part 9 “Learning to live with the unknowable” I will try to draw all these strands together and present some thoughts about the future of testing as we are increasingly confronted with complex systems that are beyond our ability to comprehend.

Dave Snowden’s Cynefin masterclass in New York, 2nd May 2017 – part 2

This post is the second of two discussing Dave Snowden’s recent Cynefin masterclass at the Test Leadership Congress in New York. I wrote the series with the support of the Committee on Standards and Professional Practices of the Association for Software Testing. The posts originally appeared on the AST site.

In the first I gave an overview of Cynefin and explained why I think it is important, and how it can helpfully shape the way we look at the world and make sense of the problems we face. In this post I will look at some of the issues raised in Dave’s class and discuss their relevance to development and testing.

The dynamics between domains

Understanding that the boundaries between the different domains are fluid and permeable is crucial to understanding Cynefin. A vital lesson is that we don’t start in one domain and stay there; we can and should move between them. Even if we ignore that lesson reality will drag us from one domain to another. Dave said “all the domains have value – it’s the ability to move between them that is key”.

The Cynefin dynamics are closely tied to the concept of constraints, which are so important to Cynefin that they act as differentiators between the domains. You could say that constraints define the domains.

Constraint is perhaps a slightly misleading word. In Cynefin terms it is not necessarily something that compels or prevents certain behaviour. That does apply to the Obvious domain, where the constraints are fixed and rigid. The constraints in the Complicated domain govern behaviour, and can be agreed by expert consensus. In the Complex domain the constraints enable action, rather than restricting it or compelling it. They are a starting point rather than an end. In Chaos there are no constraints.

Dave Snowden puts it as follows, differentiating rules and heuristics.

“Rules are governing constraints, they set limits to action, they contain all possible instances of action. In contrast heuristics are enabling constraints, they provide measurable guidance which can adapt to the unknowable unknowns.”

If we can change the constraints then we are moving from one domain to another. The most significant dynamic is the cycle between Complex and Complicated.

Cynefin core dynamic - Complex to ComplicatedCrucially, we must recognise that if we are attempting something new, that involves a significant amount of uncertainty then we start in the Complex domain exploring and discovering more about the problem. Once we have a better understanding and have found constraints that allow us to achieve repeatable outcomes we have moved the problem to the Complicated domain where we can manage it more easily and exploit our new knowledge. If our testing reveals that the constraints are not producing repeatable results then it’s important to get back into the Complex domain and carry out some more probing experiments.

This is not a one off move. We have to keep cycling to ensure the solution remains relevant. The cadence, or natural flow of the cycle will vary depending on the context. Different industries, or sectors, or applications will have different cadences. It could be days, or years, or anything in between. If, or rather when, our constraints fail to produce repeatable results we have to get back into the Complex domain.

This cycle between Complex and Complicated is key for software development in particular. Understanding this dynamic is essential in order to understand how Cynefin might be employed.

Setting up developments

As I said earlier the parts of a software development project that will provide value are where we are doing something new, and that is where the risk also lies. Any significant and worthwhile development project will start in the Complex domain. The initial challenge is to learn enough to move it to Complicated. Dave explained it as follows in a talk at Agile India in 2015.

“As things are Complex we see patterns, patterns emerge. We stabilise the patterns. As we stabilise them we can actually shift them into the Complicated domain. So the basic principle of Complexity-based intervention is you start off with multiple, parallel, safe-to-fail experiments, which is why Scrum is not a true Complexity technique; it does one thing in a linear way. We call (these experiments) a pre-Scrum technique. You do smaller experiments faster in parallel… So you’re moving from the centre of the Complex domain into the boundary, once you’re in the boundary you use Scrum to move it across the boundary.”

Such a safe-to-fail experiment might be an XP pair programming team being assigned to knock up a small, quick prototype.

So the challenge in starting the move from Complex to Complicated is to come up with the ideas for safe-to-fail pre-Scrum experiments that would allow us to use Scrum effectively.

Dave outlined the criteria that suitable experiments should meet. There should be some way of knowing whether the experiment is succeeding and it must be possible to amplify (i.e. reinforce) signs of success. Similarly, there should be some way of knowing whether it is failing and of dampening, or reducing, the damaging impact of a failing experiment. Failure is not bad. In any useful set of safe-to-fail experiments some must fail if we are to learn anything worthwhile The final criterion is that the experiment must be coherent. This idea of coherence requires more attention.

Dave Snowden explains the tests for coherence here. He isn’t entirely clear about how rigid these tests should be. Perhaps it’s more useful to regard them as heuristics than fixed rules, though the first two are of particular importance.

  • A coherent experiment, the ideas and assumptions behind it, should be compatible with natural science. That might seem like a rather banal statement, till you consider all the massive IT developments and change programmes that were launched in blissful ignorance of the fact that science could have predicted inevitable failure.
  • There should be some evidence from elsewhere to support the proposal. Replicating past cases is no guarantee of success, far from it, but it is a valid way to try and learn about the problem.
  • The proposal should fit where we are. It has to be consistent to some degree with what we have been doing. A leap into the unknown attempting something that is utterly unfamiliar is unlikely to gain any traction.
  • Can the proposal pass a series of “ritual dissent challenges? These are a formalised way of identifying flaws and refining possible experiments.
  • Does the experiment reflect an unmet, unarticulated need that has been revealed by sense-making, by attempts to make sense of the problem?

The two latter criteria refer explicitly to Cynefin techniques. The final one, identifying unmet needs, assumes the use of Cognitive Edge’s SenseMaker. Remember Fred Brooks’ blunt statement about requirements? Clients do not know what they want. They cannot articulate their needs if they are asked directly. They cannot envisage what is possible. Dave Snowden takes that point further. If users can articulate their needs than you’re dealing with a commoditized product and the solution is unlikely to have great value. Real values lies in meeting needs that users are unaware of and that they cannot articulate. This has always been so, but in days of yore we could often get away with ignoring that problem. Most applications were in-house developments that either automated back-office functions or were built around business rules and clerical processes that served as an effective proxy for true requirements. The inadequacies of the old structured methods and traditional requirements gathering could be masked.

With the arrival of web development, and then especially with mobile technology this gulf between user needs and the ability of developers to grasp them became a problem that could be ignored only through wilful blindness, admittedly a trait that has never been in short supply in corporate life. The problem has been exacerbated by our historic willingness to confuse rigour with a heavily documented, top-down approach to software development. Sense-making entails capturing large numbers of user reports in order to discern patterns that can be exploited. This appears messy, random and unstructured to anyone immured in traditional ways of development. It might appear to lack rigour, but such an approach is in accord with messy, unpredictable reality. That means it offers a more rigorous and effective way of deriving requirements than we can get by pretending that every development belongs in the Obvious domain. A simple lesson I’ve had to learn and relearn over the years is that rigour and structure are not the same as heavy documentation, prescriptive methods and a linear, top-down approach to problem solving.

This all raises big questions for testers. How do we respond? How do we get involved in testing requirements that have been derived this way and indeed the resulting applications? Any response to those questions should take account of another theme that really struck me from Dave’s day in New York. That was the need for resilience.

Resilience

The crucial feature of complex adaptive systems is their unpredictability. Applications operating in such a space will inevitably be subject to problems and threats that we would never have predicted. Even where we can confidently predict the type of threat the magnitude will remain uncertain. Failure is inevitable. What matters is how the application responds.

The need for resilience, with its linked themes of tolerance, diversity and redundancy, was a recurring message in Dave’s class. Resilience is not the same as robustness. The example that Dave gave was that a seawall is robust but a salt marsh is resilient. A seawall is a barrier to large waves and storms. It protects the harbour behind, but if it fails it does so catastrophically. A salt marsh protects inland areas by acting as a buffer, absorbing storm waves rather than repelling them. It might deteriorate over time but it won’t fail suddenly and disastrously.

An increasing challenge for testers will be to look for information about how systems fail, and test for resilience rather than robustness. Tolerance for failure becomes more important than a vain attempt to prevent failure. This tolerance often requires greater redundancy. Stripping out redundancy and maximizing the efficiency of systems has a downside, as I’ve discovered in my career. Greater efficiency can make applications brittle and inflexible. When problems hit they hit hard and recovery can be difficult.

it could be worse - not sure how, but it could be

The six years I spent working as an IT auditor had a huge impact on my thinking. I learned that things would go wrong, that systems would fail, and that they’d do so in ways I couldn’t have envisaged. There is nothing like a spell working as an auditor to imbue one with a dark sense of realism about the possibility of perfection, or even adequacy. I ended up like the gloomy old pessimist Eeyore in Winnie the Pooh. When I returned to development work a friend once commented that she could always spot one of my designs. Like Eeyore I couldn’t be certain exactly how things would go wrong, I just knew they would and my experience had taught me where to be wary. I was destined to end up as a tester.

Liz Keogh, in this talk on Safe-to-Fail makes a similar point.

“Testers are really, really good at spotting failure scenarios… they are awesomely imaginative at calamity… Devs are problem solvers. They spot patterns. Testers spot holes in patterns… I have a theory that other people who are in critical positions, like compliance and governance people are also really good at this”.

Testers should have the creativity to imagine how things might go wrong. In a Complex domain, working with applications that have been developed working with Cynefin, this insight and imagination, the ability to spot potential holes, will be extremely valuable. Testers have to seize that opportunity to remain relevant.

There is an upside to redundancy. If there are different ways of achieving the same ends then that diversity will offer more scope for innovation, for users to learn about the application and how it could be adapted and exploited to do more than the developers had imagined. Again, this is an opportunity for testers. Stakeholders need to know about the application and what it can do. Telling them that the application complied with a set of requirements that might have been of dubious relevance and accuracy just doesn’t cut it.

Conclusion

Conclusion is probably the wrong word. Dave Snowden’s class opened my mind to a wide range of new ideas and avenues to explore. This was just the starting point. These two essays can’t go very far in telling you about Cynefin and how it might apply to software testing. All I can realistically do is make people curious to go and learn more for themselves, to explore in more depth. That is what I will be doing, and as a starter I will be in London at the end of June for the London Tester Gathering. I will be at the workshop An Introduction to Complexity and Cynefin for Software Testers” being run by Martin Hynie and Ben Kelly where I hope to discuss Cynefin with fellow testers and explorers.

If you are going to the CAST conference in Nashville in August you will have the chance to hear Dave Snowden giving a keynote speech. He really is worth hearing.

Dave Snowden’s Cynefin masterclass in New York, 2nd May 2017 – part 1

This is part one of a two post series on Cynefin and software testing. I wrote it with the support of the Committee on Standards and Professional Practices of the Association for Software Testing. The posts originally appeared on the AST site.

Introduction

On May 2nd I attended Dave Snowden’s masterclass in New York, “A leader’s framework for decision making: managing complex projects using Cynefin”, at the Test Leadership Congress. For several years I have been following Dave’s work and I was keen to hear him speak in person. Dave is a gifted communicator, but he moves through his material fast, very fast. In a full day class he threw out a huge range of information, insights and arguments. I was writing frantically throughout, capturing key ideas and phrases I could research in detail later.

It was an extremely valuable day. All of it was relevant to software development, and therefore indirectly to testing. However, it would require a small book to do justice to Dave’s ideas. I will restrict myself to two posts in which I will concentrate on a few key themes that struck me as being particularly important to the testing community.

Our worldview matters

We need to understand how the world works or we will fail to understand the problems we face. We won’t recognise what success might look like, nor will we be able to anticipate unacceptable failure till we are beaten over the head, and we will select the wrong techniques to address problems.it ain't what you don't know that gets you into trouble - it's what you know for sure that just ain't do

Dave used a slide with this quote from Mark Twain. It’s an important point. Software development and testing has been plagued over the years by unquestioned assumptions and beliefs that we were paid well to take for granted, without asking awkward questions, but which just ain’t so. And they’ve got us into endless trouble.

A persistent damaging feature of software development over the years has been the illusion that is a neater, more orderly process than it really is. We craved certainty, fondly imagining that if we just put a bit more effort and expertise into the upfront analysis and requirements then good, experienced professionals can predictably develop high quality applications. It hardly ever panned out that way, and the cruel twist was that the people who finally managed to crank out something workable picked up the blame for the lack of perfection.

Fred Brooks made the point superbly in his classic paper, “No Silver Bullet”.

“The truth is, the client does not know what he wants. The client usually does not know what questions must be answered, and he has almost never thought of the problem in the detail necessary for specification. … in planning any software-design activity, it is necessary to allow for an extensive iteration between the client and the designer as part of the system definition.

…… it is really impossible for a client, even working with a software engineer, to specify completely, precisely, and correctly the exact requirements of a modern software product before trying some versions of the product.”

So iteration is required, but that doesn’t mean simply taking a linear process and repeating it. Understanding and applying Cynefin does not mean tackling problems in familiar ways but with a new vocabulary. It means thinking about the world in a different way, drawing on lessons from complexity science, cognitive neuroscience and biological anthropology.

Cynefin and ISO 29119

Cynefin is not based on successful individual cases, or on ideology, or on wishful thinking. Methods that are rooted in successful cases are suspect because of the survivorship bias (how many failed projects did the same thing?), and because people do not remember clearly and accurately what they did after the event; they reinterpret their actions dependent on the outcome. Cynefin is rooted in science and the way things are, the way systems behave, and the way that people behave. Developing software is an activity carried out by humans, for humans, mostly in social organisations. If we follow methods that are not rooted in reality, in science and which don’t allow for the way people behave then we will fail.

Dave Snowden often uses the philosophical phrase “a priori”, usually in the sense of saying that something is wrong a priori. A priori knowledge is based on theoretical deduction, or on mathematics, or the logic of the language in which the proposition is stated. We can say that certain things are true or false a priori, without having to refer to experience. Knowledge based on experience is a posteriori.

The distinction is important in the debate over the software testing standard ISO 29119. The ISO standards lobby has not attempted to defend 29119 on either a priori or on a posteriori grounds. The standard has its roots in linear, document driven development methods that were conspicuously unsuccessful. ISO were unable to cite any evidence or experience to justify their approach.

Defenders of the standard, and some neutrals, have argued that critics must examine the detailed content of the standard, which is extremely expensive to purchase, in order to provide meaningful criticism. However, this defence is misconceived because the standard itself is misconceived. The standard’s stated purpose, “is to define an internationally agreed set of standards for software testing that can be used by any organization when performing any form of software testing”. If ISO believes that a linear, prescriptive standard like ISO 29119 will apply to “any form of software testing” we can refer to Cynefin and say that they are wrong; we can say so confidently knowing that our stance is backed by reputable science and theory. ISO is attempting to introduce a practice that might, sometimes at best, be appropriate for the Obvious domain into the Complicated and Complex domains where it is wildly unsuitable and damaging. ISO is wrong a priori.

What is Cynefin?

The Wikipedia article is worth checking out, not least because Dave Snowden keeps an eye on it. This short video presented by Dave is also helpful.

The Cynefin Framework might look like a quadrant, but it isn’t. It is a collection of five domains that are distinct and clearly defined in principle, but which blur into one another in practice.

In addition to the four domains that look like the cells of a quadrant there is a fifth, in the middle, called Disorder, and this one is crucial to an understanding of the framework and its significance.

Cynefin is not a categorisation model, as would be implied if it were a simple matrix. It is not a matter of dropping data into the framework then cracking on with the work. Cynefin is a framework that is designed to help us make sense of what confronts us, to give us a better understanding of our situation and the approaches that we should take.

The first domain is Obvious, in which there are clear and predictable causes and effects. The second is Complicated, which also has definite causes and effects, but where the connections are not so obvious; expert knowledge and judgement is required.

The third is Complex, where there is no clear cause and effect. We might be able to discern it with hindsight, but that knowledge doesn’t allow us to predict what will happen next; the system adapts continually. Dave Snowden and Mary Boone used a key phrase in their Harvard Business Review article about Cynefin.

”Hindsight does not lead to foresight because the external conditions and systems constantly change.”

The fourth domain is Chaotic. Here, urgent action rather than reflective analysis, is required. The participants must act, sense feedback and respond. Complex situations might be suited to safe probing, which can teach us more about the problem, but such probing is a luxury in the Chaotic domain.

The appropriate responses in all four of these domains are different. In Obvious, the categories are clearly defined, one simply chooses the right one, and that provides the right route to follow. Best practices are appropriate here.

In the Complicated domain there is no single, right category to choose. There could be several valid options, but an expert can select a good route. There are various good practices, but the idea of a single best practice is misconceived.

In the Complex domain it is essential to probe the problem and learn by trial and error. The practices we might follow will emerge from that learning. In Chaos as I mentioned, we simply have to start with action, firefighting to stop the situation getting worse. It is helpful to remember that, instead of the everyday definition, chaos in Cynefin terms refer to the concept in physics. Here chaos refers to a system that it is so dynamic that minor variations in initial conditions lead to outcomes so dramatically divergent that the system is unpredictable. In some circumstances it makes sense to make a deliberate and temporary move into Chaos to learn new practice. That would require removing constraints and the connections that impose some sort of order.

The fifth domain is that of Disorder, in the middle of the diagram. This is the default position in a sense. It’s where we find ourselves when we don’t know which domain we should really be in. It’s therefore the normal starting point. The great danger is that we don’t choose the appropriate domain, but simply opt for the one that fits our instincts or our training, or that is aligned with the organisation’s traditions and culture, regardless of the reality.

The only stable domains are Obvious, Complicated and Complex. Chaotic and Disorder are transitional. You don’t (can’t) stay there. Chaotic is transitional because constraints will kick in very quickly, almost as a reflex. Disorder is transitional because you are actually in one of the other domains, but you just don’t know it.

The different domains have blurred edges. In any context there might be elements that fit into different domains if they are looked at independently. That isn’t a flaw with Cynefin. It merely reflects reality. As I said, Cynefin is not a neat categorisation model. It is intended to help us make sense of what we face. If reality is messy and blurred then there’s no point trying to force it into a straitjacket.

Many projects will have elements that are Obvious, that deal with a problem that is well understood, that we have dealt with before and whose solution is familiar and predictable. However, these are not the parts of a project that should shape the approach we take. The parts where the potential value, and the risk, lie are where we are dealing with something we have not done before. Liz Keogh has given many talks and written some very good blogs and articles about applying Cynefin to software development. Check out her work. This video is a good starter.

The boundaries between the domains are therefore fuzzy, but there is one boundary that is fundamentally different from the others; the border between Obvious and Chaotic. This is not really a boundary at all. It is more of a cliff. If you move from Obvious to Chaotic you don’t glide smoothly into a subtly changing landscape. You fall off the cliff.

Within the Obvious domain the area approaching the cliff is the complacent zone. Here, we think we are working in a neat, ordered environment and “we believe our own myths” as Snowden puts it in the video above. The reality is quite different and we are caught totally unaware when we hit a crisis and plunge off the cliff into chaos.

That was a quick skim through Cynefin. However, you shouldn’t think of it as being a static framework. If you are going to apply it usefully you have to understand the dynamics of the framework, and I will return to that in part two.

Auditors and testing – a rant justified by experience

A couple of weeks ago I was drawn into a discussion on Twitter about auditors and testing. At the time I was on holiday in a delightfully faraway part of Galloway, in south west Scotland.

One of the attractions of the cottage where we were staying was that it lacked a mobile (cell) phone signal, never mind internet access. Only when we happened to be in a pub or restaurant could I sneak onto wifi discreetly, without incurring a disapproving look from my wife.

Having worked as both an IT auditor and a tester, and having both strong opinions and an argumentative nature, I had plenty to say on the subject. That had to wait till I returned (via New York, but that’s another subject) when I unleashed a rant on Twitter. Here is that thread, in a more readable format. It might be a rant, but it is based on extensive experience.

Auditors looking for items they can check that MUST be called test cases? That’s a big, flashing, warning sign they have a lousy conceptual grasp of auditing. It’s true, but missing the point, to say that’s old fashioned. It’s like saying the problem with ISO 29119 is it’s old fashioned.

The crucial point is it’s bad, unprofessional auditing. The company that taught me to audit was promoting good auditing 30 years ago. If anyone had remained ignorant of the transformation in software development in the last 30 years you’d call them idiots, not old-fashioned.

A test case is just a name for a receptacle. It’s a bucket of ideas. Who cares about the bucket? Ideas and evidence really matter to auditors, who live and die by evidence; they expect compelling evidence that the auditees have been thinking about what they are doing. A lack of useful evidence showing what testing has been performed, or a lack of thought about how to test should be certain ways to attract criticism from auditors. The IT auditors’ governance model COBIT5 mentions “test cases” once (in passing). It mentions “ideas” 32 times & “evidence” 16 times.

COBIT5 isn’t just about testing of course. Its principles apply across the whole range of IT and testing is no exception. Auditors should expect testers to have;

  • a clear vision or strategy of how testing should be performed in their organisation,
  • a clear (but not necessarily detailed) plan for testing the product,
  • relevant, contemporary evidence that justifies and leads inescapably to the conclusions, lessons and insights that the testers derived and reported from their testing.

That’s what auditors should expect. Some (or many?) organisations are locked into a pattern of low quality and low value auditing. They define auditing as brainless compliance checking that is performed by low quality staff who don’t understand what they’re doing. Their work is worthless. As a result audit is held in low esteem in the organisation. Smart people don’t want to work there. Therefore audit must be defined in such a way that low quality staff are able to carry it out.

This is inexcusable. At best it is negligence. Maintaining that model of auditing requires willful ignorance of what the audit profession stipulates. It is damaging and contributes towards the creation of a dysfunctional culture. Nevertheless it is cheap and ensures there are no good auditors who might pose uncomfortable, challenging questions to senior managers.

However, this doesn’t mean there are never times when auditors do need to see test cases. If a contract has been stupidly written so that test cases must be produced and visible then there’s no wriggle room. It’s just the same (and just as stupid) as if the contract says testers must wear pink shirts. It might be stupid but it is a contractual deliverable; auditors will want to see proof of compliance. As Griffin Jones pointed out on seeing my tweet, “often (the contract) is stupidly written – thus the need to get involved with the contracting organization. The problem is bigger than test or SW dev”.

I fully agree with Griffin. Testers should get involved in contractual discussions that will influence their work, in order to anticipate and head off unhelpful contractual terms.

I would add that testers should ask to see the original contract. Contractual terms are sometimes misinterpreted as they are passed through the organisation to the testers. It might be possible to produce the required evidence by smarter means.

Apart from such tiresome contractual requirements, demanding to see “test cases” is a classic case of confusing form and content. It’s unprofessional. That’s not just my opinion; it’s not novel or radical. It’s simply orthodox, professional opinion. Anyone who says otherwise is clueless or bullshitting. Either way they must be resisted. Clueless bullshitters can enjoy good, lucrative careers, but do huge damage. I’ve no respect for them.

The US Food and Drug Administration’s “General Principles of Software Validation” do pose a problem. They date back to 1997, updated in 2002. They are creakily old. They mention test cases many times, but they were written when it was assumed that testing meant writing test cases. The term seems to be used as jargon for tests. If testing satisfies FDA criteria then there’s no obvious reason why you can’t just call planned tests “test cases”.

There’s no requirement to produce test scripts as well as test cases, but expected results with objective pass/fail criteria are required. That doesn’t, and mustn’t, mean testers should be looking only for the expected results. The underlying principle is that compliance should follow the “least burdensome” approach and the FDA do say that they are open to considering alternative approaches to comply with the requirements in a way that is less burdensome.

Further, the FDA does not have a problem with Agile development (PDF, opens in new tab), and they also do approve of exploratory testing, as explained by Michael Bolton and James Bach.

Frozen in time – grammar and testing standards

This recent tweet by Tyler Hayes caught my eye. “If you build software you’re an anthropologist whether you like it or not.”

It’s an interesting point, and it’s relevant on more than one level. By and large software is developed by people and for people. That is a statement of the obvious, but developers and testers have generally been reluctant to take on board the full implications. This isn’t a simple point about usability. The software we build is shaped by many assumptions about the users, and how they live and work. In turn, the software can reinforce existing structures and practices. Testers should think about these issues if they’re to provide useful findings to the people who matter. You can’t learn everything you need to know from a requirements specification. This takes us deep into anthropological territory.

What is anthropology?

Social anthropology is defined by University College London as follows.

Social Anthropology is the comparative study of the ways in which people live in different social and cultural settings across the globe. Societies vary enormously in how they organise themselves, the cultural practices in which they engage, as well as their religious, political and economic arrangements.

We build software in a social, economic and cultural context that is shaped by myriad factors, which aren’t necessarily conducive to good software, or a happy experience for the developers and testers, never mind the users. I’ve touched on this before in “Teddy Bear Methods“.

There is much that we can learn from anthropology, and not just to help us understand what we see when we look out at the users and the wider world. I’ve long thought that the software development and testing community would make a fascinating subject for anthropologists.

Bureaucracy, grammar and deference to authority

I recently read “The Utopia of Rules – On Technology, Stupidity, and the Secret Joys of Bureaucracy” by the anthropologist David Graeber.
Graeber has many fascinating insights and arguments about how organisations work, and why people are drawn to bureaucracy. One of his arguments is that regulation is imposed and formalised to try and remove arbitrary, random behaviour in organisations. That’s a huge simplification, but there’s not room here to do Graeber’s argument justice. One passage in particular caught my eye.

People do not invent languages by writing grammars, they write grammars — at least, the first grammars to be written for any given language — by observing the tacit, largely unconscious, rules that people seem to be applying when they speak. Yet once a book exists,and especially once it is employed in schoolrooms, people feel that the rules are not just descriptions of how people do talk, but prescriptions for how they should talk.

It’s easy to observe this phenomenon in places where grammars were only written recently. In many places in the world, the first grammars and dictionaries were created by Christian missionaries in the nineteenth or even twentieth century, intent on translating the Bible and other sacred texts into what had been unwritten languages. For instance, the first grammar for Malagasy, the language spoken in Madagascar, was written in the 1810s and ’20s. Of course, language is changing all the time, so the Malagasy spoken language — even its grammar — is in many ways quite different than it was two hundred years ago. However, since everyone learns the grammar in school, if you point this out, people will automatically say that speakers nowadays are simply making mistakes, not following the rules correctly. It never seems to occur to anyone — until you point it out — that had the missionaries came and written their books two hundred years later, current usages would be considered the only correct ones, and anyone speaking as they had two hundred years ago would themselves be assumed to be in error.

In fact, I found this attitude made it extremely difficult to learn how to speak colloquial Malagasy. Even when I hired native speakers, say, students at the university, to give me lessons, they would teach me how to speak nineteenth-century Malagasy as it was taught in school. As my proficiency improved, I began noticing that the way they talked to each other was nothing like the way they were teaching me to speak. But when I asked them about grammatical forms they used that weren’t in the books, they’d just shrug them off, and say, “Oh, that’s just slang, don’t say that.”

…The Malagasy attitudes towards rules of grammar clearly have… everything to do with a distaste for arbitrariness itself — a distaste which leads to an unthinking acceptance of authority in its most formal, institutional form.

Searching for the “correct” way to develop software

Graeber’s phrase “distate for arbitrariness itself” reminded me of the history of software development. In the 1960s and 70s academics and theorists agonised over the nature of development, trying to discover and articulate what it should be. Their approach was fundamentally mistaken. There are dreadful ways, and there are better ways to develop software but there is no natural, correct way that results in perfect software. The researchers assumed that there was and went hunting for it. Instead of seeking understanding they carried their assumptions about what the answer might be into their studies and went looking for confirmation.

They were trying to understand how the organisational machine worked and looked for mechanical processes. I use the word “machine” carefully, not as a casual metaphor. There really was an assumption that organisations were, in effect, machines. They were regarded as first order cybernetic entities whose behaviour would not vary depending on whether they were being observed. To a former auditor like myself this is a ludicrous assumption. The act of auditing an organisation changes the way that people behave. Even the knowledge that an audit may occur will shape behaviour, and not necessarily for the better (see my article “Cynefin, testing and auditing“). You cannot do the job well without understanding that. Second order cybernetics does recognise this crucial problem and treats observers as participants in the system.

So linear, sequential development made sense. The different phases passing outputs along the production line fitted their conception of the organisation as a machine. Iterative, incremental development looked messy and immature; it was just wrong as far as the researchers were concerned. Feeling one’s way to a solution seemed random, unsystematic – arbitrary.

Development is a difficult and complex job; people will tend to follow methods that make the job feel easier. If managers are struggling with the complexities of managing large projects they are more likely to choose linear, sequential methods that make the job of management easier, or at least less stressful. So when researchers saw development being carried out that way they were observing human behaviour, not a machine operating.

Doubts about this approach were quashed by pointing out that if organisations weren’t quite the neat machine that they should be this would be solved by the rapid advance in the use of computers. This argument looks suspiciously circular because the conclusion that in future organisations would be fully machine-like rests on the unproven premise that software development is a mechanical process which is not subject to human variability when performed properly.

Eliminating “arbitrariness” and ignoring the human element

This might all have been no more than an interesting academic sideline, but it fed back into software development. By the 1970s, when these studies into the nature of development were being carried out, organisations were moving towards increasingly formalised development methods. There was increasing pressure to adopt such methods. Not only were they attractive to managers, the use of more formal methods provided a competitive advantage. ISO certification and CMMI accreditation were increasingly seen as a way to demonstrate that organisations produced high quality software. The evidence may have been weak, but it seemed a plausible claim. These initiatives required formal processes. The sellers of formal methods were happy to look for and cite any intellectual justification for their products. So formal linear methods were underpinned by academic work that assumed that formal linear methods were correct. This was the way that responsible, professional software development was performed. ISO standards were built on this assumption.

If you are trying to define the nature of development you must acknowledge that it is a human activity, carried out by and for humans. These studies about the nature of development were essentially anthropological exercises, but the researchers assumed they were observing and taking apart a machine.

As with the missionaries who were codifying grammar the point in time when these researchers were working shaped the result. If they had carried out their studies earlier in the history of software development they might have struggled to find credible examples of formalised, linear development. In the 1950s software development was an esoteric activity in which the developers could call the shots. 20 years later it was part of the corporate bureaucracy and iterative, incremental development was sidelined. If the studies can been carried out a few decades further on then it would have been impossible to ignore Agile.

As it transpired, formal methods, CMM/CMMI and the first ISO standards concerning development and testing were all creatures of that era when organisations and their activities were seriously regarded as mechanical. Like the early Malagasy grammar books they codified and fossilised a particular, flawed approach at a particular time for an activity that was changing rapidly. ISO 29119 is merely an updated version of that dated approach to testing. It is rooted in a yearning for bureaucratic certainty, a reluctance to accept that ultimately good testing is dependent not on documentation, but on that most irrational, variable and unpredictable of creatures – the human who is working in a culture shaped by humans. Anthropology has much to teach us.

Further reading

That is the end of the essay, but there is a vast amount of material you could read about attempts to understand and define the nature of software development and of organisations. Here is a small selection.

Brian Fitzgerald has written some very interesting articles about the history of development. I recommend in particular “The systems development dilemma: whether to adopt formalised systems development methodologies or not?” (PDF, opens in new tab).

Agneta Olerup wrote this rather heavyweight study of what she calls the
Langeforsian approach to information systems design. Börje Langefors was a highly influential advocate of the mechanical, scientific approach to software development. Langefors’ Wikipedia entry describes him as “one of those who made systems development a science”.

This paper, by Francis Heylighen and Cliff Joslyn, gives a good introduction to first and second order cybernetics (PDF, opens in new tab), including a useful warning about the distinction between models and the entities that they attempt to represent.

All our knowledge of systems is mediated by our simplified representations—or models—of them, which necessarily ignore those aspects of the system which are irrelevant to the purposes for which the model is constructed. Thus the properties of the systems themselves must be distinguished from those of their models, which depend on us as their creators. An engineer working with a mechanical system, on the other hand, almost always know its internal structure and behavior to a high degree of accuracy, and therefore tends to de-emphasize the system/model distinction, acting as if the model is the system.

Moreover, such an engineer, scientist, or “first-order” cyberneticist, will study a system as if it were a passive, objectively given “thing”, that can be freely observed, manipulated, and taken apart. A second-order cyberneticist working with an organism or social system, on the other hand, recognizes that system as an agent in its own right, interacting with another agent, the observer.

Finally, I recommend a fascinating article in the IEEE’s Computer magazine by Craig Larman and Victor Basili, “Iterative and incremental development: a brief history” (PDF, opens in new tab). Larman and Basili argue that iterative and incremental development is not a modern practice, but has been carried out since the 1950s, though they do acknowledge that it was subordinate to the linear Waterfall in the 1970s and 80s. There is a particularly interesting contribution from Gerald Weinberg, a personal communication to the authors, in which he describes how he and his colleagues developed software in the 1950s. The techniques they followed were “indistinguishable from XP”.

Standards – a charming illusion of action

The other day I posted an article I’d written that appeared on the uTest blog a few weeks ago. It was a follow up to an article I wrote last year about ISO 29119. Pmhut (the Project Management Hut website) provided an interesting comment.

”…are you sure that the ISO standards will be really enforced on testing – notably if they don’t really work? After all, lawyers want to get paid and clients want their projects done (regardless of how big the clients are).”

Well, as I answered, whether or not ISO 29119 works is, in a sense, irrelevant. Whether or not it is adopted and enforced will not depend on its value or efficacy. ISO 29119 might go against the grain of good software development and testing, but it is very much aligned with a hugely pervasive trend in bureaucratic, corporate life.

I pointed the commenter to an article I wrote on “Teddy Bear Methods”. People cling to methods not because they work, but because they gain comfort from doing so. That is the only way they can deal with difficult, stressful jobs in messy and complex environments. I could also have pointed to this article “Why do we think we’re different?”, in which I talk about goal displacement, our tendency to focus on what we can manage while losing sight of what we’re supposed to be managing.

A lesson from Afghanistan

I was mulling over this when I started to read a fascinating looking book I was given at Christmas; “Heirs to Forgotten Kingdoms” by Gerard Russell, a deep specialist in the Middle East and a fluent Arabic and Farsi speaker.

The book is about minority religions in the Middle East. Russell is a former diplomat in the British Foreign Office. The foreword was by Rory Stewart, the British Conservative MP. Stewart was writing about his lack of surprise that Russell, a man deeply immersed in the culture of the region, had left the diplomatic service, then added;

”Foreign services and policy makers now want ‘management competency’ – slick and articulate plans, not nuance, deep knowledge, and complexity.”

That sentence resonated with me, and reminded me of a blistering passage from Stewart’s great book “The Places in Between”, his account of walking through the mountains of Afghanistan in early 2002 in the immediate aftermath of the expulsion of the Taliban and the NATO intervention.

Rory Stewart is a fascinating character, far removed from the modern identikit politician. The book is almost entirely a dispassionate account of his adventures and the people whom he met and who provided him with hospitality. Towards the end he lets rip, giving his brutally honest and well-informed perspective of the inadequacies of the western, bureaucratic, managerial approach to building a democratic state where none had previously existed.

It’s worth quoting at some length.

“I now had half a dozen friends working in embassies, thinktanks, international development agencies, the UN and the Afghan government, controlling projects worth millions of dollars. A year before they had been in Kosovo or East Timor and in a year’s time they would have been moved to Iraq or Washington or New York.

Their objective was (to quote the United Nations Assistance Mission for Afghanistan) ‘The creation of a centralised, broad-based, multi-ethnic government committed to democracy, human rights and the rule of law’. They worked twelve- or fourteen- hour days, drafting documents for heavily-funded initiatives on ‘democratisation’, ‘enhancing capacity’, ‘gender’, ‘sustainable development,’ ‘skills training’ or ‘protection issues’. They were mostly in their late twenties or early thirties, with at least two degrees – often in international law, economics or development. They came from middle class backgrounds in Western countries and in the evenings they dined with each other and swapped anecdotes about corruption in the Government and the incompetence of the United Nations. They rarely drove their 4WDs outside Kabul because they were forbidden to do so by their security advisers. There were people who were experienced and well informed about conditions in rural areas of Afghanistan. But such people were barely fifty individuals out of many thousands. Most of the policy makers knew next to nothing about the villages where 90% of the population of Afghanistan lived…

Their policy makers did not have the time, structures or resources for a serious study of an alien culture. They justified their lack of knowledge and experience by focusing on poverty and implying that dramatic cultural differences did not exist. They acted as though villagers were interested in all the priorities of international organisations, even when they were mutually contradictory…

Critics have accused this new breed of administrators of neo-colonialism. But in fact their approach is not that of a nineteenth-century colonial officer. Colonial administrations may have been racist and exploitative but they did at least work seriously at the business of understanding the people they were governing. They recruited people prepared to spend their entire careers in dangerous provinces of a single alien nation. They invested in teaching administrators and military officers the local language…

Post-conflict experts have got the prestige without the effort or stigma of imperialism. Their implicit denial of the difference between cultures is the new mass brand of international intervention. Their policy fails but no one notices. There are no credible monitoring bodies and there is no one to take formal responsibility. Individual officers are never in any one place and rarely in any one organisation long enough to be adequately assessed. The colonial enterprise could be judged by the security or revenue it delivered, but neo-colonialists have no such performance criteria. In fact their very uselessness benefits them. By avoiding any serious action or judgement they, unlike their colonial predecessors, are able to escape accusations of racism, exploitation and oppression.

Perhaps it is because no one requires more than a charming illusion of action in the developing world. If the policy makers know little about the Afghans, the public knows even less, and few care about policy failure when the effects are felt only in Afghanistan.”

Stewart’s experience and insight, backed up by the recent history of Afghanistan, allow him to present an irrefutable case. Yet, in the eyes of pretty much everyone who matters he is wrong. Governments and the military are prepared to ignore the evidence and place their trust in irrelevant and failed techniques rather than confront the awful truth; they don’t know what they’re doing and they can’t know the answers.

Vast sums of money, and millions of lives are at stake. Yet very smart and experienced people will cling on to things that don’t work, and will repeat their mistakes in the future. Stewart, meanwhile, is very unlikely to be allowed anywhere near the levers of power in the United Kingdom. Being right isn’t necessarily a great career move.

Deep knowledge, nuance and complexity

I’m conscious that I’m mixing up quite different subjects here. Software development and testing are very different activities from state building. However, both are complex and difficult. Governments fail repeatedly at something as important and high-profile as constructing new, democratic states, and do so without feeling the need to reconsider their approach. If that can happen in the glare of publicity is it likely that corporations will refrain from adopting and enforcing standards just because they don’t work? Whether or not they work barely matters. Such approaches fit the mindset and culture of many organisations, especially large bureaucracies, and once adopted it is very difficult to persuade them to abandon them.

Any approach to testing that is based on standardisation is doomed to fail unless you define success in a way that is consistent with the flawed assumptions of the standardisation. What’s the answer? Not adopting standards that don’t work is an obvious start, but that doesn’t take you very far. You’ve got to acknowledge those things that Stewart referred to in his foreword to Gerard Russell’s book; answers aren’t easy, they require deep knowledge, an understanding of nuance and an acceptance of complexity.

A video worth watching

Finally, I’d strongly recommend this video of Rory Stewart being interviewed by Harry Kreisler of the University of California about his experiences and the problems I’ve been discussing. I’ve marked the parts I found most interesting.

34 minutes; Stewart is asked about applying abstract ideas in practice.

40:20; Stewart talks about a modernist approach of applying measurement, metrics and standardisation in contexts where they are irrelevant.

47:05; Harry Kreisler and then Stewart talk about participants failing to spot the obvious, that their efforts are futile.

49:33; Stewart says his Harvard students regarded him as a colourful contrarian. They believed that all Afghanistan needed was a new plan and new resources.

Service Virtualization interview about usability

Service VirtualizationThis interview with Service Virtualization appeared in January 2015. Initially when George Lawton approached me I wasn’t enthusiastic. I didn’t think I would have much to say. However, the questions set me thinking, and I felt they were relevant to my experience so I was happy to take part. It gave me something to do while I was waiting to fly back from EuroSTAR in Dublin!

How does usability relate to the notion of the purpose of a software project?

When I started in IT over 30 years ago I never heard the word usability. It was “user friendliness”, but that was just a nice thing to have. It was nice if your manager was friendly, but that was incidental to whether he was actually good at the job. Likewise, user friendliness was incidental. If everything else was ok then you could worry about that, but no-one was going to spend time or money, or sacrifice any functionality just to make the application user friendly. And what did “user friendly” mean anyway. “Who knows? Who cares? We’ve got serious work do do. Forget about that touchy feely stuff.”

The purpose of software development was to save money by automating clerical routines. Any online part of the system was a mildly anomalous relic of the past. It was just a way of getting the data into the system so the real work could be done. Ok, that’s an over-simplification, but I think there’s enough truth in it to illustrate why developers just didn’t much care about the users and their experience. Development moved on from that to changing the business, rather than merely changing the business’s bureaucracy, but it took a long time for these attitudes to shift.

The internet revolution turned everything upside down. Users are no longer employees who have to put up with whatever they’re given. They are more likely to be customers. They are ruthless and rightly so. Is your website confusing? Too slow to load? Your customers have gone to your rivals before you’ve even got anywhere near their credit card number.

The lesson that’s been getting hammered into the heads of software engineers over the last decade or so is that usability isn’t an extra. I hate the way that we traditionally called it a “non-functional requirement”, or one of the “quality criteria”. Usability is so important and integral to every product that telling developers that they’ve got to remember it is like telling drivers they’ve got to remember to use the steering wheel and the brakes. If they’re not doing these things as a matter of course they shouldn’t be allowed out in public. Usability has to be designed in from the very start. It can’t be considered separately.

What are the main problems in specifying for and designing for software usability?

Well, who’s using the application? Where are they? What is the platform? What else are they doing? Why are they using the application? Do they have an alternative to using your application, and if so, how do you keep them with yours? All these things can affect decisions you take that are going to have a massive impact on usability.

It’s payback time for software engineering. In the olden days it would have been easy to answer these questions, but we didn’t care. Now we have to care, and it’s all got horribly difficult.

These questions require serious research plus the experience and nous to make sound judgements with imperfect evidence.

In what ways do organisations lose track of the usability across the software development lifecycle?

I’ve already hinted at a major reason. Treating usability as a non-functional requirement or quality criterion is the wrong approach. That segregates the issue. It’s treated as being like the other quality criteria, the “…ities” like security, maintainability, portability, reliability. It creates the delusion that the core function is of primary importance and the other criteria can be tackled separately, even bolted on afterwards.

Lewis & Rieman came out with a great phrase fully 20 years ago to describe that mindset. They called it the peanut butter theory of usability. You built the application, and then at the end you smeared a nice interface over the top, like a layer of peanut butter (PDF, opens in new tab).

“Usability is seen as a spread that can be smeared over any design, however dreadful, with good results if the spread is thick enough. If the underlying functionality is confusing, then spread a graphical user interface on it. … If the user interface still has some problems, smear some manuals over it. If the manuals are still deficient, smear on some training which you force users to take.”

Of course they were talking specifically about the idea that usability was a matter of getting the interface right, and that it could be developed separately from the main application. However, this was an incredibly damaging fallacy amongst usability specialists in the 80s and 90s. There was a huge effort to try to justify this idea by experts like Hartson & Hix, Edmonds, and Green. Perhaps the arrival of Object Oriented technology contributed towards the confusion. A low level of coupling so that different parts of the system are independent of each other is a good thing. I wonder if that lured usability professionals into believing what they wanted to believe, that they could be independent from the grubby developers.

Usability professionals tried to persuaded themselves that they could operate a separate development lifecycle that would liberate them from the constraints and compromises that would be inevitable if they were fully integrated into development projects. The fallacy was flawed conceptually and architecturally. However, it was also a politically disastrous approach. The usability people made themselves even less visible, and were ignored at a time when they really needed to be getting more involved at the heart of the development process.

As I’ve explained, the developers were only too happy to ignore the usability people. They were following methods and lifecycles that couldn’t easily accommodate usability.

How can organisations incorporate the idea of usability engineering into the software development and testing process?

There aren’t any right answers, certainly none that will guarantee success. However, there are plenty of wrong answers. Historically in software development we’ve kidded ourselves thinking that the next fad, whether Structured Methods, Agile, CMMi or whatever, will transform us into rigorous, respected professionals who can craft high quality applications. Now some (like Structured Methods) suck, while others (like Agile) are far more positive, but the uncomfortable truth is that it’s all hard and the most important thing is our attitude. We have to acknowledge that development is inherently very difficult. Providing good UX is even harder and it’s not going to happen organically as a by-product of some over-arching transformation of the way we develop. We have to consciously work at it.

Whatever the answer is for any particular organisation it has to incorporate UX at the very heart of the process, from the start. Iteration and prototyping are both crucial. One of the few fundamental truths of development is that users can’t know what they want and like till they’ve seen what is possible and what might be provided.

Even before the first build there should have been some attempt to understand the users and how they might be using the proposed product. There should be walkthroughs of the proposed design. It’s important to get UX professionals involved, if at all possible. I think developers have advanced to the point that they are less likely to get it horribly wrong, but actually getting it right, and delivering good UX is asking too much. For that I think you need the professionals.

I do think that Agile is much better suited to producing good UX than traditional methods, but there are still dangers. A big one is that many Agile developers are understandably sceptical about anything that smells of Big Up-Front Analysis and Design. It’s possible to strike a balance and learn about your users and their needs without committing to detailed functional requirements and design.

How can usability relate to the notion of testable hypothesis that can lead to better software?

Usability and testability go together naturally. They’re also consistent with good development practice. I’ve worked on, or closely observed, many applications where the design had been fixed and the build had been completed before anyone realised that there were serious usability problems, or that it would be extremely difficult to detect and isolate defects, or that there would be serious performance issues arising from the architectural choices that had been made.

We need to learn from work that’s been done with complexity theory and organisation theory. Developing software is mostly a complex activity, in the sense that there are rarely predictable causes and effects. Good outcomes emerge from trialling possible solutions. These possibilities aren’t just guesswork. They’re based on experience, skill, knowledge of the users. But that initial knowledge can’t tell you the solution, because trying different options changes your understanding of the problem. Indeed it changes the problem. The trials give you more knowledge about what will work. So you have to create further opportunities that will allow you to exploit that knowledge. It’s a delusion that you can get it right first time just by running through a sequential process. It would help if people thought of good software as being grown rather than built.