Do standards keep testers in the kindergarten? (2009)

Testing ExperienceThis article appeared in the December 2009 edition of Testing Experience magazine, which is no longer published. I’m moving the article onto my blog from my website, which will be decommissioned soon.

Normally when I re-post old articles I provide a warning about them being dated. This one was written in November 2009 but I think that its arguments are still valid. It is only dated in the sense that it doesn’t mention ISO 29119, the current ISO software testing standard, which was released in 2013. This article shows why I was dismayed when ISO 29119 arrived on the scene. I thought that prescriptive testing standards, such as IEEE 829, had had their day. They had failed and we had moved on.

The references in the article were all structured for a paper magazine. There are no hyperlinks and I have not tried to recreate them and check out whether they still work.
kindergarten

The article

Discussion of standards usually starts from the premise that they are intrinsically a good thing, and the debate then moves on to consider what form they should take and how detailed they should be.

Too often sceptics are marginalised. The presumption is that standards are good and beneficial. Those who are opposed to them appear suspect, even unprofessional.

I believe that although the content of standards for software development and testing can be valuable, especially within individual organisations, I do not believe that they should be regarded as generic “standards” for the whole profession. Turning useful guidelines into standards suggests that they should be mandatory.

My particular concern is that the IEEE 829 “Standard for Software and System Test Documentation”, and the many document templates derived from it, encourage a safety first approach to documentation, with testers documenting plans and scripts in slavish detail.

They do so not because the project genuinely requires it, but because they have been encouraged to equate documentation with quality, and they fear that they will look unprofessional and irresponsible in a subsequent review or audit. I think these fears are ungrounded and I will explain why.

A sensible debate about the value of standards must start with a look at what standards are, and the benefits that they bring in general, and specifically to testing.

Often discussion becomes confused because justification for applying standards in one context is transferred to a quite different context without any acknowledgement that the standards and the justification may no longer be relevant in the new context.

Standards can be internal to a particular organisation or they can be external standards attempting to introduce consistency across an industry, country or throughout the world.

I’m not going to discuss legal requirements enforcing minimum standards of safety, such as Health and Safety legislation, or the requirements of the US Food & Drug Administration. That’s the the law, and it’s not negotiable.

The justification for technical and product standards is clear. Technical standards introduce consistency, common protocols and terminology. They allow people, services and technology to be connected. Product standards protect consumers and make it easier for them to distinguish cheap, poor quality goods from more expensive but better quality competition.

Standards therefore bring information and mobility to the market and thus have huge economic benefits.

It is difficult to see where standards for software development or testing fit into this. To a limited extent they are technical standards, but only so far as they define the terminology, and that is a somewhat incidental role.

They appear superficially similar to product standards, but software development is not a manufacturing process, and buyers of applications are not in the same position as consumers choosing between rival, inter-changeable products.

Are software development standards more like the standards issued by professional bodies? Again, there’s a superficial resemblance. However, standards such as Generally Accepted Accounting Principles (Generally Accepted Accounting Practice in the UK) are backed up by company law and have a force no-one could dream of applying to software development.

Similarly, standards of professional practice and competence in the professions are strictly enforced and failure to meet these standards is punished.

Where does that leave software development standards? I do believe that they are valuable, but not as standards.

Susan Land gave a good definition and justification for standards in the context of software engineering in her book “Jumpstart CMM-CMMI Software Process Improvements – using IEEE software engineering standards”. [1]

“Standards are consensus-based documents that codify best practice. Consensus-based standards have seven essential attributes that aid in process engineering. They;

  1. Represent the collected experience of others who have been down the same road.
  2. Tell in detail what it means to perform a certain activity.
  3. Help to assure that two parties attach the same meaning to an engineering activity.
  4. Can be attached to or referenced by contracts.
  5. Improve the product.
  6. Protect the business and the buyer.
  7. Increase professional discipline.” (List sequence re-ordered from original).

The first four justifications are for standards in a descriptive form, to aid communication. Standards of this type would have a broader remit than the technical standards I referred to, and they would be guidelines rather than prescriptive. These justifications are not controversial, although the fourth has interesting implications that I will return to later.

The last three justifications hint at compulsion. These are valid justifications, but they are for standards in a prescriptive form and I believe that these justifications should be heavily qualified in the context of testing.

I believe that where testing standards have value they should be advisory, and that the word “standard” is unhelpful. “Standards” implies that they should be mandatory, or that they should at least be considered a level of best practice to which all practitioners should aspire.

Is the idea of “best practice” useful?

I don’t believe that software development standards, specifically the IEEE series, should be mandatory, or that they can be considered best practice. Their value is as guidelines, which would be a far more accurate and constructive term for them.

I do believe that there is a role for mandatory standards in software development. The time-wasting shambles that is created if people don’t follow file naming conventions is just one example. Secure coding standards that tell programmers about security flaws that they must not introduce into their programs are also a good example of standards that should be mandatory.

However, these are local, site-specific standards. They are about consistency, security and good housekeeping, rather than attempting to define an over-arching vision of “best practice”.

Testing standards should be treated as guidelines, practices that experienced practitioners would regard as generally sound and which should be understood and regarded as the default approach by inexperienced staff.

Making these practices mandatory “standards”, as if they were akin to technical or product standards and the best approach in any situation, will never ensure that experienced staff do a better job, and will often ensure they do a worse job than if they’d been allowed to use their own judgement.

Testing consultant Ben Simo, has clear views on the notion of best practice. He told me;

“‘Best’ only has meaning in context. And even in a narrow context, what we think is best now may not really be the best.

In practice, ‘best practice’ often seems to be either something that once worked somewhere else, or a technical process required to make a computer system do a task. I like for words to mean something. If it isn’t really best, let’s not call it best.

In my experience, things called best practices are falsifiable as not being best, or even good, in common contexts. I like guidelines that help people do their work. The word ‘guideline’ doesn’t imply a command. Guidelines can help set some parameters around what and how to do work and still give the worker the freedom to deviate from the guidelines when it makes sense.”

Rather than tie people’s hands and minds with standards and best practices, I like to use guidelines that help people think and communicate lessons learned – allowing the more experienced to share some of their wisdom with the novices.”

Such views cannot be dismissed as the musings of maverick testers who can’t abide the discipline and order that professional software development and testing require.

Ben is the President of the Association of Software Testing. His comments will be supported by many testers who see how it matches their own experience. Also, there has been some interesting academic work that justify such scepticism about standards. Interestingly, it has not come from orthodox IT academics.

Lloyd Roden drew on the work of the Dreyfus brothers as he presented a powerful argument against the idea of “best practice” at Starwest 2009 and the TestNet Najaarsevent. Hubert Dreyfus is a philosopher and psychologist and Stuart Dreyfus works in the fields of industrial engineering and artificial intelligence.

In 1980 they wrote an influential paper that described how people pass through five levels of competence as they move from novice to expert status, and analysed how rules and guidelines helped them along the way. The five level of the Dreyfus Model of Skills Acquisition can be summarised as follows.

  1. Novices require rules that can be applied in narrowly defined situations, free of the wider context.
  2. Advanced beginners can work with guidelines that are less rigid than the rules that novices require.
  3. Competent practitioners understand the plan and goals, and can evaluate alternative ways to reach the goal.
  4. Proficient practitioners have sufficient experience to foresee the likely result of different approaches and can predict what is likely to be the best outcome.
  5. Experts can intuitively see the best approach. Their vast experience and skill mean that rules and guidelines have no practical value.

For novices the context of the problem presents potentially confusing complications. Rules provide clarity. For experts, understanding the context is crucial and rules are at best an irrelevant hindrance.

Roden argued that we should challenge any references to “best practices”. We should talk about good practices instead, and know when and when not to apply them. He argued that imposing “best practice” on experienced professionals stifles creativity, frustrates the best people and can prompt them to leave.

However, the problem is not simply a matter of “rules for beginners, no rules for experts”. Rules can have unintended consequences, even for beginners.

Chris Atherton, a senior lecturer in psychology at the University of Central Lancashire, made an interesting point in a general, anecdotal discussion about the ways in which learners relate to rules.

“The trouble with rules is that people cling to them for reassurance, and what was originally intended as a guideline quickly becomes a noose.

The issue of rules being constrictive or restrictive to experienced professionals is a really interesting one, because I also see it at the opposite end of the scale, among beginners.”

Obviously the key difference is that beginners do need some kind of structural scaffold or support; but I think we often fail to acknowledge that the nature of that early support can seriously constrain the possibilities apparent to a beginner, and restrict their later development.”

The issue of whether rules can hinder the development of beginners has significant implications for the way our profession structures its processes. Looking back at work I did at the turn of the decade improving testing processes for an organisation that was aiming for CMMI level 3, I worry about the effect it had.

Independent professional testing was a novelty for this client and the testers were very inexperienced. We did the job to the best of our ability at the time, and our processes were certainly considered best practice by my employers and the client.

The trouble is that people can learn, change and grow faster than strict processes adapt. A year later and I’d have done it better. Two years later, it would have been different and better, and so on.

Meanwhile, the testers would have been gaining in experience and confidence, but the processes I left behind were set in tablets of stone.

As Ben Simo put it; “if an organisation is at a level less than the intent of level 5, CMM seems to often lock in ignorance that existed when the process was created”.

CMMI has its merits but also has dangers. Continuous process improvement is at its heart, but these are incremental advances and refinements in response to analysis of metrics.

Step changes or significant changes in response to a new problem don’t fit comfortably with that approach. Beginners advance from the first stage of the Dreyfus Model, but the context they come to know and accept is one of rigid processes and rules.

Rules, mandatory standards and inflexible processes can hinder the development of beginners. Rigid standards don’t promote quality. They can have the opposite effect if they keep testers in the kindergarten.

IEEE829 and the motivation behind documentation

One could argue that standards do not have to be mandatory. Software developers are pragmatic, and understand when standards should be mandatory and when they should be discretionary. That is true, but the problem is that the word “standards” strongly implies compulsion. That is the interpretation that most outsiders would place on the word.

People do act on the assumption that the standard should be mandatory, and then regard non-compliance as a failure, deviation or problem. These people include accountants and lawyers, and perhaps most significantly, auditors.

My particular concern is the effect of IEEE 829 testing documentation standard. I wonder if much more than 1% of testers have ever seen a copy of the standard. However, much of its content is very familiar, and its influence is pervasive.

IEEE 829 is a good document with much valuable material in it. It has excellent templates, which provide great examples of how to meticulously document a project.

Or at least they’re great examples of meticulous documentation if that is the right approach for the project. That of course is the question that has to be asked. What is the right approach? Too often the existence of a detailed documentation standard is taken as sufficient justification for detailed documentation.

I’m going to run through two objections to detailed documentation. They are related, but one refers to design and the other to testing. It could be argued that both have their roots in psychology as much as IT.

I believe that the fixation of many projects on documentation, and the highly dubious assumption that quality and planning are synonymous with detailed documentation, have their roots in the structured methods that dominated software development for so long.

These methods were built on the assumption that software development was an engineering discipline, rather than a creative process, and that greater quality and certainty in the development process could be achieved only through engineering style rigour and structure.

Paul Ward, one of the leading developers of structured methods, wrote a series of articles [2] on the history of structured methods, which admitted that they were neither based on empirical research nor subjected to peer-review.

Two other proponents of structured methods, Larry Constantine and Ed Yourdon, admitted that the early investigations were no more than informal “noon-hour” critiques” [3].

Fitzgerald, Russo and Stolterman gave a brief history of structured methods in their book “Information Systems Development – Methods in Action” [4] and concluded that “the authors relied on intuition rather than real-world experience that the techniques would work”.

One of the main problem areas for structured methods was the leap from the requirements to the design. Fitzgerald et al wrote that “the creation of hierarchical structure charts from data flow diagrams is poorly defined, thus causing the design to be loosely coupled to the results of the analysis. Coad & Yourdon [5] label this shift as a ‘Grand Canyon’ due to its fundamental discontinuity.”

The solution to this discontinuity, according to the advocates of structured methods, was an avalanche of documentation to help analysts to crawl carefully from the current physical system, through the current logical system to a future logical system and finally a future physical system.

Not surprisingly, given the massive documentation overhead, and developers’ propensity to pragmatically tailor and trim formal methods, this full process was seldom followed. What was actually done was more informal, intuitive, and opaque to outsiders.

An interesting strand of research was pursued by Human Computer Interface academics such as Curtis, Iscoe and Krasner [6], and Robbins, Hilbert and Redmiles [7].

They attempted to identify the mental processes followed by successful software designers when building designs. Their conclusion was that they did so using a high-speed, iterative process; repeatedly building, proving and refining mental simulations of how the system might work.

Unsuccessful designers couldn’t conceive working simulations, and fixed on designs whose effectiveness they couldn’t test till they’d been built.

Curtis et al wrote;

Exceptional designers were extremely familiar with the application domain. Their crucial contribution was their ability to map between the behavior required of the application system and the computational structures that implemented this behavior.

In particular, they envisioned how the design would generate the system behavior customers expected, even under exceptional circumstances.”

Robbins et al stressed the importance of iteration;

“The cognitive theory of reflection-in-action observes that designers of complex systems do not conceive a design fully-formed. Instead, they must construct a partial design, evaluate, reflect on, and revise it, until they are ready to extend it further.”

The eminent US software pioneer Robert Glass discussed these studies in his book “Software Conflict 2.0” [8] and observed that;

“people who are not very good at design … tend to build representations of a design rather than models; they are then unable to perform simulation runs; and the result is they invent and are stuck with inadequate design solutions.”

These studies fatally undermine the argument that linear and documentation driven processes are necessary for a quality product and that more flexible, light-weight documentation approaches are irresponsible.

Flexibility and intuition are vital to developers. Heavyweight documentation can waste time and suffocate staff if used when there is no need.

Ironically, it was the heavyweight approach that was founded on guesswork and intuition, and the lightweight approach that has sound conceptual underpinnings.

The lessons of the HCI academics have obvious implications for exploratory testing, which again is rooted in psychology as much as in IT. In particular, the finding by Curtis et al that “exceptional designers were extremely familiar with the application domain” takes us to the heart of exploratory testing.

What matters is not extensive documentation of test plans and scripts, but deep knowledge of the application. These need not be mutually exclusive, but on high-pressure, time-constrained projects it can be hard to do both.

Itkonen, Mäntylä and Lassenius conducted a fascinating experiment at the University of Helsinki in 2007 in which they tried to compare the effectiveness of exploratory testing and test case based testing. [9]

Their findings were that test case testing was no more effective in finding defects. The defects were a mixture of native defects in the application and defects seeded by the researchers. Defects were categorised according to the ease with which they could be found. Defects were also assigned to one of eight defect types (performance, usability etc.).

Exploratory testing scored better for defects at all four levels of “ease of detection”, and in 6 out of the 8 defect type categories. The differences were not considered statistically significant, but it is interesting that exploratory testing had the slight edge given that conventional wisdom for many years was that heavily documented scripting was essential for effective testing.

However, the really significant finding, which the researchers surprisingly did not make great play of, was that the exploratory testing results were achieved with 18% of the effort of the test case testing.

The exploratory testing required 1.5 hours per tester, and the test case testing required an average of 8.5 hours (7 hours preparation and 1.5 hours testing).

It is possible to criticise the methods of the researchers, particularly their use of students taking a course in software testing, rather than professionals experienced in applying the techniques they were using.

However, exploratory testing has often been presumed to be suitable only for experienced testers, with scripted, test case based testing being more appropriate for the less experienced.

The methods followed by the Helsinki researchers might have been expected to bias the results in favour of test case testing. Therefore, the finding that exploratory testing is at least as effective as test case testing with a fraction of the effort should make proponents of heavily documented test planning pause to reconsider whether it is always appropriate.

Documentation per se does not produce quality. Quality is not necessarily dependent on documentation. Sometimes they can be in conflict.

Firstly, the emphasis on producing the documentation can be a major distraction for test managers. Most of their effort goes into producing, refining and updating plans that often bear little relation to reality.

Meanwhile the team are working hard firming up detailed test cases based on an imperfect and possibly outdated understanding of the application. While the application is undergoing the early stages of testing, with consequent fixes and changes, detailed test plans for the later stages are being built on shifting sand.

You may think that is being too cynical and negative, and that testers will be able to produce useful test cases based on a correct understanding of the system as it is supposed to be delivered to the testing stage in question. However, even if that is so, the Helsinki study shows that this is not a necessary condition for effective testing.

Further, if similar results can be achieved with less than 20% of the effort, how much more could be achieved if the testers were freed from the documentation drudgery in order to carry out more imaginative and proactive testing during the earlier stages of development?

Susan Land’s fourth justification for standards (see start of article) has interesting implications.

Standards “can be attached to or referenced by contracts”. That is certainly true. However, the danger of detailed templates in the form of a standard is that organisations tailor their development practices to the templates rather than the other way round.

If the lawyers fasten onto the standard and write its content into the contract then documentation can become an end and not just a means to an end.

Documentation becomes a “deliverable”. The dreaded phrase “work product” is used, as if the documentation output is a product of similar value to the software.

In truth, sometimes it is more valuable if the payments are staged under the terms of the contract, and dependent on the production of satisfactory documentation.

I have seen triumphant announcements of “success” following approval of “work products” with the consequent release of payment to the supplier when I have known the underlying project to be in a state of chaos.

Formal, traditional methods attempt to represent a highly complex, even chaotic, process in a defined, repeatable model. These methods often bear only vague similarities to what developers have to do to craft applications.

The end product is usually poor quality, late and over budget. Any review of the development will find constant deviations from the mandated method.

The suppliers, and defenders, of the method can then breathe a sigh of relief. The sacred method was not followed. It was the team’s fault. If only they’d done it by the book! The possibility that the developers’ and testers’ apparent sins were the only reason anything was produced at all is never considered.

What about the auditors?

Adopting standards like IEEE 829 without sufficient thought causes real problems. If the standard doesn’t reflect what really has to be done to bring the project to a successful conclusion then mandated tasks or documents may be ignored or skimped on, with the result that a subsequent review or audit reports on a failure to comply.

An alternative danger is that testers do comply when there is no need, and put too much effort into the wrong things. Often testers arrive late on the project. Sometimes the emphasis is on catching up with plans and documentation that are of dubious value, and are not an effective use of the limited resources and time.

However, if the contract requires it, or if there is a fear of the consequences of an audit, then it could be rational to assign valuable staff to unproductive tasks.

Sadly, auditors are often portrayed as corporate bogey-men. It is assumed that they will conduct audits by following ticklists, with simplistic questions that require yes/no answers. “Have you done x to y, yes or no”.

If the auditees start answering “No, but …” they would be cut off with “So, it’s no”.

I have seen that style of auditing. It is unprofessional and organisations that tolerate it have deeper problems than unskilled, poorly trained auditors. It is senior management that creates the environment in which the ticklist approach thrives. However, I don’t believe it is common. Unfortunately people often assume that this style of auditing is the norm.

IT audit is an interesting example of a job that looks extremely easy at first sight, but is actually very difficult when you get into it.

It is very easy for an inexperienced auditor to do what appears to be a decent job. At least it looks competent to everyone except experienced auditors and those who really understand the area under review.

If auditors are to add value they have to be able to use their judgement, and that has to be based on their own skills and experience as well as formal standards.

They have to be able to analyse a situation and evaluate whether the risks have been identified and whether the controls are appropriate to the level of risk.

It is very difficult to find the right line and you need good experienced auditors to do that. I believe that ideally IT auditors should come from an IT background so that they do understand what is going on; poachers turned gamekeepers if you like.

Too often testers assume that they know what auditors expect, and they do not speak directly to the auditors or check exactly what professional auditing consists of.

They assume that auditors expect to see detailed documentation of every stage, without consideration of whether it truly adds value, promotes quality or helps to manage the risk.

Professional auditors take a constructive and pragmatic approach and can help testers. I want to help testers understand that. I used to find it frustrating when I worked as an IT auditor when I found that people had wasted time on unnecessary and unhelpful actions on the assumption that “the auditors require it”.

Kanwal Mookhey, an IT auditor and founder of NII consulting, wrote an interesting article for the Internal Auditor magazine of May 2008 [10] about auditing IT project management.

He described the checking that auditors should carry out at each stage of a project. He made no mention of the need to see documentation of detailed test plans and scripts whereas he did emphasize the need for early testing.

Kanwal told me.

“I would agree that auditors are – or should be – more inclined to see comprehensive testing, rather than comprehensive test documentation.

Documentation of test results is another matter of course. As an auditor, I would be more keen to know that a broad-based testing manual exists, and that for the system in question, key risks and controls identified during the design phase have been tested for. The test results would provide a higher degree of assurance than exhaustive test plans.”

One of the most significant developments in the field of IT governance in the last few decades has been the US 2002 Sarbanes-Oxley Act, which imposed new standards of reporting, auditing and control for US companies. It has had massive worldwide influence because it applies to the foreign subsidiaries of US companies, and foreign companies that are listed on the US stock exchanges.

The act attracted considerable criticism for the additional overheads it imposed on companies, duplicating existing controls and imposing new ones of dubious value.

Unfortunately, the response to Sarbanes-Oxley verged on the hysterical, with companies, and unfortunately some auditors, reading more into the legislation than a calmer reading could justify. The assumption was that every process and activity should be tied down and documented in great detail.

However, not even Sarbanes-Oxley, supposedly the sacred text of extreme documentation, requires detailed documentation of test plans or scripts. That may be how some people misinterpret the act. It is neither mandated by the act nor recommended in the guidance documents issued by the Institute of Internal Auditors [11] and the Information Systems Audit & Control Association [12].

If anyone tries to justify extensive documentation by telling you that “the auditors will expect it”, call their bluff. Go and speak to the auditors. Explain that what you are doing is planned, responsible and will have sufficient documentation of the test results.

Documentation is never required “for the auditors”. If it is required it is because it is needed to manage the project, or it is a requirement of the project that has to be justified like any other requirement. That is certainly true of safety critical applications, or applications related to pharmaceutical development and manufacture. It is not true in all cases.

IEEE 829 and other standards do have value, but in my opinion their value is not as standards! They do contain some good advice and the fruits of vast experience. However, they should be guidelines to help the inexperienced, and memory joggers for the more experienced.

I hope this article has made people think about whether mandatory standards are appropriate for software development and testing, and whether detailed documentation in the style of IEEE 829 is always needed. I hope that I have provided some arguments and evidence that will help testers persuade others of the need to give testers the freedom to leave the kindergarten and grow as professionals.

References

[1] Land, S. (2005). “Jumpstart CMM-CMMI Software Process Improvements – using IEEE software engineering standards”, Wiley.

[2a] Ward, P. (1991). “The evolution of structured analysis: Part 1 – the early years”. American Programmer, vol 4, issue 11, 1991. pp4-16.

[2b] Ward, P. (1992). “The evolution of structured analysis: Part 2 – maturity and its problems”. American Programmer, vol 5, issue 4, 1992. pp18-29.

[2c] Ward, P. (1992). “The evolution of structured analysis: Part 3 – spin offs, mergers and acquisitions”. American Programmer, vol 5, issue 9, 1992. pp41-53.

[3] Yourdon, E., Constantine, L. (1977) “Structured Design”. Yourdon Press, New York.

[4] Fitzgerald B., Russo N., Stolterman, E. (2002). “Information Systems Development – Methods in Action”, McGraw Hill.

[5] Coad, P., Yourdon, E. (1991). “Object-Oriented Analysis”, 2nd edition. Yourdon Press.

[6] Curtis, B., Iscoe, N., Krasner, H. (1988). “A field study of the software design process for large systems” (NB PDF download). Communications of the ACM, Volume 31, Issue 11 (November 1988), pp1268-1287.

[7] Robbins, J., Hilbert, D., Redmiles, D. (1998). “Extending Design Environments to Software Architecture Design” (NB PDF download). Automated Software Engineering, Vol. 5, No. 3, July 1998, pp261-290.

[8] Glass, R. (2006). “Software Conflict 2.0: The Art and Science of Software Engineering” Developer Dot Star Books.

[9a] Itkonen, J., Mäntylä, M., Lassenius C., (2007). “Defect detection efficiency – test case based vs exploratory testing”. First International Symposium on Empirical Software Engineering and Measurement. (Payment required).

[9b] Itkonen, J. (2008). “Do test cases really matter? An experiment comparing test case based and exploratory testing”.

[10] Mookhey, K. (2008). “Auditing IT Project Management”. Internal Auditor, May 2008, the Institute of Internal Auditors.

[11] The Institute of Internal Auditors (2008). “Sarbanes-Oxley Section 404: A Guide for Management by Internal Controls Practitioners”.

[12] Information Systems Audit and Control Association (2006). “IT Control Objectives for Sarbanes-Oxley 2nd Edition”.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.