But how many test cases?

One of the aspects of traditional test management that has always troubled me has been the obsession with counting test cases at some places.

If you don’t understand what’s going on, if you don’t really understand what software testing is all about then start counting, and you’ve immediately got some sort of grip on well, er something.

100 is bigger than 10. 10,000 is pretty impressive, and 100,000 is satisfyingly humongous. You might not really understand what’s happening, but when you face up to senior management and tell them that you’re managing thousands of things, well, they’ve got to be impressed.

But counting test cases? Is that useful? Well it is if it’s useful for managing the problem, but it’s not an end in itself. It’s nonsense to expect all testing to be measurable by the number of test cases. It can even be a damaging distraction.

I was once interviewed for a test management role. I was asked about my most challenging testing problem. I replied that it was working as a Y2K test manager. It seemed like a good answer. It was a huge task. There was nowhere near enough time to do all the testing we’d have liked. The dates couldn’t be pushed back, and we were starting too late. We had to take a ruthless risk based approach, triaging some applications out of sight. They’d have to run over the millennium and we’d wait and see what would happen. The cost of testing, and the limited damage if the applications failed meant we had to forget about them and put our effort where it would count.

What seemed like a good answer was really a big mistake! “How many test cases did you have?”

I was surprised. The question made no sense. The applications were insurance management information systems. There was an on-line front end, but technically that was pretty simple. My responsibility was the huge and fearsomely complex back end processing. The strategy was to get all the code fixed, then hit the most complex and date sensitive areas hard in testing.

We were looking at batch files. We had a perfect test oracle. We ran the batch suites with 1996 data, and the non-Y2K compliant code. We then date shifted the input data forward to 2000 (the next leap year) and ran the Y2K compliant code. The 2000 output files should be identical to the 1996 output files in everything except the years. It was much more complex than that, but in principle it was pretty simple.

There was a huge amount of preparation, ensuring all the reference tables were set up correctly, allowing for any hard coded dates in the programs, setting up the data and test jobs. The investigation of the inevitable discrepancies was painstaking and time-consuming, but it worked. We got through the planned high-priority testing on time, and there were no serious incidents over the millennium. I explained all this. We didn’t have “test cases”.

My interviewers asked me to try and put a figure on how many test cases we would have had, as if it were merely a matter of terminology. Even if you multiplied all the files we were checking by each time frame in which we were running tests, the total number would have been under a hundred. You could have called them “test cases” but it wouldn’t have meant anything. I explained this.

“So, under a hundred”. Pens scribbled furiously, and I seethed as I saw all our challenging, sometimes quite imaginative, and ultimately successful testing being reduced to “under a hundred” by people who hadn’t a clue.

I wasn’t hired. It didn’t bother me. I’d lost interest. I could have just lied and said, “oh, about 10,000 – yeah it was pretty big”, but I wouldn’t have wanted to work there.

I’ve seen the obsession with counting test cases taken to extremes. The need to count test cases created huge pressure on one project to execute testing in a way that facilitated reporting, not testing.

It was another big batch financial system, and there were some strong similarities with Y2K. However, this time we had to have “test cases” we could count, and report progress on. We had to “manage the stakeholders”. It was important that test execution should show that we’d got through 10% of the test cases in only 9% of the testing window, and that everything was just dandy.

Sadly, reports like that meant absolutely nothing. Easy test cases got passed quickly, complex ones took far longer and we were running badly late – according to the progress reports and the metrics. The trouble was that the easy test cases were insignificant. The complex ones were what counted and there were many inter-dependencies between them. The testers were finding out more and more about the application and the data, and if all went satisfactorily there would be a rush of test cases getting cleared at the end, as eventually happened.

The reaction of senior management to the daily progress reports was frantic concern. We weren’t getting through the test cases. That was all that mattered. No amount of explanation made a difference. I was spending most of my time on reports, explanations and meetings; very little with the testers. Management thought that we were politically naïve, and didn’t understand the need to keep the stakeholders happy. Bizarrely, the people who knew almost nothing about the work, or its complexities, thought that we were out of touch with reality.

Reality for them was managing the process, counting the test cases, playing the organisational game. “The quality of the application? Whoa, guys! You testers are missing the point. How many test cases have you got?”

Advertisements

40 thoughts on “But how many test cases?

  1. Good post, James. It worries me that there is still an obsession with quantifying things that are really unquantifiable. “Test cases”, “Lines of Code”, “Defects per Module”, etc… can be presented as a list of numbers – sure – but without context they are meaningless.

    Your example of the Y2K project is a great one in this regard because although you only describe a few steps that you went through, each one is highly complex and significant to the overall success of the project. If that were understood I don’t think we would be spending as much time gathering ‘statistics’ (I use the term loosely). Sadly, in my limited experience, the majority of the people asking for this information don’t even try to understand.

    Thanks for another good and thought-provoking post.

    Stephen

    • Thanks again Stephen. The Y2K project was unusual in many respects, but then an important aspect of test management is that you apply a testing approach that is relevant to the problem. We were just doing what seemed appropriate. A highly unusual aspect of the exercise I described is that I, as the test manager, was the only professional tester. The test team consisted of seconded developers. That made sense because each time a discrepancy was detected the testers would investigate it themselves, which entailed trawling through code, data, JCL, tool settings, database and reference tables to isolate the reason, and then re-running individual modules to confirm their finding. It was highly technical. Fortunately I was able to do that too. Being involved right from agreeing the strategy with client management through to dissecting the code etc was great.

      It wasn’t conventional ISEB Foundation style testing, but it was real testing. I was simultaneously running a more conventional team that was Y2K testing a couple of on-line applications. We had normal testers for that, but even there we weren’t bothering with test cases. Again, we had an excellent test oracle because we ran through a whole load of scenarios in the current timeframe, date shifted the data, ran through the same scenarios in the future, and compared the results. Counting test cases for these applications would have simply given a silly total of a couple of dozen test cases per application. Although I was responsible for that testing I delegated the active involvement to my deputy and I concentrated on the really tricky and interesting testing I described above. It was a busy, busy time!

  2. Hi James, Really interesting post. I identified with it very quickly (probably like lots of others.)

    I started writing a comment – but it turned into a whole post in itself, here, which I’m going to be building on later.

    In summary, yes the manager is looking for something “easy” to get a grip on – what could be easier than numbers? The problem is when they become detached from context of course – their meaning – What the numbers are saying and what they’re not saying – the silent evidence!

  3. Hi james,

    Awesome blog post. I’ve seen the very same thing in my time also. It’s scary to think that big commercial decisions are being based around a metric that makes no sense.

    I’ve always struggled with measuring test cases because of many of the reasons you mention here. But I also take great care to state that the person running the test is in fact, the test. They are the ones responsible for looking around, observing and forming new ideas and tests from the actual test case. If all we were to do is run X test cases we would fail. The X test case value will always fluctuate and it doesn’t take in to account any exploration or planned exploratory testing.

    We ran X test cases. Who ran them? Why did they run them? And did we observe anything else off the back of it?

    Brilliant post
    Rob

    • Thanks Rob. The relationship is complex between the reality and the way that we choose to measure it. If you’re talking about cans of beans there’s no problem. But in testing, or anything that is complex and abstract the measure is maybe just an approximation to help us get a better understanding.

      The measure can be like a layer spread over the substance. Imagine trying to measure the flowers in a garden by draping a patchwork quilt over it all. Then count the flowers by counting the patches, and saying you’ve got 50 patches of flowers. The measure’s hindering our understanding, rather than aiding it.

      Sure, you can always describe the testing in terms of the number of test cases, or some other arbitrary metric people feel comfortable with. But it just renders the real testing invisible and creates the dangerous assumption that people know about the flowers because they can see the quilt. Even worse, there’s a danger that good creative testing will be ditched, and testing will be conducted in a way that makes it easier to count as “test cases”.

      Um, hope that makes some sort of sense. I didn’t realise this was going to get philosophical. I thought it was going to be just a rant!

  4. Yeah, its the need of reporting that creates this situations. Unfortunately the testing is created to only be measured with no relevance:
    Number of test cases, runs, scripts, reports.
    Testing is too associated with development. I would associate testing more with marketing than programming for example.

    Sebi

    • Thank you Sebi. Yes, reporting can create the problem though you can go further back and look at the organisational structure, the management style and culture.

      In the first project I referred to there was a healthy relationship between us, the suppliers, and the client. I knew my customers very well as people. We got on well, and socialised together. I knew their business and their applications. That meant reporting was far more informal and personal. I was able to say things like “it’s ok – trust me on this”, and “it’s not going to work – you’ll have to reconsider” and the client would believe me. That meant there was no pressure to reduce everything to dubious metrics for formal progress reports.

  5. Ironically, the interviewers could also have said: “Such a difficult test project and you completed it with less than a hundred test cases? Impressive, very impressive!” and they’d be just as wrong. 🙂

    Joep

    p.s.: Great ‘quote’ at the end!

  6. This obsession with test case counts definitely needs to be fought against, tooth and nail!

    I wouldn’t have wanted to work there, either.

    I made a similar point on my own blog not too long ago. There’s a concept originally from the economic sphere called “Goodhart’s Law”: When a measure becomes a target it ceases to be a good measure.

  7. Pingback: Clarity, the key for successful communication :: Software Testing and more

  8. Hello James,
    A great article you wrote. A valuable example to me about the need and useage of figures.

    It triggered me to respond on your article.

    With that posting I tried to express my thought how numbers can distract from the actual meaning. I used this blog as an example how the real meaning is lost when thinking in numbers.

    Thanks for sharing your thoughts with us.
    regards,
    Jeroen

  9. Oh, and on coverage (all links are to PDFs and open in new windows):

    Got You Covered: Excellent testing starts by questioning the mission. So, the first step when we are seeking to evaluate or enhance the quality of our test coverage is to determine for whom we’re determining coverage, and why.

    Cover or Discover: Excellent testing isn’t just about covering the “map”—it’s also about exploring the territory, which is the process by which we discover things that the map doesn’t cover.

    A Map By Any Other Name: A mapping illustrates a relationship between two things. In testing, a map might look like a road map, but it might also look like a list, a chart, a table, or a pile of stories. We can use any of these to help us think about test coverage.

    —Michael B.

    • Michael – thanks for the four links. I particularly liked the mapping one. There’s a lot to think about in there. There are some interesting ideas that I’ll have to think through and maybe return to.

  10. Interesting post James. Our management had a problem justifying our existence as they realised that the sheer number of test cases was irrelevant. They tried to generate metrics from the number of fault reports but found that this wasn’t helpful either (we were pretty much at the end of the SDLC providing validation, explorative testing)

    I don’t think it’s easy for management to explain our worth to the bean counters without the use of metrics, as useless as they can be if they’re the wrong ones

    • That’s a rather depressing comment on the “bean counters”. An intelligent response to a realisation that they’d been judging testing by flawed metrics should be to look for better ways to understand testing’s value.

      It’s surely all about risk, and spending money to buy more information about the product and the risk that launching it will entail.

      To take a simple, crude analogy, no-one would think of the process of test flying a new plane in terms of the test cases, and the number of flights.

      I think most people would understand that the test flight process was all about getting a better knowledge of the plane and understanding of the risk.

      In principle it’s not really any different for any commercial application. Sure, the consequences of screwing up a bog-standard web application are lower than sending people up in a dangerous plane, but then the costs of testing are also lower.

      If it’s worth investing money in a new application, then it’s worth finding out what you’ve got for your money before your customers start beating you up over it. Alternatively, you can just go out of business! 😉

  11. Pingback: Beta Testing  :  Repsonse: on How many test cases by James Christie : Clevertester

  12. This has been one of the biggest pain points for me too. No matter how many explanations I give, I am still expected to pull a number out of my hat for test cases. And I am ok with pulling this number just to get them off my back but really isnt it weird that they dont want to know how much time each test might take, how complex the test is or like you said who is running the test. And if I have 10 test cases test 1 and 2 is no way the same then how can we combine them under the same list and say we have 10 tests and test 1 passed while test 2 failed. Ok so … I mean really so what? I wish there was a good way to really show testing status which includes numbers but also more information that really gives a true testing picture and helps with the story telling.

  13. Thanks Shilpa. I see you are interested in metrics. I share your interest in a way of showing testing status that’s both meaningful and objective. Too often, when there’s a conflict between the two, people go for the false security of the objective but irrelevant.

    • Yes I am into data and how to present it best. Would love to have some discussions someday. I am also a new blogger. Hopefully with time I can put everything I want online.

    • I don’t understand the implied claim that numbers or quantitative measures are “objective”. Can you give me an example of an “objective” measurement in software testing?

      —Michael B.

      • Um, I left myself wide open with that hasty reply. Numbers are “objective” only in some narrow, shallow sense that would be laughed out of court by any philosophy undergraduate. Yes, you can count test cases or bugs or whatever, and claim that it’s an objective truth that there are say 100 of the creatures. But we know that’s meaningless because we’ve just assigned things into 100 arbitrary buckets, and these buckets are fabrications based on our assumptions, prejudices and fantasies. 100 doesn’t mean anything without any understanding of the basis for the figure, which is unlikely to be anything objective.

        I should just have said numerical, rather than objective. Lurking at the back of my mind was the repeated experience of management asking for quantified measures in status reports, in the fond belief that numbers made them objective.

        I stand corrected. I think my point would have been defensible if I’d written “I share your interest in a way of showing testing status that’s both meaningful and quantifiable. Too often, when there’s a choice to be made, people go for the false security of the quantifiable but irrelevant.”

        Thanks for sharing the link to James Bach’s dashboard. I hadn’t seen it before, and I’m very glad I’ve seen it now!

  14. Pingback: Tales from the trenches: Lean test case design

  15. About the beans… It sounds so familiar.
    The problem is that project often are managed by…. bean counters. They need the beans to get “insight”.

    I remember one of my projects where we were under huge pressure. The project manager was asking question after question about the status, how much test cases we had executed, and when we would be ready.

    They asked it so often that I didn’t even had time enough to be testing!
    In fact, at some point in time I realized that they didn’t even had clear what they where asking. ..

    Well, I learned a lot about it.
    Numbers can help, but you have to know the ‘audience’ and understand what they want (not always what they ask) and what they need to know.

    Gladly I was never evaluated on the number of findings/issues logged….

  16. Pingback: What testing can learn from social science – Part 5 | Magnifiant: exploring software testing

  17. I have one question, How Can I judge x no of test cases is enough to test, my application is 100 % working fine. There maybe some test cases I may miss… How to do test cases as test coverage ( not code coverage) .

    • I don’t think that the number of test cases gives any indication of whether the amount of testing that you plan is appropriate for the circumstances. The number of test cases that you execute certainly can’t give any indication that the application is “working fine”.

      If you want to demonstrate that your test coverage is appropriate and you think that test cases help you then you should read this article by Michael Bolton. Be sure to follow the links to other articles that Michael has written.

  18. Technically, if you were comparing the output for every piece of data, you had as many test cases as you had data points, so probably well over 10K

    • Yes, I could have argued that, but it would have felt like giving in. However, it would have been a total waste of time documenting them as test cases in advance. Pretending that individual fields were meaningful individual tests would have been a joke. Each field could potentially be wrong and therefore lead to a defect in each time frame, but the initial checks were done with file comparison utilities that spat out the discrepancies from the oracle file we’d created. If a field matched the predicted value then we didn’t even look at it. We had enough to do chasing down the defects. The testers were all actually experienced programmers because it was such technical work.

  19. Pingback: How Many Test Cases?

  20. Pingback: Five Blogs – 18 December 2013 | 5blogs

  21. Pingback: Quality Index : Fooled by Measurement – The Pragmatic Testing

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s