It always takes longer! (part 1)

Assume you’re going on holiday. Your flight is at 9:30 on a Friday morning. You live 60 km from the airport, and you have to check in no later than 8:45.

You’ve taken many flights in the past, often at the same time. You know it will take probably about 75 minutes to cover the distance at the busiest time of the day. You know how long it will take to get parked and into the terminal. You allow a bit of padding for safety, decide when you’re going to leave home, and you catch your flight.

What you never do is say, “well, if there are no problems and I’m not held up I can do it in 50 minutes, so I’ll set off at 7:55”.

I’m also very confident that you never say, “I can do the 50 km to the end of the motorway at 120 kph, so that’s 25 minutes, then the last 10km at 60 kph, so that’s another 10 minutes, and if I park in the middle of the car park I will be 150 metres from the check in desk, a distance I can cover in 2 minutes, so if I allow 10% contingency I can leave home at 8:04 and I will be fine.”

That would be insane. You use your experience, you know where things can go wrong and you catch your flight. You always catch your flight! That’s the priority, not impressing people by shaving time off your estimated journey to the airport, or demonstrating your mastery of the trip by building up a pointlessly detailed estimate.

Ok, in the airport case you knew the route and the problems, so you’d obviously go with your direct experience. But would you try doing a detailed estimate for the separate components of the return journey from your hotel to the airport? Would you plan optimistically?

Of course not. You’d rely on your past experience of getting to airports in general and you’d plan the trip cautiously because you’d want to be extremely confident that you’d catch your flight.

So why, when we size IT work, do we habitually go for estimates that are based on best case scenarios, rather than bitter experience, often constructing detailed estimates, full of meaningless spurious attempts at accuracy?

We pile up optimistic guesswork on top of deluded nonsense, then pretend that it has some validity just because it consists of an impressive amount of detail. If it ever works it’s pure luck.

We kid ourselves that we have learnt from past mistakes, that we won’t repeat them, and that any unavoidable problems we hit were bad luck, someone else’s fault or were simply unique to that particular project.

For some reason we find it very difficult to apply the results of past experience to estimating future work. There are various related reasons for this.

We value the specific over the general

Firstly, we value detailed knowledge of the specific task we’re looking at over experience that we, and others, have had with similar work. Logically that’s not an explanation. All I have done is rephrase the problem, but I’ve done so in a way that helps explain why we kid ourselves.

Applying our detailed knowledge maximises the relevant information that we are feeding into the equation. Excluding the results of past experience is rational, but only if one makes completely unrealistic assumptions about risk, probability and our ability to dodge problems that always hit projects.

Kahneman and Tversky (PDF, opens in new window) illustrated this point by discussing how we would predict the life expectancy of an individual. We can call on singular information and distributional information.

Singular information applies specifically to that person. Distributional information ignores the individual and is concerned with the distribution of outcomes; how long human beings live for.

Most people would accept that the most accurate prediction would require both sorts of information. However, we have a tendency to shy away from even the most obviously relevant distributional data.

Kahneman and Tversky found that the more familiar we are with a subject the more likely we are to focus on the singular and to ignore distributional data. The result is that the distribution of predicted outcomes is significantly different from the distribution of real outcomes. Worse, the variation is firmly in one direction and amounts to a bias.

Obviously that bias is towards optimism. We might know that things tend to go wrong, but we don’t know exactly what form the problems will take. We only know for sure how long things ought to take, and we fail lamentably to build in adequate contingency or to acknowledge the probability that life will not be perfect.

We are incorrigibly optimistic

This is the second reason for our inability to use past experience wisely. Regardless of the preference for estimating using singular rather than general information, we are inherently optimistic, and we wildly overestimate the probability of our estimates being accurate.

Or rather, we are hopeless at estimating accurately how long we will take to do something. We are far better at estimating how long someone else will take. We are more likely to look dispassionately at their past performance and not kid ourselves about how long the new task could take in ideal circumstances.

We revere confidence

This ability to apply careful judgement to others people’s work should be valuable in estimating software development projects, and it should help compensate for our preference for the singular over the general. However, any benefit is largely swamped by the effects of the third reason for our failure to learn from experience.

I believe this third factor is the most pernicious of all. It is simply that we value confidence over realism.

I remember a senior IT manager telling me that it was far better to plunge confidently into a project with an impossible deadline, and then emerge sweaty, exhausted and late than it was to tell users at the start how long it would really take. The users just wouldn’t accept realism.

The first time I had to estimate a development project I was concerned that we were using an unproven mixture of tools and technology, and we were applying them to an application for which they’d never been used at that site. I estimated cautiously, and was briskly told that my estimate was far too high.

“You can only estimate for what you know will happen, plus some contingency. You can’t expect to get approval for work to cover problems you don’t know you’re going to have”.

The estimate was halved, and I brought the project in 100% over budget. By a delicious coincidence my initial guess was accurate. I received plenty of praise for my achievement. There was no criticism for being wildly over budget, and that’s certainly not because anyone might have remembered the original estimate!

In such an environment managers who do estimate accurately are likely to get a reputation for “padding their estimates”, not for being right!

The lesson I learned was that an estimate was the highest figure that was politically acceptable. Not commercially acceptable, but the highest figure you could get away with, almost invariably less than the real one.

I’ve got extensive personal experience of this phenomenon, but the evidence isn’t just anecdotal.

In a fascinating study by Jørgensen et al 37 software developers were asked to assess the competence of two developers based on their ability to estimate and deliver work.

Both produced identical estimates and results. However, one was much more confident and predicted narrower margins of error. The other was more realistic about the level of uncertainty and predicted wider margins.

The subjects in the study rated the confident developer as being more competent than the realistic one. In effect, confidence was regarded as a sign of competence. Even when the actual result was outside the confident developer’s margins of error and within the realistic one’s margins it was still the confident developer who was rated higher.

A comment from one of the subjects was revealing.

“I feel that if I estimate very wide (margins of error), this will be interpreted as a total lack of competence and has no informative value to the project manager. I’d rather have fewer actual values inside the minimum-maximum interval, than providing meaningless, wide (margins)”.

This study looked specifically at confidence, rather than optimism, but the constant danger in software development is, of course, over-confident optimism rather than pessimism.

Catch 22? Or catching that flight?

These are consistent biases. The problem is not simply that we get estimates wrong, but we approach them in a way that ensures we consistently err on the side of optimism.

I wonder if there’s a Catch 22 that keeps us trapped in a cycle of poor, over-optimistic estimating.

The less we know about a problem area the less we will know about the complexities and what could go wrong. So we are likely to to be over-confident about our estimates.

But once we have learned more about the problem area, the more tempted we will be to focus on the specifics of the task and ignore the distribution of outcomes, i.e. to ignore the lessons of experience. So we are still likely to be over-confident about our estimates.

There’s a huge exception, however. Remember the analogy I was using at the start of this piece? About getting to the airport to go on holiday.

We always catch our flight. It’s not simply that estimating accurately is easier. It really matters to us. I mean really matters.

Does it really matter as much in software development? I don’t think it does. Just consider the research suggesting we prefer to seem confident rather than right. Perhaps we sometimes need to do a check.

Do we really want to catch that flight? Or do we just want to seem confident?

I’ll expand on possible remedies in my next post. In the meantime if you want to delve further into this area here are a couple of interesting links.

Tversky and Kahneman look at how people have difficulty making sound judgements about probability based on their direct experience.

Buehler et al’s study of our tendency to underestimate task completion times.