Software Estimation on a Product Team: Part 3
Wednesday November 25, 2020
A Primer on Story Point Estimation
Intuition in software engineering only gets us so far. We can develop reliable intuitions around software architecture design, root causes of bugs, and even user experience issues. However, estimates often betray our intuition. Under-estimation is more common than over-estimation. Even backlog items with the most meticulously-researched requirements are vulnerable to delays once coding begins. What happens then if we go against our intuitions when developing our software estimation practices?
When asked for an estimate, our first intuition is to respond with a time duration. This article explores an alternative: story point estimation. I will shed light on its origins and how the practice has evolved over two decades. I will also cover the key characteristics that make it an effective and reliable method for estimating work items on software product teams.
Origin of Story Points
A response to the confusion of duration-based estimates
Software estimation using story points, or simply “story-pointing”, is by far the most common estimation sizing approach that I’ve observed and used. The term “story points” has stuck, even though it’s used for estimating far more than just “user stories”. Story-pointing defects, technical debt, and many other backlog items is a common practice.
Ron Jeffries first coined the phrase, “story point”, saying:
Stories were originally estimated in time: the time it would take to implement the story. We quickly went to what we called “Ideal Days”, which was informally described as how long it would take a pair to do it if the bastards would just leave you alone. We multiplied Ideal Days by a “load factor” to convert to actual implementation time. Load factor tended to be about three: three real days to get an Ideal Day’s work done.
We spoke of our estimates in days, usually leaving “ideal” out. The result was that our stakeholders were often confused by how it could keep taking three days to get a day’s work done, or, looking at the other side of the coin, why we couldn’t do 50 “days” of work in three weeks.
So, as I recall it, we started calling our “ideal days” just “points”. So a story would be estimated at three points, which meant it would take about nine days to complete. And we really only used the points to decide how much work to take into an iteration anyway, so if we said it was about 20 points, no one really objected.
In the case of Ron Jeffries and his team, story points served to obfuscate the internal details of how the product team operated. More generally, this obfuscation is a means of discouraging stakeholders who are not members of the product team from co-opting the estimation for other purposes, such as comparing product teams against one other.
These tactics, while dubious at first glance, redeem themselves when paired with iterative and reliable value delivery by the product team. As long as the value delivered is communicated effectively, the stakeholders eventually lose interest in the product team’s estimation strategy. Taken to the extreme, Woody Zuill describes how a regular communication of delivered value, when done frequently enough, can eliminate the need to estimate altogether.
While compelling, Woody Zuill’s #NoEstimates approach is simply not a fit for all product teams. Fortunately, story-pointing has evolved so that it now engenders other benefits, particularly when product teams estimate in story points directly (as opposed to the original approach of multiplying a time duration estimate by a load factor).
Why Story Points Stuck
An abstraction that can consider multiple factors
Software engineers understand the power of abstractions, and consequently I have found they become early advocates for adoption of story points. By not thinking in duration, the team is not bogged down with specifics.
For example, there may be some doubt as to how a certain backlog item is to be implemented, and consequently, will require some time spent on preliminary investigation.
While aligning on the time duration estimate, the team can easily get mired in a debate on how exactly this investigation should be carried out—who needs to be involved, what their schedules look like, etc. Instead of getting stuck on these details, the team adds a few “points” to the estimate to reflect this doubt on the implementation details. Any preliminary discussion that does occur gets noted on the backlog item itself. How many points to add is determined by comparison to other backlog items—in our specific example, if the amount of investigation is not significantly more than what is typical for any backlog item, no points would be added.
Doubt (also referred to as risk or uncertainty) is not the only factor that affects the story points estimate. The other two are Effort and Complexity. In this context, Effort is defined as the amount of code changes needed; high effort backlog items involve changes to many lines of code. On the other hand, a code change of high Complexity can be the introduction of, or a change to, algorithmic code. Another example of high Complexity is when the implementation must consider a large number of use cases (and thus involves a large amount of branching logic).
To recap: Doubt, Effort, and Complexity are the three main factors that can affect an estimate. If these factors are considered when constructing a time duration estimate, it’s easy to get hung up on details and end up with an inefficient estimation process. On the other hand, using story points encourages the team to consider the factors in a more abstract manner, and emphasizes focus on their impact by comparison against other backlog items.
Mike Cohn explores these three factors further in his article, “What Are Story Points?”.
Evolving Story Points
Leveraging relative sizing by using the Fibonacci sequence
Collaborative estimation discussions have a tendency to veer into implementation specifics. The temptation to specify and plan every last detail of implementation is an incessant one; many of the process details described here act as countermeasures that uphold the fourth Agile value: responding to change over following a plan. A lightweight estimation process is both quick and sparse on details. This results in a lower sunk cost, and a higher flexibility for redefining how the backlog item should be implemented.
As teams adopt story-pointing, they realize the importance of arriving at values collaboratively. Collaboration slows down decision making, however, and teams often struggle with reaching consensus on larger-sized items.
This phenomenon is known as Weber’s Law. Mike Cohn describes it succinctly using a simple analogy:
Imagine being handed two weights—one is one kilogram (2.2 pounds) and the other is two kilograms (4.4 pounds). With one in each hand but not able to see which is which, you can probably distinguish them. The two kg weight will feel noticeably heavier.
Imagine instead being handed a 20kg weight and a 21kg weight. They are the same one kg difference as the one and two kg weights. But you would have a much harder time identifying the heavier of the two weights.
This is due to Weber’s Law. Weber’s Law states that the difference we can identify between objects is given by a percentage.
Given Weber’s Law, it now becomes a matter of weighing cost against benefit. Considering the cost of everyone’s time, what’s the benefit of getting precise over an estimate we already know is very large? It may in fact be so large that the item cannot be worked on in its current state, and instead must first be “sliced” into multiple items?
Mike Cohn sidestepped the potential time sink resulting from Weber’s law by introducing the Fibonacci sequence. This means that estimators are constrained to selecting a value from the sequence when formulating their estimate (as opposed to using a linear scale).
This approach results in consensus being reached much more easily and quickly, as large items are not estimated to a high degree of precision. Estimators who would otherwise have slightly-differing estimates are now constrained into the same numerical estimate as a result of this sequence.
Lastly, the Fibonacci sequence was modified to further underscore the low precision of higher estimates. Specifically, numbers over 13 were rounded up to one significant digit, and 0.5 was introduced to provide estimators with a greater opportunity for precision on the lower end of the scale.
How does story-pointing collaboration look like in practice? How is estimation consensus reached reliably and efficiently? These questions will be answered in a future article, where various estimation techniques are explored.
It is my belief that one is afforded a greater ability to grok a practice when a full appreciation of its origin and evolution can be understood. This article is written to shed light on how story point estimation came about, how it evolved, and what characteristics make it so effective as an estimation approach.
Is story-pointing a panacea for the woes of software estimation? Definitely not. Should the approach be dogmatically followed or zealously avoided? No to both. To quote the Tao Te Ching:
Those who cling to their work will create nothing that endures.
Use story points if they’re useful, otherwise adapt or abandon them. The point is not to blindly stick to a process, but to build better products.
Addendum: Upon reviewing the article since its publishing, Ron Jeffries offered clarification via Twitter on his team’s use of story points:
purpose was absolutely not to obfuscate, it made things /more/ clear by removing the confusion between “days” and “ideal days”. everyone knew how many points we chose to undertake, and how many we completed. our progress was entirely visible to all.