Lisp-Qix: How Using Puzzles Can Estimate Software Time and Effort

I've worked on several project so far in my career and in every one, I've been asked a seemingly simple question:

"How long will this take?"

Anyone who has answered that question knows how hard it is to estimate software development time. There are many factors in this, not knowing how fast other developers are, not understanding the scope of the problem, etc. So far every estimate I've made (until recently) has been wrong by a factor of 25% to 50%. Some under, but most over. Sometimes there were bad assumptions. Other times there was bad luck. Still other times the hardware wasn't available to develop immediately.

So how can developers predict how long code takes to make? Well first we need to create a measurement for code development. Many different attempts have been made to try and quantify code production and productivity. A few examples are the man-month, lines of code (LOC), defects per LOC, number of files, number of checkins, number of bugs fixed, and even requirements tracing. Nothing seems to work. The best metric, although it leaves much to be desired, is lines of code (LOC). Unfortunately, LOC differ in every language. Paradoxically the number of bugs seems to be tied to LOC. Microsoft has clocked roughly 10-20 *released* defects per 1000 lines of code. So should we write everything as a Perl one liner? Obviously not, but there is still a correlation and the implication is clear: smaller is better.

I'll take this even further; comments count as lines of code. They must be maintained. They must be correct. If they are wrong then serious errors occur when someone tries to change the related code. In fact, if you read a book carefully enough I would guess there are 10-20 errors (typos, grammar problems, typographical blunders, etc) per 1000 lines of text. What makes one author have half the errors of another? Experience, but usually their prose is simpler. Yes, Strunk and White's Elements of Styles is a programming manual. Sort of. Grammar is not an issue with code because, the vast majority of the time, typos are caught by a perfect proofreader: the computer.

Anyway as the comedian says, I told you that so I can tell you this: These problems are related. What's more is they are the same problem. Why can't we just type out the same old code and learn to do it perfectly? The reason is simple, because we are doing something new. What we are measuring is the number of new things a programmer has to do per lines of code! This also explains why some programmers are faster than others and also why bugs are invariant over LOC. What a programmer understands, he can do well. Typos are not usually an issue unless there are many similar names that is almost a bug in itself.

Well, what is one of the "new things"? That's a terrible name so I'm going to call them puzzles. Puzzles are things that take time to solve and also allow for a chance of failure. Puzzles can be solved in one of two ways, either the developer discovering or researching the solution. Solutions also have components to them: An amount of code, development time, failure rate, and head-space. Head-space is something the developer has to keep track of during development. When you ask something about the software and he lists caveats on its use, each of those are taking energy to maintain inside him. Looking at code through this lens hints that the variation of bugs per 1000 LOC between developers and languages could either be the language or the developer bypassing puzzles. One is design and experience and the other is the power of the language.

So far this has all been idle supposition, but here it becomes practical. We can use this to find "Power Points" in sections of code. Every programmer should intuitively understand what these points are. If you have ever seen a comment saying "This is where the real work is done" or "This is magic" then the next few lines are likely a power point. The general flow of code is usually getting some data, modifying it and then sending it somewhere. Power points can happen in any of these but I find most in the modifying or sending portions. Usually when a power point is in the sending phase it means it is writing to a tricky interface. Most, however, happen when modifying data. With this information we can identify the difficult parts of the code and implement strategies for verifying them. So we identified the tough bits of code, so what? We could ask the developer and he could have told me the same information. Well, this is also a tool for time and defect estimations and also a metric for how good the design is.

First we can estimate how frequent power points are by using .01-.02 Defects/LOC. That means there is a defect, on average, around every 5 to 10 lines of code! That also means there is likely a power point *every 5 to 10 lines of code!* That also means every single solution is wrong. That seem absurdly high but defects congregate around specific points. Also, many bugs tend to be a failure of the design to handle a specific scenario so these are fixed by restructuring rather than fixing code. Let's say half of the end product defects are code/puzzle errors. I've yet to do the probability curves, but I expect there are 2 to 4 Puzzles per 1000LOC with one error each. There is a lot of speculation here, but the end result is simple, *programmer's never solve a puzzle and implement it correctly for all use cases.*

This seems bleak. How can software run at all? Simple, the program works most of the time. Only in special places, usually where the developer didn't think about a minute consequence or could not imagine every possible use of his solution. Where does that leave software development? We can't estimate effort or time. We can't estimate design complexity and we can't even implement the product without many, many errors. Is there any hope of creating reliable software on time and in budget? Is there any way to know how many developers we need before we need them? Is there a way to design software sensibly yet still allow for changes for requirements and design blunders? Well, yes actually. But we must embrace the uncertainty.

What is estimating? It's taking input from various sources (requirements, employees, vendors, past projects...), modifying it using any number of ideas, algorithms, formulas, and voodoo and finally formatting it and sending it out to be considered. That sounds familiar. It's the typical flow of code! Furthermore, managing the project and designing the product are also similar tasks. One way to look at this is that missing schedule is a bug in the planning or that a design defect is a bug in designing. This is confusing but it's the best possible outcome. This means we can use the same methodology to plan and manage a project that we use to implement it. Programming is all about finding patterns, right? Well here it is. You can write a project like you write a program. The downside is the same rules apply. My rule is if there is an anti-pattern for implementing software, there is a corresponding (if not the same) anti-pattern for estimating, planning, designing, and maintaining projects. A "ball of mud" can happen in project planning as well as implementation. In fact, projects are fractal in nature, meaning an anti-pattern can happen between any interacting subsystem. They can happen at the company's program level (the project of projects) and the interpersonal level.

This means we can use the same methodology to solve many of our problems, not just software. In fact, since it is the same process and is simple enough for everyone to understand, it can be the basis for all your processes. I have a lot more to cover, but this post has become very long and you likely need a break. I'll cover the actual process in the next post.

Lisp-Qix

Friday, December 6, 2013

How Using Puzzles Can Estimate Software Time and Effort

No comments:

Post a Comment