Long-time readers will know that I am very wary of metrics for software engineering. Oh, there’s the usual problems:
1) Generally, software engineering metrics are proxy metrics. You really want to measure productivity, but you can’t, so instead you measure lines of code. Or you really want to measure quality, but you can’t, so you measure defects.
2) If you measure something, you’re likely to get it – but people will take short-cuts in order to get that thing. Generally, this will involve exploit the difference between what you want and what is actually measured – for example, a developer will argue “that’s not a bug” if measured by bug count, or a tester, if measured by bugs found, may search in the documentation and file a bug on every single typo. Demarco and Lister refer to this as “dysfunction” in their book Peopleware.
3) Likewise, software engineering metrics (often) measure things that are different and put them all in one box. Instead of measuring dollars or widgets, all of which are interchangable, we measure tests, bugs, or lines of code. These are /not/ interchangeable – some could take much more time to find/create than others – yet putting them in the same box means they are all treated the same. The result? You’ll get a /lot/ of very small things. Try this with projects – watch the size of your portfolio shoot up … but each project is smaller. Isn’t it strange how that happens?
4) Even if you can measure well, there are probably some things you have not measured – and to achieve the metrics, the team will usually trade those “intangibles” off. The classic example of this is hitting a date by allowing quality – which is hard to measure – to suffer. In the 21st century, with more advanced techniques, we are getting better at assessing product quality, so the next thing to take on is usually technical debt.
5) The classic answer to this is to have a balanced scorecard – to measure several things, such that a tradeoff to increase one thing will cause a visible decrease in another. But consider how hard it is to measure technical debt – or strength of relationships – and consider how expensive it is to try to create and maintain an exhaustive metrics system. By the time the metrics system is in place, you could have shipped a whole new product. Can we really call that improvement?
Getting Metrics right is /hard/. Consider McDonalds, a multi-billion dollar corporation, that measures price of it’s food, sales, and repeat customers. What do they not measure? I suspect McDonalds does not measure the waistlines of it’s best customers, treatment of animals in it’s food pipeline, and, until lately, effect on the environment of it’s waste.
When I explain the challenges with Software Engineering Metrics to folks, I usually get one of two reactions: Either strong agreement “I always felt that way but didn’t have the words”, or no response at all. It’s not that people who don’t respond at all don’t care – they are usually strong proponents of metrics in a software group. They simple have an opposing viewpoint and yet have no answer to the dysfunctional issues caused my metrics.
To which I will add one more thought experiment:
I belong to twitter, which counts the number of people who follow me. This is a simple, concrete, hard measure of my popularity. I can use my twitter score as an objective number to argue my case before a book company, a magazine publisher, or a conference – in some cases, this could directly result in more bookings and higher revenue for the still-exists-but-tiny Excelon Development.
Yet if /all/ I cared about was that one metric on twitter, I would adjust what I write to appeal to all people working in testing. Then to anyone doing any kind of software work. Then I’d generalize to knowledge work. Then I’d go mass-market and try to talk about technology in the 21st century. And the message would get weaker and weaker and weaker and …
So, to be true to myself, I need to ignore my twitter ranking, ignore my technorati ranking, and try to generate real relationship and create content that’s actually worth reading.
It’s funny how that works, eh?