I frequently hear people claiming to be debating code quality where in fact the only topics discussed are of code metrics. Quotes can often be heard along the lines of “The code quality on our project is excellent, we have a 70% test coverage” or the opposite “We’re slipping on code quality, unit test coverage for this new module is only 30%”. While test coverage is not the only code metric used in this manner, line counts, cyclimatic complexity, static analysis heuristics etc. being others, it is likely the least understood and most often abused indicator of code quality.
It is true that for most people, having a BMI of 30 means that you could drop a few kilos, but ordering a body builder to lose some fat is most likely not the right course of action. Similarly, every type of code metric when used without proper interpretation and context may indicate the opposite of what the naive explanation would suggest.
Before looking at some specific types of code metrics, let’s have another look at the real subject at hand: Code Quality. While difficult to achieve, the essence of high code quality boils down to two simple questions:
- Does it do what we want it to do? (Is it correct?)
- Is it easy to change it to do something else? (Is it maintainable?)
Note that none of the above directly relates to test coverage, line counts, complexity metrics or other often used numbers. Each metric is an attempt to extract the answer to one or both of the above questions, but not the answer itself.
A high unit test coverage is meant to indicate that a code base is both correct and maintainable, but is not always the case. A test first approach where the intent of the unit is clearly documented via meaningful example use cases will increase confidence in the correctness of the code, both now and when changes are made. Readable tests also help new developers understand the intent of the existing code, making it easier to adjust it as required to match new requirements, optimise performance etc.
On the other hand, a 70% unit test coverage achieved by force rather than understanding of principles is likely to have the opposite effect. Instead of proving correctness, they give a false sense of security as the tests end up covering whatever happened to be the implementation rather than the intent of the unit. Often, they serve as a drag on maintainability as the unit tests themselves exhibit poor code quality and fragile tests too tightly coupled with implementation fail whenever a change is made.
In one of my past projects, we were working on a large code base with complex logic scattered throughout the application. While the system at the time had a high unit test coverage of above 70%, it did not exhibit much correctness and had a high bug count. Maintainability was also low, with little consistency and reuse of code and the test suite acting as a drag rather than safety net on any attempt to refactor the code base.
After significant effort was spent on addressing the situation, we ended up with a much more maintainable codebase, with common patterns extracted and reused. Bug counts were reduced as a result of establishing patterns, ensuring that bugs fixed in one place of the application was fixed in all. However, at the end of the exercise, the test coverage had gone down to 50% overall. The reason? Many of the classes that previously had had a version of common complex logic and needed unit tests for there to be any confidence of correctness had now been replaced with code acting as little more than type safe configuration making unit testing a wasted effort.
In short: Be sure to check if a person is a body builder or a couch potato before telling them to lose some weight.