Test Coverage Demystified: A Complete Introductory Guide

People often confuse test coverage with code coverage, a related but different metric. Some people also think tracking any kind of coverage is worthless at best and detrimental at worst. Still others might think test coverage is valuable but don’t know enough about it, let alone how to track it.

Regardless of which of the three categories you belong to, this post is for you. As its title suggests, it’s going to be a complete introduction to the topic. I’ll keep it comprehensive yet approachable. Among other things, you’ll learn what test coverage is, how it relates to other types of coverage—code coverage included—and how to measure it. Let’s get started.

What Is Test Coverage?

Searching online for a good definition of test coverage is a challenge. This is because test coverage might mean different things to different people. For instance, many people treat test coverage and code coverage as synonyms. I personally don’t subscribe to this view, as you’ll soon see.

Here’s my definition of test coverage:

Test coverage refers to the portion of an app’s functionality covered by at least one type of automated test that can catch regressions should they ever appear.

As you can see, my definition is primarily concerned with functionality rather than code. That’s because features are what matters from the perspective of your users.

Also, notice I don’t say anything about unit testing. Yes, unit tests are immensely valuable. That doesn’t mean they’re the only kinds of tests you should worry about.

Later in the post, I’ll talk about unit tests and other types of software tests regarding the role each of them plays in an overall testing strategy.

What About Code Coverage?

Let’s now talk a little bit about code coverage, investigating its relationship with test coverage.

Code Coverage Subtypes

Code coverage refers to the ratio of source code covered by automated tests. And though you can measure code coverage for other types of tests, this metric is typically used in the context of unit testing.

Code coverage has its subtypes:

Line coverage indicates the percentage of lines covered by at least one test case.
Statement coverage indicates the ratio of statements covered by tests.
Branch coverage indicates the percentage of branches—i.e., logical ramifications within the code—covered by tests.

Out of these three, branch coverage is the most valuable metric. It indicates how comprehensively the actual execution paths inside methods are covered and thus offers more confidence. Branch coverage is also interesting because it’s intimately related to cyclomatic complexity, or the number of logical ramifications inside a given function.

Code Coverage vs. Test Coverage

So, how does code coverage compare with test coverage? There are two main differences:

Code coverage is about source code. It’s in the name, after all.
Code coverage is typically about unit testing.

On the other hand, test coverage is less focused on the source code itself and more on actual user-facing functionality. Also, it considers other types of software testing since they all contribute to the overall quality of the application.

How to Measure/Calculate Test Coverage

Measuring code coverage is easy enough: Some tools do it for you.

Determining test coverage, on the other hand, can be trickier. If you Google for it, you’ll come across formulas like this:

Test Coverage = (Number of lines covered by test / number of lines in the application) * 100

I don’t like this for several reasons. First, such formulas sound more like code coverage than test coverage to me. Worse still, they’re about line coverage, which is the less useful type of code coverage.

Also, it ignores complications, such as generated boilerplate code, the difficulty of determining the number of lines exercised by visual end-to-end testing, and so on.

Finally, I prefer to think of test coverage in terms of functionalities. That’s what users care about, after all. So, a possible approach to track test coverage is to use your preferred project management solution—e.g., Jira—to track whether a given story or ticket has been tested.

Additionally, you might want to attribute a certain weight to each type of software test in the test pyramid. For example, unit testing might weigh 50%, integration tests 30%, and E2E tests 20%. So, a given feature covered by unit and integration tests but no E2E ones will have 80% of test coverage. Expand that to all features to find the complete coverage for the whole application.

Test Coverage and the Test Pyramid

The test pyramid, also called the test automation pyramid, is a concept I learned from Martin Fowler’s website.

It is a framework that helps you answer this question: Given the many different types of automated software testing, how much should I use of each type?

The test pyramid gives you an economical answer, helping you focus on the cost and benefit of each type of test:

If you’ve clicked the link to Fowler’s article, you’ve seen he uses different names for the top two layers of the pyramid. I like to use “integration” rather than “service” and “E2E” (end-to-end) rather than “UI” because they better reflect my understanding of those kinds of tests.

So, what is this pyramid thing about?

Some Tests Are Slow and Some Are Fast

Tests differ in regards to their execution speed. Unit tests tend to be really fast since they don’t rely on external dependencies and test small portions of the application in isolation. Integration tests are less fast since they do use dependencies and rely on other parts of the code.

End-to-end tests are the slowest of the bunch since they exercise the application the same way a user would, starting at one end—the user interface—and going all the way to the other end—the database—interacting with all the layers in between.

Some Tests Are Cheap and Others Are Expensive

Unit tests are cheaper than other types of tests in two ways:

They’re easier and faster to write. Since they don’t require elaborate setup or special configurations, writing unit tests is typically easy.
They’re literally cheaper to run. They don’t require a specialized and pricey environment, nor do they require loads of test data, which come with their own can of worms, such as the need for data masking, compliance, test data management, and so on.

The higher you go on the pyramid, the more expensive tests become.

Some Tests Are Precise in Their Feedback, Others Less So

Since unit tests verify units in complete isolation, they offer feedback with surgical precision. Unit tests are deterministic; a test won’t start falling out of the blue without changing the code under test or the test method itself—or at least, it shouldn’t.

On the other hand, an E2E test can fail for many reasons, related or not to the code under test. So, diagnosing what went wrong might feel like trying to find a needle in a haystack.

Slow, More Expensive, and Less Precise Tests Are Also Valuable

What I’ve written so far might give you the impression that only unit tests are good. That’s far from the truth.

The thing is: despite their many advantages, unit tests aren’t enough to ensure your app’s quality. That’s because there are many types of defects they can’t catch. A lot can—and does—go wrong in glue code. That’s code that integrates different modules of an application or the application to external concerns, such as the database, the filesystem, HTTP services, etc.

The pyramid tells you that you should have all different types of tests because all of them offer something valuable. However, don’t distribute them uniformly. Instead, have more unit tests, fewer integration tests, and fewer still end-to-end tests.

What About Test Coverage?

The test pyramid is a valuable concept to have in mind when thinking about test coverage. The reason is simple: You can’t achieve a high coverage without leveraging different types of testing. Consider:

Unit tests won’t help you verify that the application integrates correctly with external systems.
E2E tests don’t lend themselves easily to verifying that each method of the codebase works correctly in all its possible paths.
Integrations tests don’t validate user experience, performance, or other non-functional requirements.

Keep in mind that the labels you see on the pyramid are just broad categorizations. There are still many types of testing that aren’t featured but are equally as important.

Here’s the main takeaway: To ensure great coverage, leverage all testing types relevant to your application type according to their costs and benefits.

Back to You

Test coverage is a valuable yet often overlooked software engineering metric. It’s often confused with other metrics or even forgotten entirely.

In this post, you’ve learned about test coverage, including its relationship to other types of coverage. You’ve learned two possible approaches for calculating it and even how the test pyramid concept fits in all of this.

I’ve also quickly mentioned Jira in the context of determining which features are covered. If you’re a Jira user, you might be interested in extracting useful analytics from it, which, along with Git statistics, can give you valuable insights into your project. If that sounds interesting, schedule a LinearB demo today.

Improve your engineering organization at every level with LinearB — Want to improve your engineering processes at every level? Get started with a LinearB free-forever account today!

Your next read

Cover image for The problem with burndown charts in modern engineering

Eng. Metrics

The Engineering Productivity Platform

Resources

Use Cases

Features

Productivity Research Center

6.1M PRs

< 26 Hrs

13.3%