You don't write enough tests

A challenge

Before I get to the meat of this post, I have a challenge for you. I highly recommend you trying it before reading the rest of this post. The challenge is meant for a pair, but you can do it by yourself by assuming both roles - it will not be as powerful a lesson.

Pick a simple problem. Something like the Conway's Game of Life works well, but any problem of similar complexity will do. In your pair, pick one person who will write the tests; the other person will write the implementation. The process is as follows:

The tester writes a small test that makes the code go red.
The implementer writes the easiest possible code to make the code go green again.

Note that the easiest path to green code isn't necessarily the correct implementation. Let me give an example of a hypothetical fixed size list class:

[Fact]
public void Count_ReturnsCorrectSize() {
    var list = new FixedSizeList(10); // make a list with size 10

    Assert.Equal(list.Count, 10);
}

A very straightforward test. There is also a very straightforward implementation:

class FixedSizeList {
    public int Count = 10;

    public FixedSizeList(int _) {}
}

Is this bad code? Yeah. Does it pass the tests? Yeah, and that's really what matters in this exercise.

Now it is up to the tester to write tests in such a way that the easiest possible way to make the code green is to actually implement the correct solution. Repeat until you've had enough, or until the correct implementation exists. For an added challenge, the tester could choose not to look at the implementation at all, and rely on the tests only to create a correct solution.

Once you're ready to see what you can learn from this challenge, read on!

The lazy evil programmer

The exercise above is one of the exercises I do with the candidates of code retreats that I host, and I tend to call the session "lazy evil programmer". Almost without failure, people have a lot of fun with this exercise. For myself, doing this the first time changed my perception on tests forever.

At the end of the session, look at the ratio of test code to production code. In the most likely case, you test code will be anywhere from three to five times as big as the production code, if not more than that. This is often much more than people write in usual setups. Why is that?

Under normal circumstances, we believe we'll do a pretty good job at implementing the correct solution. Going back to the example above: if we test that our FixedSizeList works for size 10, we can reasonably assume it works for other numbers as well, right? Sure, we'll maybe cover 0, a negative number, and if we're feeling particularly scrutinizing, int.MaxValue as well, but after that we move on. Can we really assure our test works with all inputs though?

In the exercise outlined above, the implementing programmer is actually an antagonist, and we can't trust them to do the right thing. If I had written the code myself, in an attempt to make the correct implementation, it would be pretty easy to take the tests as enough evidence of the correctness of my code. This is where the big assumption comes in: I trust my own capabilities in writing the correct code.

Most of the time, programmers write the correct code. However, to err is human, and mistakes will slip in from time to time. Many people agree that if we cannot prove the code is not correct through testing, it may as well not exist (something that the language Vigil tries to solve). Tests often only sample some parts of our code. This comic summarises it pretty well. So why are we satisfied with only some tests, and covering the rest by ~~magic~~ trust in ourselves, often misplaced? As programmers, we are not actively trying to be lazy or evil, but I still believe tests should be written with that assumption. Unintentially, we may be making mistakes or shortcuts, and tests catch those.

What tests to write

This section discusses some strategies for testing, so if you want to try the exercise at the start of this blog post for yourself, this is your final warning to avoid some spoilers.

We can't write a unit test for each possible input. If you tried the challenge above, you may have found that simple unit tests tend to not quite cut it after some time. Especially when the methods become more complex, trying to find a set of covering cases is extremely difficult. This is where we have to turn towards other testing methods.

Unit tests tend to have a fairly standard format: arrange input, act on the test subject, assert that the actual output matches the expected output. To make sure that we don't build code that works only for some inputs (namely the tested ones), but all possible inputs, we can randomly sample all the possible inputs by generating random inputs. The challenge many people run into here is that if you start making the input randomised, the test also doesn't know what the expected outcome is. A common anti-pattern here is to implement the entire solution again in the tests and compare the outputs, but who's to say the test implementation doesn't contain the same bug as the actual implementation (especially since they are usually written by the same person)? What works better in these cases is to verify that certain properties on the output hold.

As an example, let's assume our hypothetical FixedSizeList can be sorted. If we generate a random set of numbers for in the list, we wouldn't want to compare the output to an exact other list, since we don't have a way to generate that (putting aside our trust in Array.Sort). However, a sorted list has one property that is very easy to check: each number is larger than or equal than the previous.

[Fact]
public void Sorted_ReturnsSortedList()
{
    var size = random.NextInt(0, int.MaxValue);
    var list = new FixedSizeList(size);
    for (var i = 0; i < size; i++) {
        list.Add(random.NextInt());
    }

    var sortedList = list.Sorted();

    for (int i = 0; i < size - 1; i++) {
        Assert.True(sortedList[i] <= sortedList[i + 1]);
    }
}

The test is relatively simple to write, and all of a sudden it is really unlikely that the Sorted function is implemented wrongly.

A second way to test beyond unit tests is to focus on system tests. In Conway's Game of Life for example, there is a repeating pattern called a glider. This is a set of alive cells that after four iterations repeats itself, but is moved one tile to the right and one tile to the bottom. This is a very fragile pattern, and any bug in your Game of Life implementation will most likely break this pattern. Yet the outcome is incredibly predictable, so it'd be easy to write a test that takes the glider as input, does 400 iterations, and matches the expected output (which is the same glider, just moved 100 tiles). It is likely to cover all your code branches.

Finally, the challenge posed above itself can provide a good solution to the problem as well: don't write your own tests. If you forget a boundary condition in your code, you are likely to also forget to test for that boundary condition. Because you know what you are testing for, you know exactly what code you do and don't have to write, or vice versa if you are writing the implementation before tests. By having a different person write the tests, you lose all preconceptions about what is and isn't correct about the code, and have a larger chance of catching actual bugs.

What about coverage tools?

One final note before we move on to the conclusion. One tool that has been created to give us programmers more trust in tests, is test coverage. Test coverage is represented as the percentage of lines of code that are executed as part of your tests. While this is an incredibly powerful - and generally under-used - tool, like any tool, we need to be careful about how we use it too. A test running a certain line of code doesn't mean that the correctness of that line is tested as well. This is the danger of coverage tools: if we blindly trust them, we may stop thinking critically, and we're back at being lazy and evil.

Conclusion

Once we stop being able to trust ourselves to write the right code, we realise that we actually need to write many more tests than we usually do to guarantee the correctness of our implementation. This shows that a large part of the evidence of our code's correctness is missing. The only way to truly be certain our code works is to assume to worst, and put maximum effort into making sure that every single bit of code is covered by tests. Randomized testing using property tests, system testing, and having a different person write your tests are among some of the solutions that can help catch more bugs in your code. In the end, code that isn't tested can't be trusted, whether it be written by a lazy evil programmer, or just a lazy one.