I came across a lot of ideas in graduate school. Some, like Test Driven Development, Extreme Programming and Scrum skyrocketed my career forward.
Others were models – a way of seeing the world – that were on the way out.
I am particularly struck with how we thought about the test “phase” in my introduction to software management course. The idea was that testing would have a defined test cycle where you do “all” the testing. No, we never did get into much depth about what “all” the testing was. Once the test cycle was complete, you’d look at your bugs, fix the important ones, then run all the tests again. Because every build had change in it, it was important to re-run all the tests, because anything could be broken.
In round one of testing we find 56 defects and fix 35? Re-run all the testing. Round two we find 20 defects and fix 19? re-run all the testing. Again and again, until we are ready to ship, and after. Find a single bug worth fixing two days after the website goes live? Rerun all the testing.
Except that we did not actually behave as if that were true – especially in maintenance mode. We would be “too busy” to do the right thing. As the deadline approached, testing would spiral inward, and we would do less and less. This made sense, at least to us, at least at the time, because we were changing less and less “just” fixing one thing or another.
There was plenty of oddity in graduate school, but one I still hear today is this idea of “all the tests”, as if testing is a binary done or not done activity.
Welcome to the Minefield (Or: There is No “All The Tests”)
The number of possible tests are infinite. Infinite. There is no idea of all the tests; a simple calculator has an unlimited combination of button-pushes, any number of which could lead to a defect. If you need more proof that test is infinite, and can’t be binary, well, there’s video:
When I talk about testing, I tend to frame it as investing some amount of time and effort to find problems, risks, and opportunities to improve the software. How much time, and how we spend it, is a question of value. You could think of software testing, then, as a minefield, and the bugs as mines. James Bach explains it this way, in his article Reasons To Repeat a Test:
Testing to find bugs is like searching a minefield for mines. If you just travel the same path through the field again and again, you won’t find a lot of mines. Actually, that’s a great way to avoid mines. The space represented by a modern software product is hugely more complex than a minefield, so it’s even more of a problem to assume that some small number of “paths”, say, a hundred, thousand, or million, when endlessly repeated, will find every important bug. As many tests as a team of testers can physically perform in a few weeks or months is still not that many tests compared to all the things that can happen to a product in the field.
The minefield analogy is really just another way of saying that testing is a sampling process, and we probably want a larger sample, rather than a tiny sample repeated over and over again. Hence the minefield heuristic is do different tests instead of repeating the same tests.
But what do I mean by repeat the same test? It’s easy to see that no test can be repeated exactly, any more than you can exactly retrace your footsteps. You can get close, but you will always be a tiny bit off. Does repeating a test mean that the second time you run the test you have to make sure that sunlight is shining at the same angle onto your mousepad? Maybe. Don’t laugh. I did experience a bug, once, that was triggered by sunlight hitting an optical sensor inside a mouse. You just can’t say for sure what factors are going to affect a test. However, when you test you have a certain goal and a certain theory of the system. You may very well be able to repeat a test with respect to that goal and theory in every respect that A) you know about and B) you care about and C) isn’t too expensive to repeat. Nothing is necessarily intractable about that.
Therefore, by a repeated test, I mean a test that includes elements already known to be covered in other tests. To repeat a test is to repeat some aspect of a previous test. The minefield heuristic is saying that it’s better to try to do something you haven’t yet done, then to do something you already have done.
I’d like to add one small bit to James’s minefield analogy, a tiny bit that explains what we were actually doing all those years ago.
Say you have exceptional coverage. You could always test more, but in this case it makes sense to release right now. Still, there is that one niggling request to add a parameter to the report, so people can get results for a specific username. You could add the parameter, or column, or change, and it doesn’t entirely invalidate your testing – not really. Most of the path you trod really hasn’t changed.
It is more like walking in a minefield … in the middle of a sandstorm.
A Couple of Real Projects
Over the past few years I have had a chance to consult at companies that broke the work down into a set of either services or data flows. Each component of the system had defined inputs and outputs, and we could check and verify the components in isolation. If all that changed was the internals of a component, and we had full confidence in it, we could run all the automated checks and release to production. Done.
And I worked at other companies, companies where a change to one piece of functionality could break something else with almost no rhyme or reason, companies with legacy systems, with functions that were thousands of lines of code that no one could track in their head, that combined business logic and the GUI and data layer.
In the first group, the wind was light. We often had systems in place to notice problems early and push to production quickly. The second group, the windstorm was the worst. The testing task was to re-explore every new build, looking at what had changed but any other critical business function that needed to keep working. (In the second group, you could argue there were plenty of quality and code-craft style tasks, as well, but that is a different post.)
My point here is that contexts are different; that testing one creaky, old, combined application that could inject bugs anywhere, the old graduate school strategy might have been just fine.
Footsteps in the Sand
Change a module, and some of the tests you ran in that module will be invalidated – but likely not all. The tests outside that module may be impacted – but you are not sure how much. Over time, if the system changes all over, the value of your test effort disappears, and everything does need to be rerun.
In the mean time, there are some paths that are still valid – it is just not entirely clear which ones. By constant communication with the programmers, the customer, and other stakeholders, it is generally possible to agree on the things worth exploring next – and certainly someone with skill who knows the product can find paths that will yield better results for the time invested, certainly better than running “all the tests” over again.
In the mean time, we are in a bit of a fog. Or a sandstorm. We can leave markers, but they tend to become invalid. Knowing what to do next is challenge.
Yes, we are searching for bugs in a minefield, covered in sand, while the winds are blowing.
The problem with this metaphor is that sometimes, your tiny little change that couldn’t possible break anything else does. You add a single parameter to a report, then find out that an entire other team relied on that report, the parameter was required, and the did not update their code to add the parameter. Boom. These sorts of things happen all the time. The windstorm is worse than we thought, and sometimes what we thought was a track is just a rabbit’s trail, that doesn’t really cut the surface.
I took that class in fall 2001. I remember because we had class on September 11th, and we tried to talk about software. It was hard.
It has been fifteen years.
I think we can do better now.