First off, I’ve revised the title of of the series. I’m all for automating work that can be described and /precisely/ evaluated.
For example, let’s say you have a PowerOf function. To test it, you could write a harness that takes input from the keyboard and prints the results, or you could write something like this:
is(PowerOf(2,1),1, “Two to the first is two”);
is(PowerOf(2,2),4, “Two to the second is four”);
is(PowerOf(3,3),27, “Three to the third is twenty-seven”);
is(PowerOf(2,-1,undef,”PowerOf doesn’t handle negative exponents yet”);
is(PowerOf(2,2.5,undef,”PowerOf doesn’t handle fractional exponents yet”);
And so on.
When you add fractional or negative exponents, you can add new tests and re-run all the old tests, in order.
That is to say, this test can now run unattended and it will be very similar to what you would manually. Not completely – because if the powerOf function takes 30 seconds to calculate the answer, which is unacceptable, it will still eventually “Green Bar” – but hopefully, when you run it by hand, you notice this problem. (And if you are concerned about speed, you could wrap the tests in timer-based tests.)
Enter The GUI
As soon as we start talking about interactive screens, the number of things the human eye evaluates goes up. Wayyy up. Which brings us back to the keyword or screen capture problem – either the software will only look for problems I specify, or it will look for everything.
Let’s talk about a real bug in the field
The environment: Software as a service web-based application that supports IE6, IE7, Firefox 2, Firefox 3, and Safari. To find examples, I searched bugzilla for “IE6 transparent”, where we’ve had a few recently. (I do not mean to pick on IE; I could have searched for Safari or FF and got a similar list.) That does bring up an interesting problem: Most of the bugs below looked just fine in other browsers.
Here are some snippets from actual bug reports.
1) To reproduce, just go to IE6 and resize your browser window to take up about half your screen. Then log into dashboard, and see “(redacted element name)” appear too low and extra whitespace in some widget frames.
2) Page includes shows missing image in place of “Edit” button in IE6 and IE7
3) In IE6 only, upload light box shows up partly hidden when browser is not maximized.
4) In IE6 and IE7, comment’s editor has long vertical and horizontal scroll bar.
5) In IE6 at editor UI, there is a thick blue spaces between the buttons and rest of the editor tools
6) To reproduce, in IE6, create some (redacted), then check out the left-most tab of (redacted 2). The icons for type of even are not the same background color as the widget itself. (see attachment)
All of these bugs were caught by actual testers prior to ship. I do not think it is reasonable to expect these tests to be automated unless you were doing record/playback testing. Now, if you were doing record/playback testing, you’d have to run the tests manually first, in every browser combination, and they’d fail, so you’d have to run them again and again until the entire sub-section of the application passed. Then you’d have a very brittle test that worked under one browser and one operating system.
That leaves writing the test after the fact, and, again, you’ll get no help from keyword-driven frameworks like Selenium – “Whitepace is less than a half and inch between elements X and Y” simply isn’t built into the tool, and the effort to add it would be prohibitive. If you wanted to write automated tests after the bugs were found, you’d have to use a traditional record/playback tool and now have two sets of tests.
That brings up a third option – slideshow tests that are watched by a human being, or that record periodic screen captures that a human can compare, side-by-side, with yesterday’s run. We do this every iteration at Socialtext to good effect, but those tests aren’t run /unattended/. Thus I change the name of this series.
I should also add, that problems like “too much whitespace” or “a button is missing but there is a big red X you can push” are fundamentally different from a crash or timeout. So if you have a big application to test, it might be a perfectly reasonable strategy to make hundreds of thousands of keyword-driven tests that make sure the basic happy-path of the application returns correct results (of the results you can think of when you write the tests.)
We have discussed unit and developer-facing test automation along with three different GUI-test driving strategies. We found that the GUI-driving, unattended strategies are really only good for regression – making sure what worked yesterday still works today. I’ve covered some pros and cons for each, and found a half-dozen real bugs from the field that we wouldn’t reasonably expect these tests to cover.
This brings up a question: What percentage of bugs are in this category, and how bad are they, and how often do we have regressions, anyway?
More to come.