The source of many challenges my team faced while implementing, automating, and reporting performance test results was based on the differences between performance tests vs. the type of tests the rest of the organization was used to: the standard “Pass/Fail” test. This is another in a series of Some thoughts on Performance Testing where I’ll examine those differences.
Pass/Fail tests are absolute; Performance tests have nuance. The product management, project/program management, and engineering teams love the absolute nature of pass/fail tests. Even when a Pass/Fail test is broken, everyone knows what the desired result had it been working should be: Pass. Pass/Fail tests can still cause problems, especially automated tests in a system hasn’t been built to consider that your results should really be one of 3 main categories: Error (of various types)/Test Completed with a Pass/Test Completed with a Fail. This adds some complexity to deal with ‘failure to run’ such as required media/network connection failures that preclude a test running to completion. But even in an imaginary world where all tests always are able to run to completion performance tests don’t give such simple pass/fail answers. And the answers they do give can change over time and configuration.
A standard Error/Pass/Fail test returns a simple value (or error) and suites of tests can be easily rolled up as in most cases those 100 tests in suite X will all Pass, so the suite itself can just say “Pass” and perfectly record all the data for the underlying tests. A performance test may report multiple results for a single test: e.g. both time elapsed and memory used. You’ll also need to know what units the test is reporting so the values have more meaning. For some applications you may need even more information about the data being recorded such as amount of time a measurement spans, etc. though some of that info may fall more into “know what your test is testing” rather than info that needs to be stored with every measurement taken, which bleeds into info about tests vs. info about data.
All of the readings that are taken will require further info to interpret. Tests which use a resource (memory, time, battery, cpu) generally aim for less, but for other tests (connections, maximum, or X per time-period) better performance means a higher number. Frames per second, number of operations per second, number of connections are all examples where bigger is better. A boolean “BiggerIsBetter” may capture that sort of vector info about the test results. Other tests may require falling within a certain range, so a simple vector isn’t enough. If you were using a Control Chart, comparing results to a target with upper and lower limits, you may have a lot of interpretive information which needs to be considered.
You may also have one or more complete set of ‘target’ results which you wish to try to achieve or better. I’ll cover that later in my discussion of Benchmarks.
Each platform, configuration, or network design may require different benchmarks, test time-frames and limits, and will impact the interpretation of results. A simple ‘bouncing balls’ animation test may run so much faster on a new desktop compared to an older smart phone that the test itself must act different so it doesn’t run too slowly or too quickly to provide decent results.
If a performance test does Fail upon comparison with a goal, there is still a great deal more to learn. Did the performance improve a little, a lot, not at all, or did we go backwards? Are we almost to the goal and some tweaks may be all we need or are we far away and really need to reconsider the approach? Does the history of results for this test reveal anything? Did we make one little tweak to improve one test and affect others (negatively or positively)? Even if the results of a performance test technically Pass, you will likely learn a lot more from the result in the context of previous results and benchmarks. Did we go from a solid Pass to just barely? Are we trending down on this test as we implement other systems?
Understanding the need for context and more information for interpreting results is the key to understanding how tests of performance differ from standard Pass/Fail/Error tests.
While there may be cases where a simple, uncomplicated performance test provides “Pass”, I prefer all such tests to record their results as-measured and do the interpretation of results after completion. If you aren’t graphing your results over time, you should be.