Writing Unit Tests

From Qt Wiki
Revision as of 11:14, 15 September 2016 by EdwardWelbourne (talk | contribs) (Added a preamble. Shortened first heading. Expanded on the evils of Q_ASSERT in tests.)
Jump to navigation Jump to search


Whenever you fix a bug, please add a regression test for it: this is a test (ideally automatic) that fails before the fix, exhibiting the bug, and passes after the fix. Whenever you implement a new feature, please add tests that verify that the new feature works as intended. Once you've written and committed your tests (along with your fix or new feature), you can check out the branch on which your work is based then check out into this the test-files for your new tests; this lets you verify that the tests do fail on the prior branch.

General principles

Use initTestCase and cleanupTestCase for setup and teardown of test harness

Tests that require preparations should use the global initTestCase for that purpose. In the end, every test should leave the system in a usable state, so it can be run repeatedly. Cleanup operations should be handled in cleanupTestCase, so they get run even if the test fails and exits early.

Another option is to use RAII, with cleanup operations called in destructors, to ensure they happen when the test function returns and the object goes out of scope.

Test functions should be self-contained

Within a test program, test functions should be independent of each other and not rely upon previous test functions having been run. You can check this by running the test function on its own with "tst_foo testname".

Test the full stack

If an API is implemented in terms of pluggable/platform-specific backends that do the heavy-lifting make sure to write tests that cover the code-paths all the way down into the backends. Testing the upper layer API parts using a mock backend is a nice way to isolate errors in the API layer from the backends, but is complementary to tests that exercise the actual implementation with real-world data.

Tests should complete quickly

Tests should not waste time by being unnecessarily repetitious, using inappropriately large volumes of test data, or by introducing needless idle time.

This is particularly true for unit testing, where every second of extra unit test execution time makes CI testing of a branch across multiple targets take longer. Remember that unit testing is separate from load and reliability testing, where larger volumes of test data and lengthier test runs are expected.

Benchmark tests, which typically execute the same test multiple times, should be in the separate tests/benchmarks directory and not mixed with functional unit tests.

Use data-driven testing as much as possible

Data-driven tests make it easier to add new tests for boundary conditions found in later bug reports.

Using a data-driven test rather than testing several items in sequence in a test function also prevents an earlier QVERIFY/QCOMPARE failure from blocking the reporting of later checks.

Always respect QCOMPARE parameter semantics

The first parameter to QCOMPARE should always be the actual value produced by the code-under-test, while the second parameter should always be the expected value. When the values don't match, QCOMPARE prints them with the labels "Actual" and "Expected". If the parameter order is swapped, debugging a failing test can be confusing.

Use QSignalSpy to verify signal emissions

The QSignalSpy class provides an elegant mechanism for capturing the list of signals emitted by an object.

Verify the validity of a QSignalSpy after construction

QSignalSpy's constructor does a number of sanity checks, such as verifying that the signal to be spied upon actually exists. To make diagnosis of test failures easier, the result of these checks should be checked by calling "QVERIFY (spy.isValid());" before proceeding further with a test.

Use coverage tools to direct testing effort

Use a coverage tool such as Squish Coco or gcov to help write tests that cover as many statements/branches/conditions as possible in the function/class being tested. The earlier this is done in the development cycle for a new feature, the easier it will be to catch regressions later as the code is refactored.

Naming of test functions is important

Test functions should be named to make it obvious what the function is trying to test.

Naming test functions using the bug-tracking identifier is to be avoided, as these identifiers soon become obsolete if the bug-tracker is replaced and some bug-trackers may not be accessible to external users (e.g. the internal Jira server that stores Company Confidential data). To track which bug-tracking task a test function relates to, it should be sufficient to include the bug identifier in the commit message. Source management tools such a "git blame" can then be used to retrieve this information.

Use appropriate mechanisms to exclude inapplicable tests

QSKIP should be used to handle cases where a test function is found at run-time to be inapplicable in the current test environment. For example, a test of font rendering may call QSKIP if needed fonts are not present on the test system.

Beware moc limitation when excluding tests. The moc preprocessor has not access to all the builtins defines the compiler has and those macro are often used for feature decation of the compiler. It may then happen that moc does not 'sees' the test, and as a result, the test would never be called.

If an entire test program is inapplicable for a specific platform, the best approach is to use the parent directory's .pro file to avoid building the test. For example, if the tests/auto/gui/someclass test is not valid for Mac OS X, add the following to tests/auto/gui.pro:

mac*:SUBDIRS -= someclass

Prefer QEXPECT_FAIL to QSKIP for known bugs

If a test exposes a known bug that will not be fixed immediately, use the QEXPECT_FAIL macro to document the failure and reference the bug tracking identifier for the known issue. When the test is run, expected failures will be marked as XFAIL in the test output and will not be counted as failures when setting the test program's return code. If an expected failure does not occur, the XPASS (unexpected pass) will be reported in the test output and will be counted as a test failure.

For known bugs, QEXPECT_FAIL is better than QSKIP because a developer cannot fix the bug without an XPASS result reminding them that the test needs to be updated too. If QSKIP is used, there is no reminder to revise/re-enable the test, without which subsequent regressions would not be reported.

Avoid Q_ASSERT in tests

The Q_ASSERT macro causes a program to abort whenever the asserted condition is false, but only if the software was built in debug mode. In both release and debug+release builds, Q_ASSERT does nothing.

Q_ASSERT should be avoided because it makes tests behave differently depending on whether a debug build is being tested, and because it causes a test to abort immediately, skipping all remaining test functions and returning incomplete and/or malformed test results. It also skips any tear-down or tidy-up that was supposed to happen at the end of the test, so can leave the workspace in an untidy state (which may cause complications for other tests).

Instead of Q_ASSERT, the QCOMPARE or QVERIFY-style macros should be used. These cause the current test to report a failure and terminate, but allow the remaining test functions to be executed and the entire test program to terminate normally. Q_VERIFY2 even allows a descriptive error message to be output into the test log.

Hints on writing reliable tests

Avoid side-effects in verification steps

When performing verification steps in an autotest using QCOMPARE, QVERIFY and friends, side-effects should be avoided. Side-effects in verification steps can make a test difficult to understand and can easily break a test in difficult to diagnose ways when the test is changed to use QTRY_VERIFY, QTRY_COMPARE or QBENCHMARK, all of which can execute the passed expression multiple times, thus repeating any side-effects.

Avoid fixed wait times for asynchronous behaviour

In the past, many unit tests were written to use qWait() to delay for a fixed period between performing some action and waiting for some asynchronous behaviour triggered by that action to be completed. For example, changing the state of a widget and then waiting for the widget to be repainted.

Such timeouts would often cause failures when a test written on a workstation was executed on a device, where the expected behaviour might take longer to complete. The natural response to this kind of failure was to increase the fixed timeout to a value several times larger than needed on the slowest test platform. This approach slows down the test run on all platforms, particularly for table-driven tests.

If the code under test issues Qt signals on completion of the asynchronous behaviour, a better approach is to use the QSignalSpy class (part of qtestlib) to notify the test function that the verification step can now be performed.

If there are no Qt signals, use the QTRY_COMPARE and QTRY_VERIFY macros, which periodically test a specified condition until it becomes true or some maximum timeout is reached. These macros prevent the test from taking longer than necessary, while avoiding breakages when tests are written on workstations and later executed on embedded platforms.

If there are no Qt signals, and you are writing the test as part of developing a new API, consider whether the API could benefit from the addition of a signal that reports the completion of the asynchronous behaviour.

Beware of timing-dependent behaviour

Some test strategies are vulnerable to timing-dependent behaviour of certain classes, which can lead to tests that fail only on certain platforms or that do not return consistent results.

One example of this is text-entry widgets, which often have a blinking cursor that can make comparisons of captured bitmaps succeed or fail depending on the state of the cursor when the bitmap is captured, which may in turn depend on the speed of the machine executing the test.

When testing classes that change their state based on timer events, the timer-based behaviour needs to be taken into account when performing verification steps. Due to the variety timing-dependent behaviour, there is no single generic solution to this testing problem.

In the example, potential solutions include disabling the cursor blinking behaviour (if the API provides that feature), waiting for the cursor to be in a known state before capturing a bitmap (e.g. by subscribing to an appropriate signal if the API provides one), or excluding the area containing the cursor from the bitmap comparison.

Prefer programmatic verification methods to bitmap capture and comparison

While necessary in many situations, verifying test results by capturing and comparing bitmaps can be quite fragile and labour intensive.

For example, a particular widget may have different appearance on different platforms or with different widget styles, so reference bitmaps may need to be created multiple times and then maintained in the future as Qt's set of supported platforms evolves. Bitmap comparisons can also be influenced by factors such as the test machine's screen resolution, bit depth, active theme, colour scheme, widget style, active locale (currency symbols, text direction, etc), font size, transparency effects, and choice of window manager.

Where possible, verification by programmatic means (that is, by verifying properties of objects and variables) is preferable to using bitmaps.

Hints on producing readable and helpful test output

Explicitly ignore expected warning messages

If a test is expected to cause Qt to output a warning or debug message on the console, the test should call QTest::ignoreMessage() to filter that message out of the test output (and to fail the test if the message is not output).

If such a message is only output when Qt is built in debug mode, use QLibraryInfo::isDebugBuild() to determine whether the Qt libraries were built in debug mode. (Using "#ifdef QT_DEBUG" in this case is insufficient, as it will only tell you whether the test was built in debug mode, and that is not a guarantee that the Qt libraries were also built in debug mode.)

Avoid printing debug messages from autotests

Autotests should not produce any unhandled warning or debug messages. This will allow the CI Gate to treat new warning or debug messages as test failures.

Adding debug messages during development is fine, but these should either disabled or removed before a test is checked in.

Prefer well-structured diagnostic code to quick-and-dirty debug code

Any diagnostic output that would be useful if a test fails should be part of the regular test output rather than being commented-out, disabled by preprocessor directives or being enabled only in debug builds. If a test fails during Continuous integration, having all of the relevant diagnostic output in the CI logs could save you a lot of time compared to enabling the diagnostic code and testing again, especially if the failure was on a platform that you don't have on your desktop.

Diagnostic messages in tests should use Qt's output mechanisms (e.g. qDebug, qWarning, etc) rather than stdio.h or iostream.h output mechanisms, as these bypass Qt's message handling and will prevent testlib's -silent command-line option from suppressing the diagnostic messages and this could result in important failure messages being hidden in a large volume of debugging output.

Prefer QCOMPARE over QVERIFY for value comparisons

QVERIFY should be used for verifying boolean expressions, except where the expression directly compares two values.

QVERIFY (x == y) and QCOMPARE (x, y) are equivalent, however, QCOMPARE is more verbose and outputs both expected and actual values when the comparison fails.

Use QVERIFY2 for extra failure details

QVERIFY2 should be used when it is practical and valuable to put additional information into the test failure report.

For example, if you have an object file and you are testing its open() function, you may write a test with a statement like:

bool opened = file.open(QIODevice::WriteOnly);
QVERIFY (opened);

If this test fails, it will give no clue as to why the file failed to open:

FAIL! : tst_QFile::open_write() 'opened' returned FALSE. ()

Your object should have a function to retrieve some detail about the last error (if it doesn't, then the API is arguably broken). So, why not use it?

QVERIFY2(opened, qPrintable(QString("open %1: %2").arg(file.fileName()).arg(file.errorString()));
FAIL! : tst_QFile::open_write() 'opened' returned FALSE. (open /tmp/qt.a3B42Cd: No space left on device)

Much better! And, if this branch is being tested in the Qt CI system, the above detailed failure message will go straight into any emailed reports.

Hints on performance/benchmark testing

Verify occurrences of QBENCHMARK and QTest::setBenchmarkResult()

A performance test should contain either a single QBENCHMARK macro or a single call to QTest::setBenchmarkResult(). Multiple occurrences of QBENCHMARK or QTest::setBenchmarkResult() in the same test function makes no sense. At most one performance result can be reported per test function, or per data tag in a data-driven setup.

Avoid changing a performance test

Avoid changing the test code that forms (or influences) the body of a QBENCHMARK macro, or the test code that computes the value passed to QTest::setBenchmarkResult(). Differences in successive performance results should ideally be caused only by changes to the product we are testing (i.e. the Qt library). Changes to the test code can potentially result in a false positive.

Verify a performance test if possible

In a test function that measures performance, the QBENCHMARK or QTest::setBenchmarkResult() should if possible be followed by a verification step using QCOMPARE, QVERIFY and friends. This can be used to flag a performance result as invalid if we know that we measured a different code path than the intended one. A performance analysis tool can use this information to filter out invalid results.

For example, an unexpected error condition will typically cause the program to bail out prematurely from the normal program execution, and thus falsely show a dramatic performance increase.

Writing testable code

Break dependencies

The idea of unit testing is to use every class in isolation. Since many classes instantiate other classes it is not possible to instantiate one class separately. Therefore a technique called dependency injection should be used. With dependency injection object creation is separated from object use. A factory is responsible for building object trees. Other objects manipulate these objects through abstract interfaces. This technique works well for data driven application. For GUI applications this approach can be difficult as there are plenty of object creations and destructions going on. To verify the correct behaviour of classes that depend from abstract interfaces mocks can be used. There are tools to support the generation of mocks.

Compile all classes in a static library

In small to medium sized projects there is typically a build script which lists all source files and then compiles the executable in one go. The build scripts for the tests must list the needed source files again. It is easier to list the source files and the headers only once in a script to build a static library. Then main() will be linked against the static library to build the executable and the tests will be linked again the static libraries also.

Building a test suite with qtestlib

qtestlib provides the tools to build an executable which contains one test class which typically tests one class of production code. In a real world project many classes shall be tested by running one command. This collection of tests is also called a test suite.

using qmake

In Qt 4.7 or later, place CONFIG+=testcase in each test program's .pro file. Then, from a parent subdirs project, you may run make check to run all testcases.

The behavior may be customized in a few different ways. Examples:

make -j1 check-> run one test at a time
make -j4 check-> run four tests at a time - make sure the tests are written OK for this!
make check TESTRUNNER=path/to/testrunner -> run autotests through a custom test runner script (which may e.g. handle crashes, fails)
make check TESTARGS=-xml-> run autotests in XML logging mode

using CMake and CTest

The KDE project uses CMake and CTest to make a test suite. With CMake it is possible to label build targets as a test. All labeled targets will be run when make test is called on the command line. There are several other advantages with CMake. The result of a test run can be published on a webserver using CDash with virtually no effort. See the CMake manual for more about cmake and automtic moc invocation.

Common problems with test machine setup

Screen savers

Screen savers can interfere with some of the tests for GUI classes, causing unreliable test results. Screen savers should be disabled to ensure that test results are consistent and reliable.

System dialogs

Dialogs displayed unexpectedly by the operating system or other running applications can steal input focus from widgets involved in an autotest, causing unreproducible failures.

Examples encountered in the past include online update notification dialogs on Mac OS X, false alarms from virus scanners, scheduled tasks such as virus signature updates, software updates pushed out to workstations by IT, and chat programs popping up windows on top of the stack.

Display usage

Some tests use the test machine's display, mouse and keyboard and can thus fail if the machine is being used for something else at the same time (e.g. as somebody's workstation), or if multiple tests are run in parallel.

The CI system uses dedicated test machines to avoid this problem, but if you don't have a dedicated test machine, you may be able to solve this problem by running the tests on a second display.

On Unix, one can also run the tests on a nested or virtual X-server, such as Xephyr. For example, to run the entire set of tests under Xephyr, execute the following commands:

Xephyr :1 -ac -screen 1920x1200 >/dev/null 2>&1 &
sleep 5
DISPLAY=:1 icewm >/dev/null 2>&1 &
cd tests/auto
make
DISPLAY=:1 make -k -j1 check

In Qt5 there is a nice alternative called offscreen plugin, which you can use like that:

TESTARGS="-platform offscreen" make check -k -j1

Window managers

On Unix, at least two autotests (tst_examples and tst_gestures) require a window manager to be running. Therefore if running these tests under a nested X-server, you must also run a window manager in that X-server.

Your window manager must be configured to position all windows on the display automatically. Some windows managers (e.g. twm) have a mode where the user must manually position new windows, and this prevents the test suite running without user interaction.

Note that the twm window manager has been found to be unsuitable for running the full suite of Qt autotests, as the tst_gestures autotest causes twm to forget its configuration and revert to manual window placement.

Miscellaneous topics

QSignalSpy and QVariant parameters

With Qt 4, QVariant parameters recorded by QSignalSpy have to be cast to QVariant (e.g., by qvariant_cast<QVariant>()) to get the actual value; this is because QSignalSpy wraps the value inside another QVariant (of type QMetaType::QVariant). The following snippet is from the tst_qpropertyanimation autotest:

QSignalSpy spy(&anim, SIGNAL (valueChanged(QVariant)));

...

QCOMPARE (spy.count(), 6); //we should have got everything from 0 to 5
for (int i = 0; i < spy.count(); ++i) {
 QCOMPARE (qvariant_cast<QVariant>(spy.at(i).first()).toInt(), i);
}

With Qt 5, casting the QVariant parameter to a QVariant is no longer required; QSignalSpy contains a direct (not wrapped) copy of the QVariant value. The above for loop can be written as follows:

for (int i = 0; i < spy.count();++i) {
 QCOMPARE (spy.at(i).first().toInt(), i);
}