Writing Unit Tests

From Qt Wiki
Revision as of 18:11, 14 January 2015 by Maintenance script (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

General principles for writing unit tests

Use
initTestCase
and
cleanupTestCase
for setup and teardown of test harness

Tests that require preparations should use the global

initTestCase

for that purpose. In the end, every test should leave the system in a usable state, so it can be run repeatedly. Cleanup operations should be handled in

cleanupTestCase

, so they get run even if the test fails and exits early.

Another option is to use RAII [en.wikipedia.org], with cleanup operations called in destructors, to ensure they happen when the test function returns and the object goes out of scope.

Test functions should be self-contained

Within a test program, test functions should be independent of each other and not rely upon previous test functions having been run. You can check this by running the test function on its own with “

tst_foo testname

”.

Test the full stack

If an API is implemented in terms of pluggable/platform-specific backends that do the heavy-lifting make sure to write tests that cover the code-paths all the way down into the backends. Testing the upper layer API parts using a mock backend is a nice way to isolate errors in the API layer from the backends, but is complementary to tests that exercise the actual implementation with real-world data.

Tests should complete quickly

Tests should not waste time by being unnecessarily repetitious, using inappropriately large volumes of test data, or by introducing needless idle time.

This is particularly true for unit testing, where every second of extra unit test execution time makes CI testing of a branch across multiple targets take longer. Remember that unit testing is separate from load and reliability testing, where larger volumes of test data and lengthier test runs are expected.

Benchmark tests, which typically execute the same test multiple times, should be in the separate

tests/benchmarks

directory and not mixed with functional unit tests.

Use data-driven testing as much as possible

Data-driven tests make it easier to add new tests for boundary conditions found in later bug reports.

Using a data-driven test rather than testing several items in sequence in a test function also prevents an earlier

<span class="caps">QVERIFY</span>

/

<span class="caps">QCOMPARE</span>

failure from blocking the reporting of later checks.

Always respect
<span class="caps">QCOMPARE</span>
parameter semantics

The first parameter to

<span class="caps">QCOMPARE</span>

should always be the actual value produced by the code-under-test, while the second parameter should always be the expected value. When the values don’t match,

<span class="caps">QCOMPARE</span>

prints them with the labels “Actual” and “Expected”. If the parameter order is swapped, debugging a failing test can be confusing.

Use
QSignalSpy
to verify signal emissions

The

QSignalSpy

class provides an elegant mechanism for capturing the list of signals emitted by an object.

Verify the validity of a QSignalSpy after construction

QSignalSpy’s constructor does a number of sanity checks, such as verifying that the signal to be spied upon actually exists. To make diagnosis of test failures easier, the result of these checks should be checked by calling “QVERIFY);” before proceeding further with a test.

Use coverage tools to direct testing effort

Use a coverage tool such as Squish Coco [froglogic.com] or gcov [gcc.gnu.org] to help write tests that cover as many statements/branches/conditions as possible in the function/class being tested. The earlier this is done in the development cycle for a new feature, the easier it will be to catch regressions later as the code is refactored.

Naming of test functions is important

Test functions should be named to make it obvious what the function is trying to test.

Naming test functions using the bug-tracking identifier is to be avoided, as these identifiers soon become obsolete if the bug-tracker is replaced and some bug-trackers may not be accessible to external users (e.g. the internal Jira server that stores Company Confidential data). To track which bug-tracking task a test function relates to, it should be sufficient to include the bug identifier in the commit message. Source management tools such a “

git blame

” can then be used to retrieve this information.

Use appropriate mechanisms to exclude inapplicable tests

<span class="caps">QSKIP</span>

should be used to handle cases where a test function is found at run-time to be inapplicable in the current test environment. For example, a test of font rendering may call

<span class="caps">QSKIP</span>

if needed fonts are not present on the test system.

Beware moc limitation when excluding tests. The moc preprocessor has not access to all the builtins defines the compiler has and those macro are often used for feature decation of the compiler. It may then happen that moc does not ‘sees’ the test, and as a result, the test would never be called.

If an entire test program is inapplicable for a specific platform, the best approach is to use the parent directory’s

.pro

file to avoid building the test. For example, if the

tests/auto/gui/someclass

test is not valid for Mac OS X, add the following to

tests/auto/gui.pro

:

Prefer
<span class="caps">QEXPECT</span>_FAIL
to
<span class="caps">QSKIP</span>
for known bugs

If a test exposes a known bug that will not be fixed immediately, use the

<span class="caps">QEXPECT</span>_FAIL

macro to document the failure and reference the bug tracking identifier for the known issue. When the test is run, expected failures will be marked as

<span class="caps">XFAIL</span>

in the test output and will not be counted as failures when setting the test program’s return code. If an expected failure does not occur, the

<span class="caps">XPASS</span>

(unexpected pass) will be reported in the test output and will be counted as a test failure. For known bugs,

<span class="caps">QEXPECT</span>_FAIL

is better than

<span class="caps">QSKIP</span>

because a developer cannot fix the bug without an

<span class="caps">XPASS</span>

result reminding them that the test needs to be updated too. If

<span class="caps">QSKIP</span>

is used, there is no reminder to revise/re-enable the test, without which subsequent regressions would not be reported.

Avoid
Q_ASSERT
in tests

The

Q_ASSERT

macro causes a program to abort whenever the asserted condition is false, but only if the software was built in debug mode. In both release and debug+release builds,

Q_ASSERT

does nothing.

Q_ASSERT

should be avoided because it makes tests behave differently depending on whether a debug build is being tested, and because it causes a test to abort immediately, skipping all remaining test functions and returning incomplete and/or malformed test results. Instead of

Q_ASSERT

, the

<span class="caps">QCOMPARE</span>

or

<span class="caps">QVERIFY</span>

-style macros should be used. These cause the current test to report a failure and terminate, but allow the remaining test functions to be executed and the entire test program to terminate normally.

Q_VERIFY2

even allows a descriptive error message to be output into the test log.

Hints on writing reliable tests

Avoid side-effects in verification steps

When performing verification steps in an autotest using

<span class="caps">QCOMPARE</span>

,

<span class="caps">QVERIFY</span>

and friends, side-effects should be avoided. Side-effects in verification steps can make a test difficult to understand and can easily break a test in difficult to diagnose ways when the test is changed to use

<span class="caps">QTRY</span>_VERIFY

,

<span class="caps">QTRY</span>_COMPARE

or

<span class="caps">QBENCHMARK</span>

, all of which can execute the passed expression multiple times, thus repeating any side-effects.

Avoid fixed wait times for asynchronous behaviour

In the past, many unit tests were written to use

qWait()

to delay for a fixed period between performing some action and waiting for some asynchronous behaviour triggered by that action to be completed. For example, changing the state of a widget and then waiting for the widget to be repainted.

Such timeouts would often cause failures when a test written on a workstation was executed on a device, where the expected behaviour might take longer to complete. The natural response to this kind of failure was to increase the fixed timeout to a value several times larger than needed on the slowest test platform. This approach slows down the test run on all platforms, particularly for table-driven tests.

If the code under test issues Qt signals on completion of the asynchronous behaviour, a better approach is to use the

QSignalSpy

class (part of qtestlib) to notify the test function that the verification step can now be performed. If there are no Qt signals, use the

<span class="caps">QTRY</span>_COMPARE

and

<span class="caps">QTRY</span>_VERIFY

macros, which periodically test a specified condition until it becomes true or some maximum timeout is reached. These macros prevent the test from taking longer than necessary, while avoiding breakages when tests are written on workstations and later executed on embedded platforms.

If there are no Qt signals, and you are writing the test as part of developing a new API, consider whether the API could benefit from the addition of a signal that reports the completion of the asynchronous behaviour.

Beware of timing-dependent behaviour

Some test strategies are vulnerable to timing-dependent behaviour of certain classes, which can lead to tests that fail only on certain platforms or that do not return consistent results.

One example of this is text-entry widgets, which often have a blinking cursor that can make comparisons of captured bitmaps succeed or fail depending on the state of the cursor when the bitmap is captured, which may in turn depend on the speed of the machine executing the test.

When testing classes that change their state based on timer events, the timer-based behaviour needs to be taken into account when performing verification steps. Due to the variety timing-dependent behaviour, there is no single generic solution to this testing problem.

In the example, potential solutions include disabling the cursor blinking behaviour (if the API provides that feature), waiting for the cursor to be in a known state before capturing a bitmap (e.g. by subscribing to an appropriate signal if the API provides one), or excluding the area containing the cursor from the bitmap comparison.

Prefer programmatic verification methods to bitmap capture and comparison

While necessary in many situations, verifying test results by capturing and comparing bitmaps can be quite fragile and labour intensive.

For example, a particular widget may have different appearance on different platforms or with different widget styles, so reference bitmaps may need to be created multiple times and then maintained in the future as Qt’s set of supported platforms evolves. Bitmap comparisons can also be influenced by factors such as the test machine’s screen resolution, bit depth, active theme, colour scheme, widget style, active locale (currency symbols, text direction, etc), font size, transparency effects, and choice of window manager.

Where possible, verification by programmatic means (that is, by verifying properties of objects and variables) is preferable to using bitmaps.

Hints on producing readable and helpful test output

Explicitly ignore expected warning messages

If a test is expected to cause Qt to output a warning or debug message on the console, the test should call

QTest::ignoreMessage()

to filter that message out of the test output (and to fail the test if the message is not output).

If such a message is only output when Qt is built in debug mode, use QLibraryInfo::isDebugBuild() to determine whether the Qt libraries were built in debug mode. (Using “#ifdef QT_DEBUG” in this case is insufficient, as it will only tell you whether the test was built in debug mode, and that is not a guarantee that the Qt libraries were also built in debug mode.)

Avoid printing debug messages from autotests

Autotests should not produce any unhandled warning or debug messages. This will allow the CI Gate to treat new warning or debug messages as test failures.

Adding debug messages during development is fine, but these should either disabled or removed before a test is checked in.

Prefer well-structured diagnostic code to quick-and-dirty debug code

Any diagnostic output that would be useful if a test fails should be part of the regular test output rather than being commented-out, disabled by preprocessor directives or being enabled only in debug builds. If a test fails during Continuous integration, having all of the relevant diagnostic output in the CI logs could save you a lot of time compared to enabling the diagnostic code and testing again, especially if the failure was on a platform that you don’t have on your desktop.

Diagnostic messages in tests should use Qt’s output mechanisms (e.g. qDebug, qWarning, etc) rather than stdio.h or iostream.h output mechanisms, as these bypass Qt’s message handling and will prevent testlib’s -silent command-line option from suppressing the diagnostic messages and this could result in important failure messages being hidden in a large volume of debugging output.

Prefer
<span class="caps">QCOMPARE</span>
over
<span class="caps">QVERIFY</span>
for value comparisons

<span class="caps">QVERIFY</span>

should be used for verifying boolean expressions, except where the expression directly compares two values.

<span class="caps">QVERIFY</span>

and

<span class="caps">QCOMPARE</span>

are equivalent, however,

<span class="caps">QCOMPARE</span>

is more verbose and outputs both expected and actual values when the comparison fails.

Use
QVERIFY2
for extra failure details

QVERIFY2

should be used when it is practical and valuable to put additional information into the test failure report. For example, if you have an object file and you are testing its

open()

function, you may write a test with a statement like:

If this test fails, it will give no clue as to why the file failed to open:

Your object should have a function to retrieve some detail about the last error (if it doesn’t, then the API is arguably broken). So, why not use it?

Much better! And, if this branch is being tested in the Qt CI system, the above detailed failure message will go straight into any emailed reports.

Hints on performance/benchmark testing

Verify occurrences of
<span class="caps">QBENCHMARK</span>
and
QTest::setBenchmarkResult()

A performance test should contain either a single

<span class="caps">QBENCHMARK</span>

macro or a single call to

QTest::setBenchmarkResult()

.
Multiple occurrences of

<span class="caps">QBENCHMARK</span>

or

QTest::setBenchmarkResult()

in the same test function makes no sense. At most one performance result can be reported per test function, or per data tag in a data-driven setup.

Avoid changing a performance test

Avoid changing the test code that forms (or influences) the body of a

<span class="caps">QBENCHMARK</span>

macro, or the test code that computes the value passed to

QTest::setBenchmarkResult()

. Differences in successive performance results should ideally be caused only by changes to the product we are testing (i.e. the Qt library). Changes to the test code can potentially result in a false positive.

Verify a performance test if possible

In a test function that measures performance, the

<span class="caps">QBENCHMARK</span>

or

QTest::setBenchmarkResult()

should if possible be followed by a verification step using

<span class="caps">QCOMPARE</span>

,

<span class="caps">QVERIFY</span>

and friends. This can be used to flag a performance result as invalid if we know that we measured a different code path than the intended one. A performance analysis tool can use this information to filter out invalid results.

For example, an unexpected error condition will typically cause the program to bail out prematurely from the normal program execution, and thus falsely show a dramatic performance increase.

Writing testable code

Break dependencies

The idea of unit testing is to use every class in isolation. Since many classes instantiate other classes it is not possible to instantiate one class separately. Therefore a technique called dependency injection should be used. With dependency injection object creation is separated from object use. A factory is responsible for building object trees. Other objects manipulate these objects through abstract interfaces. This technique works well for data driven application. For GUI applications this approach can be difficult as there are plenty of object creations and destructions going on. To verify the correct behaviour of classes that depend from abstract interfaces mocks can be used. There are tools to support the generation of mocks [code.google.com].

Compile all classes in a static library

In small to medium sized projects there is typically a build script which lists all source files and then compiles the executable in one go. The build scripts for the tests must list the needed source files again. It is easier to list the source files and the headers only once in a script to build a static library. Then main() will be linked against the static library to build the executable and the tests will be linked again the static libraries also.

Building a test suite with qtestlib

qtestlib provides the tools to build an executable which contains one test class which typically tests one class of production code. In a real world project many classes shall be tested by running one command. This collection of tests is also called a test suite.

using qmake

In Qt 4.7 or later, place

<span class="caps">CONFIG</span>+=testcase

in each test program’s

.pro

file. Then, from a parent subdirs project, you may run make check to run all testcases.

The behavior may be customized in a few different ways. Examples:

using CMake and CTest

The KDE project uses CMake and CTest to make a test suite. With CMake it is possible to label build targets as a test. All labeled targets will be run when make test is called on the command line. There are several other advantages with CMake. The result of a test run can be published on a webserver using CDash with virtually no effort. See the CMake manual for more about cmake and automtic moc invocation.

Common problems with test machine setup

Screen savers

Screen savers can interfere with some of the tests for GUI classes, causing unreliable test results. Screen savers should be disabled to ensure that test results are consistent and reliable.

System dialogs

Dialogs displayed unexpectedly by the operating system or other running applications can steal input focus from widgets involved in an autotest, causing unreproducible failures.

Examples encountered in the past include online update notification dialogs on Mac OS X, false alarms from virus scanners, scheduled tasks such as virus signature updates, software updates pushed out to workstations by IT, and chat programs popping up windows on top of the stack.

Display usage

Some tests use the test machine’s display, mouse and keyboard and can thus fail if the machine is being used for something else at the same time (e.g. as somebody’s workstation), or if multiple tests are run in parallel.

The CI system uses dedicated test machines to avoid this problem, but if you don’t have a dedicated test machine, you may be able to solve this problem by running the tests on a second display.

On Unix, one can also run the tests on a nested or virtual X-server, such as

Xephyr

. For example, to run the entire set of tests under

Xephyr

, execute the following commands:

In Qt5 there is a nice alternative called offscreen plugin, which you can use like that:

Window managers

On Unix, at least two autotests (

tst_examples

and

tst_gestures

) require a window manager to be running. Therefore if running these tests under a nested X-server, you must also run a window manager in that X-server.

Your window manager must be configured to position all windows on the display automatically. Some windows managers (e.g. twm) have a mode where the user must manually position new windows, and this prevents the test suite running without user interaction.

Note that the twm window manager has been found to be unsuitable for running the full suite of Qt autotests, as the

tst_gestures

autotest causes twm to forget its configuration and revert to manual window placement.

Miscellaneous topics

QSignalSpy
and
QVariant
parameters

With Qt 4,

QVariant

parameters recorded by

QSignalSpy

have to be cast to

QVariant

(e.g., by

qvariant_cast&lt;QVariant&gt;()

) to get the actual value; this is because

QSignalSpy

wraps the value inside another

QVariant

(of type

QMetaType::QVariant

). The following snippet is from the

tst_qpropertyanimation

autotest: With Qt 5, casting the QVariant parameter to a QVariant is no longer required;

QSignalSpy

contains a direct (not wrapped) copy of the

QVariant

value. The above for loop can be written as follows:

If your test needs to run on both Qt 4 and Qt 5, use the casting approach.

Categories: