Writing Unit Tests: Difference between revisions

From Qt Wiki
Jump to navigation Jump to search
(Added a preamble. Shortened first heading. Expanded on the evils of Q_ASSERT in tests.)
(Update wildly out-of-date discussion of testlib, containing broken links.)
 
(5 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[Category:Developing_Qt::QA]]
[[Category:Developing Qt::QA]]
[[Category:HowTo]]


Whenever you fix a bug, please add a regression test for it: this is a test (ideally automatic) that fails before the fix, exhibiting the bug, and passes after the fix.
Whenever you fix a bug, please add a regression test for it: this is a test (ideally automatic) that fails before the fix, exhibiting the bug, and passes after the fix.
Line 5: Line 6:
Once you've written and committed your tests (along with your fix or new feature), you can check out the branch on which your work is based then check out into this the test-files for your new tests; this lets you verify that the tests do fail on the prior branch.
Once you've written and committed your tests (along with your fix or new feature), you can check out the branch on which your work is based then check out into this the test-files for your new tests; this lets you verify that the tests do fail on the prior branch.


== General principles ==
See the [https://doc.qt.io/qt-6/qttest-index.html Qt Test documentation] and [https://doc.qt.io/qt-6/qttest-best-practices-qdoc.html Qt Test Best Practices] for details of how to write tests; and [[Writing good tests]] for further hints on this wiki.
 
=== Use <tt>initTestCase</tt> and <tt>cleanupTestCase</tt> for setup and teardown of test harness ===
 
Tests that require preparations should use the global <tt>initTestCase</tt> for that purpose. In the end, every test should leave the system in a usable state, so it can be run repeatedly. Cleanup operations should be handled in <tt>cleanupTestCase</tt>, so they get run even if the test fails and exits early.
 
Another option is to use [http://en.wikipedia.org/wiki/RAII RAII], with cleanup operations called in destructors, to ensure they happen when the test function returns and the object goes out of scope.
 
=== Test functions should be self-contained ===
 
Within a test program, test functions should be independent of each other and not rely upon previous test functions having been run. You can check this by running the test function on its own with "<tt>tst_foo testname</tt>".
 
=== Test the full stack ===
 
If an API is implemented in terms of pluggable/platform-specific backends that do the heavy-lifting make sure to write tests that cover the code-paths all the way down into the backends. Testing the upper layer API parts using a mock backend is a nice way to isolate errors in the API layer from the backends, but is complementary to tests that exercise the actual implementation with real-world data.
 
=== Tests should complete quickly ===
 
Tests should not waste time by being unnecessarily repetitious, using inappropriately large volumes of test data, or by introducing needless idle time.
 
This is particularly true for unit testing, where every second of extra unit test execution time makes CI testing of a branch across multiple targets take longer. Remember that unit testing is separate from load and reliability testing, where larger volumes of test data and lengthier test runs are expected.
 
Benchmark tests, which typically execute the same test multiple times, should be in the separate <tt>tests/benchmarks</tt> directory and not mixed with functional unit tests.
 
=== Use data-driven testing as much as possible ===
 
Data-driven tests make it easier to add new tests for boundary conditions found in later bug reports.
 
Using a data-driven test rather than testing several items in sequence in a test function also prevents an earlier <tt>QVERIFY</tt>/<tt>QCOMPARE</tt> failure from blocking the reporting of later checks.
 
=== Always respect <tt>QCOMPARE</tt> parameter semantics ===
 
The first parameter to <tt>QCOMPARE</tt> should always be the actual value produced by the code-under-test, while the second parameter should always be the expected value. When the values don't match, <tt>QCOMPARE</tt> prints them with the labels "Actual" and "Expected". If the parameter order is swapped, debugging a failing test can be confusing.
 
=== Use <tt>QSignalSpy</tt> to verify signal emissions ===
 
The <tt>QSignalSpy</tt> class provides an elegant mechanism for capturing the list of signals emitted by an object.
 
=== Verify the validity of a QSignalSpy after construction ===
 
QSignalSpy's constructor does a number of sanity checks, such as verifying that the signal to be spied upon actually exists. To make diagnosis of test failures easier, the result of these checks should be checked by calling "QVERIFY (spy.isValid());" before proceeding further with a test.
 
=== Use coverage tools to direct testing effort ===
 
Use a coverage tool such as [http://www.froglogic.com/squish/coco/ Squish Coco] or [http://gcc.gnu.org/onlinedocs/gcc/Gcov.html gcov] to help write tests that cover as many statements/branches/conditions as possible in the function/class being tested. The earlier this is done in the development cycle for a new feature, the easier it will be to catch regressions later as the code is refactored.
 
=== Naming of test functions is important ===
 
Test functions should be named to make it obvious what the function is trying to test.
 
Naming test functions using the bug-tracking identifier is to be avoided, as these identifiers soon become obsolete if the bug-tracker is replaced and some bug-trackers may not be accessible to external users (e.g. the internal Jira server that stores Company Confidential data). To track which bug-tracking task a test function relates to, it should be sufficient to include the bug identifier in the commit message. Source management tools such a "<tt>git blame</tt>" can then be used to retrieve this information.
 
=== Use appropriate mechanisms to exclude inapplicable tests ===
 
<tt>QSKIP</tt> should be used to handle cases where a test function is found at run-time to be inapplicable in the current test environment. For example, a test of font rendering may call <tt>QSKIP</tt> if needed fonts are not present on the test system.
 
Beware moc limitation when excluding tests. The moc preprocessor has not access to all the builtins defines the compiler has and those macro are often used for feature decation of the compiler. It may then happen that moc does not 'sees' the test, and as a result, the test would never be called.
 
If an entire test program is inapplicable for a specific platform, the best approach is to use the parent directory's <tt>.pro</tt> file to avoid building the test. For example, if the <tt>tests/auto/gui/someclass</tt> test is not valid for Mac OS X, add the following to <tt>tests/auto/gui.pro</tt>:
 
<code>
mac*:SUBDIRS -= someclass
</code>
 
=== Prefer <tt>QEXPECT_FAIL</tt> to <tt>QSKIP</tt> for known bugs ===
 
If a test exposes a known bug that will not be fixed immediately, use the <tt>QEXPECT_FAIL</tt> macro to document the failure and reference the bug tracking identifier for the known issue. When the test is run, expected failures will be marked as <tt>XFAIL</tt> in the test output and will not be counted as failures when setting the test program's return code. If an expected failure does not occur, the <tt>XPASS</tt> (unexpected pass) will be reported in the test output and will be counted as a test failure.
 
For known bugs, <tt>QEXPECT_FAIL</tt> is better than <tt>QSKIP</tt> because a developer cannot fix the bug without an <tt>XPASS</tt> result reminding them that the test needs to be updated too. If <tt>QSKIP</tt> is used, there is no reminder to revise/re-enable the test, without which subsequent regressions would not be reported.
 
=== Avoid <tt>Q_ASSERT</tt> in tests ===
 
The <tt>Q_ASSERT</tt> macro causes a program to abort whenever the asserted condition is false, but only if the software was built in debug mode. In both release and debug+release builds, <tt>Q_ASSERT</tt> does nothing.
 
<tt>Q_ASSERT</tt> should be avoided because it makes tests behave differently depending on whether a debug build is being tested, and because it causes a test to abort immediately, skipping all remaining test functions and returning incomplete and/or malformed test results.
It also skips any tear-down or tidy-up that was supposed to happen at the end of the test, so can leave the workspace in an untidy state (which may cause complications for other tests).
 
Instead of <tt>Q_ASSERT</tt>, the <tt>QCOMPARE</tt> or <tt>QVERIFY</tt>-style macros should be used. These cause the current test to report a failure and terminate, but allow the remaining test functions to be executed and the entire test program to terminate normally. <tt>Q_VERIFY2</tt> even allows a descriptive error message to be output into the test log.
 
== Hints on writing reliable tests ==
 
=== Avoid side-effects in verification steps ===
 
When performing verification steps in an autotest using <tt>QCOMPARE</tt>, <tt>QVERIFY</tt> and friends, side-effects should be avoided. Side-effects in verification steps can make a test difficult to understand and can easily break a test in difficult to diagnose ways when the test is changed to use <tt>QTRY_VERIFY</tt>, <tt>QTRY_COMPARE</tt> or <tt>QBENCHMARK</tt>, all of which can execute the passed expression multiple times, thus repeating any side-effects.
 
=== Avoid fixed wait times for asynchronous behaviour ===
 
In the past, many unit tests were written to use <tt>qWait()</tt> to delay for a fixed period between performing some action and waiting for some asynchronous behaviour triggered by that action to be completed. For example, changing the state of a widget and then waiting for the widget to be repainted.
 
Such timeouts would often cause failures when a test written on a workstation was executed on a device, where the expected behaviour might take longer to complete. The natural response to this kind of failure was to increase the fixed timeout to a value several times larger than needed on the slowest test platform. This approach slows down the test run on all platforms, particularly for table-driven tests.
 
If the code under test issues Qt signals on completion of the asynchronous behaviour, a better approach is to use the <tt>QSignalSpy</tt> class (part of qtestlib) to notify the test function that the verification step can now be performed.
 
If there are no Qt signals, use the <tt>QTRY_COMPARE</tt> and <tt>QTRY_VERIFY</tt> macros, which periodically test a specified condition until it becomes true or some maximum timeout is reached. These macros prevent the test from taking longer than necessary, while avoiding breakages when tests are written on workstations and later executed on embedded platforms.
 
If there are no Qt signals, and you are writing the test as part of developing a new API, consider whether the API could benefit from the addition of a signal that reports the completion of the asynchronous behaviour.
 
=== Beware of timing-dependent behaviour ===
 
Some test strategies are vulnerable to timing-dependent behaviour of certain classes, which can lead to tests that fail only on certain platforms or that do not return consistent results.
 
One example of this is text-entry widgets, which often have a blinking cursor that can make comparisons of captured bitmaps succeed or fail depending on the state of the cursor when the bitmap is captured, which may in turn depend on the speed of the machine executing the test.
 
When testing classes that change their state based on timer events, the timer-based behaviour needs to be taken into account when performing verification steps. Due to the variety timing-dependent behaviour, there is no single generic solution to this testing problem.
 
In the example, potential solutions include disabling the cursor blinking behaviour (if the API provides that feature), waiting for the cursor to be in a known state before capturing a bitmap (e.g. by subscribing to an appropriate signal if the API provides one), or excluding the area containing the cursor from the bitmap comparison.
 
=== Prefer programmatic verification methods to bitmap capture and comparison ===
 
While necessary in many situations, verifying test results by capturing and comparing bitmaps can be quite fragile and labour intensive.
 
For example, a particular widget may have different appearance on different platforms or with different widget styles, so reference bitmaps may need to be created multiple times and then maintained in the future as Qt's set of supported platforms evolves. Bitmap comparisons can also be influenced by factors such as the test machine's screen resolution, bit depth, active theme, colour scheme, widget style, active locale (currency symbols, text direction, etc), font size, transparency effects, and choice of window manager.
 
Where possible, verification by programmatic means (that is, by verifying properties of objects and variables) is preferable to using bitmaps.
 
== Hints on producing readable and helpful test output ==
 
=== Explicitly ignore expected warning messages ===
 
If a test is expected to cause Qt to output a warning or debug message on the console, the test should call <tt>QTest::ignoreMessage()</tt> to filter that message out of the test output (and to fail the test if the message is not output).
 
If such a message is only output when Qt is built in debug mode, use QLibraryInfo::isDebugBuild() to determine whether the Qt libraries were built in debug mode. (Using "#ifdef QT_DEBUG" in this case is insufficient, as it will only tell you whether the test was built in debug mode, and that is not a guarantee that the Qt libraries were also built in debug mode.)
 
=== Avoid printing debug messages from autotests ===
 
Autotests should not produce any unhandled warning or debug messages. This will allow the CI Gate to treat new warning or debug messages as test failures.
 
Adding debug messages during development is fine, but these should either disabled or removed before a test is checked in.
 
=== Prefer well-structured diagnostic code to quick-and-dirty debug code ===
 
Any diagnostic output that would be useful if a test fails should be part of the regular test output rather than being commented-out, disabled by preprocessor directives or being enabled only in debug builds. If a test fails during Continuous integration, having all of the relevant diagnostic output in the CI logs could save you a lot of time compared to enabling the diagnostic code and testing again, especially if the failure was on a platform that you don't have on your desktop.
 
Diagnostic messages in tests should use Qt's output mechanisms (e.g. qDebug, qWarning, etc) rather than stdio.h or iostream.h output mechanisms, as these bypass Qt's message handling and will prevent testlib's -silent command-line option from suppressing the diagnostic messages and this could result in important failure messages being hidden in a large volume of debugging output.
 
=== Prefer <tt>QCOMPARE</tt> over <tt>QVERIFY</tt> for value comparisons ===
 
<tt>QVERIFY</tt> should be used for verifying boolean expressions, except where the expression directly compares two values.
 
<tt>QVERIFY (x == y)</tt> and <tt>QCOMPARE (x, y)</tt> are equivalent, however, <tt>QCOMPARE</tt> is more verbose and outputs both expected and actual values when the comparison fails.
 
=== Use <tt>QVERIFY2</tt> for extra failure details ===
 
<tt>QVERIFY2</tt> should be used when it is practical and valuable to put additional information into the test failure report.
 
For example, if you have an object file and you are testing its <tt>open()</tt> function, you may write a test with a statement like:
 
<code>
bool opened = file.open(QIODevice::WriteOnly);
QVERIFY (opened);
</code>
 
If this test fails, it will give no clue as to why the file failed to open:
 
<code>
FAIL! : tst_QFile::open_write() 'opened' returned FALSE. ()
</code>
 
Your object should have a function to retrieve some detail about the last error (if it doesn't, then the API is arguably broken). So, why not use it?
 
<code>
QVERIFY2(opened, qPrintable(QString("open %1: %2").arg(file.fileName()).arg(file.errorString()));
FAIL! : tst_QFile::open_write() 'opened' returned FALSE. (open /tmp/qt.a3B42Cd: No space left on device)
</code>
 
Much better! And, if this branch is being tested in the Qt CI system, the above detailed failure message will go straight into any emailed reports.
 
== Hints on performance/benchmark testing ==
 
=== Verify occurrences of <tt>QBENCHMARK</tt> and <tt>QTest::setBenchmarkResult()</tt> ===
 
A performance test should contain either a single <tt>QBENCHMARK</tt> macro or a single call to <tt>QTest::setBenchmarkResult()</tt>.
Multiple occurrences of <tt>QBENCHMARK</tt> or <tt>QTest::setBenchmarkResult()</tt> in the same test function makes no sense. At most one performance result can be reported per test function, or per data tag in a data-driven setup.
 
=== Avoid changing a performance test ===
 
Avoid changing the test code that forms (or influences) the body of a <tt>QBENCHMARK</tt> macro, or the test code that computes the value passed to <tt>QTest::setBenchmarkResult()</tt>. Differences in successive performance results should ideally be caused only by changes to the product we are testing (i.e. the Qt library). Changes to the test code can potentially result in a [http://en.wikipedia.org/wiki/False_positive false positive].
 
=== Verify a performance test if possible ===
 
In a test function that measures performance, the <tt>QBENCHMARK</tt> or <tt>QTest::setBenchmarkResult()</tt> should if possible be followed by a verification step using <tt>QCOMPARE</tt>, <tt>QVERIFY</tt> and friends. This can be used to flag a performance result as invalid if we know that we measured a different code path than the intended one. A performance analysis tool can use this information to filter out invalid results.
 
For example, an unexpected error condition will typically cause the program to bail out prematurely from the normal program execution, and thus falsely show a dramatic performance increase.
 
== Writing testable code ==
 
=== Break dependencies ===
 
The idea of unit testing is to use every class in isolation. Since many classes instantiate other classes it is not possible to instantiate one class separately. Therefore a technique called dependency injection should be used. With dependency injection object creation is separated from object use. A factory is responsible for building object trees. Other objects manipulate these objects through abstract interfaces. This technique works well for data driven application. For GUI applications this approach can be difficult as there are plenty of object creations and destructions going on. To verify the correct behaviour of classes that depend from abstract interfaces mocks can be used. There are [http://code.google.com/p/googlemock/ tools to support the generation of mocks].
 
=== Compile all classes in a static library ===
 
In small to medium sized projects there is typically a build script which lists all source files and then compiles the executable in one go. The build scripts for the tests must list the needed source files again. It is easier to list the source files and the headers only once in a script to build a static library. Then main() will be linked against the static library to build the executable and the tests will be linked again the static libraries also.
 
== Building a test suite with qtestlib ==
 
qtestlib provides the tools to build an executable which contains one test class which typically tests one class of production code. In a real world project many classes shall be tested by running one command. This collection of tests is also called a test suite.
 
=== using qmake ===
 
In Qt 4.7 or later, place <tt>CONFIG+=testcase</tt> in each test program's <tt>.pro</tt> file. Then, from a parent subdirs project, you may run make check to run all testcases.
 
The behavior may be customized in a few different ways. Examples:
 
<code>
make -j1 check-> run one test at a time
make -j4 check-> run four tests at a time - make sure the tests are written OK for this!
make check TESTRUNNER=path/to/testrunner -> run autotests through a custom test runner script (which may e.g. handle crashes, fails)
make check TESTARGS=-xml-> run autotests in XML logging mode
</code>
 
=== using CMake and CTest ===
 
The KDE project uses [http://www.itk.org/Wiki/CMake_Testing_With_CTest CMake and CTest] to make a test suite. With CMake it is possible to label build targets as a test. All labeled targets will be run when make test is called on the command line. There are several other advantages with CMake. The result of a test run can be published on a webserver using CDash with virtually no effort. See the [http://doc-snapshot.qt.io/qt5-stable/cmake-manual.html CMake manual] for more about cmake and automtic moc invocation.
 
== Common problems with test machine setup ==
 
=== Screen savers ===
 
Screen savers can interfere with some of the tests for GUI classes, causing unreliable test results. Screen savers should be disabled to ensure that test results are consistent and reliable.
 
=== System dialogs ===
 
Dialogs displayed unexpectedly by the operating system or other running applications can steal input focus from widgets involved in an autotest, causing unreproducible failures.
 
Examples encountered in the past include online update notification dialogs on Mac OS X, false alarms from virus scanners, scheduled tasks such as virus signature updates, software updates pushed out to workstations by IT, and chat programs popping up windows on top of the stack.
 
=== Display usage ===
 
Some tests use the test machine's display, mouse and keyboard and can thus fail if the machine is being used for something else at the same time (e.g. as somebody's workstation), or if multiple tests are run in parallel.
 
The CI system uses dedicated test machines to avoid this problem, but if you don't have a dedicated test machine, you may be able to solve this problem by running the tests on a second display.
 
On Unix, one can also run the tests on a nested or virtual X-server, such as <tt>Xephyr</tt>. For example, to run the entire set of tests under <tt>Xephyr</tt>, execute the following commands:
 
<code>
Xephyr :1 -ac -screen 1920x1200 >/dev/null 2>&1 &
sleep 5
DISPLAY=:1 icewm >/dev/null 2>&1 &
cd tests/auto
make
DISPLAY=:1 make -k -j1 check
</code>
 
In Qt5 there is a nice alternative called offscreen plugin, which you can use like that:
<code>
TESTARGS="-platform offscreen" make check -k -j1
</code>
 
=== Window managers ===
 
On Unix, at least two autotests (<tt>tst_examples</tt> and <tt>tst_gestures</tt>) require a window manager to be running. Therefore if running these tests under a nested X-server, you must also run a window manager in that X-server.
 
Your window manager must be configured to position all windows on the display automatically. Some windows managers (e.g. twm) have a mode where the user must manually position new windows, and this prevents the test suite running without user interaction.
 
Note that the twm window manager has been found to be unsuitable for running the full suite of Qt autotests, as the <tt>tst_gestures</tt> autotest causes twm to forget its configuration and revert to manual window placement.
 
== Miscellaneous topics ==
 
=== <tt>QSignalSpy</tt> and <tt>QVariant</tt> parameters ===
 
With Qt 4, <tt>QVariant</tt> parameters recorded by <tt>QSignalSpy</tt> have to be cast to <tt>QVariant</tt> (e.g., by <tt>qvariant_cast<QVariant>()</tt>) to get the actual value; this is because <tt>QSignalSpy</tt> wraps the value inside another <tt>QVariant</tt> (of type <tt>QMetaType::QVariant</tt>). The following snippet is from the <tt>tst_qpropertyanimation</tt> autotest:
 
<code>
QSignalSpy spy(&anim, SIGNAL (valueChanged(QVariant)));
 
...
 
QCOMPARE (spy.count(), 6); //we should have got everything from 0 to 5
for (int i = 0; i < spy.count(); ++i) {
QCOMPARE (qvariant_cast<QVariant>(spy.at(i).first()).toInt(), i);
}
</code>
 
With Qt 5, casting the QVariant parameter to a QVariant is no longer required; <tt>QSignalSpy</tt> contains a direct (not wrapped) copy of the <tt>QVariant</tt> value. The above for loop can be written as follows:
 
<code>
for (int i = 0; i < spy.count();++i) {
QCOMPARE (spy.at(i).first().toInt(), i);
}
</code>

Latest revision as of 10:01, 25 August 2021


Whenever you fix a bug, please add a regression test for it: this is a test (ideally automatic) that fails before the fix, exhibiting the bug, and passes after the fix. Whenever you implement a new feature, please add tests that verify that the new feature works as intended. Once you've written and committed your tests (along with your fix or new feature), you can check out the branch on which your work is based then check out into this the test-files for your new tests; this lets you verify that the tests do fail on the prior branch.

See the Qt Test documentation and Qt Test Best Practices for details of how to write tests; and Writing good tests for further hints on this wiki.