Handling HTML

From Qt Wiki
Revision as of 21:24, 27 June 2015 by Wieland (talk | contribs) (Cleanup)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

En Ar Bg De El Es Fa Fi Fr Hi Hu It Ja Kn Ko Ms Nl Pl Pt Ru Sq Th Tr Uk Zh

This page discusses various available options for working with HTML documents in your Qt application. Please also read the general considerations outlined in the Handling Document Formats article.

Reading / Writing

Scribe

Qt's Scribe framework (see Handling Document Formats) has built-in support for loading from / saving to HTML (see QTextDocument::setHtml and toHtml as well as QTextDocumentWriter). Together with the format-independent API that QTextDocument provides for modifying documents (or creating them from scratch), this makes Scribe an adequate framework for processing or generating HTML documents.

However, it only supports a limited subset of static HTML 4 / CSS 2.1 - corresponding to the limited set of built-in document features which QTextDocument supports internally.

QtWebKit

The Webkit-based web browser framework shipped with Qt provides the QWebPage and QWebFrame classes, which can be used to load an HTML document (or any web page) without actually rendering it, and access or modify it through a DOM-like API. Saving back to HTML is possible using QWebFrame::toHtml.

Keep in mind though that loading an HTML document in this way will not just passively parse it (like the Scribe framework or the XML/HTML parsers discussed below would), but actively evaluate it like a browser would, i.e. loading linked content like iframes, running JavaScript that is set to run on start-up, etc. Whether this is useful or problematic will depend on your specific use-case.

Manual XML processing

If your application needs to parse or write HTML/XHTML documents which are valid XML, consider processing them using Qt's XML handling classes (see Handling Document Formats).

Note that there are third-party tools/libraries available for automatically converting "normal" (and even broken) HTML documents into valid XHTML/XML which is suitable for processing:

type platforms license
HTML Tidy stand-alone tool Win, Mac, Linux, … MIT-like [permissive]
TidyLib C library Win, Mac, Linux, … MIT-like [permissive]
Chilkat C+ library Win, Mac, Linux, … proprietary

Manual HTML processing

For specialized HTML parsers with a similarly low-level API as Qt's XML handling classes, refer to third-party C/C++ libraries, e.g.:

API parsing writing parsing modes platforms license
libxml2 C yes ? stream, SAX, DOM (non-validating?) Win, Mac, Linux, … MIT [permissive]
htmlcxx C++ yes yes SAX, DOM, ? (non-validating) Win, Linux, ? LGPL [weak copyleft]
libhtml C yes yes stream (strongly-validating) Linux, ? ICS [permissive]

Rendering / Interactive Viewing

Scribe

As already described above, Qt's Scribe framework supports automatically importing HTML content into a QTextDocument. Once in that form, you can…

Again, the restriction to the limited subset of static HTML 4 / CSS 2.1 supported by QTextDocument applies.

QtWebKit

Webkit-examples.png

If you need more powerful viewing / user-interaction capabilities, take a look at the QtWebKit browser framework which is included with Qt. It can interactively display pretty much any modern web document (which may make use of HTML 5, XHTML, CSS 3, SVG, JavaScript, plugins like Flash, etc.). The viewer component is available in the following forms:

The framework also allows rendering an HTML page to any QPaintDevice using QWebFrame::render.


See Also