Handling HTML: Difference between revisions
AutoSpider (talk | contribs) (Decode HTML entity numbers) |
Henri Vikki (talk | contribs) (Formatting.) |
||
Line 2: | Line 2: | ||
[[Category:Developing_with_Qt]] | [[Category:Developing_with_Qt]] | ||
= Handling HTML = | = Handling HTML = | ||
Line 8: | Line 7: | ||
This page discusses various available options for working with [http://en.wikipedia.org/wiki/HTML HTML] documents in your Qt application. Please also read the general considerations outlined on the [[Handling_Document_Formats | Handling Document Formats]] page. | This page discusses various available options for working with [http://en.wikipedia.org/wiki/HTML HTML] documents in your Qt application. Please also read the general considerations outlined on the [[Handling_Document_Formats | Handling Document Formats]] page. | ||
<pre style="background-color: #E6E6FA">Note that this information is collaboratively collected by the community, with no promise | |||
of completeness or correctness. In particular, use your own research and judgment | |||
when evaluating third-party libraries or tools! </pre> | |||
== Reading / Writing == | == Reading / Writing == | ||
Line 14: | Line 15: | ||
=== Scribe === | === Scribe === | ||
Qt's ''Scribe'' framework (see [[Handling_Document_Formats | Handling Document Formats]]) has built-in support for loading from / saving to HTML (see | Qt's ''Scribe'' framework (see [[Handling_Document_Formats | Handling Document Formats]]) has built-in support for loading from / saving to HTML (see [http://doc.qt.io/qt-4.8/qtextdocument.html#setHtml QTextDocument::setHtml] and [http://doc.qt.io/qt-4.8/qtextdocument.html#toHtml toHtml] as well as [http://doc.qt.io/qt-5/qtextdocumentwriter.html QTextDocumentWriter]). Together with the format-independent API that [http://doc.qt.io/qt-5/qtextdocument.html QTextDocument] provides for modifying documents (or creating them from scratch), this makes Scribe an adequate framework for processing or generating HTML documents. | ||
However, it only supports a | However, it only supports a [http://doc.qt.io/qt-4.8/richtext-html-subset.html limited subset of static HTML 4 / CSS 2.1] - corresponding to the limited set of built-in document features which QTextDocument supports internally. | ||
=== QtWebKit === | === QtWebKit === | ||
The Webkit-based web browser framework shipped with Qt provides the [ | The Webkit-based web browser framework shipped with Qt provides the [http://doc.qt.io/qt-5/qwebpage.html QWebPage] and [http://doc.qt.io/qt-5/qwebframe.html QWebFrame] classes, which can be used to load an HTML document (or any web page) without actually rendering it, and access or modify it through a DOM-like API. Saving back to HTML is possible using [http://doc.qt.io/qt-5/qwebframe.html QWebFrame::toHtml]. | ||
Keep in mind though that loading an HTML document in this way will not just passively parse it (like the Scribe framework or the XML/HTML parsers discussed below would), but actively evaluate it like a browser would, i.e. loading linked content like iframes, running JavaScript that is set to run on start-up, etc. Whether this is useful or problematic will depend on your specific use-case. | Keep in mind though that loading an HTML document in this way will not just passively parse it (like the Scribe framework or the XML/HTML parsers discussed below would), but actively evaluate it like a browser would, i.e. loading linked content like iframes, running JavaScript that is set to run on start-up, etc. Whether this is useful or problematic will depend on your specific use-case. | ||
<pre style="background-color: moccasin">TODO: Can someone confirm whether this is true (and unavoidable), and edit this section accordingly?</pre> | |||
=== Manual XML processing === | === Manual XML processing === | ||
Line 32: | Line 33: | ||
Note that there are third-party tools/libraries available for automatically converting "normal" (and even broken) HTML documents into valid XHTML/XML which is suitable for processing: | Note that there are third-party tools/libraries available for automatically converting "normal" (and even broken) HTML documents into valid XHTML/XML which is suitable for processing: | ||
{| class="wikitable" | |||
| | | | ||
| [http://tidy.sourceforge.net '''HTML Tidy'''] | stand-alone tool | Win, Mac, Linux, … | MIT-like | ! type | ||
| [http://tidy.sourceforge.net/libintro.html '''TidyLib'''] | C library | Win, Mac, Linux, … | MIT-like | ! platforms | ||
| [http://www.chilkatsoft.com/ | ! license | ||
|- | |||
| [http://tidy.sourceforge.net '''HTML Tidy'''] | |||
| stand-alone tool | |||
| Win, Mac, Linux, … | |||
| MIT-like <span style="color:Navy">[permissive]</span> | |||
|- | |||
| [http://tidy.sourceforge.net/libintro.html '''TidyLib'''] | |||
| C library | |||
| Win, Mac, Linux, … | |||
| MIT-like <span style="color:Navy">[permissive] </span> | |||
|- | |||
| [http://www.chilkatsoft.com/ '''Chilkat'''] | |||
| C''+ library | |||
| Win, Mac, Linux, … | |||
| proprietary | |||
|} | |||
=== Manual HTML processing === | === Manual HTML processing === | ||
For specialized HTML parsers with a similarly low-level API as Qt's XML handling classes, refer to third-party C/C++ libraries, e.g.: | For specialized HTML parsers with a similarly low-level API as Qt's XML handling classes, refer to third-party C/C++ libraries, e.g.: | ||
{| class="wikitable" | |||
| | | | ||
| [http://xmlsoft.org '''libxml2'''] | C | | ! API | ||
| [http://htmlcxx.sourceforge.net '''htmlcxx'''] | C++ | | ! parsing | ||
| [http://libhtml.bsd.lv '''libhtml'''] | C | | ! writing | ||
! parsing modes | |||
! platforms | |||
! license | |||
|- | |||
| [http://xmlsoft.org '''libxml2'''] | |||
| C | |||
| <span style="color:Green">yes</span> | |||
| ? | |||
| stream, SAX, DOM (non-validating?) | |||
| Win, Mac, Linux, … | |||
| MIT <span style="color:Navy">[permissive] </span> | |||
|- | |||
| [http://htmlcxx.sourceforge.net '''htmlcxx'''] | |||
| C++ | |||
| <span style="color:Green">yes</span> | |||
| <span style="color:Green">yes</span> | |||
| SAX, DOM, ? (non-validating) | |||
| Win, Linux, ? | |||
| LGPL <span style="color:Navy">[weak copyleft] </span> | |||
|- | |||
| [http://libhtml.bsd.lv '''libhtml'''] | |||
| C | |||
| <span style="color:Green">yes</span> | |||
| <span style="color:Green">yes</span> | |||
| stream (strongly-validating) | |||
| Linux, ? | |||
| ICS <span style="color:Navy">[permissive] </span> | |||
|} | |||
== Rendering / Interactive Viewing == | == Rendering / Interactive Viewing == | ||
Line 53: | Line 98: | ||
As already described above, Qt's Scribe framework supports automatically importing HTML content into a [[Doc:QTextDocument]]. | As already described above, Qt's Scribe framework supports automatically importing HTML content into a [[Doc:QTextDocument]]. | ||
Once in that form, you can… | Once in that form, you can… | ||
* …render it onto any [ | * …render it onto any [http://doc.qt.io/qt-5/qpaintdevice.html QPaintDevice] using [http://doc.qt.io/qt-4.8/qtextdocument.html#drawContents QTextDocument::drawContents]. | ||
* …show it to the user through a [ | * …show it to the user through a [http://doc.qt.io/qt-5/qtextedit.html QTextEdit] widget (either in read-only mode, or in editable mode which allows the user to actually edit the document interactively). | ||
Again, the restriction to the "limited subset of static HTML 4 / CSS 2.1":/doc/qt-4.8/richtext-html-subset.html supported by QTextDocument applies. | Again, the restriction to the "limited subset of static HTML 4 / CSS 2.1":/doc/qt-4.8/richtext-html-subset.html supported by QTextDocument applies. | ||
Line 60: | Line 105: | ||
=== QtWebKit === | === QtWebKit === | ||
[[Image: | [[Image:Webkit-examples.png|right] If you need more powerful viewing / user-interaction capabilities, take a look at the [http://doc.qt.io/qt-5/qtwebkit-index.html QtWebKit browser framework] which is included with Qt. It can interactively display pretty much any modern web document (which may make use of HTML 5, XHTML, CSS 3, SVG, JavaScript, plugins like Flash, etc.). The viewer component is available in the following forms: | ||
* As a QWidget ([ | * As a QWidget ([http://doc.qt.io/qt-5/qwebview.html QWebView] ) | ||
* As a QGraphicsItem ([ | * As a QGraphicsItem ([http://doc.qt.io/qt-5/qgraphicswebview.html QGraphicsWebView]) | ||
* As a QML element ( | * As a QML element ([http://doc.qt.io/qt-4.8/qml-webview.html WebView]) | ||
The framework also allows rendering an HTML page to any [http://doc.qt.io/qt-5/qpaintdevice.html QPaintDevice] using | |||
[http://doc.qt.io/qt-4.8/qwebframe.html#render QWebFrame::render]. | |||
== See Also == | == See Also == |
Revision as of 12:15, 20 March 2015
This article may require cleanup to meet the Qt Wiki's quality standards. Reason: Auto-imported from ExpressionEngine. Please improve this article if you can. Remove the {{cleanup}} tag and add this page to Updated pages list after it's clean. |
Handling HTML
This page discusses various available options for working with HTML documents in your Qt application. Please also read the general considerations outlined on the Handling Document Formats page.
Note that this information is collaboratively collected by the community, with no promise of completeness or correctness. In particular, use your own research and judgment when evaluating third-party libraries or tools!
Reading / Writing
Scribe
Qt's Scribe framework (see Handling Document Formats) has built-in support for loading from / saving to HTML (see QTextDocument::setHtml and toHtml as well as QTextDocumentWriter). Together with the format-independent API that QTextDocument provides for modifying documents (or creating them from scratch), this makes Scribe an adequate framework for processing or generating HTML documents.
However, it only supports a limited subset of static HTML 4 / CSS 2.1 - corresponding to the limited set of built-in document features which QTextDocument supports internally.
QtWebKit
The Webkit-based web browser framework shipped with Qt provides the QWebPage and QWebFrame classes, which can be used to load an HTML document (or any web page) without actually rendering it, and access or modify it through a DOM-like API. Saving back to HTML is possible using QWebFrame::toHtml.
Keep in mind though that loading an HTML document in this way will not just passively parse it (like the Scribe framework or the XML/HTML parsers discussed below would), but actively evaluate it like a browser would, i.e. loading linked content like iframes, running JavaScript that is set to run on start-up, etc. Whether this is useful or problematic will depend on your specific use-case.
TODO: Can someone confirm whether this is true (and unavoidable), and edit this section accordingly?
Manual XML processing
If your application needs to parse or write HTML/XHTML documents which are valid XML, consider processing them using Qt's XML handling classes (see Handling Document Formats).
Note that there are third-party tools/libraries available for automatically converting "normal" (and even broken) HTML documents into valid XHTML/XML which is suitable for processing:
type | platforms | license | |
---|---|---|---|
HTML Tidy | stand-alone tool | Win, Mac, Linux, … | MIT-like [permissive] |
TidyLib | C library | Win, Mac, Linux, … | MIT-like [permissive] |
Chilkat | C+ library | Win, Mac, Linux, … | proprietary |
Manual HTML processing
For specialized HTML parsers with a similarly low-level API as Qt's XML handling classes, refer to third-party C/C++ libraries, e.g.:
API | parsing | writing | parsing modes | platforms | license | |
---|---|---|---|---|---|---|
libxml2 | C | yes | ? | stream, SAX, DOM (non-validating?) | Win, Mac, Linux, … | MIT [permissive] |
htmlcxx | C++ | yes | yes | SAX, DOM, ? (non-validating) | Win, Linux, ? | LGPL [weak copyleft] |
libhtml | C | yes | yes | stream (strongly-validating) | Linux, ? | ICS [permissive] |
Rendering / Interactive Viewing
Scribe
As already described above, Qt's Scribe framework supports automatically importing HTML content into a Doc:QTextDocument. Once in that form, you can…
- …render it onto any QPaintDevice using QTextDocument::drawContents.
- …show it to the user through a QTextEdit widget (either in read-only mode, or in editable mode which allows the user to actually edit the document interactively).
Again, the restriction to the "limited subset of static HTML 4 / CSS 2.1":/doc/qt-4.8/richtext-html-subset.html supported by QTextDocument applies.
QtWebKit
[[Image:Webkit-examples.png|right] If you need more powerful viewing / user-interaction capabilities, take a look at the QtWebKit browser framework which is included with Qt. It can interactively display pretty much any modern web document (which may make use of HTML 5, XHTML, CSS 3, SVG, JavaScript, plugins like Flash, etc.). The viewer component is available in the following forms:
- As a QWidget (QWebView )
- As a QGraphicsItem (QGraphicsWebView)
- As a QML element (WebView)
The framework also allows rendering an HTML page to any QPaintDevice using QWebFrame::render.
See Also
- Handling Document Formats
- other "text document" formats: