Handling HTML: Difference between revisions

From Qt Wiki
Jump to navigation Jump to search
No edit summary
 
No edit summary
Line 1: Line 1:
=Handling <span class="caps">HTML</span>=
[[Category:Developing_with_Qt]]<br />[toc align_right=&quot;yes&amp;quot; depth=&quot;2&amp;quot;]


This page discusses various available options for working with [http://en.wikipedia.org/wiki/HTML <span class="caps">HTML</span>] ''[en.wikipedia.org]'' documents in your Qt application. Please also read the general considerations outlined on the [[Handling Document Formats]] page.
= Handling HTML =


''<font size="0.9em"><font color="#335">Note that this information is collaboratively collected by the community, with no promise of completeness or correctness. In particular, use your own research and judgment when evaluating third-party libraries or tools!</font></font>''
This page discusses various available options for working with &quot;HTML&amp;quot;:http://en.wikipedia.org/wiki/HTML documents in your Qt application. Please also read the general considerations outlined on the [[Handling_Document_Formats | Handling Document Formats]] page.


==Reading / Writing==
p{width:60%;border:solid 1px #99a;background:#eef;color:#335;padding:2pt 4pt;font-size:0.9em;line-height:150%;font-style:italic}. Note that this information is collaboratively collected by the community, with no promise of completeness or correctness. In particular, use your own research and judgment when evaluating third-party libraries or tools!


===Scribe===
== Reading / Writing ==


Qt’s ''Scribe'' framework (see [[Handling Document Formats]]) has built-in support for loading from / saving to <span class="caps">HTML</span> (see [[doc/qt-4.8/qtextdocument.html#setHtml|QTextDocument::setHtml]] and [[doc/qt-4.8/qtextdocument.html#toHtml|toHtml]] as well as [[doc/QTextDocumentWriter|QTextDocumentWriter]]). Together with the format-independent <span class="caps">API</span> that [[doc/QTextDocument|QTextDocument]] provides for modifying documents (or creating them from scratch), this makes Scribe an adequate framework for processing or generating <span class="caps">HTML</span> documents.
=== Scribe ===


However, it only supports a [[doc/qt-4.8/richtext-html-subset.html|limited subset of static <span class="caps">HTML</span> 4 / <span class="caps">CSS</span> 2.1]] – corresponding to the limited set of built-in document features which QTextDocument supports internally.
Qt's ''Scribe'' framework (see [[Handling_Document_Formats | Handling Document Formats]]) has built-in support for loading from / saving to HTML (see &quot;QTextDocument::setHtml&amp;quot;:/doc/qt-4.8/qtextdocument.html#setHtml and &quot;toHtml&amp;quot;:/doc/qt-4.8/qtextdocument.html#toHtml as well as [[Doc:QTextDocumentWriter]]). Together with the format-independent API that [[Doc:QTextDocument]] provides for modifying documents (or creating them from scratch), this makes Scribe an adequate framework for processing or generating HTML documents.


===QtWebKit===
However, it only supports a &quot;limited subset of static HTML 4 / CSS 2.1&amp;quot;:/doc/qt-4.8/richtext-html-subset.html - corresponding to the limited set of built-in document features which QTextDocument supports internally.


The Webkit-based web browser framework shipped with Qt provides the [[doc/QWebPage|QWebPage]] and [[doc/QWebFrame|QWebFrame]] classes, which can be used to load an <span class="caps">HTML</span> document (or any web page) without actually rendering it, and access or modify it through a <span class="caps">DOM</span>-like <span class="caps">API</span>. Saving back to <span class="caps">HTML</span> is possible using [[doc/qt-4.8/qwebframe.html#toHtml|QWebFrame::toHtml]].
=== QtWebKit ===


Keep in mind though that loading an <span class="caps">HTML</span> document in this way will not just passively parse it (like the Scribe framework or the <span class="caps">XML</span>/HTML parsers discussed below would), but actively evaluate it like a browser would, i.e. loading linked content like iframes, running JavaScript that is set to run on start-up, etc. Whether this is useful or problematic will depend on your specific use-case.
The Webkit-based web browser framework shipped with Qt provides the [[Doc:QWebPage]] and [[Doc:QWebFrame]] classes, which can be used to load an HTML document (or any web page) without actually rendering it, and access or modify it through a DOM-like API. Saving back to HTML is possible using &quot;QWebFrame::toHtml&amp;quot;:/doc/qt-4.8/qwebframe.html#toHtml.


''<font size="0.9em"><font color="#530"><span class="caps">TODO</span>: Can someone confirm whether this is true (and unavoidable), and edit this section accordingly?</font></font>''
Keep in mind though that loading an HTML document in this way will not just passively parse it (like the Scribe framework or the XML/HTML parsers discussed below would), but actively evaluate it like a browser would, i.e. loading linked content like iframes, running JavaScript that is set to run on start-up, etc. Whether this is useful or problematic will depend on your specific use-case.


===Manual <span class="caps">XML</span> processing===
p{border:dashed 1px #a94;background:#fbf3dd;color:#530;padding:2pt 4pt;margin-left:2pt;margin-right:2pt;font-size:0.9em;line-height:150%;font-style:italic}. TODO: Can someone confirm whether this is true (and unavoidable), and edit this section accordingly?


If your application needs to parse or write <span class="caps">HTML</span>/XHTML documents which are valid <span class="caps">XML</span>, consider processing them using Qt’s ''<span class="caps">XML</span> handling classes'' (see [[Handling Document Formats]]).
=== Manual XML processing ===


Note that there are third-party tools/libraries available for automatically converting “normal” (and even broken) <span class="caps">HTML</span> documents into valid <span class="caps">XHTML</span>/XML which is suitable for processing:
If your application needs to parse or write HTML/XHTML documents which are valid XML, consider processing them using Qt's ''XML handling classes'' (see [[Handling_Document_Formats | Handling Document Formats]]).


{| class="infotable line" style="width: 95%; margin-left: 2.5%"
Note that there are third-party tools/libraries available for automatically converting &quot;normal&amp;quot; (and even broken) HTML documents into valid XHTML/XML which is suitable for processing:
|
! type
! platforms
! license
|-
|
[http://tidy.sourceforge.net '''<span class="caps">HTML</span> Tidy'''] ''[tidy.sourceforge.net]''
| stand-alone tool
| Win, Mac, Linux, …
| <span class="caps">MIT</span>-like <font color="#458">[permissive]</font>
|-
|
[http://tidy.sourceforge.net/libintro.html '''TidyLib'''] ''[tidy.sourceforge.net]''
| C library
| Win, Mac, Linux, …
| <span class="caps">MIT</span>-like <font color="#458">[permissive]</font>
|-
|
[http://www.chilkatsoft.com/C++-HTML-Parser.asp '''Chilkat'''] ''[chilkatsoft.com]''
| C++ library
| Win, Mac, Linux, …
| <font color="#458">proprietary</font>
|}


===Manual <span class="caps">HTML</span> processing===
table{width:95%;margin-left:2.5%}.<br />| |''. type |''. platforms |''. license |<br />| &quot;'''HTML Tidy'''&quot;:http://tidy.sourceforge.net | stand-alone tool | Win, Mac, Linux, … | MIT-like {color:#458}[permissive] |<br />| &quot;'''TidyLib'''&quot;:http://tidy.sourceforge.net/libintro.html | C library | Win, Mac, Linux, … | MIT-like {color:#458}[permissive] |<br />| &quot;'''Chilkat'''&quot;:http://www.chilkatsoft.com/C+''-HTML-Parser.asp | C''+ library | Win, Mac, Linux, … | {color:#458}proprietary |
<br />h3. Manual HTML processing
<br />For specialized HTML parsers with a similarly low-level API as Qt's XML handling classes, refer to third-party C/C++ libraries, e.g.:
<br />table{width:95%;margin-left:2.5%}.<br />| |''. API |''. parsing |''. writing |''. parsing modes |''. platforms |_. license |<br />| &quot;'''libxml2'''&quot;:http://xmlsoft.org | C | {color:#580}yes | {color:#920}? | stream, SAX, DOM (non-validating?) | Win, Mac, Linux, … | MIT {color:#458}[permissive] |<br />| &quot;'''htmlcxx'''&quot;:http://htmlcxx.sourceforge.net | C++ | {color:#580}yes | {color:#580}yes | SAX, DOM, ? (non-validating) | Win, Linux, ? | LGPL {color:#458}[weak copyleft]|<br />| &quot;'''libhtml'''&quot;:http://libhtml.bsd.lv | C | {color:#580}yes | {color:#580}yes | stream (strongly-validating) | Linux, ? | ICS {color:#458}[permissive] |


For specialized <span class="caps">HTML</span> parsers with a similarly low-level <span class="caps">API</span> as Qt’s <span class="caps">XML</span> handling classes, refer to third-party C/C++ libraries, e.g.:
== Rendering / Interactive Viewing ==


{| class="infotable line" style="width: 95%; margin-left: 2.5%"
=== Scribe ===
|
! <span class="caps">API</span>
! parsing
! writing
! parsing modes
! platforms
! license
|-
|
[http://xmlsoft.org '''libxml2'''] ''[xmlsoft.org]''
| C
| <font color="#580">yes</font>
| <font color="#920">?</font>
| stream, <span class="caps">SAX</span>, <span class="caps">DOM</span> (non-validating?)
| Win, Mac, Linux, …
| <span class="caps">MIT</span> <font color="#458">[permissive]</font>
|-
|
[http://htmlcxx.sourceforge.net '''htmlcxx'''] ''[htmlcxx.sourceforge.net]''
| C++
| <font color="#580">yes</font>
| <font color="#580">yes</font>
| <span class="caps">SAX</span>, <span class="caps">DOM</span>, ? (non-validating)
| Win, Linux, ?
| <span class="caps">LGPL</span> <font color="#458">[weak copyleft]</font>
|-
|
[http://libhtml.bsd.lv '''libhtml'''] ''[libhtml.bsd.lv]''
| C
| <font color="#580">yes</font>
| <font color="#580">yes</font>
| stream (strongly-validating)
| Linux, ?
| <span class="caps">ICS</span> <font color="#458">[permissive]</font>
|}


==Rendering / Interactive Viewing==
As already described above, Qt's Scribe framework supports automatically importing HTML content into a [[Doc:QTextDocument]].<br />Once in that form, you can…<br />* …render it onto any [[Doc:QPaintDevice]] using &quot;QTextDocument::drawContents&amp;quot;:/doc/qt-4.8/qtextdocument.html#drawContents.<br />* …show it to the user through a [[Doc:QTextEdit]] widget (either in read-only mode, or in editable mode which allows the user to actually edit the document interactively).


===Scribe===
Again, the restriction to the &quot;limited subset of static HTML 4 / CSS 2.1&amp;quot;:/doc/qt-4.8/richtext-html-subset.html supported by QTextDocument applies.


As already described above, Qt’s Scribe framework supports automatically importing <span class="caps">HTML</span> content into a [[doc/QTextDocument|QTextDocument]].<br /> Once in that form, you can…
=== QtWebKit ===


* …render it onto any [[doc/QPaintDevice|QPaintDevice]] using [[doc/qt-4.8/qtextdocument.html#drawContents|QTextDocument::drawContents]].
[[Image:&#123;float:right;margin:0 1em 1em 1em&#125;/doc/qt-4.8/images/webkit-examples.png|]] If you need more powerful viewing / user-interaction capabilities, take a look at the &quot;QtWebKit browser framework&amp;quot;:/doc/qt-4.8/qtwebkit.html which is included with Qt. It can interactively display pretty much any modern web document (which may make use of HTML 5, XHTML, CSS 3, SVG, JavaScript, plugins like Flash, etc.). The viewer component is available in the following forms:<br />* As a QWidget ([[Doc:QWebView]] )<br />* As a QGraphicsItem ([[Doc:QGraphicsWebView]])<br />* As a QML element (&quot;WebView&amp;quot;:/doc/qt-4.8/qml-webview.html)
* …show it to the user through a [[doc/QTextEdit|QTextEdit]] widget (either in read-only mode, or in editable mode which allows the user to actually edit the document interactively).


Again, the restriction to the [[doc/qt-4.8/richtext-html-subset.html|limited subset of static <span class="caps">HTML</span> 4 / <span class="caps">CSS</span> 2.1]] supported by QTextDocument applies.
The framework also allows rendering an HTML page to any [[Doc:QPaintDevice]] using &quot;QWebFrame::render&amp;quot;:/doc/qt-4.8/qwebframe.html#render.


===QtWebKit===
p{color:#fff;border-bottom:solid 1px #ccc;clear:both}. .


[[Image:webkit-examples.png]] If you need more powerful viewing / user-interaction capabilities, take a look at the [[doc/qt-4.8/qtwebkit.html|QtWebKit browser framework]] which is included with Qt. It can interactively display pretty much any modern web document (which may make use of <span class="caps">HTML</span> 5, <span class="caps">XHTML</span>, <span class="caps">CSS</span> 3, <span class="caps">SVG</span>, JavaScript, plugins like Flash, etc.). The viewer component is available in the following forms:
== See Also ==


* As a QWidget ([[doc/QWebView|QWebView]] )
* [[Handling_Document_Formats | Handling Document Formats]]
* As a QGraphicsItem ([[doc/QGraphicsWebView|QGraphicsWebView]])
** ''other &quot;text document&amp;quot; formats:''
* As a <span class="caps">QML</span> element ([[doc/qt-4.8/qml-webview.html|WebView]])
*** [[Handling_PDF | PDF]]
 
*** [[Handling_RTF | RTF]]
The framework also allows rendering an <span class="caps">HTML</span> page to any [[doc/QPaintDevice|QPaintDevice]] using [[doc/qt-4.8/qwebframe.html#render|QWebFrame::render]].
*** [[Handling_Microsoft_Word_(file_format) | Microsoft Word]]
 
*** [[Handling_OpenDocument_Text | OpenDocument Text]]
<font color="#fff">.</font>
 
==See Also==
 
* [[Handling Document Formats]]
** ''other “text document” formats:''
*** [[Handling PDF|<span class="caps">PDF</span>]]
*** [[Handling RTF|<span class="caps">RTF</span>]]
*** [[Handling Microsoft Word file format|Microsoft Word]]
*** [[Handling OpenDocument Text|OpenDocument Text]]
 
===Categories:===
 
* [[:Category:Developing with Qt|Developing_with_Qt]]

Revision as of 14:27, 23 February 2015


[toc align_right="yes&quot; depth="2&quot;]

Handling HTML

This page discusses various available options for working with "HTML&quot;:http://en.wikipedia.org/wiki/HTML documents in your Qt application. Please also read the general considerations outlined on the Handling Document Formats page.

p{width:60%;border:solid 1px #99a;background:#eef;color:#335;padding:2pt 4pt;font-size:0.9em;line-height:150%;font-style:italic}. Note that this information is collaboratively collected by the community, with no promise of completeness or correctness. In particular, use your own research and judgment when evaluating third-party libraries or tools!

Reading / Writing

Scribe

Qt's Scribe framework (see Handling Document Formats) has built-in support for loading from / saving to HTML (see "QTextDocument::setHtml&quot;:/doc/qt-4.8/qtextdocument.html#setHtml and "toHtml&quot;:/doc/qt-4.8/qtextdocument.html#toHtml as well as Doc:QTextDocumentWriter). Together with the format-independent API that Doc:QTextDocument provides for modifying documents (or creating them from scratch), this makes Scribe an adequate framework for processing or generating HTML documents.

However, it only supports a "limited subset of static HTML 4 / CSS 2.1&quot;:/doc/qt-4.8/richtext-html-subset.html - corresponding to the limited set of built-in document features which QTextDocument supports internally.

QtWebKit

The Webkit-based web browser framework shipped with Qt provides the Doc:QWebPage and Doc:QWebFrame classes, which can be used to load an HTML document (or any web page) without actually rendering it, and access or modify it through a DOM-like API. Saving back to HTML is possible using "QWebFrame::toHtml&quot;:/doc/qt-4.8/qwebframe.html#toHtml.

Keep in mind though that loading an HTML document in this way will not just passively parse it (like the Scribe framework or the XML/HTML parsers discussed below would), but actively evaluate it like a browser would, i.e. loading linked content like iframes, running JavaScript that is set to run on start-up, etc. Whether this is useful or problematic will depend on your specific use-case.

p{border:dashed 1px #a94;background:#fbf3dd;color:#530;padding:2pt 4pt;margin-left:2pt;margin-right:2pt;font-size:0.9em;line-height:150%;font-style:italic}. TODO: Can someone confirm whether this is true (and unavoidable), and edit this section accordingly?

Manual XML processing

If your application needs to parse or write HTML/XHTML documents which are valid XML, consider processing them using Qt's XML handling classes (see Handling Document Formats).

Note that there are third-party tools/libraries available for automatically converting "normal&quot; (and even broken) HTML documents into valid XHTML/XML which is suitable for processing:

table{width:95%;margin-left:2.5%}.
| |. type |. platforms |. license |
| "HTML Tidy":http://tidy.sourceforge.net | stand-alone tool | Win, Mac, Linux, … | MIT-like {color:#458}[permissive] |
| "TidyLib":http://tidy.sourceforge.net/libintro.html | C library | Win, Mac, Linux, … | MIT-like {color:#458}[permissive] |
| "Chilkat":http://www.chilkatsoft.com/C+
-HTML-Parser.asp | C+ library | Win, Mac, Linux, … | {color:#458}proprietary |
h3. Manual HTML processing
For specialized HTML parsers with a similarly low-level API as Qt's XML handling classes, refer to third-party C/C++ libraries, e.g.:
table{width:95%;margin-left:2.5%}.
| |. API |. parsing |. writing |. parsing modes |. platforms |_. license |
| "libxml2":http://xmlsoft.org | C | {color:#580}yes | {color:#920}? | stream, SAX, DOM (non-validating?) | Win, Mac, Linux, … | MIT {color:#458}[permissive] |
| "htmlcxx":http://htmlcxx.sourceforge.net | C++ | {color:#580}yes | {color:#580}yes | SAX, DOM, ? (non-validating) | Win, Linux, ? | LGPL {color:#458}[weak copyleft]|
| "libhtml":http://libhtml.bsd.lv | C | {color:#580}yes | {color:#580}yes | stream (strongly-validating) | Linux, ? | ICS {color:#458}[permissive] |

Rendering / Interactive Viewing

Scribe

As already described above, Qt's Scribe framework supports automatically importing HTML content into a Doc:QTextDocument.
Once in that form, you can…
* …render it onto any Doc:QPaintDevice using "QTextDocument::drawContents&quot;:/doc/qt-4.8/qtextdocument.html#drawContents.
* …show it to the user through a Doc:QTextEdit widget (either in read-only mode, or in editable mode which allows the user to actually edit the document interactively).

Again, the restriction to the "limited subset of static HTML 4 / CSS 2.1&quot;:/doc/qt-4.8/richtext-html-subset.html supported by QTextDocument applies.

QtWebKit

[[Image:{float:right;margin:0 1em 1em 1em}/doc/qt-4.8/images/webkit-examples.png|]] If you need more powerful viewing / user-interaction capabilities, take a look at the "QtWebKit browser framework&quot;:/doc/qt-4.8/qtwebkit.html which is included with Qt. It can interactively display pretty much any modern web document (which may make use of HTML 5, XHTML, CSS 3, SVG, JavaScript, plugins like Flash, etc.). The viewer component is available in the following forms:
* As a QWidget (Doc:QWebView )
* As a QGraphicsItem (Doc:QGraphicsWebView)
* As a QML element ("WebView&quot;:/doc/qt-4.8/qml-webview.html)

The framework also allows rendering an HTML page to any Doc:QPaintDevice using "QWebFrame::render&quot;:/doc/qt-4.8/qwebframe.html#render.

p{color:#fff;border-bottom:solid 1px #ccc;clear:both}. .

See Also