Handling Document Formats: Difference between revisions

From Qt Wiki
Jump to navigation Jump to search
No edit summary
 
No edit summary
Line 1: Line 1:
=Handling Document Formats=
[[Category:Developing_with_Qt]]<br />[toc align_right=&quot;yes&amp;quot; depth=&quot;3&amp;quot;]


There are many use-cases that may require Qt applications to deal with document formats – usually either involving transparently parsing/writing documents, or displaying documents to the user.<br /> This page covers some general considerations, and provides an overview of wiki pages discussing available options for specific formats.
= Handling Document Formats =


==General Considerations==
There are many use-cases that may require Qt applications to deal with document formats - usually either involving transparently parsing/writing documents, or displaying documents to the user.<br />This page covers some general considerations, and provides an overview of wiki pages discussing available options for specific formats.


===The Scribe Framework===
== General Considerations ==


While not being able to provide built-in functionality for every imaginable document handling use-case, Qt does ship with a generic [[doc/qt-4.8/richtext.html|rich text document framework]], nicknamed “Scribe”.
=== The Scribe Framework ===


[[Image:richtext-examples.png]] It revolves around the class [[doc/qt-5/QTextDocument.html|QTextDocument]], which provides an [[doc/qt-4.8/richtext-structure.html|object-oriented frame-based representation]] of a document consisting of blocks (sub-frames, paragraphs, tables, lists, …) which in turn can contain strings of styled text fragments.<br /><span class="caps">API</span> is included for loading from <span class="caps">HTML</span> and saving to <span class="caps">HTML</span> and <span class="caps">ODT</span> (see [[doc/qt-5/QTextDocumentWriter.html|QTextDocumentWriter]]), as well as for displaying documents to the user (in read-only or interactively editable mode) through [[doc/qt-5/QTextEdit.html|QTextEdit]] .
While not being able to provide built-in functionality for every imaginable document handling use-case, Qt does ship with a generic &quot;rich text document framework&amp;quot;:/doc/qt-4.8/richtext.html, nicknamed &quot;Scribe&amp;quot;.


The Scribe framework’s built-in document feature set (which all built-in loading/saving/displaying/editing operations are limited to) covers only the basics and doesn’t come anywhere close to what modern full-featured document formats and authoring tools (like Microsoft Word) support, although it is sufficient for many tasks such as generating reports. Most parts of the framework are extensible through subclassing, so application authors can implement additional document features or save/load formats as they see fit.
[[Image:&#123;float:right;margin:0 1em 1em 1em&#125;/doc/qt-4.8/images/richtext-examples.png|]] It revolves around the class &quot;QTextDocument&amp;quot;:/doc/qt-5/QTextDocument.html, which provides an &quot;object-oriented frame-based representation&amp;quot;:/doc/qt-4.8/richtext-structure.html of a document consisting of blocks (sub-frames, paragraphs, tables, lists, …) which in turn can contain strings of styled text fragments.<br />API is included for loading from HTML and saving to HTML and ODT (see &quot;QTextDocumentWriter&amp;quot;:/doc/qt-5/QTextDocumentWriter.html), as well as for displaying documents to the user (in read-only or interactively editable mode) through &quot;QTextEdit&amp;quot;:/doc/qt-5/QTextEdit.html .


===<span class="caps">XML</span> Processing===
The Scribe framework's built-in document feature set (which all built-in loading/saving/displaying/editing operations are limited to) covers only the basics and doesn't come anywhere close to what modern full-featured document formats and authoring tools (like Microsoft Word) support, although it is sufficient for many tasks such as generating reports. Most parts of the framework are extensible through subclassing, so application authors can implement additional document features or save/load formats as they see fit.


Many modern document formats are based on <span class="caps">XML</span>. So depending on what kind of processing you wish to perform, manual parsing/writing using [[doc/qt-4.8/xml-processing.html|Qt’s powerful <span class="caps">XML</span> handling classes]] might be a viable option.
=== XML Processing ===


* The efficient [[doc/qt-4.8/xml-streaming.html|<span class="caps">XML</span> Streaming classes]] available in QtCore are recommended for most purposes.
Many modern document formats are based on XML. So depending on what kind of processing you wish to perform, manual parsing/writing using &quot;Qt's powerful XML handling classes&amp;quot;:/doc/qt-4.8/xml-processing.html might be a viable option.<br />* The efficient &quot;XML Streaming classes&amp;quot;:/doc/qt-4.8/xml-streaming.html available in QtCore are recommended for most purposes.<br />* In some cases the SAX and DOM classes from the &quot;QtXml module&amp;quot;:/doc/qt-4.8/qtxml.html can be a useful alternative.<br />* If your application needs to repeatedly extract a certain piece of information, or apply a certain transformation, on many documents with a similar structure, then the &quot;QtXmlPatterns module&amp;quot;:/doc/qt-4.8/qtxmlpatterns.html might provide an elegant solution.
* In some cases the <span class="caps">SAX</span> and <span class="caps">DOM</span> classes from the [[doc/qt-4.8/qtxml.html|QtXml module]] can be a useful alternative.
* If your application needs to repeatedly extract a certain piece of information, or apply a certain transformation, on many documents with a similar structure, then the [[doc/qt-4.8/qtxmlpatterns.html|QtXmlPatterns module]] might provide an elegant solution.


==Individual Formats==
== Individual Formats ==


For information/tips (gathered by the community) on how to work with a specific document format in your Qt application, click on the name of the format in the list below:
For information/tips (gathered by the community) on how to work with a specific document format in your Qt application, click on the name of the format in the list below:


===Text Documents===
=== Text Documents ===


{| class="infotable line" style="width: 95%; margin-left: 2.5%"
table{width:95%;margin-left:2.5%}.<br />|{width:18em}. [[Handling_HTML | '''HTML''']] |{width:12em}. {color:#567;font-family:monospace}.html .htm .xhtml | |<br />|{width:18em}. [[Handling_PDF | '''PDF''']] |{width:12em}. {color:#567;font-family:monospace}.pdf | |<br />|{width:18em}. [[Handling_Microsoft_Word_file_format | '''Microsoft Word''']] |{width:12em}. {color:#567;font-family:monospace}.doc .docx | (native format of Microsoft Word) |<br />|{width:18em}. [[Handling_OpenDocument_Text | '''OpenDocument Text''' ]] |{width:12em}. {color:#567;font-family:monospace}.odt | (native format of OpenOffice/LibreOffice Writer, among others) |<br />|{width:18em}. [[Handling_RTF | '''Rich Text Format''' ]] |{width:12em}. {color:#567;font-family:monospace}.rtf | (here referring specifically to Microsoft's &quot;RTF&amp;quot; format, not rich text in general) |<br />|{width:18em}. [[Handling_LaTeX | '''LaTeX''' ]] |{width:12em}. {color:#567;font-family:monospace}.tex | |
| style="width: 18em" |
[[Handling HTML|'''<span class="caps">HTML</span>''']]
| style="width: 12em" | <font face="monospace"><font color="#567">.html .htm .xhtml</font></font>
|
|-
| style="width: 18em" |
[[Handling PDF|'''<span class="caps">PDF</span>''']]
| style="width: 12em" | <font face="monospace"><font color="#567">.pdf</font></font>
|
|-
| style="width: 18em" |
[[Handling Microsoft Word file format|'''Microsoft Word''']]
| style="width: 12em" | <font face="monospace"><font color="#567">.doc .docx</font></font>
| (native format of Microsoft Word)
|-
| style="width: 18em" |
[[Handling OpenDocument Text|'''OpenDocument Text''']]
| style="width: 12em" | <font face="monospace"><font color="#567">.odt</font></font>
| (native format of OpenOffice/LibreOffice Writer, among others)
|-
| style="width: 18em" |
[[Handling RTF|'''Rich Text Format''']]
| style="width: 12em" | <font face="monospace"><font color="#567">.rtf</font></font>
| (here referring specifically to Microsoft’s “<span class="caps">RTF</span>” format, not rich text in general)
|-
| style="width: 18em" |
[[Handling LaTeX|'''LaTeX''']]
| style="width: 12em" | <font face="monospace"><font color="#567">.tex</font></font>
|
|}


===Spreadsheets===
=== Spreadsheets ===


{| class="infotable line" style="width: 95%; margin-left: 2.5%"
table{width:95%;margin-left:2.5%}.<br />|{width:18em}. [[Handling_Microsoft_Excel_file_format | '''Microsoft Excel''']] |{width:12em}. {color:#567;font-family:monospace}.xls, .xlsx | (native format of Microsoft Excel) |<br />|{width:18em}. [[Handling_OpenDocument_Spreadsheet | '''OpenDocument Spreadsheet''']] |{width:12em}. {color:#567;font-family:monospace}.ods | (native format of OpenOffice/LibreOffice Calc, among others) |<br />|{width:18em}. [[Handling_CSV | '''comma-separated values''']] |{width:12em}. {color:#567;font-family:monospace}.csv | (simple file format that is widely supported by consumer, business, and scientific applications.) |
| style="width: 18em" |
[[Handling Microsoft Excel file format|'''Microsoft Excel''']]
| style="width: 12em" | <font face="monospace"><font color="#567">.xls, .xlsx</font></font>
| (native format of Microsoft Excel)
|-
| style="width: 18em" |
[[Handling OpenDocument Spreadsheet|'''OpenDocument Spreadsheet''']]
| style="width: 12em" | <font face="monospace"><font color="#567">.ods</font></font>
| (native format of OpenOffice/LibreOffice Calc, among others)
|-
| style="width: 18em" |
[[Handling CSV|'''comma separated values''']]
| style="width: 12em" | <font face="monospace"><font color="#567">.csv</font></font>
| (simple file format that is widely supported by consumer, business, and scientific applications.)
|}


===Presentations===
=== Presentations ===


{| class="infotable line" style="width: 95%; margin-left: 2.5%"
table{width:95%;margin-left:2.5%}.<br />|{width:18em}. [[Handling_Microsoft_Powerpoint_file_format | '''Microsoft Powerpoint''']] |{width:12em}. {color:#567;font-family:monospace}.ppt, .pptx | (native format of Microsoft Powerpoint) |<br />|{width:18em}. [[Handling_OpenDocument_Presentation | '''OpenDocument Presentation''']] |{width:12em}. {color:#567;font-family:monospace}.odp | (native format of OpenOffice/LibreOffice Impress, among others) |
| style="width: 18em" |
[[Handling Microsoft PowerPoint file format|'''Microsoft Powerpoint''']]
| style="width: 12em" | <font face="monospace"><font color="#567">.ppt, .pptx</font></font>
| (native format of Microsoft Powerpoint)
|-
| style="width: 18em" |
[[Handling OpenDocument Presentation|'''OpenDocument Presentation''']]
| style="width: 12em" | <font face="monospace"><font color="#567">.odp</font></font>
| (native format of OpenOffice/LibreOffice Impress, among others)
|}


===Math/Formulae===
=== Math/Formulae ===


{| class="infotable line" style="width: 95%; margin-left: 2.5%"
table{width:95%;margin-left:2.5%}.<br />|{width:18em}. [[Handling_MathML | '''MathML''']] |{width:12em}. {color:#567;font-family:monospace}.mml | |<br />|{width:18em}. [[Handling_OpenDocument_Formula | '''OpenDocument Formula''']] |{width:12em}. {color:#567;font-family:monospace}.odf | (native format of OpenOffice/LibreOffice Math, among others) |<br />|{width:18em}. [[Handling_LaTeX_Math | '''LaTeX Math''']] | | |
| style="width: 18em" |
[[Handling MathML|'''MathML''']]
| style="width: 12em" | <font face="monospace"><font color="#567">.mml</font></font>
|
|-
| style="width: 18em" |
[[Handling OpenDocument Formula|'''OpenDocument Formula''']]
| style="width: 12em" | <font face="monospace"><font color="#567">.odf</font></font>
| (native format of OpenOffice/LibreOffice Math, among others)
|-
| style="width: 18em" |
[[Handling LaTeX Math|'''LaTeX Math''']]
|
|
|}


<font color="#fff">.</font>
p{color:#fff;border-bottom:solid 1px #ccc}. .


==See Also==
== See Also ==


* [[Handling Graphics Formats]]
* [[Handling_Graphics_Formats | Handling Graphics Formats]]
* [[Handling Multimedia Formats]]
* [[Handling_Multimedia_Formats | Handling Multimedia Formats]]
* [[Handling Data Formats]]
* [[Handling_Data_Formats | Handling Data Formats]]
* [[Handling Container Formats]]
 
===Categories:===
 
* [[:Category:Developing with Qt|Developing_with_Qt]]

Revision as of 14:25, 23 February 2015


[toc align_right="yes&quot; depth="3&quot;]

Handling Document Formats

There are many use-cases that may require Qt applications to deal with document formats - usually either involving transparently parsing/writing documents, or displaying documents to the user.
This page covers some general considerations, and provides an overview of wiki pages discussing available options for specific formats.

General Considerations

The Scribe Framework

While not being able to provide built-in functionality for every imaginable document handling use-case, Qt does ship with a generic "rich text document framework&quot;:/doc/qt-4.8/richtext.html, nicknamed "Scribe&quot;.

[[Image:{float:right;margin:0 1em 1em 1em}/doc/qt-4.8/images/richtext-examples.png|]] It revolves around the class "QTextDocument&quot;:/doc/qt-5/QTextDocument.html, which provides an "object-oriented frame-based representation&quot;:/doc/qt-4.8/richtext-structure.html of a document consisting of blocks (sub-frames, paragraphs, tables, lists, …) which in turn can contain strings of styled text fragments.
API is included for loading from HTML and saving to HTML and ODT (see "QTextDocumentWriter&quot;:/doc/qt-5/QTextDocumentWriter.html), as well as for displaying documents to the user (in read-only or interactively editable mode) through "QTextEdit&quot;:/doc/qt-5/QTextEdit.html .

The Scribe framework's built-in document feature set (which all built-in loading/saving/displaying/editing operations are limited to) covers only the basics and doesn't come anywhere close to what modern full-featured document formats and authoring tools (like Microsoft Word) support, although it is sufficient for many tasks such as generating reports. Most parts of the framework are extensible through subclassing, so application authors can implement additional document features or save/load formats as they see fit.

XML Processing

Many modern document formats are based on XML. So depending on what kind of processing you wish to perform, manual parsing/writing using "Qt's powerful XML handling classes&quot;:/doc/qt-4.8/xml-processing.html might be a viable option.
* The efficient "XML Streaming classes&quot;:/doc/qt-4.8/xml-streaming.html available in QtCore are recommended for most purposes.
* In some cases the SAX and DOM classes from the "QtXml module&quot;:/doc/qt-4.8/qtxml.html can be a useful alternative.
* If your application needs to repeatedly extract a certain piece of information, or apply a certain transformation, on many documents with a similar structure, then the "QtXmlPatterns module&quot;:/doc/qt-4.8/qtxmlpatterns.html might provide an elegant solution.

Individual Formats

For information/tips (gathered by the community) on how to work with a specific document format in your Qt application, click on the name of the format in the list below:

Text Documents

table{width:95%;margin-left:2.5%}.
|{width:18em}. HTML |{width:12em}. {color:#567;font-family:monospace}.html .htm .xhtml | |
|{width:18em}. PDF |{width:12em}. {color:#567;font-family:monospace}.pdf | |
|{width:18em}. Microsoft Word |{width:12em}. {color:#567;font-family:monospace}.doc .docx | (native format of Microsoft Word) |
|{width:18em}. OpenDocument Text |{width:12em}. {color:#567;font-family:monospace}.odt | (native format of OpenOffice/LibreOffice Writer, among others) |
|{width:18em}. Rich Text Format |{width:12em}. {color:#567;font-family:monospace}.rtf | (here referring specifically to Microsoft's "RTF&quot; format, not rich text in general) |
|{width:18em}. LaTeX |{width:12em}. {color:#567;font-family:monospace}.tex | |

Spreadsheets

table{width:95%;margin-left:2.5%}.
|{width:18em}. Microsoft Excel |{width:12em}. {color:#567;font-family:monospace}.xls, .xlsx | (native format of Microsoft Excel) |
|{width:18em}. OpenDocument Spreadsheet |{width:12em}. {color:#567;font-family:monospace}.ods | (native format of OpenOffice/LibreOffice Calc, among others) |
|{width:18em}. comma-separated values |{width:12em}. {color:#567;font-family:monospace}.csv | (simple file format that is widely supported by consumer, business, and scientific applications.) |

Presentations

table{width:95%;margin-left:2.5%}.
|{width:18em}. Microsoft Powerpoint |{width:12em}. {color:#567;font-family:monospace}.ppt, .pptx | (native format of Microsoft Powerpoint) |
|{width:18em}. OpenDocument Presentation |{width:12em}. {color:#567;font-family:monospace}.odp | (native format of OpenOffice/LibreOffice Impress, among others) |

Math/Formulae

table{width:95%;margin-left:2.5%}.
|{width:18em}. MathML |{width:12em}. {color:#567;font-family:monospace}.mml | |
|{width:18em}. OpenDocument Formula |{width:12em}. {color:#567;font-family:monospace}.odf | (native format of OpenOffice/LibreOffice Math, among others) |
|{width:18em}. LaTeX Math | | |

p{color:#fff;border-bottom:solid 1px #ccc}. .

See Also