Handling Document Formats: Difference between revisions

From Qt Wiki
Jump to navigation Jump to search
No edit summary
 
No edit summary
 
(5 intermediate revisions by 3 users not shown)
Line 1: Line 1:
=Handling Document Formats=


There are many use-cases that may require Qt applications to deal with document formats – usually either involving transparently parsing/writing documents, or displaying documents to the user.<br /> This page covers some general considerations, and provides an overview of wiki pages discussing available options for specific formats.


==General Considerations==
[[Category:Developing_with_Qt]]


===The Scribe Framework===


While not being able to provide built-in functionality for every imaginable document handling use-case, Qt does ship with a generic [[doc/qt-4.8/richtext.html|rich text document framework]], nicknamed “Scribe”.


[[Image:richtext-examples.png]] It revolves around the class [[doc/qt-5/QTextDocument.html|QTextDocument]], which provides an [[doc/qt-4.8/richtext-structure.html|object-oriented frame-based representation]] of a document consisting of blocks (sub-frames, paragraphs, tables, lists, …) which in turn can contain strings of styled text fragments.<br /><span class="caps">API</span> is included for loading from <span class="caps">HTML</span> and saving to <span class="caps">HTML</span> and <span class="caps">ODT</span> (see [[doc/qt-5/QTextDocumentWriter.html|QTextDocumentWriter]]), as well as for displaying documents to the user (in read-only or interactively editable mode) through [[doc/qt-5/QTextEdit.html|QTextEdit]] .


The Scribe framework’s built-in document feature set (which all built-in loading/saving/displaying/editing operations are limited to) covers only the basics and doesn’t come anywhere close to what modern full-featured document formats and authoring tools (like Microsoft Word) support, although it is sufficient for many tasks such as generating reports. Most parts of the framework are extensible through subclassing, so application authors can implement additional document features or save/load formats as they see fit.
There are many use-cases that may require Qt applications to deal with document formats - usually either involving transparently parsing/writing documents, or displaying documents to the user.
This page covers some general considerations, and provides an overview of wiki pages discussing available options for specific formats.


===<span class="caps">XML</span> Processing===
== General Considerations ==


Many modern document formats are based on <span class="caps">XML</span>. So depending on what kind of processing you wish to perform, manual parsing/writing using [[doc/qt-4.8/xml-processing.html|Qt’s powerful <span class="caps">XML</span> handling classes]] might be a viable option.
=== The Scribe Framework ===


* The efficient [[doc/qt-4.8/xml-streaming.html|<span class="caps">XML</span> Streaming classes]] available in QtCore are recommended for most purposes.
While not being able to provide built-in functionality for every imaginable document handling use-case, Qt does ship with a generic [http://doc.qt.io/qt-5/richtext.html rich text document framework], nicknamed "Scribe".
* In some cases the <span class="caps">SAX</span> and <span class="caps">DOM</span> classes from the [[doc/qt-4.8/qtxml.html|QtXml module]] can be a useful alternative.
* If your application needs to repeatedly extract a certain piece of information, or apply a certain transformation, on many documents with a similar structure, then the [[doc/qt-4.8/qtxmlpatterns.html|QtXmlPatterns module]] might provide an elegant solution.


==Individual Formats==
[[Image:Richtext-examples.png|right]] It revolves around the class [http://doc.qt.io/qt-5/QTextDocument.html QTextDocument], which provides an [http://doc.qt.io/qt-5/richtext-structure.html object-oriented frame-based representation] of a document consisting of blocks (sub-frames, paragraphs, tables, lists, …) which in turn can contain strings of styled text fragments.
API is included for loading from HTML and saving to HTML and ODT (see [http://doc.qt.io/qt-5/QTextDocumentWriter.html QTextDocumentWriter] ),as well as for displaying documents to the user (in read-only or interactively editable mode) through [http://doc.qt.io/qt-5/QTextEdit.html QTextEdit].
 
The Scribe framework's built-in document feature set (which all built-in loading/saving/displaying/editing operations are limited to) covers only the basics and doesn't come anywhere close to what modern full-featured document formats and authoring tools (like Microsoft Word) support, although it is sufficient for many tasks such as generating reports. Most parts of the framework are extensible through subclassing, so application authors can implement additional document features or save/load formats as they see fit.
 
=== XML Processing ===
 
Many modern document formats are based on XML. So depending on what kind of processing you wish to perform, manual parsing/writing using [http://doc.qt.io/qt-4.8/xml-processing.html Qt's powerful XML handling classes] might be a viable option.
* The efficient [http://doc.qt.io/qt-4.8/xml-streaming.html XML Streaming classes] available in QtCore are recommended for most purposes.
* In some cases the SAX and DOM classes from the [http://doc.qt.io/qt-5/qtxml-index.html QtXml module] can be a useful alternative.
* If your application needs to repeatedly extract a certain piece of information, or apply a certain transformation, on many documents with a similar structure, then the [http://doc.qt.io/qt-5/qtxmlpatterns-index.html QtXmlPatterns module] might provide an elegant solution.
 
== Individual Formats ==


For information/tips (gathered by the community) on how to work with a specific document format in your Qt application, click on the name of the format in the list below:
For information/tips (gathered by the community) on how to work with a specific document format in your Qt application, click on the name of the format in the list below:


===Text Documents===
=== Text Documents ===


{| class="infotable line" style="width: 95%; margin-left: 2.5%"
 
| style="width: 18em" |
{| class="wikitable"
[[Handling HTML|'''<span class="caps">HTML</span>''']]
| [[Handling_HTML | '''HTML''']]  
| style="width: 12em" | <font face="monospace"><font color="#567">.html .htm .xhtml</font></font>
| .html .htm .xhtml  
|
|  
|-
|-
| style="width: 18em" |
| [[Handling_PDF | '''PDF''']]  
[[Handling PDF|'''<span class="caps">PDF</span>''']]
| .pdf  
| style="width: 12em" | <font face="monospace"><font color="#567">.pdf</font></font>
|  
|
|-
|-
| style="width: 18em" |
| [[Handling_Microsoft_Word_file_format | '''Microsoft Word''']]  
[[Handling Microsoft Word file format|'''Microsoft Word''']]
| .doc .docx  
| style="width: 12em" | <font face="monospace"><font color="#567">.doc .docx</font></font>
| (native format of Microsoft Word)  
| (native format of Microsoft Word)
|-
|-
| style="width: 18em" |
| [[Handling_OpenDocument_Text | '''OpenDocument Text''' ]]  
[[Handling OpenDocument Text|'''OpenDocument Text''']]
| .odt  
| style="width: 12em" | <font face="monospace"><font color="#567">.odt</font></font>
| (native format of OpenOffice/LibreOffice Writer, among others)  
| (native format of OpenOffice/LibreOffice Writer, among others)
|-
|-
| style="width: 18em" |
| [[Handling_RTF | '''Rich Text Format''' ]]  
[[Handling RTF|'''Rich Text Format''']]
| .rtf  
| style="width: 12em" | <font face="monospace"><font color="#567">.rtf</font></font>
| (here referring specifically to Microsoft's "RTF" format, not rich text in general)  
| (here referring specifically to Microsoft’s “<span class="caps">RTF</span>” format, not rich text in general)
|-
|-
| style="width: 18em" |
| [[Handling_LaTeX | '''LaTeX''' ]]  
[[Handling LaTeX|'''LaTeX''']]
| .tex
| style="width: 12em" | <font face="monospace"><font color="#567">.tex</font></font>
|
|}
|}


===Spreadsheets===
=== Spreadsheets ===


{| class="infotable line" style="width: 95%; margin-left: 2.5%"
 
| style="width: 18em" |
{| class="wikitable"
[[Handling Microsoft Excel file format|'''Microsoft Excel''']]
| [[Handling_Microsoft_Excel_file_format | '''Microsoft Excel''']]  
| style="width: 12em" | <font face="monospace"><font color="#567">.xls, .xlsx</font></font>
| .xls, .xlsx  
| (native format of Microsoft Excel)
| (native format of Microsoft Excel)  
|-
|-
| style="width: 18em" |
| [[Handling_OpenDocument_Spreadsheet | '''OpenDocument Spreadsheet''']]  
[[Handling OpenDocument Spreadsheet|'''OpenDocument Spreadsheet''']]
| .ods  
| style="width: 12em" | <font face="monospace"><font color="#567">.ods</font></font>
| (native format of OpenOffice/LibreOffice Calc, among others)  
| (native format of OpenOffice/LibreOffice Calc, among others)
|-
|-
| style="width: 18em" |
| [[Handling_CSV | '''comma-separated values''']]  
[[Handling CSV|'''comma separated values''']]
| .csv  
| style="width: 12em" | <font face="monospace"><font color="#567">.csv</font></font>
| (simple file format that is widely supported by consumer, business, and scientific applications.)  
| (simple file format that is widely supported by consumer, business, and scientific applications.)
|}
|}


===Presentations===
=== Presentations ===


{| class="infotable line" style="width: 95%; margin-left: 2.5%"
{| class="wikitable"
| style="width: 18em" |
| [https://wiki.qt.io/Handling_Microsoft_PowerPoint_file_format '''Microsoft PowerPoint''']
[[Handling Microsoft PowerPoint file format|'''Microsoft Powerpoint''']]
| .ppt, .pptx  
| style="width: 12em" | <font face="monospace"><font color="#567">.ppt, .pptx</font></font>
| (native format of Microsoft Powerpoint)  
| (native format of Microsoft Powerpoint)
|-
|-
| style="width: 18em" |
| [[Handling_OpenDocument_Presentation | '''OpenDocument Presentation''']]  
[[Handling OpenDocument Presentation|'''OpenDocument Presentation''']]
| .odp  
| style="width: 12em" | <font face="monospace"><font color="#567">.odp</font></font>
| (native format of OpenOffice/LibreOffice Impress, among others)  
| (native format of OpenOffice/LibreOffice Impress, among others)
|}
|}


===Math/Formulae===
=== Math/Formulae ===


{| class="infotable line" style="width: 95%; margin-left: 2.5%"
{| class="wikitable"
| style="width: 18em" |
| [[Handling_MathML | '''MathML''']]  
[[Handling MathML|'''MathML''']]
| .mml  
| style="width: 12em" | <font face="monospace"><font color="#567">.mml</font></font>
|  
|
|-
|-
| style="width: 18em" |
| [[Handling_OpenDocument_Formula | '''OpenDocument Formula''']]  
[[Handling OpenDocument Formula|'''OpenDocument Formula''']]
| .odf  
| style="width: 12em" | <font face="monospace"><font color="#567">.odf</font></font>
| (native format of OpenOffice/LibreOffice Math, among others)  
| (native format of OpenOffice/LibreOffice Math, among others)
|-
|-
| style="width: 18em" |
| [[Handling_LaTeX_Math | '''LaTeX Math''']]
[[Handling LaTeX Math|'''LaTeX Math''']]
|  
|
|  
|
|}
|}


<font color="#fff">.</font>
==See Also==


* [[Handling Graphics Formats]]
* [[Handling Multimedia Formats]]
* [[Handling Data Formats]]
* [[Handling Container Formats]]


===Categories:===
== See Also ==


* [[:Category:Developing with Qt|Developing_with_Qt]]
* [[Handling_Graphics_Formats | Handling Graphics Formats]]
* [[Handling_Multimedia_Formats | Handling Multimedia Formats]]
* [[Handling_Data_Formats | Handling Data Formats]]

Latest revision as of 11:31, 20 March 2015



There are many use-cases that may require Qt applications to deal with document formats - usually either involving transparently parsing/writing documents, or displaying documents to the user. This page covers some general considerations, and provides an overview of wiki pages discussing available options for specific formats.

General Considerations

The Scribe Framework

While not being able to provide built-in functionality for every imaginable document handling use-case, Qt does ship with a generic rich text document framework, nicknamed "Scribe".

Richtext-examples.png

It revolves around the class QTextDocument, which provides an object-oriented frame-based representation of a document consisting of blocks (sub-frames, paragraphs, tables, lists, …) which in turn can contain strings of styled text fragments.

API is included for loading from HTML and saving to HTML and ODT (see QTextDocumentWriter ),as well as for displaying documents to the user (in read-only or interactively editable mode) through QTextEdit.

The Scribe framework's built-in document feature set (which all built-in loading/saving/displaying/editing operations are limited to) covers only the basics and doesn't come anywhere close to what modern full-featured document formats and authoring tools (like Microsoft Word) support, although it is sufficient for many tasks such as generating reports. Most parts of the framework are extensible through subclassing, so application authors can implement additional document features or save/load formats as they see fit.

XML Processing

Many modern document formats are based on XML. So depending on what kind of processing you wish to perform, manual parsing/writing using Qt's powerful XML handling classes might be a viable option.

  • The efficient XML Streaming classes available in QtCore are recommended for most purposes.
  • In some cases the SAX and DOM classes from the QtXml module can be a useful alternative.
  • If your application needs to repeatedly extract a certain piece of information, or apply a certain transformation, on many documents with a similar structure, then the QtXmlPatterns module might provide an elegant solution.

Individual Formats

For information/tips (gathered by the community) on how to work with a specific document format in your Qt application, click on the name of the format in the list below:

Text Documents

HTML .html .htm .xhtml
PDF .pdf
Microsoft Word .doc .docx (native format of Microsoft Word)
OpenDocument Text .odt (native format of OpenOffice/LibreOffice Writer, among others)
Rich Text Format .rtf (here referring specifically to Microsoft's "RTF" format, not rich text in general)
LaTeX .tex

Spreadsheets

Microsoft Excel .xls, .xlsx (native format of Microsoft Excel)
OpenDocument Spreadsheet .ods (native format of OpenOffice/LibreOffice Calc, among others)
comma-separated values .csv (simple file format that is widely supported by consumer, business, and scientific applications.)

Presentations

Microsoft PowerPoint .ppt, .pptx (native format of Microsoft Powerpoint)
OpenDocument Presentation .odp (native format of OpenOffice/LibreOffice Impress, among others)

Math/Formulae

MathML .mml
OpenDocument Formula .odf (native format of OpenOffice/LibreOffice Math, among others)
LaTeX Math


See Also