Handling Microsoft Word file format: Difference between revisions

From Qt Wiki
Jump to navigation Jump to search
No edit summary
 
No edit summary
Line 1: Line 1:
=Handling Microsoft Word (file format)=
[[Category:Developing_with_Qt]]<br />[toc align_right=&quot;yes&amp;quot; depth=&quot;2&amp;quot;]


This page discusses various available options for working with [http://en.wikipedia.org/wiki/Microsoft_Word#File_formats Microsoft Word] ''[en.wikipedia.org]'' documents in your Qt application. Please also read the general considerations outlined on the [[Handling Document Formats]] page.
= Handling Microsoft Word (file format) =


''<font size="0.9em"><font color="#335">Note that this information is collaboratively collected by the community, with no promise of completeness or correctness. In particular, use your own research and judgment when evaluating third-party libraries or tools!</font></font>''
This page discusses various available options for working with &quot;Microsoft Word&amp;quot;:http://en.wikipedia.org/wiki/Microsoft_Word#File_formats documents in your Qt application. Please also read the general considerations outlined on the [[Handling_Document_Formats | Handling Document Formats]] page.
 
p{width:60%;border:solid 1px #99a;background:#eef;color:#335;padding:2pt 4pt;font-size:0.9em;line-height:150%;font-style:italic}. Note that this information is collaboratively collected by the community, with no promise of completeness or correctness. In particular, use your own research and judgment when evaluating third-party libraries or tools!


One needs to distinguish between two different formats (this page deals with both of them):
One needs to distinguish between two different formats (this page deals with both of them):


{| class="infotable line" style="width: 95%; margin-left: 2.5%"
table{width:95%;margin-left:2.5%}.<br />| |''. Legacy &quot;Word Document&amp;quot; format |''. &quot;Office Open XML Document&amp;quot; format |<br />| ''classification:'' | binary | XML-based |<br />| ''main filename extension:'' | {font:1em monospace}.doc | {font-family:monospace}.docx |<br />| ''main internet media type:'' | {font:0.9em monospace}application/vnd.ms-word | {font:0.9em monospace}application/vnd.openxmlformats-officedocument.wordprocessingml.document |<br />| ''default format of Word:'' | until Word 2003 | since Word 2007 |
|
! Legacy “Word Document” format
! “Office Open <span class="caps">XML</span> Document” format
|-
| ''classification:''
| binary
| <span class="caps">XML</span>-based
|-
| ''main filename extension:''
| <span style="font: 1em monospace">.doc</span>
| <font face="monospace">.docx</font>
|-
| ''main internet media type:''
| <span style="font: 0.9em monospace">application/vnd.ms-word</span>
| <span style="font: 0.9em monospace">application/vnd.openxmlformats-officedocument.wordprocessingml.document</span>
|-
| ''default format of Word:''
| until Word 2003
| since Word 2007
|}


==Reading / Writing==
== Reading / Writing ==


===Using Word itself===
=== Using Word itself ===


If you are exclusively targeting the Windows platform and Microsoft Word will be installed on all target machines, then you can use [http://doc.qt.io/qt-4.8/activeqt.html Qt’s ActiveX framework] ''[qt.io]'' to access Word’s .doc and .docx processing functionality through <span class="caps">OLE</span> automation. For an introductory code example (and a way to list the <span class="caps">API</span> provided by Word’s <span class="caps">COM</span> object), consult [[Using ActiveX Object in QT|this how to]] ''[qt.io]'' (focuses on Microsoft Excel, but it works the same way for Word).
If you are exclusively targeting the Windows platform and Microsoft Word will be installed on all target machines, then you can use &quot;Qt’s ActiveX framework&amp;quot;:http://doc.qt.io/qt-4.8/activeqt.html to access Word’s .doc and .docx processing functionality through OLE automation. For an introductory code example (and a way to list the API provided by Word's COM object), consult &quot;this how to&amp;quot;:http://wiki.qt.io/Using_ActiveX_Object_in_QT (focuses on Microsoft Excel, but it works the same way for Word).


{| class="infotable line" style="width: 95%; margin-left: 2.5%"
table{width:95%;margin-left:2.5%}.<br />| |''. DLL file name |''. COM object name |''. platforms |''. license |<br />| &quot;'''Microsoft Word'''&quot;:http://office.microsoft.com/word/ | ? | {font-family:monospace}Word.Application | Windows | {color:#458}commercial |
|
! <span class="caps">DLL</span> file name
! <span class="caps">COM</span> object name
! platforms
! license
|-
|
[http://office.microsoft.com/word/ '''Microsoft Word'''] ''[office.microsoft.com]''
| ?
| <font face="monospace">Word.Application</font>
| Windows
| <font color="#458">commercial</font>
|}


===Using independent parser/writer libraries===
=== Using independent parser/writer libraries ===


{| class="infotable line" style="width: 95%; margin-left: 2.5%"
table{width:95%;margin-left:2.5%}.<br />| |''. API |''. {font-family:monospace}.doc |''. {font-family:monospace}.docx |''. reading |''. writing |''. platforms |''. license |<br />| &quot;'''…'''&quot;:http://… | … | … | … | … | … | … | … |<br />| &quot;'''wv'''&quot;:http://www.abisource.com/projects/ | C | {color:#580}yes | {color:#920}no | {color:#580}yes | {color:#920}no | Win, Mac, Linux | GPL {color:#458}[strong copyleft] |
|
<br />h3. Using manual XML processing
! <span class="caps">API</span>
<br />Files using the XML-based (.docx) format could be processed using Qt's XML handling classes (see [[Handling_Document_Formats | Handling Document Formats]]).
! <font face="monospace">.doc</font>
<br />p{border:dashed 1px #a94;background:#fbf3dd;color:#530;padding:2pt 4pt;margin-left:2pt;margin-right:2pt;font-size:0.9em;line-height:150%;font-style:italic}. TODO: Expand this section.
! <font face="monospace">.docx</font>
<br />h3. Using batch conversion tools
! reading
<br />If all else fails, there is always the option of using an existing tool to automatically convert between Microsoft Word files and a more manageable format, and let your Qt application deal with that format instead. The conversion tool could be bundled with your application or specified as a prerequisite, and controlled via [[Doc:QProcess]]. Some possibilities are:
! writing
<br />table{width:95%;margin-left:2.5%}.<br />|''. |''. {font-family:monospace}.doc to |''. {font-family:monospace}.docx to |''. … to {font-family:monospace}.doc |''. … to {font-family:monospace}.docx |''. platforms |''. license |<br />| &quot;'''AbiWord'''&quot;:http://www.abisource.com | {font-family:monospace}.txt .rtf .html .dbk .odt .docx … | {font-family:monospace}.txt .rtf .html .dbk .odt … | - | {font-family:monospace}.txt .rtf .html .dbk .odt .doc … | Win, Mac, Linux, … | GPL {color:#458}[strong copyleft] |<br />| &quot;'''wvWare'''&quot;:http://www.abisource.com/projects | {font-family:monospace}.txt .rtf .html .dbk … | - | - | - | Win, Mac, Linux, … | GPL {color:#458}[strong copyleft] |<br />| &quot;'''…'''&quot;:http:// | {font-family:monospace}… | {font-family:monospace}… | {font-family:monospace}… | {font-family:monospace}… | … | … |
! platforms
! license
|-
|
[http:// '''…''']
| …
| …
| …
| …
| …
| …
| …
|-
|
[http://www.abisource.com/projects/ '''wv'''] ''[abisource.com]''
| C
| <font color="#580">yes</font>
| <font color="#920">no</font>
| <font color="#580">yes</font>
| <font color="#920">no</font>
| Win, Mac, Linux
| <span class="caps">GPL</span> <font color="#458">[strong copyleft]</font>
|}


===Using manual <span class="caps">XML</span> processing===
''Notes:''<br />AbiWord can be used like this for batch conversion: <code>abiword —to=outputfile.rtf inputfile.doc<code>


Files using the <span class="caps">XML</span>-based (.docx) format could be processed using Qt’s <span class="caps">XML</span> handling classes (see [[Handling Document Formats]]).
== Displaying / User-Interacting ==


''<font size="0.9em"><font color="#530"><span class="caps">TODO</span>: Expand this section.</font></font>''
=== Using Word itself ===


===Using batch conversion tools===
p{border:dashed 1px #a94;background:#fbf3dd;color:#530;padding:2pt 4pt;font-size:0.9em;line-height:150%;font-style:italic}. TODO: If you know whether Word provides a &quot;viewer&amp;quot; ActiveX control that can be embedded in a Qt application through ActiveQT, please fill out this section (include links to relevant resources!)


If all else fails, there is always the option of using an existing tool to automatically convert between Microsoft Word files and a more manageable format, and let your Qt application deal with that format instead. The conversion tool could be bundled with your application or specified as a prerequisite, and controlled via [[doc/QProcess|QProcess]]. Some possibilities are:
=== Manual solution ===


{| class="infotable line" style="width: 95%; margin-left: 2.5%"
p{border:dashed 1px #a94;background:#fbf3dd;color:#530;padding:2pt 4pt;font-size:0.9em;line-height:150%;font-style:italic}. TODO: Tips for implementing a custom Microsoft Word viewer widget, using Qt and the Microsoft Word parsing libraries mentioned above
!
! <font face="monospace">.doc</font> to
! <font face="monospace">.docx</font> to
! … to <font face="monospace">.doc</font>
! … to <font face="monospace">.docx</font>
! platforms
! license
|-
|
[http://www.abisource.com '''AbiWord'''] ''[abisource.com]''
| <font face="monospace">.txt .rtf .html .dbk .odt .docx …</font>
| <font face="monospace">.txt .rtf .html .dbk .odt …</font>
| –
| <font face="monospace">.txt .rtf .html .dbk .odt .doc …</font>
| Win, Mac, Linux, …
| <span class="caps">GPL</span> <font color="#458">[strong copyleft]</font>
|-
|
[http://www.abisource.com/projects '''wvWare'''] ''[abisource.com]''
| <font face="monospace">.txt .rtf .html .dbk …</font>
| –
| –
| –
| Win, Mac, Linux, …
| <span class="caps">GPL</span> <font color="#458">[strong copyleft]</font>
|-
|
[http:// '''…''']
| <font face="monospace">…</font>
| <font face="monospace">…</font>
| <font face="monospace">…</font>
| <font face="monospace">…</font>
| …
| …
|}


''Notes:''<br />
p{color:#fff;border-bottom:solid 1px #ccc}. .


==Displaying / User-Interacting==
== See Also ==


===Using Word itself===
* [[Handling_Document_Formats | Handling Document Formats]]
 
''<font size="0.9em"><font color="#530"><span class="caps">TODO</span>: If you know whether Word provides a “viewer” ActiveX control that can be embedded in a Qt application through ActiveQT, please fill out this section (include links to relevant resources!)</font></font>''
 
===Manual solution===
 
''<font size="0.9em"><font color="#530"><span class="caps">TODO</span>: Tips for implementing a custom Microsoft Word viewer widget, using Qt and the Microsoft Word parsing libraries mentioned above</font></font>''
 
<font color="#fff">.</font>
 
==See Also==
 
* [[Handling Document Formats]]
** ''other Microsoft Office formats:''
** ''other Microsoft Office formats:''
*** [[Handling Microsoft PowerPoint file format|Microsoft Powerpoint]]
*** [[Handling_Microsoft_Powerpoint_(file_format) | Microsoft Powerpoint]]
*** [[Handling Microsoft Excel file format|Microsoft Excel]]
*** [[Handling_Microsoft_Excel_(file_format) | Microsoft Excel]]
** ''other “Text Document” formats:''
** ''other &quot;Text Document&amp;quot; formats:''
*** [[Handling HTML|<span class="caps">HTML</span>]]
*** [[Handling_HTML | HTML]]
*** [[Handling PDF|<span class="caps">PDF</span>]]
*** [[Handling_PDF | PDF]]
*** [[Handling OpenDocument Text|OpenDocument Text]]
*** [[Handling_OpenDocument_Text | OpenDocument Text]]
*** [[Handling RTF|<span class="caps">RTF</span>]]
*** [[Handling_RTF | RTF]]
 
===Categories:===
 
* [[:Category:Developing with Qt|Developing_with_Qt]]

Revision as of 14:31, 23 February 2015


[toc align_right="yes&quot; depth="2&quot;]

Handling Microsoft Word (file format)

This page discusses various available options for working with "Microsoft Word&quot;:http://en.wikipedia.org/wiki/Microsoft_Word#File_formats documents in your Qt application. Please also read the general considerations outlined on the Handling Document Formats page.

p{width:60%;border:solid 1px #99a;background:#eef;color:#335;padding:2pt 4pt;font-size:0.9em;line-height:150%;font-style:italic}. Note that this information is collaboratively collected by the community, with no promise of completeness or correctness. In particular, use your own research and judgment when evaluating third-party libraries or tools!

One needs to distinguish between two different formats (this page deals with both of them):

table{width:95%;margin-left:2.5%}.
| |. Legacy "Word Document&quot; format |. "Office Open XML Document&quot; format |
| classification: | binary | XML-based |
| main filename extension: | {font:1em monospace}.doc | {font-family:monospace}.docx |
| main internet media type: | {font:0.9em monospace}application/vnd.ms-word | {font:0.9em monospace}application/vnd.openxmlformats-officedocument.wordprocessingml.document |
| default format of Word: | until Word 2003 | since Word 2007 |

Reading / Writing

Using Word itself

If you are exclusively targeting the Windows platform and Microsoft Word will be installed on all target machines, then you can use "Qt’s ActiveX framework&quot;:http://doc.qt.io/qt-4.8/activeqt.html to access Word’s .doc and .docx processing functionality through OLE automation. For an introductory code example (and a way to list the API provided by Word's COM object), consult "this how to&quot;:http://wiki.qt.io/Using_ActiveX_Object_in_QT (focuses on Microsoft Excel, but it works the same way for Word).

table{width:95%;margin-left:2.5%}.
| |. DLL file name |. COM object name |. platforms |. license |
| "Microsoft Word":http://office.microsoft.com/word/ | ? | {font-family:monospace}Word.Application | Windows | {color:#458}commercial |

Using independent parser/writer libraries

table{width:95%;margin-left:2.5%}.
| |. API |. {font-family:monospace}.doc |. {font-family:monospace}.docx |. reading |. writing |. platforms |. license |
| "":http://… | … | … | … | … | … | … | … |
| "wv":http://www.abisource.com/projects/ | C | {color:#580}yes | {color:#920}no | {color:#580}yes | {color:#920}no | Win, Mac, Linux | GPL {color:#458}[strong copyleft] |

h3. Using manual XML processing
Files using the XML-based (.docx) format could be processed using Qt's XML handling classes (see Handling Document Formats).
p{border:dashed 1px #a94;background:#fbf3dd;color:#530;padding:2pt 4pt;margin-left:2pt;margin-right:2pt;font-size:0.9em;line-height:150%;font-style:italic}. TODO: Expand this section.
h3. Using batch conversion tools
If all else fails, there is always the option of using an existing tool to automatically convert between Microsoft Word files and a more manageable format, and let your Qt application deal with that format instead. The conversion tool could be bundled with your application or specified as a prerequisite, and controlled via Doc:QProcess. Some possibilities are:
table{width:95%;margin-left:2.5%}.
|. |. {font-family:monospace}.doc to |. {font-family:monospace}.docx to |. … to {font-family:monospace}.doc |. … to {font-family:monospace}.docx |. platforms |. license |
| "AbiWord":http://www.abisource.com | {font-family:monospace}.txt .rtf .html .dbk .odt .docx … | {font-family:monospace}.txt .rtf .html .dbk .odt … | - | {font-family:monospace}.txt .rtf .html .dbk .odt .doc … | Win, Mac, Linux, … | GPL {color:#458}[strong copyleft] |
| "wvWare":http://www.abisource.com/projects | {font-family:monospace}.txt .rtf .html .dbk … | - | - | - | Win, Mac, Linux, … | GPL {color:#458}[strong copyleft] |
| "":http:// | {font-family:monospace}… | {font-family:monospace}… | {font-family:monospace}… | {font-family:monospace}… | … | … |

Notes:
AbiWord can be used like this for batch conversion: abiword —to=outputfile.rtf inputfile.doc

Displaying / User-Interacting

Using Word itself

p{border:dashed 1px #a94;background:#fbf3dd;color:#530;padding:2pt 4pt;font-size:0.9em;line-height:150%;font-style:italic}. TODO: If you know whether Word provides a "viewer&quot; ActiveX control that can be embedded in a Qt application through ActiveQT, please fill out this section (include links to relevant resources!)

Manual solution

p{border:dashed 1px #a94;background:#fbf3dd;color:#530;padding:2pt 4pt;font-size:0.9em;line-height:150%;font-style:italic}. TODO: Tips for implementing a custom Microsoft Word viewer widget, using Qt and the Microsoft Word parsing libraries mentioned above

p{color:#fff;border-bottom:solid 1px #ccc}. .

See Also