Handling microsoft word file format: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
[[Category:Developing_with_Qt]] | [[Category:Developing_with_Qt]] | ||
[toc align_right="yes" depth="2"] | |||
= Handling Microsoft Word (file format) = | = Handling Microsoft Word (file format) = | ||
This page discusses various available options for working with | This page discusses various available options for working with "Microsoft Word":http://en.wikipedia.org/wiki/Microsoft_Word#File_formats documents in your Qt application. Please also read the general considerations outlined on the [[Handling_Document_Formats | Handling Document Formats]] page. | ||
p{width:60%;border:solid 1px #99a;background:#eef;color:#335;padding:2pt 4pt;font-size:0.9em;line-height:150%;font-style:italic}. Note that this information is collaboratively collected by the community, with no promise of completeness or correctness. In particular, use your own research and judgment when evaluating third-party libraries or tools! | p{width:60%;border:solid 1px #99a;background:#eef;color:#335;padding:2pt 4pt;font-size:0.9em;line-height:150%;font-style:italic}. Note that this information is collaboratively collected by the community, with no promise of completeness or correctness. In particular, use your own research and judgment when evaluating third-party libraries or tools! | ||
Line 9: | Line 10: | ||
One needs to distinguish between two different formats (this page deals with both of them): | One needs to distinguish between two different formats (this page deals with both of them): | ||
table{width:95%;margin-left:2.5%}. | table{width:95%;margin-left:2.5%}. | ||
| |''. Legacy "Word Document" format |''. "Office Open XML Document" format | | |||
| ''classification:'' | binary | XML-based | | |||
| ''main filename extension:'' | {font:1em monospace}.doc | {font-family:monospace}.docx | | |||
| ''main internet media type:'' | {font:0.9em monospace}application/vnd.ms-word | {font:0.9em monospace}application/vnd.openxmlformats-officedocument.wordprocessingml.document | | |||
| ''default format of Word:'' | until Word 2003 | since Word 2007 | | |||
== Reading / Writing == | == Reading / Writing == | ||
Line 15: | Line 21: | ||
=== Using Word itself === | === Using Word itself === | ||
If you are exclusively targeting the Windows platform and Microsoft Word will be installed on all target machines, then you can use | If you are exclusively targeting the Windows platform and Microsoft Word will be installed on all target machines, then you can use "Qt’s ActiveX framework":http://doc.qt.io/qt-4.8/activeqt.html to access Word’s .doc and .docx processing functionality through OLE automation. For an introductory code example (and a way to list the API provided by Word's COM object), consult "this how to":http://wiki.qt.io/Using_ActiveX_Object_in_QT (focuses on Microsoft Excel, but it works the same way for Word). | ||
table{width:95%;margin-left:2.5%}. | table{width:95%;margin-left:2.5%}. | ||
| |''. DLL file name |''. COM object name |''. platforms |''. license | | |||
| "'''Microsoft Word'''":http://office.microsoft.com/word/ | ? | {font-family:monospace}Word.Application | Windows | {color:#458}commercial | | |||
=== Using independent parser/writer libraries === | === Using independent parser/writer libraries === | ||
table{width:95%;margin-left:2.5%}. | table{width:95%;margin-left:2.5%}. | ||
| |''. API |''. {font-family:monospace}.doc |''. {font-family:monospace}.docx |''. reading |''. writing |''. platforms |''. license | | |||
| "'''…'''":http://… | … | … | … | … | … | … | … | | |||
| "'''wv'''":http://www.abisource.com/projects/ | C | {color:#580}yes | {color:#920}no | {color:#580}yes | {color:#920}no | Win, Mac, Linux | GPL {color:#458}[strong copyleft] | | |||
''Notes:'' | h3. Using manual XML processing | ||
Files using the XML-based (.docx) format could be processed using Qt's XML handling classes (see [[Handling_Document_Formats | Handling Document Formats]]). | |||
p{border:dashed 1px #a94;background:#fbf3dd;color:#530;padding:2pt 4pt;margin-left:2pt;margin-right:2pt;font-size:0.9em;line-height:150%;font-style:italic}. TODO: Expand this section. | |||
h3. Using batch conversion tools | |||
If all else fails, there is always the option of using an existing tool to automatically convert between Microsoft Word files and a more manageable format, and let your Qt application deal with that format instead. The conversion tool could be bundled with your application or specified as a prerequisite, and controlled via [[Doc:QProcess]]. Some possibilities are: | |||
table{width:95%;margin-left:2.5%}. | |||
|''. |''. {font-family:monospace}.doc to |''. {font-family:monospace}.docx to |''. … to {font-family:monospace}.doc |''. … to {font-family:monospace}.docx |''. platforms |''. license | | |||
| "'''AbiWord'''":http://www.abisource.com | {font-family:monospace}.txt .rtf .html .dbk .odt .docx … | {font-family:monospace}.txt .rtf .html .dbk .odt … | - | {font-family:monospace}.txt .rtf .html .dbk .odt .doc … | Win, Mac, Linux, … | GPL {color:#458}[strong copyleft] | | |||
| "'''wvWare'''":http://www.abisource.com/projects | {font-family:monospace}.txt .rtf .html .dbk … | - | - | - | Win, Mac, Linux, … | GPL {color:#458}[strong copyleft] | | |||
| "'''…'''":http:// | {font-family:monospace}… | {font-family:monospace}… | {font-family:monospace}… | {font-family:monospace}… | … | … | | |||
''Notes:'' | |||
AbiWord can be used like this for batch conversion: <code>abiword —to=outputfile.rtf inputfile.doc<code> | |||
== Displaying / User-Interacting == | == Displaying / User-Interacting == | ||
Line 35: | Line 57: | ||
=== Using Word itself === | === Using Word itself === | ||
p{border:dashed 1px #a94;background:#fbf3dd;color:#530;padding:2pt 4pt;font-size:0.9em;line-height:150%;font-style:italic}. TODO: If you know whether Word provides a | p{border:dashed 1px #a94;background:#fbf3dd;color:#530;padding:2pt 4pt;font-size:0.9em;line-height:150%;font-style:italic}. TODO: If you know whether Word provides a "viewer" ActiveX control that can be embedded in a Qt application through ActiveQT, please fill out this section (include links to relevant resources!) | ||
=== Manual solution === | === Manual solution === | ||
Line 49: | Line 71: | ||
*** [[Handling_Microsoft_Powerpoint_(file_format) | Microsoft Powerpoint]] | *** [[Handling_Microsoft_Powerpoint_(file_format) | Microsoft Powerpoint]] | ||
*** [[Handling_Microsoft_Excel_(file_format) | Microsoft Excel]] | *** [[Handling_Microsoft_Excel_(file_format) | Microsoft Excel]] | ||
** ''other | ** ''other "Text Document" formats:'' | ||
*** [[Handling_HTML | HTML]] | *** [[Handling_HTML | HTML]] | ||
*** [[Handling_PDF | PDF]] | *** [[Handling_PDF | PDF]] | ||
*** [[Handling_OpenDocument_Text | OpenDocument Text]] | *** [[Handling_OpenDocument_Text | OpenDocument Text]] | ||
*** [[Handling_RTF | RTF]] | *** [[Handling_RTF | RTF]] |
Revision as of 13:09, 25 February 2015
[toc align_right="yes" depth="2"]
Handling Microsoft Word (file format)
This page discusses various available options for working with "Microsoft Word":http://en.wikipedia.org/wiki/Microsoft_Word#File_formats documents in your Qt application. Please also read the general considerations outlined on the Handling Document Formats page.
p{width:60%;border:solid 1px #99a;background:#eef;color:#335;padding:2pt 4pt;font-size:0.9em;line-height:150%;font-style:italic}. Note that this information is collaboratively collected by the community, with no promise of completeness or correctness. In particular, use your own research and judgment when evaluating third-party libraries or tools!
One needs to distinguish between two different formats (this page deals with both of them):
table{width:95%;margin-left:2.5%}. | |. Legacy "Word Document" format |. "Office Open XML Document" format | | classification: | binary | XML-based | | main filename extension: | {font:1em monospace}.doc | {font-family:monospace}.docx | | main internet media type: | {font:0.9em monospace}application/vnd.ms-word | {font:0.9em monospace}application/vnd.openxmlformats-officedocument.wordprocessingml.document | | default format of Word: | until Word 2003 | since Word 2007 |
Reading / Writing
Using Word itself
If you are exclusively targeting the Windows platform and Microsoft Word will be installed on all target machines, then you can use "Qt’s ActiveX framework":http://doc.qt.io/qt-4.8/activeqt.html to access Word’s .doc and .docx processing functionality through OLE automation. For an introductory code example (and a way to list the API provided by Word's COM object), consult "this how to":http://wiki.qt.io/Using_ActiveX_Object_in_QT (focuses on Microsoft Excel, but it works the same way for Word).
table{width:95%;margin-left:2.5%}. | |. DLL file name |. COM object name |. platforms |. license | | "Microsoft Word":http://office.microsoft.com/word/ | ? | {font-family:monospace}Word.Application | Windows | {color:#458}commercial |
Using independent parser/writer libraries
table{width:95%;margin-left:2.5%}. | |. API |. {font-family:monospace}.doc |. {font-family:monospace}.docx |. reading |. writing |. platforms |. license | | "…":http://… | … | … | … | … | … | … | … | | "wv":http://www.abisource.com/projects/ | C | {color:#580}yes | {color:#920}no | {color:#580}yes | {color:#920}no | Win, Mac, Linux | GPL {color:#458}[strong copyleft] |
h3. Using manual XML processing
Files using the XML-based (.docx) format could be processed using Qt's XML handling classes (see Handling Document Formats).
p{border:dashed 1px #a94;background:#fbf3dd;color:#530;padding:2pt 4pt;margin-left:2pt;margin-right:2pt;font-size:0.9em;line-height:150%;font-style:italic}. TODO: Expand this section.
h3. Using batch conversion tools
If all else fails, there is always the option of using an existing tool to automatically convert between Microsoft Word files and a more manageable format, and let your Qt application deal with that format instead. The conversion tool could be bundled with your application or specified as a prerequisite, and controlled via Doc:QProcess. Some possibilities are:
table{width:95%;margin-left:2.5%}. |. |. {font-family:monospace}.doc to |. {font-family:monospace}.docx to |. … to {font-family:monospace}.doc |. … to {font-family:monospace}.docx |. platforms |. license | | "AbiWord":http://www.abisource.com | {font-family:monospace}.txt .rtf .html .dbk .odt .docx … | {font-family:monospace}.txt .rtf .html .dbk .odt … | - | {font-family:monospace}.txt .rtf .html .dbk .odt .doc … | Win, Mac, Linux, … | GPL {color:#458}[strong copyleft] | | "wvWare":http://www.abisource.com/projects | {font-family:monospace}.txt .rtf .html .dbk … | - | - | - | Win, Mac, Linux, … | GPL {color:#458}[strong copyleft] | | "…":http:// | {font-family:monospace}… | {font-family:monospace}… | {font-family:monospace}… | {font-family:monospace}… | … | … |
Notes:
AbiWord can be used like this for batch conversion: abiword —to=outputfile.rtf inputfile.doc
Displaying / User-Interacting
Using Word itself
p{border:dashed 1px #a94;background:#fbf3dd;color:#530;padding:2pt 4pt;font-size:0.9em;line-height:150%;font-style:italic}. TODO: If you know whether Word provides a "viewer" ActiveX control that can be embedded in a Qt application through ActiveQT, please fill out this section (include links to relevant resources!)
Manual solution
p{border:dashed 1px #a94;background:#fbf3dd;color:#530;padding:2pt 4pt;font-size:0.9em;line-height:150%;font-style:italic}. TODO: Tips for implementing a custom Microsoft Word viewer widget, using Qt and the Microsoft Word parsing libraries mentioned above
p{color:#fff;border-bottom:solid 1px #ccc}. .
See Also
- Handling Document Formats
- other Microsoft Office formats:
- other "Text Document" formats: