Handling microsoft word file format
[toc align_right="yes" depth="2"]
Handling Microsoft Word (file format)
This page discusses various available options for working with "Microsoft Word":http://en.wikipedia.org/wiki/Microsoft_Word#File_formats documents in your Qt application. Please also read the general considerations outlined on the Handling Document Formats page.
p{width:60%;border:solid 1px #99a;background:#eef;color:#335;padding:2pt 4pt;font-size:0.9em;line-height:150%;font-style:italic}. Note that this information is collaboratively collected by the community, with no promise of completeness or correctness. In particular, use your own research and judgment when evaluating third-party libraries or tools!
One needs to distinguish between two different formats (this page deals with both of them):
table{width:95%;margin-left:2.5%}. | |. Legacy "Word Document" format |. "Office Open XML Document" format | | classification: | binary | XML-based | | main filename extension: | {font:1em monospace}.doc | {font-family:monospace}.docx | | main internet media type: | {font:0.9em monospace}application/vnd.ms-word | {font:0.9em monospace}application/vnd.openxmlformats-officedocument.wordprocessingml.document | | default format of Word: | until Word 2003 | since Word 2007 |
Reading / Writing
Using Word itself
If you are exclusively targeting the Windows platform and Microsoft Word will be installed on all target machines, then you can use "Qt’s ActiveX framework":http://doc.qt.io/qt-4.8/activeqt.html to access Word’s .doc and .docx processing functionality through OLE automation. For an introductory code example (and a way to list the API provided by Word's COM object), consult "this how to":http://wiki.qt.io/Using_ActiveX_Object_in_QT (focuses on Microsoft Excel, but it works the same way for Word).
table{width:95%;margin-left:2.5%}. | |. DLL file name |. COM object name |. platforms |. license | | "Microsoft Word":http://office.microsoft.com/word/ | ? | {font-family:monospace}Word.Application | Windows | {color:#458}commercial |
Using independent parser/writer libraries
table{width:95%;margin-left:2.5%}. | |. API |. {font-family:monospace}.doc |. {font-family:monospace}.docx |. reading |. writing |. platforms |. license | | "…":http://… | … | … | … | … | … | … | … | | "wv":http://www.abisource.com/projects/ | C | {color:#580}yes | {color:#920}no | {color:#580}yes | {color:#920}no | Win, Mac, Linux | GPL {color:#458}[strong copyleft] |
h3. Using manual XML processing
Files using the XML-based (.docx) format could be processed using Qt's XML handling classes (see Handling Document Formats).
p{border:dashed 1px #a94;background:#fbf3dd;color:#530;padding:2pt 4pt;margin-left:2pt;margin-right:2pt;font-size:0.9em;line-height:150%;font-style:italic}. TODO: Expand this section.
h3. Using batch conversion tools
If all else fails, there is always the option of using an existing tool to automatically convert between Microsoft Word files and a more manageable format, and let your Qt application deal with that format instead. The conversion tool could be bundled with your application or specified as a prerequisite, and controlled via Doc:QProcess. Some possibilities are:
table{width:95%;margin-left:2.5%}. |. |. {font-family:monospace}.doc to |. {font-family:monospace}.docx to |. … to {font-family:monospace}.doc |. … to {font-family:monospace}.docx |. platforms |. license | | "AbiWord":http://www.abisource.com | {font-family:monospace}.txt .rtf .html .dbk .odt .docx … | {font-family:monospace}.txt .rtf .html .dbk .odt … | - | {font-family:monospace}.txt .rtf .html .dbk .odt .doc … | Win, Mac, Linux, … | GPL {color:#458}[strong copyleft] | | "wvWare":http://www.abisource.com/projects | {font-family:monospace}.txt .rtf .html .dbk … | - | - | - | Win, Mac, Linux, … | GPL {color:#458}[strong copyleft] | | "…":http:// | {font-family:monospace}… | {font-family:monospace}… | {font-family:monospace}… | {font-family:monospace}… | … | … |
Notes:
AbiWord can be used like this for batch conversion: abiword —to=outputfile.rtf inputfile.doc
Displaying / User-Interacting
Using Word itself
p{border:dashed 1px #a94;background:#fbf3dd;color:#530;padding:2pt 4pt;font-size:0.9em;line-height:150%;font-style:italic}. TODO: If you know whether Word provides a "viewer" ActiveX control that can be embedded in a Qt application through ActiveQT, please fill out this section (include links to relevant resources!)
Manual solution
p{border:dashed 1px #a94;background:#fbf3dd;color:#530;padding:2pt 4pt;font-size:0.9em;line-height:150%;font-style:italic}. TODO: Tips for implementing a custom Microsoft Word viewer widget, using Qt and the Microsoft Word parsing libraries mentioned above
p{color:#fff;border-bottom:solid 1px #ccc}. .
See Also
-  Handling Document Formats
- other Microsoft Office formats:
- other "Text Document" formats: