QtInternationalization
[toc align_right="yes" depth="1"]
Written By : Girish Ramakrishnan, ForwardBias Technologies
This article explains how Qt implements i18n support.
Qt i18n overview
The "Qt Internationalization":http://doc.qt.nokia.com/internationalization.html manual provides a comprehensive overview of the i18n support in Qt. To summarize the working,
1. The strings meant for translation are marked using tr() in source code.
2. lupdate - A tool that scans the source files for tr() and places the strings in a .ts xml file. At this point the .ts file contains only strings that are meant to be translated.
3. A translator provides translations by opening the .ts file using "Qt Linguist":http://doc.qt.nokia.com/linguist-programmers.html. At this point the .ts file contains both the string to be translated and the translation.
3. lrelease - A tool that takes the .ts which contains translated strings and converts them into a binary .qm format that can be loaded into an application at runtime.
4. Application uses QTranslator is used to load .qm file(s) depending on the locale/settings.
5. QCoreApplication::installTranslator is used to install the QTranslator.
6. Result : All tr() invocations of step1 automatically get translations made in step 3.
Let's go through the internals of each of the above steps.
What gets translated
In Qt, for a string to be translated we need the following bits of information:
1. A source string that needs to be translated. For example, "OK"
2. A context. The string "OK" could be translated to different words in different contexts. For example, "OK" in a InformationDialog is probably different from the "OK" in FileRemoveDialog. Contexts can be any arbitrary string. lupdate uses the name of the class (along with the namespace) in which the tr() is used as the context.
3. A comment (aka disambiguation). It could also be that "OK" has different translations within the same class methods itself. Think of this as a micro-context or a sub-context. Comments are arbitrary strings and are optional.
4. A number for plural support - In some languages, the wording changes depending on a number. For example: 0 files removed, 1 file removed, 2 files removed (notice plurality of 'file' based on the number)
To summarize, Qt translates {context, source_string, comment, number} into a single string.
What is tr()?
One usually invokes tr() as
<br />m_label->setText(tr("OK&quot;));<br />
tr() serves two purposes. First, It is used by lupdate as a marker for translatable strings. It parses the argument to tr and places them in a .ts file. This means that if the argument to tr() is some variable, lupdate will not pick up the string and Qt will not translate the string as you expect. To illustrate,
<br />void Dialog::changeLabelText(const char *text)<br />{<br /> m_label->setText(tr(text)); // Wrong usage of tr. lupdate doesn't know what text is<br />}<br />
As discussed in the previous section, a translatable string is actually a {context, source_string, comment, number} tuple. lupdate determines the 'context' from the class in which the code resides. In the above code, the context is 'Dialog'. The comment is empty. If you need lupdate to pick up a comment you need to invoke tr() as
<br />void Dialog::setLicenseText()<br />{<br /> m_label->setText(tr("License text follows&quot;, "setLicenseText&quot;)); // the comment is set as "setLicenseText&quot;<br />}<br />
To enable plural support for the string, you need to add a %n in the source_string. For example, tr("%n file&#40;s&#41; removed"). It is important to understand that lupdate uses static code analysis to determine what goes into a .ts file. Any runtime information (like using variable names in tr) is most likely erroneous usage.
The second purpose of tr() is for the i18n run time. It is a C++ static method that translates the string given to it. It is not a macro or a virtual function as is the popular belief. The reason for this is that lupdate uses the class name as the context of the translation. We want tr("OK") to expand as:
<br />m_label->setText(Dialog::tr("OK&quot;, "some_comment&quot;)); // "Dialog::" helps get the context of translation<br />
The class name "Dialog" has to be some how figured out. tr cannot be a macro since there is no compile time macro in C++ to get the class name. tr cannot be virtual since there is no mechanism in C++ to know the class name of the invoking class. Note that what we want is different from QObject::className() - Unlike className(), the context of tr("OK") in base class is "Base" and the context of tr("OK") in a derived class is "Derived". The tr("OK") in a base class needs to become tr("Base", "OK") and in the Derived class it needs to become tr("Derived", "OK").
So, tr() is a static function and is generated by moc. The Q_OBJECT macro, among other things, has another macro QT_TR_FUNCTIONS. QT_TR_FUNCTIONS is defines as
<br /># define QT_TR_FUNCTIONS static inline QString tr(const char *s, const char *c = 0) { return staticMetaObject.tr(s, c); }
Notice that it relies on the staticMetaObject that is generated by moc.
For non-Qt/non-QObject classes on which moc is not run, Qt provide Q_DECLARE_TR_FUNCTIONS(context) where one needs to explicitly provide the context. All one needs to do is to place the macro in the class definition. What it does should be obvious by now:
<br />#define Q_DECLARE_TR_FUNCTIONS(context) public: static inline QString tr(const char *sourceText, const char '''disambiguation = 0) { return QCoreApplication::translate(#context, sourceText, disambiguation); }<br />
For "free" strings (i.e strings not part of any class), Qt provides QT_TR_NOOP and QT_TRANSLATE_NOOP. The NOOP suffix is because the macros don't expand to tr(). They just mark the strings for translation for lupdate and don't actually do any translation.
h1. QCoreApplication and QTranslator
Call to tr() gets translated as staticMetaObject.tr() which in turn ends up as QCoreApplication::translate(). QCoreApplications runs through list of installed translators (using QCoreApplication::installTranslator) in the reverse order of installation. The first QTranslator that succeeds in returning a valid string for a call to QTranslator::translate() is used as the translation for that text. Translations are not cached. If no translator can translate the string, the const char is just converted to QString using the QTextCodec::codecForTr (see tr() and encoding section below for more specifics).
Installing or removing a QTranslator might result in translations changing since Qt just searches translators in reverse order and returns the first that succeeds. So, a QEvent::LanguageChange event is sent to every widget in the application to notify a change in list of translators. Widgets need to reimplement QWidget::changeEvent and invoke tr() again to obtain new translations. Note that for UI designed using Qt Designer, the generated UI code contains a function retranslateUi() that can be used to invoke tr() again. The application developer still needs to reimplement changeEvent() and call that function.
tr() and encoding
The encoding of the C-style string wrapped by tr() is assumed to be Latin-1 by default. One can wrap UTF-8 encoded strings by using trUtf8(), which works the exact say way as tr(). This encoding information is used by tr() to convert the const char * to QString() when all translators fail to translate the string (the QTranslator works with char * and not QString)
You can change the default encoding assumed by tr() from latin-1 by using QTextCodec::setDefaultCodecForTr(). Note that this only provides the encoding information for the Qt i18n runtime. lupdate, which does static analysis of the C++ code, needs to informed that the strings wrapped by tr are of some other codec. One can set the CODECFORTR variable in the .pro file to specify the codec.
NOTE : trUtf8() is redundant if CODECFORTR is set to UTF-8 and the setDefaultCodecForTr("UTF-8") has been called by the application.
TODO: Document purpose of CODECFORSRC.
The .ts file
The .ts is just a XML file. The DTD is "here":http://doc.qt.nokia.com/4.6//linguist-ts-file-format.html. You can give each of the translatable items an id using "//=" and meta-data can be added to the xml using "//~". See "Adding Meta-data":http://doc.qt.nokia.com/4.6/i18n-source-translation.html#adding-meta-data-to-strings for more information.
lupdate detects the comment (disambiguation text) from the second argument of the call to tr(). This mechanism was abused to leave a note to the translator. It's an abuse because the comment is really a sub-context and actually forms a part of the key that QTranslator uses to look up translations. So, a progammer would write tr("OK", "dear translator, this is the remove file dialog's OK"); The love note to the translator is now actually part of the lookup key. A cleaner way, as of Qt 4.4, is to use "//:", like:
<br />//: dear translator, this is the file dialog's OK (and this is not part of the lookup key)<br />m_label->setText(tr("OK&quot;));<br />
The .qm file
The .qm file can be considered as a big hash table (strings are hashed using ELF hash). QTranslator hashes the {context+source_string+ comment} together to look up the translated text. The context+source_string+comment is (and has to be) stored in the same encoding as CODECFORTR (otherwise lookups won't work). The translated text is stored in the .qm as UTF-16.
Note that the .qs file doesn't actually contain source strings (it gets mangled since comment is appended to it). Besides, looking up the original string given the translatable string is a very expensive operation and there is no API to reverse-translate.
How plurality works
Linguist supports plural forms and translates correctly based on the language. To understand this feature, please read "Plural rules":http://doc.qt.nokia.com/4.6/i18n-plural-rules.html.
When n is encountered in a string, linguist offers the translator to provide various plural forms (depending on the translation language). All the plural form strings are embedded into the final .qm file. How does QTranslator determine which plural form to pick at runtime? The numerus rules are actually embedded inside the .qm file by lrelease. The rules (byte code) are loaded by QTranslator at runtime and executed (like in a virtual machine) to determine which plural form to pick up.
h1. Id based translations
In the initial development stages of an Application, strings in tr() usually change wildly. Updating translations involves a lot of rework. Even if the strings changed only slightly, lupdate thinks new strings have appeared and old translations are lost since it uses the source_text as the key for merging old and new translations.
One approach to help lupdate merge existing translations better is to use a static unique id for all translatable text by using qtTrId():
<br />// "This is text that keeps changing&quot;<br />m_label->setText(qtTrId("labelTextId&quot;));<br />
The idea is that at run time, qtTrId("labelTextId") gets translated to whatever "This is text that keeps changing" translates to. qtTrId()
is implemented as calling tr with 'labelTextId' as the source_string and null comment.
lupdate on seeing qtTrId, populates the //% "…" as the source string in the .ts XML. In addition it marks "labelTextId" as the id of the XML tag. Notice how the .ts created is completely compatible with the .ts that was created when using tr(). The only extra is the id. So, linguist will continue to work just as before.
<br /> <message id="labelTextId&quot;><br /> <location filename="main.cpp&quot; line="15&quot;/&gt;<br /> <source&gt;This is text that keeps changing&lt;/source&gt;<br /> <translation type="unfinished&quot;></translation&gt;<br /> </message&gt;<br />
lrelease needs to be told to use id as the key instead of source_text since that is how qtTrId will translate. One needs to pass -idbased command line arg to lrelease to achieve that.
The main advantage of the id based approach is that the text can keep changing and lupdate will merge old and new translations better since it has a static 'id' to work with.
References
"Plural Form(s) in Translation(s)":http://doc.qt.nokia.com/qq/qq19-plurals.html