QString: Difference between revisions

From Qt Wiki
Jump to navigation Jump to search
No edit summary
 
No edit summary
Line 1: Line 1:
'''English''' [[QtQString-Korean|한국어]]


Written By : Girish Ramakrishnan, ForwardBias Technologies
=QString=
The fundamentals of encoding are covered in [http://developer.qt.nokia.com/wiki/BasicsOfStringEncoding BasicsOfStringEncoding] ''[developer.qt.nokia.com]''.<br /> QString stores unicode strings. By definition, since QString stores unicode, a QString knows what characters it’s contents represent. This is in contrast to a C-style string (char *) that has no knowledge of encoding by itself. A QString can be rendered on the screen or to a printer, provided there is a font to display the characters that the QString holds. All user-visible strings in Qt are stored in QString.
Internally, QString stores the string using the <span class="caps">UTF</span>-16 encoding. Each of the 2 bytes of <span class="caps">UTF</span>-16 is represented using a QChar. One main reason to use <span class="caps">UTF</span>-16 as the internal representation is that it makes it fast to use them with native unicode <span class="caps">API</span>s’ on the Mac OS X and Windows (which expect <span class="caps">UTF</span>-16).
For processing a C-style char-pointer or an array of bytes, QByteArray should be used instead of QString. See [http://developer.qt.nokia.com/wiki/UsingQByteArray UsingQByteArray] ''[developer.qt.nokia.com]'' for more details.
=Using C-style strings with QString=
The above code is saved in some file with encoding called the the ''input charset''. The compiler generates code that puts the C-style string “Qt” in memory with possibly some other encoding called the ''exec charset''. At run time, QString gets a pointer to this memory location and needs to interpret and convert the bytes to unicode.
For converting the C-style string to Unicode, QString needs to know the exec charset. By default, Qt assumes that this is <span class="caps">ASCII</span>. Internally, this conversion uses the same code path as QString::fromAscii(). QString::fromAscii(), in turn, attempts to decode the characters as Latin-1 (since Ascii and Latin-1 are compatible). It is thus possible to get away with placing Latin-1 characters in C-strings.
QTextCodec::setCodecForCStrings(exec-charset) can be used to change the encoding that Qt uses to decode C-style strings. Calling this function makes QString::fromAscii() decode C-style strings using the new charset (in other words, it doesn’t decode <span class="caps">ASCII</span> anymore).
The only reason to use QTextCodec::setCodecForCStrings is when the exec charset is not <span class="caps">ASCII</span>. A common case this occurs is when source files contain non-<span class="caps">ASCII</span> characters. Such source files are saved as <span class="caps">UTF</span>-8 and the exec charset of the compiler is set to <span class="caps">UTF</span>-8. QTextCodec::setCodecForCStrings(“<span class="caps">UTF</span>-8”) can then be used to make Qt interpret all the char * pointers correctly as <span class="caps">UTF</span>-8.
Even though QTextCodec::setCodecForCStrings() is a nice convenience, it is recommended to use only <span class="caps">ASCII</span> characters in source files. The C++ standard only mandates <span class="caps">ASCII</span> support and does not specify what encodings are to be supported by the compiler. A string may be initialized with the euro character (U+20AC) in any of the following ways:<br />
All the above techniques require the source file to be only <span class="caps">ASCII</span> encoded.
=Unicode methods in QString=
A QChar represents a unicode code point. QString::unicode() returns the QChars of a QString. QString::utf16() returns ushort '''. Notice that the function is *not named''' toUtf16() because there is no conversion involved since the internal representation of QString is <span class="caps">UTF</span>-16.
QString::normalized() can be used for Unicode composition and decomposition.
A QChar is always 16-bit. Surrogate pairs are represented using multiple QChars. QChar::isHighSurrogate and QChar::isLowSurrogate can be used to get the surrogate order. QChar::unicode() will return the values. QChar::cell() and QChar::row() can be used to get the lower byte and the higher byte of the QChar.
QString::length() represents the number of QChars. Thus, it can be that the length does not actually refer to number of actual characters (when the string contains supplementary characters).
QString::toUtf8(), QString::fromUtf8(), QString::toUcs4(), QString::fromUcs4() help in <span class="caps">UTF</span>-8 and <span class="caps">UTF</span>-32 conversion.
=Disabling QString(char *)=
Even though the automatic conversion from C-style string to QString is convenient, it is often the source of many subtle bugs when using third party libraries. Qt provides an option of disabling automatic conversion from C-style strings to QString. For example,<br />
Compile errors from above make the programmer rethink about using QString (maybe a QByteArray is a better option) and also try to figure out the encoding of the C-style string.
By defining the macro QT_NO_CAST_FROM_ASCII, the automatic conversion from C-strings to QString using QString::fromAscii() is disabled and results in a compile error. After adding the define, the above code should be changed to<br />
=Further reading=
[http://developer.qt.nokia.com/wiki/UsingQStringEffectively Using Qt Strings Effectively] ''[developer.qt.nokia.com]''
===Categories:===
* [[:Category:QtInternals|QtInternals]]

Revision as of 17:30, 14 January 2015