Locale Support in Qt 5

From Qt Wiki
Jump to navigation Jump to search


WARNING: This page is a WORK-IN-PROGRESS and so may not be complete or accurate.

Early Qt localization support was weak with a design based on Windows APIs, but with its own code and data. Over time new features have been introduced based on CLDR data, but the current QLocale implementation is still incomplete, inconsistent, not fully integrated with the host system locale, and unnecessarily bloats the core library. It is clear that Qt needs a new approach.

This page documents the current state of Qt localization and researches possible solutions to the issues.

Current Support

Qt provides two core options for localization:

  • System Locale: Use the default system locale settings, the implementation being platform specific.
  • Custom Locale: An application can create any custom locale supported by the CLDR data embedded in QtCore which is then used by Qt's internal code routines. These may or may not match the host's available locales.

In theory, the System Locale should use the facilities of the host system, falling back to Qt's CLDR-based facilities where the host system doesn't provide the required data or API. In practice, Windows and Mac use an inconsistent mixture of the host facilities and Qt's CLDR-based facilities. On all other platforms, such as Linux and Android, Qt's built-in CLDR-based facilities are always used.

A number of key requirements for future improvements have been identified:

  • Minimise locale data shipped with Qt
  • Minimise localization code needing to be maintained by Qt
  • Integrate fully with the user/system locale on all platforms
  • Integrate fully with any user level overrides
  • Add support for time zones
  • Add support for calendar systems
  • Add support for collation
  • Add support for advanced formatting/parsing options such as spell-out, ordinal, duration, etc.

These can be summarised as reducing the Qt code base, integrating fully with the host platform, and supporting advanced features.

In the past, Qt has emphasised consistent behaviour across all platforms as being the key driver, at the expense of Qt apps looking and behaving the same as all other apps on the users platform, i.e. making life easier for the developer rather than the end user. A better balance needs to be found between the two positions, but with the emphasis on fitting in with the users expectations for the fundamentals, i.e. date and number formats must be consistent with the environment teh app is running in unless the developer has good reason. This means better using the host facilities for the core localization features, and providing optional advanced features for those apps that need them.

Solution Options

A number of options have been investigated for implementing these improvements.

The initial plan was to extend the existing QLocale code and data with new code, code from KDE, and more data from CLDR As a test case calendar system support was added using KDE code and CLDR data, but this proved to be unacceptable in terms of the library size required by the new data. It would also result in a lot of new code that Qt would struggle to code and support, especially in difficult areas like collation. It also failed the test for improving host integration.

The second plan was to utilise ICU as the localization back-end on all platforms to minimise code and data requirements within Qt. This had the advantage of a single code base and consistent behaviour and feature set across platforms. The main disadvantage was that ICU does not respect a users personal preference overrides or the host settings, so would be inconsistent. This option proved unworkable however due to resistance from Windows devs to the extra dependency and download, Apple App Store policies preventing linking to ICU, and Android only shipping the Java version.

The third plan adopted was to implement individual system back-ends for each host platform to fully utilise the system locale resources. While more code than option 2, and limited to a lowest common denominator feature set, it would at least provide a fully integrated appearance with the host system, and is the only design pattern that will work on all platforms without additional large amounts of code or data or external library dependencies. Unfortunately the lowest common feature set was too low, limited to what the Win32 API in Windows Vista or Windows Embedded 2013 supports, a very limited feature set little better than the current QLocale support. Given that Windows 7 will need to be supported for at least 5 more years (official end-of-support is 2020) then we are unlikely to be be able to add any advanced formatting support into QtCore during the 5.x series, and possibly even a future 6.x series. This will fail to meet the needs of many developers who require advanced formatting.

A fourth plan is now proposed. this will be to improve the existing QLocale code base where possible, adding a small number of missing features, with a new separate Qt Add-on library provided for advanced features based on ICU. This is the option now documented below. Detailed planning for the ICU option 2 can be found at Qt-5-ICU. Many of these details will still apply to the new separate library, host-based plan.

Localization Features

This section describes the different localization features, their current implementation in Qt, and possible changes.

User and System Locales

The System Locale is the default locale configured for the entire system, i.e. the default any new user or a root user would have. The User Locale is the locale chosen by the current user which may differ from the System Locale, and/or may have individual settings customised by the user, e.g. usually date format.

  • Windows and Mac have System and User locales, but Qt only uses/exposes the User Locale including user customizations.
  • Unix has System and User locale IDs, but Qt only uses/exposes the User IDs. The user can customize setting groups but not individual settings.

Ideally, QLocale would provide separate static methods for systemLocale() and userLocale(), and a way on UNIX to customize individual overrides.

Custom Locales

A Custom Locale is one where the app developer creates a locale of their own choice, for example if they want to show output in different currencies, languages or locations. Qt currently supports this by shipping a sub-set of the CLDR data for the features it supports. This is only useful for a very small set of developers on Windows and Mac at the cost of a lot of data in qtcore. The data is currently required for all other platforms though.

  • Windows provides a limited set of locales depending on the regional version installed or language packs downloaded, and allows passing of the required locale into it's localization api
  • Mac and most Unix systems ship the full set of locales from CLDR/ICU so do not need Qt's CLDR data
  • WinRT and Android may also ship the full set of CLDR data (TODO: confirm)

In short, on all main platforms host data can be utilised to create custom locales, reducing the need to ship Qt's copy of CLDR on those platforms. It can also be argued this is an advanced feature that should be enabled if required by the app developer on platforms that don't provide enough host data, either by enabling Qt's inbuilt data or choosing to build against ICU.

Locale Changes

In theory, all Qt apps should fix the locale used at start-up and not change that locale until the entire UI can be refreshed at the same time. The app should respond to a change locale system event by refreshing the QLocale and then refreshing the GUI. Unfortunately this facility appears to be incomplete on most if not all platforms, with no obvious api available. (More research required).

  • On Windows and Unix, the locale ID is fixed on creation and an internal refresh call is available.
  • On Mac, CFLocaleCopyCurrent is used for every call then released, so the cached object may be reused but this is not guaranteed. No internal refresh call is available

Number Localization

All number localization is done using the internal Qt code:

  • System Locales use cached host data for number symbols
  • Custom Locales use Qt's CLDR data for number symbols
  • All locales use Qt internal code routines, and libdouble where available
  • The Qt internal code only supports single-character number symbols whereas CLDR and host data can be multi-character (TODO: check how many actually are)
  • The Qt internal code does not support alternative number-grouping such as Chinese or Indian.
  • The QLocale API requires code routines for int64, uint64, and double, all other types can be derived from these
  • The Qt internal code is reused by other Qt classes QByteArray, QString, QTextStream, QIpAddress, QVersionNumber, QtGui::QValidator, and in external modules QtXmlPatterns and QtDeclarative (TODO: check if really need to use Qt internal routines, i.e. if only doing non-localized C conversions could use host, or if localized could use public QLocale api)

Ideally Qt would not have to implement it's own number localization code as it is complex and hard to get exactly right for all cases and so imposes a maintenance burden and possibly a performance burden. Ideally we would also support advanced number localization options such as Spell-out, Ordinal, etc as done by ICU-based platforms.

There exists 3 options here:

  • Continue to use our own code, but cleaned-up and enhanced with multi-char and grouping support
  • Use the host facilities
  • Use another external library or the standard C/C++ library

A number of factors affect this decision:

  • Using our own code is a maintenance burden
  • Using the host localization code may not support the functionality used by QLocale or other Qt code
  • The standard C/C++ code options only support basic number formatting and not advanced options as in ICU/WinRT/OSX
  • The Win32 API only supports standard C/C++ library formatting
  • The ICU C API only supports int32, int64 and double, it does not support unsigned int64 until ICU 52
  • The standard C/C++ libraries do not integrate with ICU or Qt's own CLDR data and do not provide easy ways to override the settings used

As some platforms cannot support the minimum Qt localization requirements, we must continue to offer an internal version or use an external library that takes the required symbols. It may not make sense to only use this code for selected platforms and host code on others, so it could be used on all platforms and improved to include the required features. Alternatively we could clean up our existing code to match the new C++11 std:: api calls and provide a thin wrapper around whichever is appropriate to use on a given platform.

Currency Localization

Date/Time Localization

Current QLocale support is as follows:

  • Host system data is used for Mac and Windows System Locales
  • Qt's copy of CLDR data is used for system locales on all other platforms, and for all custom locales on all platforms
  • All date parsing is done by Qt's own code, no host or standard library code is used
  • All custom date formatting (dd-MMM-yyyy etc) is done by Qt's own code, no host or standard library code is used
  • Fixed format date formatting (LongDate, ShortDate, etc) is done by the host on Mac and Wondows, and Qt's own code on all other platforms and custom locales

Ideally Qt would not have to implement it's own date/time localization code as it is complex and hard to get exactly right for all cases and so imposes a maintenance burden and possibly a performance burden.

A number of problems exist:

  • No support in the Qt code for non-Gregorian Calendar Systems, but the host data and formatting may return names and symbols for these
  • Time zone names are not properly supported
  • Not all standard format codes or formats supported
  • Advanced formatting options not supported
  • The format codes are a weird hybrid of Windows and Unicode

One popular option that should be provided is 24-hour time formatting with locales that default to 12-hour based.

Unlike number localization, it may be possible to use the host facilities for all parsing and formatting, except for CLDR which will still require internal Qt routines.

Solution Design

The solution proposed is split into two sections, changes to the current QLocale code to implement what new features and structural changes are possible in Qt5 series, and a new QtGlobalization library for advanced localization.

The QLocale changes would implement the minimum set of new features required:

  • Internal Qt code routines will continue to be used, but only with host platform data and with some extended features
  • Number formatting will support number grouping
  • Calendar System support will be added
  • Custom Locales will be limited to those available natively on the platform. On Win32 this can be supplemented by enabling the CLDR backend, or choosing to use QtGlobalization with ICU instead
  • The Mac platform backend will be completely rewritten to the latest available API
  • The Win32 platform backend will be updated to the latest available API
  • A new WinRT platform backend will be created
  • A new Android platform backend will be created using JNI
  • A new ICU platform backend will be created as the default for Unix platforms
  • The CLDR code will be refactored into a new platform backend as an optional alternative to the ICU backend. The CLDR data included will be configurable to allow embedded platforms to choose what locales and calendars to include
  • A new C platform backend will be created as a base class and final fallback
  • A new QLocaleCode class may be created

Note that there is no minimal subset of these changes that can be implemented, they must be done as one inter-dependent unit, although Calendar Systems can be initially left out.

The QtGlobalization library will be a wrapper around either the host system localization API where an ICU-equivalent API is provided, or around ICU where it is not.

  • Mac and WinRT will use the host API directly
  • Android will use the host API via JNI
  • Windows 7 will use ICU with some customisation to cater for user overrides on the default locale
  • Unix and all other platforms will use ICU directly
  • Features available will be determined at runtime

The exact split of core functions and advanced functions is yet to be determined, as is how to share code required in both.

High Level

Changes to QtCore::QLocale:

  • Remove all custom locale data and only support what locales the host platform does. If an app requires more locales they can choose to use QtGlobalization for this purpose.
  • Remove all internal formatting code
  • Create a new private QPlatformLocale class to implement backends on each platform. The api will match QLocale to improve performance. Each platform will receive a clean implementation to ensure only latest api is cleanly and correctly used.
  • Windows will convert from LCID to Locale Names and will use full Win32 api available, supplemented by fallback code in limited circumstances
  • WinRT will use the new WinRT api instead for improved support
  • Mac will solely use Mac api
  • Linux will use ICU api
  • Embedded will have a minimal fallback C implementation for when platform devs do not require localization
  • A number of key missing features could be added to QLocale to reduce teh need to use QtGlobal, such as 24-hour time formatting

Other QtCore changes:

  • Add QCalendarSystem support across all platforms
  • Update QTimeZone support to the new minimum platform features

New QtGlobal module:

  • A Qt Add-on that ships as part of the main Qt release so it may be relied upon
  • The api implemented will be teh ICU/CLDR style api as implemented on all main platforms
  • The module will implement api support for as advanced a feature set as possible, with devs checking at runtime which features may or may not be available in a local host. Where a dev does call an unavailable feature thsi will degrade gracefully by default
  • On all platforms except Windows 7 uses the full host api available
  • On Windows 7 will be built against ICU and will require ICU to be shipped with the app
  • Any dev may choose to build using ICU for any advanced features not available on their platform.

Long term, once Windows 7 support is dropped, QLocale would be dropped and QtGlobal would be merged into QtCore.

A major issue with this design is the heavy overlap of shared code between QtCore and QtGlobal, i.e. the Linux backend in QtCore will implement ICU support, which will be replicated and enhanced in QtGlobal. This would be a necessary evil however to allow for QtGlobal to be built on Windows 7 using ICU. This would not be possible if QtGlobal was to use a shared private api exported by QtCore. It may still be possible for Mac and WinRT to share their implementations in this way. In this event, the resulting QtGlobal library could be very small as a result.

An alternative is to make the new API part of QtCore but with Windows 7 devs advised that if they don't use ICU then the new API will not actually work. This would be controversial. In any event, teh new code will be easiest developed outside QtCore and migrated later if required.

Implementation Steps

  • CLDR data conversion - Make calendar data an option.
  • CLDR data conversion - Make configurable to only export a given list of locales and calendar systems

Platform Support

While we cannot use ICU directly on all platforms, all platforms except Win32 extended (Windows 7 and Windows Embedded 2013) use ICU or CLDR data as a base and so have a consistent feature set, object model and api that we can design to. Careful API design using new classes for the new features, an API that can be queried for supported features, and sensible fallback options where a feature is unsupported on minor platforms will allow for different levels of support on different platform versions while allowing the latest versions to make most new features available.

Choosing the supported lowest common set of advanced features is slightly tricky depending on whether we choose the lowest deployment version of each platform supported by the Community (say OSX 10.7 supporting ICU 4.6) or the lowest Reference or Officially supported platform (say OSX 10.8 supporting ICU 49). Currently the absolute minimum version would be RHEL using ICU 4.2 but this is very restricting.

As at Qt5.7 the minimum supported platforms are:

Platform Reference Official Community ICU
Windows 7 7 7 Win32 Extended only
Windows Embedded 2013 2013 2013 Win32 Extended only
Windows Phone 8.1 8.1? Own API
Windows RT 8.1 8.1? Own API
Mac OSX 10.8 10.8 10.7? 4.6 or 4.9 via Own API
Mac iOS 8.0 5.0? ? via Own API
Linux Ubuntu 14.04 14.04 11.10? 4.4 or 52
Linux RHEL 6.6 6.6 ? 4.2
Linux OpenSuse 13.1 13.1 ? 51
Android 4.1 (API Level 16) 4.1 (API Level 16) 4.8 via Own API
QNX SDP 6.6 ?

ICU

ICU ships standard on all modern Linux and BSD distros, and is shipped standard with QNX. With POSIX locale functions being clearly deficient (i.e. no calendar system support, poor collation, etc) then this is the preferred back-end for use on all Unix-like platforms, including embedded and QNX. One limiting factor is that the C++ ABI is not stable between versions so the C API must be used instead, and this lacks many of the features of the C++ API.

The defining platform here is RHEL 6.6 which only uses ICU 4.2, but it could be argued that RHEL is mostly a server distro and so localization is perhaps not as high a profile requirement there, so a later target of ICU 51 (OpenSUSE 13.1) could be used with a degraded feature-set on RHEL.

Handling of user-level overrides could occur at app level, but a platform wide solution should be sought (e.g. so KDE can define desktop-wide overrides).

Linux ICU
RHEL 6.6 4.2
Ubuntu 11.10 LTS (Oneiric) 4.4
Ubuntu 12.04 LTS (Precise) 4.8
Ubuntu 14.04 LTS (Trusty) 52
OpenSuse 13.1 51
OpenSuse 13.2 53
QNX 6.6.0 ?

Mac OS X / iOS

Mac uses ICU for localization, but does not ship the ICU headers and prohibits apps that link to ICU from the App Store. Instead we must use the native Mac API which in reality is a thin wrapper around ICU with a simplified API and some Mac convenience methods added. The Mac API also uses any user level overrides which using ICU directly would not. One problem is that the wrapper API's may not have been updated to provide access to new features in each ICU release used.

Qt 5.4 supports 10.6 at community level, and 10.7 at Official and Reference level. Qt 5.5. drops 10.7 as an Official and Reference platform, making OSX 10.8 and ICU 49 the lowest supported version.

OSX ICU
10.6 4.0
10.7 4.6
10.8 49
10.9 51
10.10 53
10.11 55

Reference: http://opensource.apple.com/

Android

Android uses ICU for localization, but only ships the Java version of the library and data files. We will use the Java Android api via JNI.

The ICU version changes in Android are as follows:

Version Codename API Level ICU
2.3.3 Gingerbread 10 4.4
3.0 Honeycomb 11 4.4
4.0 Ice Cream Sandwich 14 4.6
4.1 Jelly Bean 16 4.8
4.3 Jelly Bean 18 50
4.4 Kit Kat 19 51
5.0 Lollipop 21 53

Reference: http://developer.android.com/reference/java/util/Locale.html

Win32

Windows Vista, Windows 7 and Windows Embedded 2013 continue to use the Win32 API for localization functions, supplemented with some new functions using Locale Names in place of LCID's and improved calendar and custom locale data access. Unfortunately, the core number and date formatter calls have not changed and so remain the lowest common denominator and prevent any new advanced formatting API. Initial analysis does indicate that the calendar api is sufficient to implement the minimal required cross-platform support.

Windows Runtime

WinRT and Windows Phone provide advanced localization functions clearly based on CLDR data and broadly comparable to ICU. An initial review of the WinRT api indicates all required features are exposed, but this needs to be fully documented.

Embedded / Fallback

Some embedded platforms may prefer not to ship ICU, or may only have very simple localization requirements. It will be required to provide a fall-back implementation for these platforms. This could be the existing QLocale code and database, but that would be a substantial maintenance burden. The alternative is to ship a simple C-locale back-end or a pass-through to POSIX. One option is to provide good documentation on building the minimal ICU required to support the embedded platform's locales.

API Design

The new API will be implemented as a set of new classes completely separate to the existing QLocale class:

  • QLocaleCode
  • QNumberFormatter
  • QDateTimeFormatter
  • QCalendar
  • QTimeZone

There are a number of very strong reasons for this:

  • This is the design pattern used by ICU, OSX, Windows, and Java that all devs are already familiar with
  • It is more efficient as ICU splits the locale resource files that are loaded by the Number and DateTime formatters, so a monolithic class would take longer to load
  • It represents a clear break with the old API and format codes, making it clearer to devs that behaviour and codes have changed
  • It prevents API bloat by having a single formatter api for different formatter types rather than multiple calls with prefixes or extra enums
  • It allows different levels of feature support for different formatter types on different platforms or versions while keeping the same simple api across them all

The old QLocale backend code will be removed and replaced with calls to the new code. It is expected this compatibility layer will take considerable effort and testing to ensure backwards compatible behaviour.

ICU Analysis

This section analyses the available ICU API for a common feature set. This analysis is based on the ICU 4.2 C api as used in RHEL 6.6, but will also analyse later versions for new features.

Number Formatting

Date/Time Formatting

Calendar Support

Note ICU Calendar C API merges Calendar and Time Zone into one 'object'.

Supported range in ICU 56: Julian day numbers of -0x7F000000 to +0x7F000000. This corresponds to years from ~5,800,000 BCE to ~5,800,000 CE. Was previously wider range.

Calendar Systems:

  • "gregorian"
  • "chinese"
  • "coptic"
  • "ethiopic"
  • "taiwan"
  • "indian"
  • "persian"
  • "japanese"
  • "buddist"
  • "islamic"
  • "hebrew"
  • "dangi" ICU 50 Java ICU 51 C
  • "islamic-umalqura" ICU 52
  • "islamic-tbla" ICU 52

Date/Time Components:

  • UCAL_ERA
  • UCAL_YEAR
  • UCAL_MONTH
  • UCAL_WEEK_OF_YEAR
  • UCAL_WEEK_OF_MONTH
  • UCAL_DAY_OF_YEAR
  • UCAL_DAY_OF_WEEK
  • UCAL_DAY_OF_WEEK_IN_MONTH
  • UCAL_AM_PM
  • UCAL_HOUR
  • UCAL_HOUR_OF_DAY
  • UCAL_MINUTE
  • UCAL_SECOND
  • UCAL_MILLISECOND
  • UCAL_ZONE_OFFSET
  • UCAL_DST_OFFSET
  • UCAL_YEAR_WOY
  • UCAL_DOW_LOCAL
  • UCAL_EXTENDED_YEAR
  • UCAL_JULIAN_DAY
  • UCAL_MILLISECONDS_IN_DAY
  • UCAL_IS_LEAP_MONTH
  • UCAL_FIELD_COUNT
  • UCAL_DAY_OF_MONTH

Other:

  • Min/Max values
  • Lenient
  • Gregorian change date
  • First day of week
  • Min days in first week
  • ICU 4.4 Weekday/Weekend
  • ICU 49 Ambiguous Wall Time
  • ICU 50 TZ Transitions
  • ICU 51 Get TZ ID
  • ICU 52 Windows TZ ID

Mac Analysis

This section analyses the available Mac API for a common feature set. This analysis is based on a minimum OSX 10.8 but will also analyse later versions for new features.

Number Formatting

Date/Time Formatting

Calendar Support

Calendar Systems

  • API Docs say enum deprecated since 10.10?
  • NSGregorianCalendar
  • NSBuddhistCalendar
  • NSChineseCalendar
  • NSHebrewCalendar
  • NSIslamicCalendar
  • NSIslamicCivilCalendar
  • NSJapaneseCalendar
  • NSRepublicOfChinaCalendar
  • NSPersianCalendar
  • NSIndianCalendar
  • NSISO8601Calendar - Doc says not implemented?
  • OSX 10.11 System Preferences lists all above as available, plus Amete Alem, Umm al-Qura, Islamic Tabular

Win Analysis

This section analyses the available Windows API for a common feature set. This analysis is based on Windows 7, but will also analyse later versions for new features.

Number Formatting

Date/Time Formatting

Calendar Support

Components

QTimeZone

QTimeZone has been successfully implemented in Qt 5.2 using separate back-ends for each system but a common API. The design of this class will be copied for much of the new QLocale implementation, especially QCalendar.

In 5.3 a number of new features are still required, including a QEvent for TimeZoneChanged and a QTimeZoneDatabase class to load TZ databases on any platform.

Old design details can be found at http://wiki.qt.io/Qt-5-QTimeZone

QCalendar

QCalendar will follow the design of QTimeZone to wrap the system provided calendar calculators.

Not all platforms equally support the same set of calendar systems, although this is slowly converging thanks to the increasing use of CLDR and ICU. While QCalendar will have to define an enum for all possible calendar systems, it will also have to provide an availableCalendarSystems() api to describe what systems are available to be set in the formatter.

QCalendar will implement baseline support for as many calendars as possible so that even where a calendar system is not available on a host system it may still be used as an optional calculation class, but not in the formatter.

ICU C++: http://icu-project.org/apiref/icu4c/classicu_1_1Calendar.html

Calendar ICU Mac Android WinRT Win32 KDE
Gregorian gregorian
Chinese Lunar chinese
Coptic coptic
Ethiopic ethiopic
Ethiopic Al Amate ???
Indian National indian
Jalali persian
Hebrew hebrew
Islamic Civil islamic
Islamic Um-al-Qura slamic-umalqura (ICU 52)
Islamic Tabular islamic-tbla (ICU 52)
Japanese (Gregorian) japanese
Dangi (Korean Lunar) dangi (ICU 51)
Taiwan (Gregorian) taiwan
Thai (Gregorian) buddist