String Handling: Difference between revisions

From Qt Wiki
Jump to navigation Jump to search
No edit summary
(bullet-ize the last item)
 
(One intermediate revision by the same user not shown)
Line 13: Line 13:
** Comparison cannot early-exit based on size (when comparing to L1 or U16)
** Comparison cannot early-exit based on size (when comparing to L1 or U16)
** Indexing (too many muti-code-unit encodings; UTF-16 has fewer)
** Indexing (too many muti-code-unit encodings; UTF-16 has fewer)
** Searching (no Boyer-Moore for non-ASCII)
** Searching (no Boyer-Moore for general UTF-8)
* Suggestion of where to go:
* Suggestion of where to go:
** complete mixed string operations (comparison, searching, tokenisation)
** complete mixed string operations (comparison, searching, tokenisation)
Line 25: Line 25:


* QAnyStringView is missing a lot of API
* QAnyStringView is missing a lot of API
** because the underlying Q*StringView are also lacking such
** because QUtf8StringView are also lacking such


* Thiago requests that we make proof-of-concept of the final state of this API
* Thiago requests that we make proof-of-concept of the final state of this API
** Avoid breaking user code (e.g., implicit conversions to QString)
** Avoid breaking user code (e.g., implicit conversions to QString)
** Create a Library API Design document what to craete
** Create a Library API Design document what to create
*** Needs a plan for getting there from where we are
*** Needs a plan for getting there from where we are
*** Return types
*** Return types
*** Where to keep simple QStrings (for other develoeprs not working on QtCore)
*** Where to keep simple QStrings (for other develoeprs not working on QtCore)


QString considers equal only strings that have the exact same code units. Equivalence based on Unicode transforms (NFD, NFC) is not taken into account, but there's API to do the conversions if needed.
* QString considers equal only strings that have the exact same code units.
** Equivalence based on Unicode transforms (NFD, NFC) is not taken into account
*** but there's API to do the conversions if needed.

Latest revision as of 15:38, 30 November 2023


Session Summary

Session Owners

Marc Mutz

Notes

Issues:

  • QByteArray doubles as QUtf8String
  • UTF-8 is not a good in-memory format:
    • Comparison cannot early-exit based on size (when comparing to L1 or U16)
    • Indexing (too many muti-code-unit encodings; UTF-16 has fewer)
    • Searching (no Boyer-Moore for general UTF-8)
  • Suggestion of where to go:
    • complete mixed string operations (comparison, searching, tokenisation)
    • add UTF-8 searching and tokenisation
    • add UTF-32 (for Python compat)
    • add owning versions of all
  • Can QString get more constexpr support?
    • Unlikely, it requires more C++ support
    • QString has implicit sharing and a lot of out-of-line API
  • QAnyStringView is missing a lot of API
    • because QUtf8StringView are also lacking such
  • Thiago requests that we make proof-of-concept of the final state of this API
    • Avoid breaking user code (e.g., implicit conversions to QString)
    • Create a Library API Design document what to create
      • Needs a plan for getting there from where we are
      • Return types
      • Where to keep simple QStrings (for other develoeprs not working on QtCore)
  • QString considers equal only strings that have the exact same code units.
    • Equivalence based on Unicode transforms (NFD, NFC) is not taken into account
      • but there's API to do the conversions if needed.