Rethinking serialization for Qt6: Difference between revisions

From Qt Wiki
Jump to navigation Jump to search
No edit summary
(Add to category QtCS2019)
 
(5 intermediate revisions by one other user not shown)
Line 1: Line 1:
[[Category:QtCS2019]]
'''Arnaud Clère'''
'''Arnaud Clère'''


Serialization is an old problem, still, we keep writing code to serialize C++ data in specific ways again and again. With Qt5 for instance, you may have to code: QDebug << to debug it, QDataStream << and >> to marshal it to another Qt application, use QSettings to make it persistent, QJson* or QXml* to convey it on the web, QCbor* for the IoT, and QAbstractModelItem for the DB/GUI. Even though such code needs to be customized here and there, it is mostly boilerplate code. So, can we make this simpler for Qt6?  
''Serialization is an old problem, still, we keep writing code to serialize C++ data in specific ways again and again. With Qt5 for instance, you may have to code: QDebug << to debug it, QDataStream << and >> to marshal it to another Qt application, use QSettings to make it persistent, QJson* or QXml* to convey it on the web, QCbor* for the IoT, and QAbstractModelItem for the DB/GUI. Even though such code needs to be customized here and there, it is mostly boilerplate code. So, can we make this simpler for Qt6?''


Indeed, I will present a solution that enables to read/write C++ data from/to any of those APIs by defining a single function which can be easily customized to specific needs. Its runtime overhead being almost negligible, I will go on talking about many data formats from QDataStream to XML since that is where the actual performance/safety/interoperability tradeoffs are made. That should trigger an interesting discussion on the tradeoffs made by the only broadly implemented serialization for Qt types: QDataStream.
''Indeed, I will present a solution that enables to read/write C++ data from/to any of those APIs by defining a single function which can be easily customized to specific needs. Its runtime overhead being almost negligible, I will go on talking about many data formats from QDataStream to XML since that is where the actual performance/safety/interoperability tradeoffs are made. That should trigger an interesting discussion on the tradeoffs made by the only broadly implemented serialization for Qt types: QDataStream.''


Finally, I hope to trigger enough interest from the community to review and polish this proposal, and to enable more serialization choices for most Qt6 types.
''Finally, I hope to trigger enough interest from the community to review and polish this proposal, and to enable more serialization choices for most Qt6 types.''


'''Notes'''
[https://gricad-gitlab.univ-grenoble-alpes.fr/modmed/modmedLog/blob/master/tests/QBind/Rethinking%20serialization.pdf] Presentation of potential use cases, benchmark, and comparison of various data formats (protobuf, qdatastream, cbor, json, xml)


[https://gricad-gitlab.univ-grenoble-alpes.fr/modmed/modmedLog/blob/master/tests/QBind/Rethinking%20serialization.pdf]Presentation of potential use cases, benchmark, and comparison of various data formats (protobuf, qdatastream, cbor, json, xml)
[https://gricad-gitlab.univ-grenoble-alpes.fr/modmed/modmedLog/tree/master/tests/QBind] Full proof-of-concept code with examples and benchmark


=== '''Discussion''' ===
* Json API is based on Cbor since a couple of weeks
* Json API is based on Cbor since a couple of weeks
yes, public API unchanged: construct generic QJsonValue (backed by QCborValue), then write using QJsonDocument


public API unchanged: construct generic QJsonValue (backed by QCborValue), then write using QJsonDocument
High write performance need to serialize data without necessarily constructing a generic data structure in memory
High write performance need to serialize data without necessarily constructing a generic data structure in memory (almost no memory allocation involved)


* QValue/QValueStatus as an interface for serializing data, using:
Also, reading is tedious with these APIs


value.record(),
QValue/QValueStatus as an interface for (de)serializing data is more flexible, like:
    .bind() (with recursion included)


* Or use runtime reflection for getting the information related to the meta-object
value.record().bind(persons) // with recursion into person.children included


'''Discussion'''
It can use runtime reflection too for getting the information related to the meta-object, like Q_DEFINE_ZAP_WITH_QMETAOBJECT (a single macro below Q_GADGET enables de/serialization)


* What is the proposal for Qt6?
* What is the proposal for Qt6?


Patch on github (for high-level design and API review, no tests, so not suitable for gerrit)
[https://github.com/arnaud-clere/qtbase/compare/63a1a30a014eb75a67c390a16faa9aeb03a4a012...HEAD]  Patch on github (for high-level design and API review, no tests, so not suitable for gerrit yet)
[https://github.com/arnaud-clere/qtbase/compare/63a1a30a014eb75a67c390a16faa9aeb03a4a012...HEAD]
 
Start enabling CBorStreamWriter with the value() method, to provide the QValue interface.
Start enabling CBorStreamWriter with the value() method, to provide the QValue interface.
Flexibility and Safety are the base on this approach.
 
Good performance, max Flexibility without sacrificing Safety are the goals of this approach.


* What about performance?
* What about performance?
Line 38: Line 40:
* Is this Meta Data Format?
* Is this Meta Data Format?


Mainly data (json-like) + metadata to provide flexibility (QAbstractItemModel tree, table, color, CBOR tags)
Mainly data (json-like) + metadata like, say, in a .proto file, to provide flexibility (QAbstractItemModel tree, table, color, CBOR tags)


* Does it need to be runtime? What about compile time?
* Does it need to be runtime? What about compile time?
replacing if (value->mode()) with if constexpr did not show notable improvement, already very low
* What about breaking up the patch?
Presentation is about potential use cases of such API
Idea is to include this step-by-step, starting with cbor + json (read+write).
* Boost serialization has a similar idea in place, what about it? Worth comparing it.
Boost serialization was among the first ones studied. 
Archive type provide read/write mode similar to QCborStreamWriter/Reader.


* What is you break up the patch?
Less flexibility to provide some data types, like XML.  
* The idea could be to include this step-by-step.


* Boost serialization has a similar idea in place, what about it?
Boost does not compare well in publicly available benchmarks.  
* Worth comparing it.


Archive type provide read/write mode similar to QCborStreamWriter/Reader.
protobuf is more interesting to compare with.
Less flexibility to provide some data types, like XML.
Boost does not compare well in publicly available benchmarks.
protobuf is more interesting to compare with


QDataStream: code is the schema.
With QDataStream: code is the schema.
protobuf: Move problem to another external file (.proto) that still need to be managed in a central way.
 
With protobuf: move the problem to another external file (.proto) that still need to be managed in a central way.


* What about the provisions that are provided?
* What about the provisions that are provided?
Line 60: Line 69:


Still you need to update the receiving part with the updated schema : not always possible
Still you need to update the receiving part with the updated schema : not always possible
In contrast with Cbor, you can always process the data at runtime without a schema (protobuf doesn't)
In contrast with Cbor, you can always process the data at runtime without a schema (protobuf doesn't)


Overall, there has to be a tradeoff between performance, safety, convenience.
Overall, there has to be a tradeoff between performance, safety, convenience.
CBOR seems very well placed to me on the design space.
CBOR seems very well placed to me on the design space.
=== Action items after the session: ===
* Present ideas to Thiago that was in another session:
DONE => API is valuable to Qt users, API need to be discussed
* Thiago and other approvers must provide feedback before Arnaud develops unit tests and pushes anything to gerrit

Latest revision as of 10:29, 25 November 2019

Arnaud Clère

Serialization is an old problem, still, we keep writing code to serialize C++ data in specific ways again and again. With Qt5 for instance, you may have to code: QDebug << to debug it, QDataStream << and >> to marshal it to another Qt application, use QSettings to make it persistent, QJson* or QXml* to convey it on the web, QCbor* for the IoT, and QAbstractModelItem for the DB/GUI. Even though such code needs to be customized here and there, it is mostly boilerplate code. So, can we make this simpler for Qt6?

Indeed, I will present a solution that enables to read/write C++ data from/to any of those APIs by defining a single function which can be easily customized to specific needs. Its runtime overhead being almost negligible, I will go on talking about many data formats from QDataStream to XML since that is where the actual performance/safety/interoperability tradeoffs are made. That should trigger an interesting discussion on the tradeoffs made by the only broadly implemented serialization for Qt types: QDataStream.

Finally, I hope to trigger enough interest from the community to review and polish this proposal, and to enable more serialization choices for most Qt6 types.

[1] Presentation of potential use cases, benchmark, and comparison of various data formats (protobuf, qdatastream, cbor, json, xml)

[2] Full proof-of-concept code with examples and benchmark

Discussion

  • Json API is based on Cbor since a couple of weeks

yes, public API unchanged: construct generic QJsonValue (backed by QCborValue), then write using QJsonDocument

High write performance need to serialize data without necessarily constructing a generic data structure in memory

Also, reading is tedious with these APIs

QValue/QValueStatus as an interface for (de)serializing data is more flexible, like:

value.record().bind(persons) // with recursion into person.children included

It can use runtime reflection too for getting the information related to the meta-object, like Q_DEFINE_ZAP_WITH_QMETAOBJECT (a single macro below Q_GADGET enables de/serialization)

  • What is the proposal for Qt6?

[3] Patch on github (for high-level design and API review, no tests, so not suitable for gerrit yet)

Start enabling CBorStreamWriter with the value() method, to provide the QValue interface.

Good performance, max Flexibility without sacrificing Safety are the goals of this approach.

  • What about performance?

Already very low overhead indeed

  • Is this Meta Data Format?

Mainly data (json-like) + metadata like, say, in a .proto file, to provide flexibility (QAbstractItemModel tree, table, color, CBOR tags)

  • Does it need to be runtime? What about compile time?

replacing if (value->mode()) with if constexpr did not show notable improvement, already very low

  • What about breaking up the patch?

Presentation is about potential use cases of such API

Idea is to include this step-by-step, starting with cbor + json (read+write).

  • Boost serialization has a similar idea in place, what about it? Worth comparing it.

Boost serialization was among the first ones studied.

Archive type provide read/write mode similar to QCborStreamWriter/Reader.

Less flexibility to provide some data types, like XML.

Boost does not compare well in publicly available benchmarks.

protobuf is more interesting to compare with.

With QDataStream: code is the schema.

With protobuf: move the problem to another external file (.proto) that still need to be managed in a central way.

  • What about the provisions that are provided?
  • There are protocols that handle version schemas.

Still you need to update the receiving part with the updated schema : not always possible

In contrast with Cbor, you can always process the data at runtime without a schema (protobuf doesn't)

Overall, there has to be a tradeoff between performance, safety, convenience.

CBOR seems very well placed to me on the design space.

Action items after the session:

  • Present ideas to Thiago that was in another session:

DONE => API is valuable to Qt users, API need to be discussed

  • Thiago and other approvers must provide feedback before Arnaud develops unit tests and pushes anything to gerrit