Rethinking serialization for Qt6
Arnaud Clère
Serialization is an old problem, still, we keep writing code to serialize C++ data in specific ways again and again. With Qt5 for instance, you may have to code: QDebug << to debug it, QDataStream << and >> to marshal it to another Qt application, use QSettings to make it persistent, QJson* or QXml* to convey it on the web, QCbor* for the IoT, and QAbstractModelItem for the DB/GUI. Even though such code needs to be customized here and there, it is mostly boilerplate code. So, can we make this simpler for Qt6?
Indeed, I will present a solution that enables to read/write C++ data from/to any of those APIs by defining a single function which can be easily customized to specific needs. Its runtime overhead being almost negligible, I will go on talking about many data formats from QDataStream to XML since that is where the actual performance/safety/interoperability tradeoffs are made. That should trigger an interesting discussion on the tradeoffs made by the only broadly implemented serialization for Qt types: QDataStream.
Finally, I hope to trigger enough interest from the community to review and polish this proposal, and to enable more serialization choices for most Qt6 types.
Notes
[1]Presentation of potential use cases, benchmark, and comparison of various data formats (protobuf, qdatastream, cbor, json, xml)
- Json API is based on Cbor since a couple of weeks
public API unchanged: construct generic QJsonValue (backed by QCborValue), then write using QJsonDocument High write performance need to serialize data without necessarily constructing a generic data structure in memory (almost no memory allocation involved)
- QValue/QValueStatus as an interface for serializing data, using:
value.record(),
.bind() (with recursion included)
- Or use runtime reflection for getting the information related to the meta-object
Discussion
- What is the proposal for Qt6?
Patch on github (for high-level design and API review, no tests, so not suitable for gerrit) [2] Start enabling CBorStreamWriter with the value() method, to provide the QValue interface. Flexibility and Safety are the base on this approach.
- What about performance?
Already very low overhead indeed
- Is this Meta Data Format?
Mainly data (json-like) + metadata to provide flexibility (QAbstractItemModel tree, table, color, CBOR tags)
- Does it need to be runtime? What about compile time?
- What is you break up the patch?
* The idea could be to include this step-by-step.
- Boost serialization has a similar idea in place, what about it?
* Worth comparing it.
Archive type provide read/write mode similar to QCborStreamWriter/Reader. Less flexibility to provide some data types, like XML. Boost does not compare well in publicly available benchmarks. protobuf is more interesting to compare with
QDataStream: code is the schema. protobuf: Move problem to another external file (.proto) that still need to be managed in a central way.
- What about the provisions that are provided?
- There are protocols that handle version schemas.
Still you need to update the receiving part with the updated schema : not always possible In contrast with Cbor, you can always process the data at runtime without a schema (protobuf doesn't)
Overall, there has to be a tradeoff between performance, safety, convenience. CBOR seems very well placed to me on the design space.