Lupdate clang-based c++ parser

From Qt Wiki
Jump to navigation Jump to search

Update: The clang-based C++ parser of lupdate was removed

The clang-based C++ parser was removed in Qt 6.10. The removal commit is here: https://codereview.qt-project.org/c/qt/qttools/+/622277

Rationale:

Why was lupdate/clang done?

The C++ language gained new language features that lupdate/classic could not handle. The hand-written C++ "parser" was seen as unmaintainable. Adding new C++ language constructs to the lupdate/classic was a task nobody wanted to tackle.

The silver bullet to this was to use an existing C++ parser and "just use it".

What are the problems of lupdate/clang?

Now, that we have a full-blown C++ parser at our hands, why should we remove it again? All is fine, isn't it?

lupdate/clang needs setup

Being a full-blown C++ parser means that lupdate/clang cannot skip over uninteresting things. It's needs the full picture of the project, including include paths and defines.

The user must pass a compile_commands.json file. CMake can help you with that. For QMake, you need a separate tool. The user must point lupdate/clang to that JSON file. The user might have to specify separate project roots.

Internally, lupdate/clang must retrieve the compiler's default include paths and platform defines, and all this is sometimes breaking unexpectedly. Especially new CI platforms, e.g. Windows on ARM, suffer from this. The autotests for lupdate/clang have often been the culprit of flakiness in the qttools repository.

lupdate/clang is slow

Being a full-blown C++ parser means that we're way slower than the simple lupdate/classic. And people even complained about performance issues of _that_.

lupdate/clang does not see all of the code

Being a full-blown C++ parser, including a real C preprocessor, lupdate/clang can only see the code that is compiled for the current platform.

On the other hand, lupdate/classic can extract all these translatable strings:

  #if defined(Q_OS_WIN) 
      show(tr("I'm on Windows!")); 
  #elif defined(Q_OS_LINUX) 
      show(tr("I'm on Linux!")); 
  #else 
      show(tr("I'm on some other OS!")); 
  #endif 

This is an unfixable behavior difference.

The only thing we can do is to document this behavior and fix Qt and Qt Creator to move all translatable strings out of platform-specific code paths. But personally, I have the feeling that this is a no-go for an internationalization tool.

lupdate/clang is not used by anyone

Because of the above points, lupdate/clang has no users outside of our CI to our knowledge.

Do we actually need a full-blown C++ parser in lupdate?

To answer this question, we must understand what the C++ parser is used for: retrieving the context of the translatable string.

In the following code, lupdate extracts the string "Hi there!" and determines that it's in the context PreferencesDialog.

  class PreferencesDialog : public QDialog { 
  public: 
      PreferencesDialog(QWidget *parent = nullptr) 
          : QDialog(parent) 
      {} 
 
      void sayHi(); 
  }; 
 
  void PreferencesDialog::sayHi() 
  { 
      tr("Hi there!"); 
  }

That's it. Should lupdate ever fail to recognize the context, we can work around the issue like this:

  void PreferencesDialog::sayHi()
  { 
      PreferencesDialog::tr("Hi there!");    // or QCoreApplication::translate("PreferencesDialog", "Hi there!"); 
  } 

What seems like a nice idea at first will kick you when refactoring your code base. Let's say, someone renames <code>PreferencesDialog</code> to <code>SettingsDialog</code>. All the translatable strings in that class have a different context now, creating more work for translators.

Projects like Qt Creator forcefully set the context for large parts of their code bases to avoid such issues.

My conclusion here is: at a certain project size, people will set the context per directory, library, plugin, or other project unit. And all automatic context extraction is all for nothing, meaning that having a full-blown C++ parser is not that important for lupdate.

Another point in favor of this claim: we're offering id-based translation, documented as being "industrial strength". This also doesn't need context extraction at all.

Is lupdate/classic actually unmaintainable?

I don't think so. The following list of fixed issues should be proof for that:

  • QTBUG-110949
  • QTBUG-59802
  • QTBUG-36589
  • QTBUG-99415
  • QTBUG-91521
  • QTBUG-53326

Conclusion

lupdate/clang has been an interesting experiment, and it's nice to see how far it came. However, given all above, we should remove it.

Introduction

lupdate is part of qttools/linguist module. It allows parsing of the code in order to retrieve the text to be translated. It has a customised c++ parser, with its limitations. In order to overcome those limitations and modernize the parser, a new clang-based c++ parser is added to lupdate. The new parser is available as an option, accessible with -clang-parser.

The new clang-based lupdate c++ parser is based on clang tooling. Using clang tooling instead of libclang gives access to the AST in full and allows the construction of avery customized parser. Information about llvm and clang: Information about AST:

Here is documented some of the steps taken in order to implement this parser.

Clang-environment and Qt configuration

In order to add clang related code to lupdate, clang needs to be available in the environment. QDoc is already checking for clang availability, at qt configuration level, in configure.pri and configure.json. Those files were originally located in qttools/src/qdoc.

Changes: Those files have been placed somewhere accessible to both qdoc and lupdate. Since lupdate clang-based c++ parser needs other libraries than qdoc, lupdate dedicated checks have been added.

Note: if the clang environment is not adapted to the new lupdate clang-based parser, nothing is breaking, the new parser option is simply not be available. Nothing is changing for qdoc.

How to make clang-environment checks available for both qdoc and lupdate?

Simply moving the configure.json and configure.pri files upstream, at qttools level does NOT work (because the configuration headers don't end up in a place that can be accessed from anywhere)

Solution: Creating a "header only" module:

  • name = tools
  • location = qttools/src/global

The configure.json and configure.pri are now there. A "header only" module is a module that only provides header files. In our case: qttools-config.h and qttools-config_p.h. Those files are created at configuration time and placed in builtDIR/qttools/src/global/

To have those files available in a reliable manner at compilation time: add "QtTools" => "$basedir/src/global" in qttools/sync.profile which creates: builtDir/qtbase/include/QtTools/qttools-config.h builtDir/qtbase/include/QtTools/QT_VERSION/QtTools/private/qttools-config_p.h

and now we can access those headers from the code using \#include <QtTools/private/qttools-config_p.h>

lupdate clang-based parser special needs

The new lupdate clang-based parser is based on clang tooling, which requires clang-c++ and llvm libraries. Also requires clang version 7.0.0 or higher, otherwise llvm/Support/InitLLVM.h header is missing. So, unlike qdoc, lupdate needs clang-c++ and llvm libraries. They are not part of the libclang which contains only clang-c related libraries.

For qdoc the feature clang, checking for clang environment (directory, version, and libclang aka libclang-c) needs to be ON. For lupdate, a new feature clang_cpp, checking for additional clang libraries (clang c++ and llvm libraries) needs to be ON. This new feature is added in configure.json. It is ON if:

  • the clang feature is ON, technically if the libclang-test associated with the clang feature is passed
  • the variable 'has_clangcpp' is equal to true. This variable is set within the libclang-test.

This new clang_cpp feature does not rely on its own test, but on the already existing libclang-test which is expended for lupdate clang-based parser needs, without modifying it's behavior for qdoc.

LLVM/clang libraries availability

Unlike libclang (the c related clang libraries needed for qdoc), the clang c++ and llvm libraries availability changes from one build to another and from one pre-built package to an other. Having a valid clang environment for qdoc does not imply having one for lupdate new clang-based parser. Let's try to give an overview of the situation: When building LLVM, clang-c++ and llvm libraries are by default installed as static (only since may-june 2019, an option exist to export the clang-c++ libraries as shared using: CLANG_LINK_CLANG_DYLIB=ON, https://reviews.llvm.org/D61909).

When installing an official pre-built LLVM package (http://releases.llvm.org/download.html), clang-c++ libraries are either:

  • not available (Windows)
  • available as static (all the rest)

There is also the possibility for the user to install a pre-built llvm package from http://download.qt.io/development_releases/prebuilt/libclang/qt. In those packages, all the clang libraries are available as static for all platforms.

To summarize

in the configure.pri, to fit lupdate new clang-based parser needs, we need to check for the existence of clang c++ and llvm libraries, as static or dynamic, and if none are present, the feature clang_cpp needs to be turned off. Additionally, one needs to check that clang version is at least 8.0.0 in order to have the proper headers available.

Basic squeleton of the c++ parser clang-tool or how to built a clang tool

here summarize the basic things needed to built a clang tool