QtCS2024 AI tooling for Qt developers

Session Summary

Qt already has a network of bots which augment development workflows-- Cherry-Pick Bot, Submodule Update Bot, Flake8 Bot for Python, and so on.

With the rise of LLMs, a couple bots in this vain have been implemented with pretty good success, even considering limitations.

API Header Review Bot - Identifies changes to public headers, summarizes them, and flags the change for review before the next release
- Uses GPT-4 for analysis. Generally good results, but in current state, inputs are not comprehensive and do not represent a full "API change" across multiple change reviews.
- Useful enough to at least flag changes.
CI Failure Analysis Bot - Analyzes failure log, test sources, and change diff to determine if the change caused the failure. May suggest fixes if obvious.
- Very good results during a Proof-of-concept trial run in Qt Company bugfix Sprint H2 2024.
- Guessing at least 90% accuracy for changes causing/not causing the CI failure based on manual sampling and review of outputs.
- Identification of infrastructure issues as cause of failure.
- Identification of flaky tests as cause of failure.
- Limitation of 128k context, covers all but the largest changes.
- Limitation of only analyzing atomic changes, cannot take in a full relation chain or topic.
  - Sometimes results in blaming multiple changes as the cause of failure with ambiguous analysis results, but even so, remains usually correct about the changes being related.

Daniel has been working will LLMs and the Qt review systems.

A primer on Large Language Models and Daniels findings and learnings.

Use Case 1:API Change Identification: In production, it will request you to give feedback on a particular bug ticket.

Use Case2: CI Failure Analysis (work in progress)