QtCS2024 AI tooling for Qt developers: Difference between revisions

From Qt Wiki
Jump to navigation Jump to search
(Qt Contributor Summit 2024 Session: AI tooling for Qt developers)
 
No edit summary
Line 1: Line 1:
==Session Summary==
Qt already has a network of bots which augment development workflows-- Cherry-Pick Bot, Submodule Update Bot, Flake8 Bot for Python, and so on.
Qt already has a network of bots which augment development workflows-- Cherry-Pick Bot, Submodule Update Bot, Flake8 Bot for Python, and so on.


Line 13: Line 14:
** Limitation of only analyzing atomic changes, cannot take in a full relation chain or topic.
** Limitation of only analyzing atomic changes, cannot take in a full relation chain or topic.
*** Sometimes results in blaming multiple changes as the cause of failure with ambiguous analysis results, but even so, remains usually correct about the changes being related.
*** Sometimes results in blaming multiple changes as the cause of failure with ambiguous analysis results, but even so, remains usually correct about the changes being related.
==Session Owners==
[mailto:daniel.smith@qt.io Daniel Smith]
==Notes==
[[Category:QtCS2024]]

Revision as of 13:23, 5 September 2024

Session Summary

Qt already has a network of bots which augment development workflows-- Cherry-Pick Bot, Submodule Update Bot, Flake8 Bot for Python, and so on.

With the rise of LLMs, a couple bots in this vain have been implemented with pretty good success, even considering limitations.

  • API Header Review Bot - Identifies changes to public headers, summarizes them, and flags the change for review before the next release
    • Uses GPT-4 for analysis. Generally good results, but in current state, inputs are not comprehensive and do not represent a full "API change" across multiple change reviews.
    • Useful enough to at least flag changes.
  • CI Failure Analysis Bot - Analyzes failure log, test sources, and change diff to determine if the change caused the failure. May suggest fixes if obvious.
    • Very good results during a Proof-of-concept trial run in Qt Company bugfix Sprint H2 2024.
    • Guessing at least 90% accuracy for changes causing/not causing the CI failure based on manual sampling and review of outputs.
    • Identification of infrastructure issues as cause of failure.
    • Identification of flaky tests as cause of failure.
    • Limitation of 128k context, covers all but the largest changes.
    • Limitation of only analyzing atomic changes, cannot take in a full relation chain or topic.
      • Sometimes results in blaming multiple changes as the cause of failure with ambiguous analysis results, but even so, remains usually correct about the changes being related.

Session Owners

Daniel Smith

Notes