QtCS2024 AI tooling for Qt developers

From Qt Wiki
Revision as of 10:48, 19 September 2024 by Dasmith (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Session Summary

Qt already has a network of bots which augment development workflows-- Cherry-Pick Bot, Submodule Update Bot, Flake8 Bot for Python, and so on.

With the rise of LLMs, a couple bots in this vain have been implemented with pretty good success, even considering limitations.

  • API Header Review Bot - Identifies changes to public headers, summarizes them, and flags the change for review before the next release
    • Uses GPT-4 for analysis. Generally good results, but in current state, inputs are not comprehensive and do not represent a full "API change" across multiple change reviews.
    • Useful enough to at least flag changes.
  • CI Failure Analysis Bot - Analyzes failure log, test sources, and change diff to determine if the change caused the failure. May suggest fixes if obvious.
    • Very good results during a Proof-of-concept trial run in Qt Company bugfix Sprint H2 2024.
    • Guessing at least 90% accuracy for changes causing/not causing the CI failure based on manual sampling and review of outputs.
    • Identification of infrastructure issues as cause of failure.
    • Identification of flaky tests as cause of failure.
    • Limitation of 128k context, covers all but the largest changes.
    • Limitation of only analyzing atomic changes, cannot take in a full relation chain or topic.
      • Sometimes results in blaming multiple changes as the cause of failure with ambiguous analysis results, but even so, remains usually correct about the changes being related.

Session Owners

Daniel Smith

Notes

Daniel has been working will LLMs and the Qt review systems.

A primer on Large Language Models and Daniels findings and learnings.

Use Case 1:API Change Identification: In production, it will request you to give feedback on a particular bug ticket.

Use Case2: CI Failure Analysis (work in progress)

Prompt engineering, leveraging json structure to get the LLM to provide useful analysis.

looking for ways to provide feedback on the ticket.

looking for new ideas, must be non-intrusive, no high false-positive rate.

Q and A and Discussion

RAG database would help provide better contextual knowledge

Discussion about training the models with feedback (using CI failure analysis as the example) the method of fine tuning has not shown much promise so far.

How to minimize hallucinations, etc.

Slide deck from the session