QtCS2024 AI tooling for Qt developers: Difference between revisions
		
		
		
		
		
		Jump to navigation
		Jump to search
		
				
		
		
	
 (Qt Contributor Summit 2024 Session: AI tooling for Qt developers)  | 
				No edit summary  | 
				||
| Line 1: | Line 1: | ||
==Session Summary==  | |||
Qt already has a network of bots which augment development workflows-- Cherry-Pick Bot, Submodule Update Bot, Flake8 Bot for Python, and so on.  | Qt already has a network of bots which augment development workflows-- Cherry-Pick Bot, Submodule Update Bot, Flake8 Bot for Python, and so on.  | ||
| Line 13: | Line 14: | ||
** Limitation of only analyzing atomic changes, cannot take in a full relation chain or topic.  | ** Limitation of only analyzing atomic changes, cannot take in a full relation chain or topic.  | ||
*** Sometimes results in blaming multiple changes as the cause of failure with ambiguous analysis results, but even so, remains usually correct about the changes being related.  | *** Sometimes results in blaming multiple changes as the cause of failure with ambiguous analysis results, but even so, remains usually correct about the changes being related.  | ||
==Session Owners==  | |||
[mailto:daniel.smith@qt.io Daniel Smith]  | |||
==Notes==  | |||
[[Category:QtCS2024]]  | |||
Revision as of 13:23, 5 September 2024
Session Summary
Qt already has a network of bots which augment development workflows-- Cherry-Pick Bot, Submodule Update Bot, Flake8 Bot for Python, and so on.
With the rise of LLMs, a couple bots in this vain have been implemented with pretty good success, even considering limitations.
- API Header Review Bot - Identifies changes to public headers, summarizes them, and flags the change for review before the next release
- Uses GPT-4 for analysis. Generally good results, but in current state, inputs are not comprehensive and do not represent a full "API change" across multiple change reviews.
 - Useful enough to at least flag changes.
 
 - CI Failure Analysis Bot - Analyzes failure log, test sources, and change diff to determine if the change caused the failure. May suggest fixes if obvious.
- Very good results during a Proof-of-concept trial run in Qt Company bugfix Sprint H2 2024.
 - Guessing at least 90% accuracy for changes causing/not causing the CI failure based on manual sampling and review of outputs.
 - Identification of infrastructure issues as cause of failure.
 - Identification of flaky tests as cause of failure.
 - Limitation of 128k context, covers all but the largest changes.
 - Limitation of only analyzing atomic changes, cannot take in a full relation chain or topic.
- Sometimes results in blaming multiple changes as the cause of failure with ambiguous analysis results, but even so, remains usually correct about the changes being related.