[HN Gopher] LLM and Bug Finding: Insights from a $2M Winning Tea...
___________________________________________________________________
LLM and Bug Finding: Insights from a $2M Winning Team in the White
House's AIxCC
Author : garlic_chives
Score : 20 points
Date : 2024-08-16 19:56 UTC (3 hours ago)
(HTM) web link (team-atlanta.github.io)
(TXT) w3m dump (team-atlanta.github.io)
| garlic_chives wrote:
| AIxCC is an AI Cyber Challenge launched by DARPA and ARPA-H.
|
| Notably, a zero-day vulnerability in SQLite3 was discovered and
| patched during the AIxCC semifinals, demonstrating the potential
| of LLM-based approaches in bug finding.
| hqzhao wrote:
| I'm part of the team, and we used LLM agents extensively for
| smart bug finding and patching. I'm happy to discuss some
| insights, and share all of the approaches after grand final :)
| doctorpangloss wrote:
| Everyone thinks bug bounties should be higher. How high should
| they be? Who should pay for them?
| hqzhao wrote:
| It really depends on the target and the quality of the
| vulnerability. For example, low-quality software on GitHub
| might not warrant high bug bounties, and that's
| understandable. However, critical components like KVM, ESXi,
| WebKit, etc., need to be taken much more seriously.
|
| For vendor-specific software, the responsibility to pay
| should fall on the vendor. When it comes to open-source
| software, a foundation funded by the vendors who rely on it
| for core productivity would be ideal.
|
| For high-quality vulnerabilities, especially those that can
| demonstrate exploitability without any prerequisites (e.g.,
| zero-click remote jailbreaks), the bounties should be on par
| with those offered at competitions like Pwn2Own. :)
| simonw wrote:
| What kind of LLM agents did you use?
| hqzhao wrote:
| Based on popular pre-trained models like GPT-4, Claude
| Sonnet, and Gemini 1.5, we've built several agents designed
| to mimic the behaviors and habits of the experts on our team.
|
| Our idea is straightforward: after a decade of auditing code
| and writing exploits, we've accumulated a wealth of
| experience. So, why not teach these agents to replicate what
| we do during bug hunting and exploit writing? Of course, the
| LLMs themselves aren't sufficient on their own, so we've
| integrated various program analysis techniques to augment the
| models and help the agents understand more complex and
| esoteric code.
| adragos wrote:
| Hey, congrats on getting to the finals of AIxCC!
|
| Have you tested your CRS on weekend CTFs? I'm curious how well
| it'd be able to perform compared to other teams
| hqzhao wrote:
| Thanks!
|
| We haven't tested it yet. Regarding CTFs, I have some
| experience. I'm a member of the Tea Deliverers CTF team, and
| I participated in the DARPA CGC CTF back in 2016 with team
| b1o0p.
|
| There are a few issues that make it challenging to directly
| apply our AIxCC approaches to CTF challenges:
|
| 1. *Format Compatibility:* This year's DEFCON CTF finals
| didn't follow a uniform format. The challenges were complex
| and involved formats like a Lua VM running on a custom
| Verilog simulator. Our system, however, is designed for
| source code repositories like Git repos.
|
| 2. *Binary vs. Source Code:* CTFs are heavily binary-
| oriented, whereas AIxCC is focused on source code. In CTFs,
| reverse engineering binaries is often required, but our
| system isn't equipped to handle that yet. We are, however,
| interested in supporting binary analysis in the future!
___________________________________________________________________
(page generated 2024-08-16 23:00 UTC)