[HN Gopher] LLM and Bug Finding: Insights from a $2M Winning Tea...
       ___________________________________________________________________
        
       LLM and Bug Finding: Insights from a $2M Winning Team in the White
       House's AIxCC
        
       Author : garlic_chives
       Score  : 20 points
       Date   : 2024-08-16 19:56 UTC (3 hours ago)
        
 (HTM) web link (team-atlanta.github.io)
 (TXT) w3m dump (team-atlanta.github.io)
        
       | garlic_chives wrote:
       | AIxCC is an AI Cyber Challenge launched by DARPA and ARPA-H.
       | 
       | Notably, a zero-day vulnerability in SQLite3 was discovered and
       | patched during the AIxCC semifinals, demonstrating the potential
       | of LLM-based approaches in bug finding.
        
       | hqzhao wrote:
       | I'm part of the team, and we used LLM agents extensively for
       | smart bug finding and patching. I'm happy to discuss some
       | insights, and share all of the approaches after grand final :)
        
         | doctorpangloss wrote:
         | Everyone thinks bug bounties should be higher. How high should
         | they be? Who should pay for them?
        
           | hqzhao wrote:
           | It really depends on the target and the quality of the
           | vulnerability. For example, low-quality software on GitHub
           | might not warrant high bug bounties, and that's
           | understandable. However, critical components like KVM, ESXi,
           | WebKit, etc., need to be taken much more seriously.
           | 
           | For vendor-specific software, the responsibility to pay
           | should fall on the vendor. When it comes to open-source
           | software, a foundation funded by the vendors who rely on it
           | for core productivity would be ideal.
           | 
           | For high-quality vulnerabilities, especially those that can
           | demonstrate exploitability without any prerequisites (e.g.,
           | zero-click remote jailbreaks), the bounties should be on par
           | with those offered at competitions like Pwn2Own. :)
        
         | simonw wrote:
         | What kind of LLM agents did you use?
        
           | hqzhao wrote:
           | Based on popular pre-trained models like GPT-4, Claude
           | Sonnet, and Gemini 1.5, we've built several agents designed
           | to mimic the behaviors and habits of the experts on our team.
           | 
           | Our idea is straightforward: after a decade of auditing code
           | and writing exploits, we've accumulated a wealth of
           | experience. So, why not teach these agents to replicate what
           | we do during bug hunting and exploit writing? Of course, the
           | LLMs themselves aren't sufficient on their own, so we've
           | integrated various program analysis techniques to augment the
           | models and help the agents understand more complex and
           | esoteric code.
        
         | adragos wrote:
         | Hey, congrats on getting to the finals of AIxCC!
         | 
         | Have you tested your CRS on weekend CTFs? I'm curious how well
         | it'd be able to perform compared to other teams
        
           | hqzhao wrote:
           | Thanks!
           | 
           | We haven't tested it yet. Regarding CTFs, I have some
           | experience. I'm a member of the Tea Deliverers CTF team, and
           | I participated in the DARPA CGC CTF back in 2016 with team
           | b1o0p.
           | 
           | There are a few issues that make it challenging to directly
           | apply our AIxCC approaches to CTF challenges:
           | 
           | 1. *Format Compatibility:* This year's DEFCON CTF finals
           | didn't follow a uniform format. The challenges were complex
           | and involved formats like a Lua VM running on a custom
           | Verilog simulator. Our system, however, is designed for
           | source code repositories like Git repos.
           | 
           | 2. *Binary vs. Source Code:* CTFs are heavily binary-
           | oriented, whereas AIxCC is focused on source code. In CTFs,
           | reverse engineering binaries is often required, but our
           | system isn't equipped to handle that yet. We are, however,
           | interested in supporting binary analysis in the future!
        
       ___________________________________________________________________
       (page generated 2024-08-16 23:00 UTC)