[HN Gopher] LLM and Bug Finding: Insights from a $2M Winning Tea...
___________________________________________________________________
LLM and Bug Finding: Insights from a $2M Winning Team in the White
House's AIxCC
Author : garlic_chives
Score : 154 points
Date : 2024-08-16 19:56 UTC (1 days ago)
(HTM) web link (team-atlanta.github.io)
(TXT) w3m dump (team-atlanta.github.io)
| garlic_chives wrote:
| AIxCC is an AI Cyber Challenge launched by DARPA and ARPA-H.
|
| Notably, a zero-day vulnerability in SQLite3 was discovered and
| patched during the AIxCC semifinals, demonstrating the potential
| of LLM-based approaches in bug finding.
| rfoo wrote:
| Notably, an undiscovered trivial NULL pointer dereference in
| SQLite3's SQL parser was discovered and patched. But yeah, it
| makes very good marketing material.
| hqzhao wrote:
| It's not a critical issue, but it was surprising since we
| didn't know that SQLite3 would be one of the challenges
| before the competition.
| hypeatei wrote:
| Is there any write ups or CVE pages on that vulnerability? From
| a quick search, I can't find anything.
| hqzhao wrote:
| I'm part of the team, and we used LLM agents extensively for
| smart bug finding and patching. I'm happy to discuss some
| insights, and share all of the approaches after grand final :)
| doctorpangloss wrote:
| Everyone thinks bug bounties should be higher. How high should
| they be? Who should pay for them?
| hqzhao wrote:
| It really depends on the target and the quality of the
| vulnerability. For example, low-quality software on GitHub
| might not warrant high bug bounties, and that's
| understandable. However, critical components like KVM, ESXi,
| WebKit, etc., need to be taken much more seriously.
|
| For vendor-specific software, the responsibility to pay
| should fall on the vendor. When it comes to open-source
| software, a foundation funded by the vendors who rely on it
| for core productivity would be ideal.
|
| For high-quality vulnerabilities, especially those that can
| demonstrate exploitability without any prerequisites (e.g.,
| zero-click remote jailbreaks), the bounties should be on par
| with those offered at competitions like Pwn2Own. :)
| tptacek wrote:
| Google and Apple bounties on zero-click remotes exceeds the
| prize amounts I see from Pwn2Own?
| doctorpangloss wrote:
| It seems really hard for people to like, name some
| vulnerabilities, name some prices. I'm glad you are playing
| along. Which scenario makes more sense:
| The Punchline: Microsoft pays $10m for vulnerabilities like
| the kind used to exploit SolarWinds and the Azure token
| audience vulnerability. The Status Quo:
| Thousands of people pay CrowdStrike a total of billions of
| dollars, in exchange for urgent patching when
| vulnerabilities become known.
|
| Okay, do you see what I am getting at? On the one hand, if
| you pay bug bounties, the bugs get fixed, and they sure
| _seem_ expensive. But if you look into how much money is
| spent on valueless security theatre, it is a total drop in
| the bucket. But CrowdStrike hires security researchers!
|
| So what should the prices really be? For which
| vulnerabilities? The SolarWinds issue is probably worth
| more than $10m, if people are willing to pay 100x more to
| CrowdStrike for nothing.
| saagarjha wrote:
| The real question here is who is willing to pay $10
| million for such a bug.
| tptacek wrote:
| Nobody. That far exceeds the current market prices of the
| most in-demand bugs.
| doctorpangloss wrote:
| What is this market you speak of? Can you link me to it
| and show me the prices you are talking about? The
| Microsoft key vulnerability leaked all the State
| Department emails, and probably a lot more. It could have
| been used to compromise a lot of Azure. What is
| comparable?
| necovek wrote:
| It's not as simple: those billions of dollars are not
| just for this particular issue, or even just for security
| support.
|
| It's also a difference between keeping a software
| engineer on staff and hiring a contractor as needed. One
| is cheaper for the company even if the hourly rate is
| higher.
|
| The better question is how we can improve the overall
| security of the software we write, which this article is
| more focused on. But we understand that there will be
| bugs, and security bugs even, no matter how hard we try.
|
| Even DJB (of qmail fame) and Knuth (of TeX and TAOCP
| fame) pay out bug bounties, and they heavily focus on
| software correctness over large feature sets.
| logical_person wrote:
| p2o is pathetically low in comparison to other markets. is
| your experience limited to legitimate bug bounty programs
| like that?
| 77pt77 wrote:
| > KVM, ESXi, WebKit, etc., need to be taken much more
| seriously.
|
| Openssl
| tptacek wrote:
| Who thinks bug bounties should be higher? Why? Everybody
| definitely _does not_ think this.
| vasco wrote:
| There's always two or three people in every thread
| repeating the same thing without any understanding of
| marketplace dynamics. If you ask them how much should it be
| you also get wild answers that don't reflect reality.
| simonw wrote:
| What kind of LLM agents did you use?
| hqzhao wrote:
| Based on popular pre-trained models like GPT-4, Claude
| Sonnet, and Gemini 1.5, we've built several agents designed
| to mimic the behaviors and habits of the experts on our team.
|
| Our idea is straightforward: after a decade of auditing code
| and writing exploits, we've accumulated a wealth of
| experience. So, why not teach these agents to replicate what
| we do during bug hunting and exploit writing? Of course, the
| LLMs themselves aren't sufficient on their own, so we've
| integrated various program analysis techniques to augment the
| models and help the agents understand more complex and
| esoteric code.
| simonw wrote:
| When you call these things "agents" what do you mean by
| that? Is this a system prompt combined with some defined
| tools, or is it a different definition?
| tinco wrote:
| An agent in this context is software that does LLM prompt
| results to determine its next action, often looping to
| iteratively get to a good result.
| dogma1138 wrote:
| Are you going to publish your RAG strategy?
| adragos wrote:
| Hey, congrats on getting to the finals of AIxCC!
|
| Have you tested your CRS on weekend CTFs? I'm curious how well
| it'd be able to perform compared to other teams
| hqzhao wrote:
| Thanks!
|
| We haven't tested it yet. Regarding CTFs, I have some
| experience. I'm a member of the Tea Deliverers CTF team, and
| I participated in the DARPA CGC CTF back in 2016 with team
| b1o0p.
|
| There are a few issues that make it challenging to directly
| apply our AIxCC approaches to CTF challenges:
|
| 1. *Format Compatibility:* This year's DEFCON CTF finals
| didn't follow a uniform format. The challenges were complex
| and involved formats like a Lua VM running on a custom
| Verilog simulator. Our system, however, is designed for
| source code repositories like Git repos.
|
| 2. *Binary vs. Source Code:* CTFs are heavily binary-
| oriented, whereas AIxCC is focused on source code. In CTFs,
| reverse engineering binaries is often required, but our
| system isn't equipped to handle that yet. We are, however,
| interested in supporting binary analysis in the future!
| wslh wrote:
| Congrats! ELI5: what insights do you have NOW that were not
| published/researched extensively in academic papers and/or
| publicly discussed yet?
| rockskon wrote:
| The AIxcc booth felt like it was meant for a tradeshow as opposed
| to being a place where someone could learn something.
| hqzhao wrote:
| I heard that the AIxCC booth prepared the same challenges for
| the audience to solve manually, but I didn't check the details.
|
| I believe there will be even more cool stuff in next year's
| grand final. If you want to get a sense of what to expect,
| check out the DARPA CGC from 2016. :)
| rockskon wrote:
| I hope that booth is gone for good. Def Con doesn't need
| marketers with a blank check putting a booth there. Leave
| that garbage at Black Hat.
| rockskon wrote:
| To clarify - I hope your "more cool stuff" doesn't mean
| more fog machines and LED strips. And some of the companies
| that seemed to ride DARPA's coattails there made my skin
| crawl. No slight on DARPA themselves.
| wslh wrote:
| BTW, have you seen the new LLMsic offensive tools such as XBOW
| [1]? They just received a founding round from Sequoia Capital
| [2].
|
| [1] https://xbow.com/
|
| [2] https://www.sequoiacap.com/article/partnering-with-xbow-
| the-...
| sim7c00 wrote:
| this is really impressive work. coverage guided and especially
| directed fuzing can be extremely difficult. its mentioned fuzzing
| is not a dumb technique. I think the classical idea is kind of
| dumb, in the sense of 'dumb fuzzers' but these days there is tons
| of intelligence built around it now aand poured into it, but i've
| always thought its now beyond the classic idea of fuzz testing. i
| had colleagues who poured their soul into trying to use git
| commit info etc. to try and help find potentially bad code paths
| and then coverage guided fuzzing trying to get in there. I really
| like the little note at the bottom about this. adding such layers
| kind of does make it lean towards machine learning nowadays, and
| id think perhaps fuzzing is not the right term anymore. i dont
| think many people are actually still simply generating random
| inputs and trying to crash programs like that.
|
| this is really exciting new progress around this type of field
| guys. well done! cant wait to see what new tools and techniques
| will be yielded from all of this research.
|
| Will you guys be open to implementing something around libafl++
| perhaps? i remember we worked with that extensively. As a lot of
| shops use that already it might be cool to look at integration
| into such tools or would you think this deviates so far it'll
| amount to a new kind of tool entirely? Also, the work on datasets
| might be really valuable to other researchers. there was a
| mention of wasted work but labeled sets of data around cve, bug
| and patch commits can help a lot of folks if theres new data in
| there.
|
| this kind of makes me miss having my head in this space :D cool
| stuff and massive congrats on being finalists. thanks for the
| extensive writeup!
| deeznuttynutz wrote:
| What's the good word!!
___________________________________________________________________
(page generated 2024-08-17 23:01 UTC)