[HN Gopher] G-3PO: A protocol droid for Ghidra, or GPT-3 for rev...
___________________________________________________________________
G-3PO: A protocol droid for Ghidra, or GPT-3 for reverse-
engineering
Author : AlbertoGP
Score : 158 points
Date : 2023-01-04 20:20 UTC (2 hours ago)
(HTM) web link (medium.com)
(TXT) w3m dump (medium.com)
| bri3d wrote:
| I'm partial to Gepetto for IDA, which includes an especially
| hilarious trick in which it instructs ChatGPT to phrase its
| responses in JSON, and then uses this JSON directly to name
| variables in the decompilation. If the JSON is incorrect, it
| politely asks ChatGPT to please fix its JSON output, which
| usually works.
|
| https://github.com/JusticeRage/Gepetto/blob/main/gepetto.py#...
| popinman322 wrote:
| I've been waiting to see something like this. There's certainly
| room to fine-tune an LLM for this task; in that vein, I wonder
| whether Ghidra's pcode would produce better results? It's a bit
| better suited to this task in that the model wouldn't need to be
| tuned for each possible instruction set. Training on code
| compiled at different optimization levels might also produce
| interesting results.
|
| You could probably also take the explanations from the LLM,
| convert those into embeddings, and then do semantic search over
| all functions in a binary. For example, searching for "get
| process handle and inject dll" and getting a list of prospects.
| It's less useful in an obfuscated binary, but for things like
| modding games or extending end-of-life software it could be very
| useful.
| TOMDM wrote:
| I'd never considered semantic search for code vulnerabilities.
|
| Maybe this is the next generation of automated code scanning.
|
| Next feature on github: "Our LLM has scanned your code and
| found a potential buffer overflow. Please mark as a bug or a
| false report"
| popinman322 wrote:
| I know there's some active work on this (using LLMs, not
| traditional methods), not on the binary side but on the
| source analysis side. See https://grit.io/, which tries to
| detect bugs (and maybe vulnerabilities?) and automatically
| submits PRs to patch them for you. I think morgante is their
| contact on HN.
|
| It feels like it'd be difficult to acquire a large corpus of
| vulnerabilities to train on.
| AlbertoGP wrote:
| A few days ago this went mostly ignored
| (https://news.ycombinator.com/item?id=34161642) and I was asked
| to re-submit it (https://news.ycombinator.com/item?id=34250150)
| so that it gets a second chance.
|
| That's a script for the reverse-engineering tool Ghidra that uses
| GPT-3 to de-compile machine code and to write plain English
| explanations of what a piece of code does.
|
| The article is quite detailed and describes both its capabilities
| and its limitations. That G-3PO script is open source, MIT
| license: https://github.com/tenable/ghidra_tools/tree/main/g3po
|
| There was also another HN story about what at first sight looks
| like an alternative implementation of the same idea: "GptHidra -
| Ghidra plugin that asks OpenAI Chat GPT to explain functions"
|
| https://news.ycombinator.com/item?id=34165291
|
| This one is more recent and lacks that good write-up mentioned
| above. The script is smaller and it seems to have fewer features.
|
| I suggest checking both of them.
| mdaniel wrote:
| Wow, I wouldn't have expected Tenable to shell out to curl,
| especially when the curl only adds two headers and they omitted
| the "--fail" that would cause non-200 responses to return a
| non-zero exit code :-(
|
| https://github.com/tenable/ghidra_tools/blob/main/g3po/g3po....
| saagarjha wrote:
| The real question is how a human should merge these results with
| their own reversing, honestly. I can't really trust GPT-3 to be
| accurate like I would actually trust the decompiler (and, as any
| reverser knows, you don't trust the decompiler). I think I would
| treat the output of this as I might a suggestion from a friend
| who I let glance over the code: "hmm, that might be a SHA-1?" and
| then I go confirm the results for myself.
| trenchgun wrote:
| Exactly. GPT-3 shines where we can make it solve hard problems
| into a format where they are easy to verify
___________________________________________________________________
(page generated 2023-01-04 23:00 UTC)