[HN Gopher] LLM4Decompile: Decompiling Binary Code with LLM
___________________________________________________________________
LLM4Decompile: Decompiling Binary Code with LLM
Author : Davidbrcz
Score : 303 points
Date : 2024-03-17 10:15 UTC (12 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| potatoman22 wrote:
| It's interesting the 6b model outperforms the 33b model. I wonder
| if it means the 33b model needs more training data? It was
| pretrained on ~1 million C programs, compared to DeepSeek-Coder,
| which was trained on 2 trillion tokens, which is a few orders of
| magnitude more data.
|
| I'm also curious about how this compares to non-LLM solutions.
| mattashii wrote:
| > on ~1 million C programs, compared to [...] 2 trillion
| tokens, which is a few orders of magnitude more data.
|
| Is that comparable like that? This would assume that the
| average C program of the set is orders (plural) of magnitude
| less than 2m tokens in size, which could indeed be true but
| sounds like an optimistic assumption.
| Der_Einzige wrote:
| This has been the dynamics with LLMs for awhile. The majority
| of LLMs are massively _undertrained_. 7b models are the least
| "undertrained" mainstream models we have, hence why they have
| proliferated so much among the LLM fine-tuning community.
| maCDzP wrote:
| Can this be used for deobfuscation of code? I really hadn't
| thought about LLM being a tool during reverse engineering.
| Tiberium wrote:
| Big LLMs like GPT-4 (and even GPT 3.5 Turbo) can be directly
| used to beautify obfuscated/minified JS, see e.g.
| https://thejunkland.com/blog/using-llms-to-reverse-javascrip...
| and https://news.ycombinator.com/item?id=34503233
| Eager wrote:
| I have tried feeding some of the foundation models obfuscated
| code from some of the competitions.
|
| People might think that the answers would be in the training
| data already, but I didn't find that to be the case. At least
| in my small experiments.
|
| The model's did try to guess what the code does. They would say
| things like, "It seems to be trying to print some message to
| the console". I wasn't able to get full solutions.
|
| It's definitely worth more research, not just as a curiosity,
| but these kinds of problems are good proxies for other tasks
| and also excellent benchmarks for LLMs particularly.
| evmar wrote:
| I did a little experiment with this here:
|
| https://neugierig.org/software/blog/2023/01/compiling-advent...
| kken wrote:
| Pretty wild how well GPT4 is still doing in comparison. It's
| significantly better than their model at creating compilable
| code, but is less accurate at recreating functional code. Still
| quite impressive.
| nebula8804 wrote:
| Will be interesting to see is there is some way to train a
| decompilation module based on who we know developed the
| application and use their previous code used as training. For
| example: Super Mario 64 and Zelda 64 were fully decompiled and a
| handful of other N64 games are in the process. I wonder if we
| could map which developers worked on these two games (maybe even
| guess who did what module) and then use that to more easily
| decompile any other game that had those developers working on it.
|
| If this gets really good, maybe we can dream of having a fully
| de-obfuscated and open source life. All the layers of binary
| blobs in a PC can finally be decoded. All the drivers can be
| open. Why not do the OS as well! We don't have to settle for
| Linux, we can bring back Windows XP and back port modern security
| and app compatibility into the OS and Microsoft can keep their
| Windows 11 junk...at least one can dream! :D
| ZitchDog wrote:
| I doubt the code would be identifiable. It wouldn't be the
| actual code written, but it would be very similar. But I assume
| many elements of code style would be lost, and any semblance of
| code style would be more or less hallucinated.
| K0IN wrote:
| if it can make test from the decompiled code, we could
| reimplement it with our code style. might be cool to have
| some bunch of llms working together with feedback loops.
| coddle-hark wrote:
| I wrote my bachelor thesis on something tangential --
| basically, some researchers found that it was possible _in some
| very specific circumstances_ to train a classifier to do author
| attribution (i.e. figure out who wrote the program) based just
| on the compiled binaries they produced. I don't think the
| technique has been used for anything actually useful, but it's
| cool to see that individual coding style survives the
| compilation process, so much so that you can tell one person's
| compiled programs apart from another's.
| userbinator wrote:
| _If this gets really good, maybe we can dream of having a fully
| de-obfuscated and open source life. All the layers of binary
| blobs in a PC can finally be decoded. All the drivers can be
| open. Why not do the OS as well!_
|
| Decompilers already exist and are really good. If an LLM can do
| the same as these existing compilers, you can bet the lawyers
| will consider it an equivalent process. The main problem is
| legal/political, not technical.
| kukas wrote:
| Hey, I am working on my own LLM-based decompiler for Python
| bytecode (https://github.com/kukas/deepcompyle). I feel there are
| not many people working on this research direction but I think it
| could be quite interesting, especially now that longer attention
| contexts are becoming feasible. If anyone knows a team that is
| working on this, I would be quite interested in cooperation.
| ok123456 wrote:
| Is there a benefit from using an LLM for Python byte code?
| Python byte code is high enough level that it's possible to
| translate it directly to source code from my experience.
| kukas wrote:
| My motivation is that the existing decompilers work only for
| Python versions till ~3.8. Having a model that could be
| finetuned with every new Python version release might
| overcome the need for highly specialized programmer that is
| able to update the decompiler to be compatible with the new
| version.
|
| It is also a toy example for me to set up a working pipeline
| and then try to decompile more interesting targets.
| a2code wrote:
| Why Python? First, python is a language with a large open-
| source library. Second, I do not think it is used for software
| that is distributed as binaries?
| jagrsw wrote:
| Decompilation is somewhat a default choice for ML in the world of
| comp-sec.
|
| Searching for vulns and producing patches in source code is a bit
| problematic, as the databases of vulnerable source code examples
| and their corresponding patches are neither well-structured nor
| comprehensive, and sometimes very, very specific to the analyzed
| code (for higher abstraction type of problems). So, it's not easy
| to train something usable beyond standard mem safety problems and
| use of unsafe APIs.
|
| The area of fuzzing is somewhat messy, with sporadic efforts
| undertaken here and there, but it also requires a lot of
| preparatory work, and the results might not be groundbreaking
| unless we reach a point where we can feed an ML model the entire
| source code of a project, allowing it to analyze and identify all
| bugs, producing fixes and providing offending inputs. i.e. not
| yet.
|
| While decompilation is a fairly standard problem, it is possible
| to produce input-output pairs somewhat at will based on existing
| source code, using various compiler switches, CPU architectures,
| ABIs, obfuscations, syscall calling conventions. And train models
| on those input-output pairs (i.e. in reversed order).
| a2code wrote:
| The problem is interesting in at least two aspects. First, an
| ideal decompiler would eliminate proprietary source code. Second,
| the abundant publicly available C code allows you to simply make
| a dataset of paired ASM and source code. There is also a lot of
| variety with optimization level, compiler choice, and platform.
|
| What is unclear to me is: why did the authors fine-tune the
| DeepSeek-Coder model? Can you train an LLM from zero with a
| similar dataset? How big does the LLM need to be? Can it run
| locally?
| 3abiton wrote:
| I assume it's related to the cost of training vs fine-tuning.
| It could be also a starting point to validate an idea.
| mike_hearn wrote:
| Most proprietary code runs behind firewalls and won't be
| affected by this one way or another.
|
| It's basically always better to start training with a pre-
| trained model rather than random, even if what you want isn't
| that close to what you start with.
| madisonmay wrote:
| This is an excellent use case for LLM fine-tuning, purely because
| of the ease of generating a massive dataset of input / output
| pairs from public C code
| bt1a wrote:
| I would also think that generating a very large amount of C
| code using coding LLMs (using deepseek, for example, +
| verifying that the output compiles) as synthetic training data
| would be quite beneficial in this situation. Generally the
| quality of synthetic training data is one of the main concerns,
| but in this case, the ability for the code to compile is the
| crux.
| klik99 wrote:
| This is a fascinating idea, but (honest question, not a
| judgement) would the output be reliable? It would be hard to
| identify hallucinations since recompiling could produce different
| machine code. Particularly if there is some novel construct that
| could be a key part of the code. Are there ways of also reporting
| the LLMs confidence in sections like this when running
| generatively? It's an amazing idea but I worry it would stumble
| invisibly on the parts that are most critical. I suppose it would
| just need human confirmation on the output
| Eager wrote:
| This is why round-tripping the code is important.
|
| If you decompile the binary to source, then compile the source
| back to binary you should get the original binary.
|
| You just need to do this enough times until the loss drops to
| some acceptable amount.
|
| It's a great task for reinforcement learning, which is known to
| be unreasonably effective for these types of problems.
| thfuran wrote:
| >If you decompile the binary to source, then compile the
| source back to binary you should get the original binary.
|
| You really can't expect that if you're not using exactly the
| same version of exactly the same compiler with exactly the
| same flags, and often not even then.
| Eager wrote:
| You try your best, and if you provide enough examples, it
| will undoubtedly get figured out.
| thfuran wrote:
| What exactly are you suggesting will get figured out?
| spqrr wrote:
| The mapping from binary to source code.
| layer8 wrote:
| The question was about the reverse mapping.
| thfuran wrote:
| Even ignoring all sources of irreproducibility, there
| does not exist a bijection between source and binary
| artifact irrespective of tool chain. Two different
| toolchains could compile the same source to different
| binaries or different sources to the same binary. And you
| absolutely shouldn't be ignoring sources of
| irreproducibility in this context, since they'll cause
| even the same toolchain to keep producing different
| binaries given the same source.
| achrono wrote:
| Exactly, but neither the source nor the binary is what's
| truly important here. The real question is: can the LLM
| generate the _functionally valid_ source equivalent of
| the binary at hand? If I disassemble Microsoft Paint, can
| I get code that will result in a mostly functional
| version of Microsoft Paint, or will I just get 515
| compile errors instead?
| Brian_K_White wrote:
| This is what I thought the question was really about.
|
| I assume that an llm will simply see patterns that look
| similar to other patterns and make assosciations and
| assume ewuivalences on that level, meanwhile real code is
| full of things where the programmer, especially assembly
| programmers, modify something by a single instruction or
| offset value etc to get a very specific and functionally
| important result.
|
| Often the result is code that not only isn't obvious,
| it's nominaly flatly wrong, violating standards, specs,
| intended function, datasheet docs, etc. If all you knew
| were the rules written in the docs, the code is broken
| and invalid.
|
| Is the llm really going to see or understand the intent
| of that?
|
| They find matching patterns in other existing stuff, and
| to the user who can not see the infinite body of that
| other stuff the llm pulled from, it looks like the llm
| understood the intent of a question, but I say it just
| found the prior work of some human who understood a
| similar intent somewhere else.
|
| Maybe an llm or some other flavor of ai can operate some
| other way like actually playing out the binary like
| executing in a debugger and map out the results not just
| look at the code as fuzzy matching patterns. Can that
| take the place of understanding the intents the way a
| human would reading the decompiled assembly?
|
| Guess we'll be finding out sooner of later since of
| course it will all be tried.
| fao_ wrote:
| Except LLMs cannot reason.
| lolinder wrote:
| I think you're misunderstanding OP's objection. It's not
| simply a matter of going back and forth with the LLM
| until eventually (infinite monkeys on typewriters style)
| it gets the same binary as before: Even if you got the
| _exact same source code_ as the original there 's still
| no automated way to tell that you're done because the
| bits you get back out of the recompile step will almost
| certainly not be the same, even if your decompiled source
| were identical in every way. They might even vary quite
| substantially depending on a lot of different
| environmental factors.
|
| Reproducible builds are hard to pull off cooperatively,
| when you control the pipeline that built the original
| binary and can work to eliminate all sources of
| variation. It's simply not going to happen in a
| decompiler like this.
| blagie wrote:
| Well, no, but yes.
|
| The critical piece is that this can be done in training.
| If I collect a large number of C programs from github,
| compile them (in a deterministic fashion), I can use that
| as a training, test, and validation set. The output of
| the ML ought to compile to the same way given the same
| environment.
|
| Indeed, I can train over multiple deterministic build
| environments (e.g. different compilers, different
| compiler flags) to be even more robust.
|
| The second critical piece is that for something like a
| GAN, it doesn't need to be identical. You have two ML
| algorithms competing:
|
| - One is trying to identify generated versus ground-truth
| source code
|
| - One is trying to generate source code
|
| Virtually all ML tasks are trained this way, and it
| doesn't matter. I have images and descriptions, and all
| the ML needs to do is generate an indistinguishable
| description.
|
| So if I give the poster a lot more benefit of the doubt
| on what they wanted to say, it can make sense.
| lolinder wrote:
| Oh, I was assuming that Eager was responding to klik99's
| question about how we could identify hallucinations in
| the output--round tripping doesn't help with that.
|
| If what they're actually saying is that it's possible to
| train a model to low loss and then you just have to trust
| the results, yes, what you say makes sense.
| blagie wrote:
| I haven't found many places where I trust the results of
| an ML algorithm. I've found many places where they work
| astonishingly well 30-95% of the time, which is to say,
| save me or others a bunch of time.
|
| It's been years, but I'm thinking back through things
| I've reverse-engineered before, and having something
| which kinda works most of the time would be super-useful
| still as a starting point.
| incrudible wrote:
| Have you ever trained a GAN?
| blagie wrote:
| Technically, yes!
|
| A more reasonable answer, though, is "no."
|
| I've technically gone through random tutorials and
| trained various toy networks, including a GAN at some
| point, but I don't think that should really count. I also
| have a ton of experience with neural networks that's
| decades out-of-date (HUNDREDS of nodes, doing things like
| OCR). And I've read a bunch of modern papers and used a
| bunch of Hugging Face models.
|
| Which is to say, I'm not completely ignorant, but I do
| not have credible experience training GANs.
| dheera wrote:
| Maybe we then need an LLM to tell us if two pieces of
| compiled code are equivalent in an input-output mapping
| sense (ignoring execution time).
|
| I'm actually serious; it would be exceedingly easy to get
| training data for this just by running the same source code
| through a bunch of different compiler versions and
| optimization flags.
| thfuran wrote:
| Why would an llm be the tool for that job?
| dheera wrote:
| Without analytical thinking how else would you come to
| conviction that two functions are identical, for a
| computationally unfeasible number of possible inputs?
| codethief wrote:
| > you should get the original binary
|
| According to the project's README, they only seem to be
| checking mere "re-compilability" and "re-executability" of
| the decompiled code, though.
| 1024core wrote:
| > If you decompile the binary to source, then compile the
| source back to binary you should get the original binary.
|
| Doesn't that depend on the compiler's version though? Or, for
| that matter, even the sub-version. Every compiler does things
| differently.
| sebastianconcpt wrote:
| Generators' nature is to hallucinate.
| DougBTX wrote:
| One man's hallucination is another's creativity.
| sebastianconcpt wrote:
| Well we need to remember that "hallucination" here is not a
| concept but a language figure for the output of a
| stochastic parroting machine. So what you mentinoed would
| be a digitally induced halluciation out of some dancing
| matrix multiplications / electrons on silicon.
| riedel wrote:
| One could as well use differential fuzzing.
| klik99 wrote:
| I'm amazed that there are so many good responses above only
| this mentions fuzzing. In the context of security, inputs
| might be non-linear things like adjacent memory, so I don't
| see anyway to be confident about equilivancy without
| substantial fuzzing.
|
| Honestly I just don't see a way to formally verify this at
| all, it's sounds like it could be a very useful tool but I
| don't see a way for it to be fully confident. But, heck, just
| getting you 90% of the way towards understanding it with LLMs
| is still amazing and useful in real life.
| layer8 wrote:
| The way to do this is to have a formal verification tool that
| takes the input, the output, and a formal proof that the input
| matches the semantics of the output, and have the LLM create
| the formal proof alongside the output. Then you can run the
| verification tool to check if the LLM's output is correct
| according to the proof that it also provided.
|
| Of course, building and training an LLM that can provide such
| proofs will be the bigger challenge, but it would be a safe a
| way to detect hallucinations.
| thfuran wrote:
| Good luck formally proving Linux.
| layer8 wrote:
| The goal is to prove that the source code matches the
| machine code, not to prove that the code implements some
| intended higher-level semantics. This has nothing to do
| with formally proving the correctness of the Linux kernel.
| djinnandtonic wrote:
| What if there are hallucinations in the verification tool?
| thfuran wrote:
| Then it's not a formal verification tool. Generative models
| are profoundly unfit for that purpose.
| layer8 wrote:
| There may be bugs, but not hallucinations. Bugs are at
| least reproducible, and the source code of the verification
| tool is much, much smaller than an LLM, so has a much
| higher chance of its finite number of bugs to be found,
| whereas with an LLM it is probably impossible to remove all
| hallucinations.
|
| To turn your question around: What if the compiler that
| compiles your LLM implementation "hallucinates"? That would
| be the closer parallel.
| smellf wrote:
| I think the idea is that you'd have two independently-
| develooed systems, one LLM decompiling the binary and the
| other LLM formally verifying. If the verifier disagrees
| with the decompiler you won't know which tool is right and
| which is wrong, but if they agree then you'll know the
| decompiled result is correct, since both tools are unlikely
| to hallucinate the same thing.
| layer8 wrote:
| No, the idea is that the verifier is a human-written
| program, like the many formal-verification tools that
| already exist, not an LLM. There is zero reason to make
| this an LLM.
|
| It makes sense to use LLMs for the decompilation and the
| proof generation, because both arguably require
| creativity, but a mere proof verifier requires zero
| creativity, only correctness.
| natsch wrote:
| That would require the tool to prove the equivalence of the
| two programs, which is generally undecidable. Maybe this
| could be weakened to preserving some properties of the
| program.
| ngruhn wrote:
| That doesn't mean that it's impossible, right? Just that no
| tool is guaranteed to give an answer in any case. And those
| cases might be 90%, 10% or it-doesn't-matter-in-practice %
| layer8 wrote:
| No, it would not. It would require the LLM to provide a
| proof for the program that it outputs, which seems
| reasonable in the same way that a human decompiling a
| program would be able to provide a record of his/her
| reasoning.
|
| The formal verifier would then merely check the provided
| proof, which is a simple mechanical process.
|
| This is analogous to a mathematician providing a detailed
| proof and a computer checking it.
|
| What is impossible due to undecidability is for two
| _arbitrary_ programs, to either prove or disprove their
| equivalence. However, the two programs we are talking about
| are highly correlated, and thus not arbitrary at all with
| respect to each other. If an LLM is able to provide a
| correct decompilation, then in principle it should also be
| able to provide a proof of the correctness of that
| decompilation.
| afro88 wrote:
| The detail how they measure this in the readme. This is
| directed at all the sibling comments as well!
|
| TLDR they recompile and then re-execute (including test
| suites). From the results table it looks like GPT4 still
| "outperforms" their model in recompilation, but their
| recompiled code has a much better re-execution success rate
| (less hallucinations). But, that re-execution rate is still
| pretty lacking (around 14%), even if better than GPT4.
| londons_explore wrote:
| Even if it isn't fully reliable, often it's only necessary to
| modify a few functions for most changes one wants to make to a
| binary.
|
| You'd therefore only need to recompile those few functions.
| userbinator wrote:
| LLMs are by nature probabilistic, which is why they work
| reasonably well for "imprecise" domains like natural language
| processing. Expecting one to do decompilation, or disassembly
| for that matter, is IMHO very much a "wrong tool for the job"
| --- but perhaps it's just an exploratory exercise for the "just
| use an LLM" meme that seems to be a common trend these days.
|
| The bigger argument against the effectiveness of this approach
| is that existing decompilers can already do a much better job
| with far less processing power.
| czl wrote:
| In the future efficient rule based compilers and decompiler
| may be generated by AI systems trained on inputs and outputs
| of what we use today.
|
| This effort is an exploration to find a radically different
| AI way that may give superior results.
|
| Yes. For all the reasons you give above, AI for this job is
| not practical today.
| ReptileMan wrote:
| Let's hope it kills Denuvo ...
| AndrewKemendo wrote:
| If successful wouldn't you be replicating the compilers machine
| code 1:1?
|
| In which case that means fully complete code can live in the
| "latent space" but is distributed as probabilities
|
| Or perhaps more likely would it be replicating the logic only,
| which can then be translated into the target language
|
| I would guess that any binary that requires a non-deterministic
| input (key, hash etc...) to compile would break this
|
| Fascinating
| m3kw9 wrote:
| Basically predicting code token by token except now you don't
| even have a large enough context size and worse, you are using
| RAG
| xorvoid wrote:
| As someone who is actively developing a decompiler to reverse
| engineer old DOS 8086 video games, I'd have a hard time trusting
| an LLM to do this correctly. My standard is accurate semantics
| lifting from Machine Code to C. Reversing assembly to C is very
| delicate. There are many patterns that tend to _usually_ map to
| obvious C constructs... except when they don 't. And that assumes
| the original source was C. Once you bump into routines that were
| hand-coded assembly and break every established rule in the
| calling conventions, all bets are off. I'm somewhat convinced
| that decompilation cannot be made fully-automatic. Instead a good
| decompiler is just a lever-arm on the manual work a reverser
| would otherwise be doing. Corollary: I'm also somewhat convinced
| that only the decompiler's developers can really use it most
| effectively because they know where the "bodies are buried" and
| where different heuristics and assumptions were made. Decompilers
| are compilers with all the usual engineering challenges, plus a
| hard inference problem tacked on top.
|
| All that said, I'm not a pessimist on this idea. I think it has
| pretty great promise as a technique for general reversing
| security analysis where the reversing is done mostly for
| "discovery" and "understanding" rather than for perfect semantic
| lifting to a high-level language. In that world, you can afford
| to develop "hypotheses" and then drill down to validate if you
| think you've discovered something big.
|
| Compiling and testing the resulting decompilation is a great
| idea. I do that as well. The limitation here is TEST SUITE. Some
| random binary doesn't typically come with a high-coverage test
| suite, so you have to develop your own acceptance criterion as
| you go along. In other words: write tests for a function whose
| computation you don't understand (ha). I suppose a form of
| static-analysis / symbolic-computation might be handy here (I
| haven't explored that). Here you're also beset with challenges of
| specifying which machine state changes are important and which
| are superfluous (e.g. is it okay if the x86 FLAGS register isn't
| modified in the decompiled version, probably yes, but sometimes
| no).
|
| In my case I don't have access to the original compiler and even
| if I did, I'm not sure I could convince it to reproduce the same
| code. Maybe this is more feasible for more modern binaries where
| you can assume GCC, Clang, MSVC, or ICC.
|
| At any rate: crazy hard, crazy fun problem. I'm sure LLMs have a
| role somewhere, but I'm not sure exactly where: the future will
| tell. My guess is some kind of "copilot" / "assistant" type role
| rather than directly making the decisions.
|
| (If this is your kind of thing... I'll be writing more about it
| on my blog soonish...)
| a2code wrote:
| I would devise a somewhat loose metric. Consider you assign a
| percentage as to how much a binary is disassembled. As in, 0%
| means the binary is in assembly and 100% means the whole binary
| is now C code. The ideal decompiler would result in 100% for
| any binary.
|
| My prediction is that this percentage will increase with time.
| It would be interesting to construct data for this metric.
|
| It is important to define the limitations of using LLMs for
| this endeavor. I would like to emphasize your subtle point. The
| compiler used for the original binary may not be the same as
| the one you use. The probability of this increases with time,
| as compilers improve or the platform on which the binary runs
| becomes obsolete. This is a problem for validation, as in you
| cannot directly compare original assembly code with assembly
| after compiling C code (that came from decompiling).
|
| Perhaps assembly routines could be given a likelihood, as in
| how sure the LLM is that some C code maps to assembly. Then,
| routines with hand-coded assembly would have a lower
| likelihood.
| mdaniel wrote:
| relevant: https://news.ycombinator.com/item?id=34250872 ( _G-3PO:
| A protocol droid for Ghidra, or GPT-3 for reverse-engineering_ <h
| ttps://github.com/tenable/ghidra_tools/blob/main/g3po/g3po....>;
| Jan, 2023; 44 comments)
|
| _ed_ : seems they have this, too, which may value your
| submission: https://github.com/tenable/awesome-llm-cybersecurity-
| tools#a...
| sinuhe69 wrote:
| For me the huge difference between re-compilability and re-
| excuteability scores is very interesting. GTP4 achieved 8x% on
| re-compilability (syntactically correct) but abysmal 1x% in re-
| excutability (schematically correct) demonstrated once again its
| overgrown mimicry capacity.
| sitkack wrote:
| > overgrown mimicry
|
| I don't think it shows that. GPT4 was not trained on
| decompiling binaries back into C. Amazing result for an
| untrained task.
|
| We are soon going to have robust toolchain detection from
| binaries, and source recovery with variable and function names.
| speedylight wrote:
| I have thought about doing something similar for heavily
| obfuscated JavaScript. Very useful for security research I
| imagine!
| quantum_state wrote:
| It seems the next logical step would be LLMAssistedHacking to
| turn things up side down...
| mahaloz wrote:
| It's always cool to see different approaches in this area, but I
| worry its benchmarks are meaningless without a comparison of non-
| AI based approaches (like IDA Pro). It would be interesting to
| see how this model holds up on metrics from previous papers in
| security.
| YeGoblynQueenne wrote:
| If I read the "re-executability" results in the Results figure
| right then that's a great idea but it doesn't really work:
|
| https://raw.githubusercontent.com/albertan017/LLM4Decompile/...
|
| To clarify:
|
| >> Re-executability provides this critical measure of semantic
| correctness. By re-compiling the decompiled output and running
| the test cases, we assess if the decompilation preserved the
| program logic and behavior. Together, re-compilability and re-
| executability indicate syntax recovery and semantic preservation
| - both essential for usable and robust decompilation.
___________________________________________________________________
(page generated 2024-03-17 23:00 UTC)