[HN Gopher] Large Language Models for Compiler Optimization
       ___________________________________________________________________
        
       Large Language Models for Compiler Optimization
        
       Author : og_kalu
       Score  : 42 points
       Date   : 2023-09-17 20:55 UTC (2 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | CalChris wrote:
       | There's a half day tutorial at the LLVM Developers Meeting on
       | this, _ML-Guided Compiler Optimization in LLVM_. However, the
       | authors of this paper aren 't giving that tutorial.
        
         | klohto wrote:
         | MLGO uses RL
        
           | mathisfun123 wrote:
           | maybe today but MLGO, the initiative, uses whatever. To wit:
           | chris cummins (first author here) was on last year's MLGO
           | panel.
        
       | wyldfire wrote:
       | > understanding. We evaluate on a large suite of test programs.
       | Our approach achieves a 3.0% improvement in reducing instruction
       | counts over the compiler,
       | 
       | 3% code size reduction is really good. The challenge will be
       | having codegen like this that someone is willing to support. And
       | for that they'd want to be able to reason about why the compiler
       | made this decision or that one. IIUC that's an outstanding
       | problem for AI in general.
        
         | TheLoafOfBread wrote:
         | Also bugs from this approach are going to be funny - program
         | compiled with compiler version X will work as expected and same
         | program compiled with version X+1 will start crashing because
         | AI under some circumstances decided that dereference of a
         | specific pointer was unnecessary, so it won't drop it into the
         | assembly.
         | 
         | Good luck finding such a bug, because you will be looking on
         | correct code, but computer will be executing invalid output.
        
           | a_wild_dandan wrote:
           | Damn. These things imitate humans too well. Guess we'll need
           | a giant test suite to feel stable. Sounds like work. Let's
           | train a GAN for that and put our feet up. Turtles all the way
           | down, my dudes.
        
       | foota wrote:
       | This kind of application of LLMs is most interesting to me, since
       | it's possible to evaluate correctness and performance
       | quantitatively.
        
         | jebarker wrote:
         | It seems like a poor fit to me precisely because correctness is
         | boolean, difficult to measure and getting it wrong is very bad.
         | I do think there's a place for AI here but it's probably not
         | LLMs in their current form.
        
           | foota wrote:
           | The approach seems to focus on selecting optimizations to
           | apply for LLVM (e.g., imagime: should this be inlined as an
           | example), but the worst case is that the code is poorly
           | optimized, you can't select optimizations to apply and get a
           | wrong result.
           | 
           | I agree that what you say is true of compilation as a whole,
           | but that doesn't seem to be the focus here (rather, it's used
           | as a sort of crutch to help the LLM learn)
        
           | kukkamario wrote:
           | They are not using LLM to directly produce the result code,
           | but as tool that lists which optimisations should be done and
           | in which order, which is fairly complex problem to solve. But
           | if optimisation passes are implemented correctly (which is
           | anyway required for a functioning optimising compiler), it
           | cannot produce incorrect code, maybe only suboptimal compared
           | to default heuristics used.
        
             | jebarker wrote:
             | If there's a list of known optimizations that preserve
             | correctness then it becomes an optimization problem based
             | on output length (as a proxy for cycle count). So is the
             | idea that an LLM is more efficient than a search or direct
             | optimization?
        
               | sagarm wrote:
               | There tend to be a lot of heuristics involved when
               | deciding which optimizations to apply and in which order,
               | so there's plenty of room to apply some machine learning.
               | 
               | Whether LLMs are the right approach is a separate
               | question.
               | 
               | In SQL optimization, the problem is a bit trickier (IMO)
               | because compilation is in the query path. One successful
               | approach I know of is Bao:
               | https://arxiv.org/abs/2004.03814
        
           | armchairhacker wrote:
           | The trick is finding a way to ensure that LLM produces
           | something which is always correct. Like in this case, the LLM
           | only changes compiler optimizations, not the assembly itself,
           | so no matter what it outputs the code is correct, it just may
           | be larger. Other possibilities: an LLM which applies
           | semantics-preserving program transformations, or an LLM
           | combined with a proof assistant to verify the output (more
           | generally and for any domain, an LLM as an NP oracle).
           | 
           | But I agree, as of now I haven't seen good uses where LLMs
           | produce reliable output. Not only do you need that guarantee
           | that whatever output always generates a correct program, you
           | need something where an LLM is considerably better than a
           | simple or random algorithm, and you need a lot of training
           | data (severely restricting how creative you can be with the
           | output).
        
           | reic wrote:
           | Quantification can be done by measuring in at least two
           | dimensions: (1) the size of the synthesised code, and (2) how
           | precisely the generated code matches the input (which means
           | roughly: on what fraction of input do the two programs give
           | different output). We have set up a challenge that _seeks_ to
           | entice the community to look into this problem domain more.
           | And we 've simplified the assumptions, so as to make it more
           | tractable:
           | 
           | - Challenge:
           | https://codalab.lisn.upsaclay.fr/competitions/15096
           | 
           | - Paper describing the challenge:
           | https://arxiv.org/abs/2308.07899
           | 
           | (I am one of the authors, AMA)
        
       | haltist wrote:
       | Next step is to add verification for optimized code from the LLM
       | with an SMT solver (like Z3) to remove "hallucinations". If the
       | input and output code can be verified to be equivalent then this
       | would be a great addition to an optimization pipeline. Once
       | that's done the same can be applied to intermediate
       | representations of GPU kernels in a recursive loop of AI
       | optimizing AI code for faster execution times.
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-09-17 23:00 UTC)