[HN Gopher] Learning Almost Nothing About LLVM
       ___________________________________________________________________
        
       Learning Almost Nothing About LLVM
        
       Author : mbellotti
       Score  : 63 points
       Date   : 2021-09-06 21:27 UTC (1 days ago)
        
 (HTM) web link (bellmar.medium.com)
 (TXT) w3m dump (bellmar.medium.com)
        
       | veltas wrote:
       | > Each of the major steps have tools available that will do 90%
       | of the work for you. On the lexer/parser side there's ANTLR4,
       | bison, yacc, flex.
       | 
       | In my experience, those tools will do like 10% of the work of
       | lexing or parsing for you, and you will spend equivalent to 20%
       | of the work understanding how to use them and integrating them.
       | And then you'll find out a sad hand-written recursive descent
       | parser is faster in practice and is what e.g. GCC and clang use.
        
       | mathgenius wrote:
       | The python numba people learnt this lesson also: forget about the
       | C++ api, just emit the (text) IR directly and feed that into
       | LLVM.
        
       | jcranmer wrote:
       | Some comments:
       | 
       | 1. The LLVM API is designed as a C++ API, and if you're serious
       | about using LLVM, you're likely to have to actually work with the
       | C++ API directly. There's a C API which is theoretically more
       | stable than the C++ API, but it is very heavily gimped--it has
       | basically no support for metadata, for example--and is mostly
       | feasible only for the most basic usage entirely. Since the author
       | brings up needing to use custom metadata, that suggests that they
       | are intending to create custom optimization passes which is
       | basically impossible except via the C++ API.
       | 
       | 2. The complaint about metadata was very strange to me. I have
       | had to work with custom metadata very recently with my work in
       | LLVM, and I've had nothing like the pain the author suggests.
       | (I've also had to deal with TBAA, which is definitely an area
       | where LLVM lacks sorely in documentation, particularly examples).
       | The "defined before use" just simply isn't an issue, because
       | metadata is supposed to be global, so there is no define or
       | use...
       | 
       | I took a look at the llir library the author was using. On a
       | quick inspection, it appears to be a library for generating
       | textual LLVM IR _without having to link to LLVM at all_. Oy. The
       | problem isn 't LLVM, nor even the LLVM IR itself. The problem is
       | your library to generate LLVM IR.
       | 
       | 3. About the SSA issue. LLVM actually does have facilities to
       | generate SSA correctly without going through allocas (though that
       | might be challenging to use for codegen instead of in the context
       | of an optimization pass). But, as established above, the author
       | is purposefully using LLVM in a way that precludes them from
       | availing themselves of this feature. Note that LLVM specifically
       | recommends that frontends generate variables as allocas in the
       | entry block and letting the optimizer generate the SSA for you
       | (see
       | https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangI...).
       | 
       | 4. I'm not entirely sure what the author means when discussing
       | variable scope, but my guess is they neglected the "in the entry
       | block" part of the standard guidelines for generating variables.
       | If true, I'm left scratching my head where they got their answer
       | to the SSA issue from that didn't mention that part--it's a very
       | important part of generating alloca's correctly, and getting it
       | wrong means you have some very broken mental semantics as to how
       | it's supposed to work.
       | 
       | 5. From the final paragraph, it seems the author's final step is
       | to... write a parser for LLVM IR, and then convert their custom-
       | parsed LLVM IR into SMTLIB2 code. As opposed to having LLVM parse
       | the IR itself, visiting that IR, and then doing the same. Just...
       | no.
       | 
       | This isn't to say that LLVM is perfect in terms of documentation
       | --it is _very_ far from it--but a lot of the issues seem to be
       | related to trying to actively avoid working with LLVM itself.
        
       | QuadrupleA wrote:
       | Also found LLVM to be pretty poorly documented - often what's out
       | there is out of date and incomplete. The sheer scale of it makes
       | it hard to narrow down what you're looking for too - I've
       | resorted to searching the source code a few times to see how
       | something works.
       | 
       | I love its multi-language, multi hardware target abilities, and
       | wicked fast compiled code - but its complexity, glacially slow
       | compiles, and sloppy documentation are currently a drag.
        
         | rightbyte wrote:
         | The documentation is a joke last time I tried to use it. The
         | author should have tried libgccjit instead. It lacks LLVMs full
         | capabilities, but atleast it is possible to get a grasp around
         | the whole API and read it fully.
        
       | veltas wrote:
       | The article aptly describes what it's like to start working with
       | essentially any large code project, be it open source or
       | proprietary. Unfortunately you are never going to see a project
       | with comprehensive documentation, I'm not sure what that would
       | even look like.
       | 
       | Good, maintainable code therefore becomes an important part of
       | the documentation and being realistic about this from the start
       | probably improves your "reference" or "manual", where it's better
       | to focus on high-level or architectural concepts, and link to
       | where to find the nitty-gritty in source.
        
         | plafl wrote:
         | >Unfortunately you are never going to see a project with
         | comprehensive documentation, I'm not sure what that would even
         | look like.
         | 
         | It probably would look like Tex, FWIW.
         | 
         | Edit: I move to the bookcase to look at TeX: The Program. There
         | it is sitting next to my volumes of TAOCP. A reminder of my
         | failures. I can almost feel the disaproving gaze of D. E.
         | Knuth. I'm not worthy.
        
       | eatonphil wrote:
       | I had a not terrible time emitting LLVM IR text directly as part
       | of an exploration of language backends.
       | 
       | Here are the three parts:                 * Introduction:
       | https://notes.eatonphil.com/compiler-basics-llvm.html       *
       | Conditionals: https://notes.eatonphil.com/compiler-basics-llvm-
       | conditionals.html       * And system calls:
       | https://notes.eatonphil.com/compiler-basics-llvm-system-
       | calls.html
       | 
       | The hardest part I can remember is figuring out how LLVM IR's
       | embedded assembly works since it's not exactly like Clang or
       | GCC's IIRC. And the documentation was definitely confusing.
       | 
       | I think the libraries wrapping LLVM IR are frankly harder to
       | figure out than emitting the IR text directly.
        
       | [deleted]
        
       | sva_ wrote:
       | https://archive.is/HFEKM
        
       | drmeister wrote:
       | I had a very different experience. I implemented Common Lisp
       | using LLVM-IR as the backend (https://github.com/clasp-
       | developers/clasp.git).
       | 
       | 1. I started with a primitive lisp interpreter written in C++ and
       | worked hard on exposing C++ functions/classes to my lisp using
       | C++ template programming. LLVM is a C++ library, the C bindings
       | are always behind the C++ API. So exposing the C++ API directly
       | gave me access to the latest, and greatest API. That means you
       | need to keep up with LLVM - but clang helps a lot because API
       | changes appear as clang C++ compile time errors. I've been
       | "chasing the LLVM dragon" (cough - keeping up with the LLVM API)
       | from version 3.something to the upcoming 13.
       | 
       | 2. I wrote a Common Lisp compiler in my primitive lisp that
       | converted Common Lisp straight into LLVM-IR. I didn't want to
       | develop my own language - who's got time for that? So I just
       | picked a powerful one (Common Lisp) with macros, classes, generic
       | functions, existing libraries, a community etc.
       | 
       | 3. I used alloca/stack allocated variables everywhere and let
       | mem2reg optimize what it could to registers. I exposed and used
       | the llvm::IRBuilder class that makes generating IR a lot easier.
       | 
       | 4. Then I picked an experimental, developing compiler "Cleavir"
       | written by Robert Strandh and bootstrap that with my Common Lisp
       | compiler. It's like that movie "Inception" - but it makes sense
       | :-).
       | 
       | Now we have a Common Lisp programming environment that
       | interoperates with C++ at a very deep level. Common Lisp stack
       | frames intermingle perfectly with C++ stack frames and we can use
       | all the C/C development, debugging and profiling tools.
       | 
       | This Common Lisp programming environment supports "Cando" a
       | computational chemistry programming environment for developing
       | advanced therapeutics and diagnostic molecules.
       | 
       | We are looking for people who want to work with us - if
       | interested and you have a somewhat suitable background - drop me
       | a message at info@thirdlaw.tech
        
         | e12e wrote:
         | > 4. Then I picked an experimental, developing compiler
         | "Cleavir" written by Robert Strandh and bootstrap that with my
         | Common Lisp compiler.
         | 
         | I was wondering if this was some new twist on clasp that I was
         | unaware of - but then discovered that I know that project as
         | SICL (not cleavir).
         | 
         | Since you had a primitive cl compiler (from 2) - 4 added a
         | runtime/advanced cl compiler?
         | 
         | https://github.com/robert-strandh/SICL
        
       | dceddia wrote:
       | I've recently started down the rabbit hole of building a video
       | player + editor from scratch and this feels so relatable!
       | 
       | Lots of stumbling around, reading scarce and outdated resources,
       | and finding that really not many people have written about this
       | stuff and it's "easier"/necessary to dive into the source of
       | projects that do similar things. I spent a solid day mapping out
       | how ffplay.c works to try to figure out how to synchronize audio
       | and video properly. I have no background in video but I'm
       | learning as I go, and things are falling into place, and it's
       | been pretty fun most of the time.
       | 
       | But I definitely resonate with the feeling that, if/when I get
       | this thing working, I won't really know if it's "correct", and I
       | also won't know how much that affects anything. It's like one of
       | those infinitely zoomable fractal images, there's always some
       | higher level of detail than the one you can currently see!
        
       | kayodelycaon wrote:
       | I think I'm missing something here.
       | 
       | From what I'm seeing, the author skipped reading the documented
       | LLVM source code in favor reading of a completely undocumented Go
       | port (reimplementation?) of one part of LLVM?
       | 
       | They also seemed to have misunderstood what the level of
       | abstraction LLVM's IR provides.
       | 
       | Did they miss the forest for the trees? I'd like to think I'm
       | wrong. :/
        
       | jasperry wrote:
       | I empathize with the author's struggle and the pain of having to
       | use the C++ API to generate LLVM IR. It's not relevant to Go, but
       | the OCaml LLVM bindings are kept up-to-date and the documentation
       | is there, though there's very little tutorial material to be
       | found. Still, I find it much cleaner and nicer to use than C++.
       | 
       | Trying to generate LLVM IR from scratch seems like a lost cause;
       | when you realize how much the library code is keeping track of
       | for you to make it possible to emit correct LLVM, you know that
       | replicating all that just isn't worth it.
        
       | sillycross wrote:
       | Not LLVM expert, but I don't agree with some of your arguments.
       | 
       | > My side of the code generator had to recognize when a variable
       | had already been defined and keep track of its pointer
       | 
       | For human, it's natural to write code text that reference each
       | variable by its name. However, for a compiler, it's really error
       | prone (and inefficient) to reference a variable by its string
       | name (for example, think about shadowing). The natural way to
       | reference an entity is by its object pointer, which is what LLVM
       | does. This is especially true considering LLVM is designed to
       | perform various complex transformations.
       | 
       | > There is a pass called mem2reg that will convert to SSA, but it
       | needs you to allocate and store variables in memory (instead of
       | in registers).
       | 
       | The purpose of mem2reg is to make your job easier. It's weird to
       | say that it "needs" you to allocate allocas for your variables:
       | that's what it _allows_ you to do (for your own convenience). If
       | you prefer to generate PHI nodes directly, you can just do so.
       | 
       | > LLVM IR has opinions about variable scope
       | 
       | Not sure what you are referencing to. LLVM only has 'alloca',
       | which knows nothing about "scope". It must be defined before
       | being referenced -- but this is true for everything in SSA.
        
         | scrubs wrote:
         | I've also gone down the lex/parse/llvm rabbit hole. The op
         | didn't write llvm is clueless or unstructured; she writes llvm
         | could have a better user manual. c++ is my meal ticket; llvm
         | can do nice stuff certainly.
        
           | sillycross wrote:
           | I definitely agree that it would be better if LLVM has a more
           | flattened learning curve and a more accessible manual.
           | 
           | I'm only pointing out that many "problems" listed in the post
           | are intentional design choices for good reasons. They are not
           | downsides that should be improved.
        
             | da_chicken wrote:
             | > I'm only pointing out that many "problems" listed in the
             | post are intentional design choices for good reasons. They
             | are not downsides that should be improved.
             | 
             | That's not really _less_ of a problem. If you can 't tell
             | _why_ the designers made a choice and what the purpose and
             | intention was, and there 's no documentation about that,
             | then your project has failed pretty catastrophically on
             | communication. That's not really a less severe failure or
             | an easier to fix problem than failing technically.
             | Although, I suspect a lot of developers would reflexively
             | disagree.
        
       | vzaliva wrote:
       | The author tries to figure out bravely some of LLVM IR concepts
       | and get some of them right and some wrong (like mem2reg purpose).
       | While I do not want to discurage this sort of exploration
       | learning I want to point out that what he is clearly lacking is
       | some CS fundamentals. Perhaps taking some compiler construction
       | classes from here https://www.classcentral.com/search?q=compiler
       | could made learning LLVM easier.
       | 
       | I also second a good point about LLVM examples and documentation
       | being heavy on C/C++ API. I was also generating IR code from
       | other language and found this C/C++ API focus annoying.
        
       | muth02446 wrote:
       | Shameless plug: https://github.com/robertmuth/Cwerg A lot simpler
       | than LLVM but also a lot less mature.
        
       ___________________________________________________________________
       (page generated 2021-09-07 23:01 UTC)