[HN Gopher] Learning Almost Nothing About LLVM
___________________________________________________________________
Learning Almost Nothing About LLVM
Author : mbellotti
Score : 63 points
Date : 2021-09-06 21:27 UTC (1 days ago)
(HTM) web link (bellmar.medium.com)
(TXT) w3m dump (bellmar.medium.com)
| veltas wrote:
| > Each of the major steps have tools available that will do 90%
| of the work for you. On the lexer/parser side there's ANTLR4,
| bison, yacc, flex.
|
| In my experience, those tools will do like 10% of the work of
| lexing or parsing for you, and you will spend equivalent to 20%
| of the work understanding how to use them and integrating them.
| And then you'll find out a sad hand-written recursive descent
| parser is faster in practice and is what e.g. GCC and clang use.
| mathgenius wrote:
| The python numba people learnt this lesson also: forget about the
| C++ api, just emit the (text) IR directly and feed that into
| LLVM.
| jcranmer wrote:
| Some comments:
|
| 1. The LLVM API is designed as a C++ API, and if you're serious
| about using LLVM, you're likely to have to actually work with the
| C++ API directly. There's a C API which is theoretically more
| stable than the C++ API, but it is very heavily gimped--it has
| basically no support for metadata, for example--and is mostly
| feasible only for the most basic usage entirely. Since the author
| brings up needing to use custom metadata, that suggests that they
| are intending to create custom optimization passes which is
| basically impossible except via the C++ API.
|
| 2. The complaint about metadata was very strange to me. I have
| had to work with custom metadata very recently with my work in
| LLVM, and I've had nothing like the pain the author suggests.
| (I've also had to deal with TBAA, which is definitely an area
| where LLVM lacks sorely in documentation, particularly examples).
| The "defined before use" just simply isn't an issue, because
| metadata is supposed to be global, so there is no define or
| use...
|
| I took a look at the llir library the author was using. On a
| quick inspection, it appears to be a library for generating
| textual LLVM IR _without having to link to LLVM at all_. Oy. The
| problem isn 't LLVM, nor even the LLVM IR itself. The problem is
| your library to generate LLVM IR.
|
| 3. About the SSA issue. LLVM actually does have facilities to
| generate SSA correctly without going through allocas (though that
| might be challenging to use for codegen instead of in the context
| of an optimization pass). But, as established above, the author
| is purposefully using LLVM in a way that precludes them from
| availing themselves of this feature. Note that LLVM specifically
| recommends that frontends generate variables as allocas in the
| entry block and letting the optimizer generate the SSA for you
| (see
| https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/LangI...).
|
| 4. I'm not entirely sure what the author means when discussing
| variable scope, but my guess is they neglected the "in the entry
| block" part of the standard guidelines for generating variables.
| If true, I'm left scratching my head where they got their answer
| to the SSA issue from that didn't mention that part--it's a very
| important part of generating alloca's correctly, and getting it
| wrong means you have some very broken mental semantics as to how
| it's supposed to work.
|
| 5. From the final paragraph, it seems the author's final step is
| to... write a parser for LLVM IR, and then convert their custom-
| parsed LLVM IR into SMTLIB2 code. As opposed to having LLVM parse
| the IR itself, visiting that IR, and then doing the same. Just...
| no.
|
| This isn't to say that LLVM is perfect in terms of documentation
| --it is _very_ far from it--but a lot of the issues seem to be
| related to trying to actively avoid working with LLVM itself.
| QuadrupleA wrote:
| Also found LLVM to be pretty poorly documented - often what's out
| there is out of date and incomplete. The sheer scale of it makes
| it hard to narrow down what you're looking for too - I've
| resorted to searching the source code a few times to see how
| something works.
|
| I love its multi-language, multi hardware target abilities, and
| wicked fast compiled code - but its complexity, glacially slow
| compiles, and sloppy documentation are currently a drag.
| rightbyte wrote:
| The documentation is a joke last time I tried to use it. The
| author should have tried libgccjit instead. It lacks LLVMs full
| capabilities, but atleast it is possible to get a grasp around
| the whole API and read it fully.
| veltas wrote:
| The article aptly describes what it's like to start working with
| essentially any large code project, be it open source or
| proprietary. Unfortunately you are never going to see a project
| with comprehensive documentation, I'm not sure what that would
| even look like.
|
| Good, maintainable code therefore becomes an important part of
| the documentation and being realistic about this from the start
| probably improves your "reference" or "manual", where it's better
| to focus on high-level or architectural concepts, and link to
| where to find the nitty-gritty in source.
| plafl wrote:
| >Unfortunately you are never going to see a project with
| comprehensive documentation, I'm not sure what that would even
| look like.
|
| It probably would look like Tex, FWIW.
|
| Edit: I move to the bookcase to look at TeX: The Program. There
| it is sitting next to my volumes of TAOCP. A reminder of my
| failures. I can almost feel the disaproving gaze of D. E.
| Knuth. I'm not worthy.
| eatonphil wrote:
| I had a not terrible time emitting LLVM IR text directly as part
| of an exploration of language backends.
|
| Here are the three parts: * Introduction:
| https://notes.eatonphil.com/compiler-basics-llvm.html *
| Conditionals: https://notes.eatonphil.com/compiler-basics-llvm-
| conditionals.html * And system calls:
| https://notes.eatonphil.com/compiler-basics-llvm-system-
| calls.html
|
| The hardest part I can remember is figuring out how LLVM IR's
| embedded assembly works since it's not exactly like Clang or
| GCC's IIRC. And the documentation was definitely confusing.
|
| I think the libraries wrapping LLVM IR are frankly harder to
| figure out than emitting the IR text directly.
| [deleted]
| sva_ wrote:
| https://archive.is/HFEKM
| drmeister wrote:
| I had a very different experience. I implemented Common Lisp
| using LLVM-IR as the backend (https://github.com/clasp-
| developers/clasp.git).
|
| 1. I started with a primitive lisp interpreter written in C++ and
| worked hard on exposing C++ functions/classes to my lisp using
| C++ template programming. LLVM is a C++ library, the C bindings
| are always behind the C++ API. So exposing the C++ API directly
| gave me access to the latest, and greatest API. That means you
| need to keep up with LLVM - but clang helps a lot because API
| changes appear as clang C++ compile time errors. I've been
| "chasing the LLVM dragon" (cough - keeping up with the LLVM API)
| from version 3.something to the upcoming 13.
|
| 2. I wrote a Common Lisp compiler in my primitive lisp that
| converted Common Lisp straight into LLVM-IR. I didn't want to
| develop my own language - who's got time for that? So I just
| picked a powerful one (Common Lisp) with macros, classes, generic
| functions, existing libraries, a community etc.
|
| 3. I used alloca/stack allocated variables everywhere and let
| mem2reg optimize what it could to registers. I exposed and used
| the llvm::IRBuilder class that makes generating IR a lot easier.
|
| 4. Then I picked an experimental, developing compiler "Cleavir"
| written by Robert Strandh and bootstrap that with my Common Lisp
| compiler. It's like that movie "Inception" - but it makes sense
| :-).
|
| Now we have a Common Lisp programming environment that
| interoperates with C++ at a very deep level. Common Lisp stack
| frames intermingle perfectly with C++ stack frames and we can use
| all the C/C development, debugging and profiling tools.
|
| This Common Lisp programming environment supports "Cando" a
| computational chemistry programming environment for developing
| advanced therapeutics and diagnostic molecules.
|
| We are looking for people who want to work with us - if
| interested and you have a somewhat suitable background - drop me
| a message at info@thirdlaw.tech
| e12e wrote:
| > 4. Then I picked an experimental, developing compiler
| "Cleavir" written by Robert Strandh and bootstrap that with my
| Common Lisp compiler.
|
| I was wondering if this was some new twist on clasp that I was
| unaware of - but then discovered that I know that project as
| SICL (not cleavir).
|
| Since you had a primitive cl compiler (from 2) - 4 added a
| runtime/advanced cl compiler?
|
| https://github.com/robert-strandh/SICL
| dceddia wrote:
| I've recently started down the rabbit hole of building a video
| player + editor from scratch and this feels so relatable!
|
| Lots of stumbling around, reading scarce and outdated resources,
| and finding that really not many people have written about this
| stuff and it's "easier"/necessary to dive into the source of
| projects that do similar things. I spent a solid day mapping out
| how ffplay.c works to try to figure out how to synchronize audio
| and video properly. I have no background in video but I'm
| learning as I go, and things are falling into place, and it's
| been pretty fun most of the time.
|
| But I definitely resonate with the feeling that, if/when I get
| this thing working, I won't really know if it's "correct", and I
| also won't know how much that affects anything. It's like one of
| those infinitely zoomable fractal images, there's always some
| higher level of detail than the one you can currently see!
| kayodelycaon wrote:
| I think I'm missing something here.
|
| From what I'm seeing, the author skipped reading the documented
| LLVM source code in favor reading of a completely undocumented Go
| port (reimplementation?) of one part of LLVM?
|
| They also seemed to have misunderstood what the level of
| abstraction LLVM's IR provides.
|
| Did they miss the forest for the trees? I'd like to think I'm
| wrong. :/
| jasperry wrote:
| I empathize with the author's struggle and the pain of having to
| use the C++ API to generate LLVM IR. It's not relevant to Go, but
| the OCaml LLVM bindings are kept up-to-date and the documentation
| is there, though there's very little tutorial material to be
| found. Still, I find it much cleaner and nicer to use than C++.
|
| Trying to generate LLVM IR from scratch seems like a lost cause;
| when you realize how much the library code is keeping track of
| for you to make it possible to emit correct LLVM, you know that
| replicating all that just isn't worth it.
| sillycross wrote:
| Not LLVM expert, but I don't agree with some of your arguments.
|
| > My side of the code generator had to recognize when a variable
| had already been defined and keep track of its pointer
|
| For human, it's natural to write code text that reference each
| variable by its name. However, for a compiler, it's really error
| prone (and inefficient) to reference a variable by its string
| name (for example, think about shadowing). The natural way to
| reference an entity is by its object pointer, which is what LLVM
| does. This is especially true considering LLVM is designed to
| perform various complex transformations.
|
| > There is a pass called mem2reg that will convert to SSA, but it
| needs you to allocate and store variables in memory (instead of
| in registers).
|
| The purpose of mem2reg is to make your job easier. It's weird to
| say that it "needs" you to allocate allocas for your variables:
| that's what it _allows_ you to do (for your own convenience). If
| you prefer to generate PHI nodes directly, you can just do so.
|
| > LLVM IR has opinions about variable scope
|
| Not sure what you are referencing to. LLVM only has 'alloca',
| which knows nothing about "scope". It must be defined before
| being referenced -- but this is true for everything in SSA.
| scrubs wrote:
| I've also gone down the lex/parse/llvm rabbit hole. The op
| didn't write llvm is clueless or unstructured; she writes llvm
| could have a better user manual. c++ is my meal ticket; llvm
| can do nice stuff certainly.
| sillycross wrote:
| I definitely agree that it would be better if LLVM has a more
| flattened learning curve and a more accessible manual.
|
| I'm only pointing out that many "problems" listed in the post
| are intentional design choices for good reasons. They are not
| downsides that should be improved.
| da_chicken wrote:
| > I'm only pointing out that many "problems" listed in the
| post are intentional design choices for good reasons. They
| are not downsides that should be improved.
|
| That's not really _less_ of a problem. If you can 't tell
| _why_ the designers made a choice and what the purpose and
| intention was, and there 's no documentation about that,
| then your project has failed pretty catastrophically on
| communication. That's not really a less severe failure or
| an easier to fix problem than failing technically.
| Although, I suspect a lot of developers would reflexively
| disagree.
| vzaliva wrote:
| The author tries to figure out bravely some of LLVM IR concepts
| and get some of them right and some wrong (like mem2reg purpose).
| While I do not want to discurage this sort of exploration
| learning I want to point out that what he is clearly lacking is
| some CS fundamentals. Perhaps taking some compiler construction
| classes from here https://www.classcentral.com/search?q=compiler
| could made learning LLVM easier.
|
| I also second a good point about LLVM examples and documentation
| being heavy on C/C++ API. I was also generating IR code from
| other language and found this C/C++ API focus annoying.
| muth02446 wrote:
| Shameless plug: https://github.com/robertmuth/Cwerg A lot simpler
| than LLVM but also a lot less mature.
___________________________________________________________________
(page generated 2021-09-07 23:01 UTC)