[HN Gopher] Compilers: Incrementally and Extensibly (2024)
___________________________________________________________________
Compilers: Incrementally and Extensibly (2024)
Author : todsacerdoti
Score : 138 points
Date : 2025-04-05 12:55 UTC (1 days ago)
(HTM) web link (okmij.org)
(TXT) w3m dump (okmij.org)
| soegaard wrote:
| The notes reference: "An Incremental Approach
| to Compiler Construction" Abdulaziz Ghuloum
| http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf
|
| Can't recommend it enough.
| sitkack wrote:
| Andy Keep - Writing a Nanopass Compiler
| https://www.youtube.com/watch?v=Os7FE3J-U5Q
|
| https://github.com/akeep/scheme-to-c
| mncharity wrote:
| And as a github search[1], I recall it turning up fun work[2]
| amidst clutter.
|
| Hmm, I wonder if an LLM could sift a "Essentials of
| Compilation" search[3] for _interesting_ repos?
|
| [1] https://github.com/search?q=Ghuloum&type=repositories
| https://github.com/search?q=incremental%20approach%20to%20co...
| [2] just some old bookmarks: https://github.com/namin/inc
| https://github.com/iambrj/imin
| https://github.com/jaseemabid/inc [3]
| https://github.com/search?q=Essentials%20of%20Compilation&ty...
| djoldman wrote:
| Learning and working on this stuff almost always felt "all or
| nothing," (either it worked or it didn't), which was very
| frustrating.
|
| This looks interesting:
|
| > Like Ghuloum [2006], this course is different. Our development
| is by extending the complete, working compiler one small step at
| a time. At each step we end up with the working compiler, for a
| subset of the source language. Specifically, the methodology
| ([Ghuloum, 2006, SS2.6]):
|
| > 1. choose a small subset of the source language that is easy to
| directly compile to assembly;
|
| > 2. Write the extensive test cases;
| Jtsummers wrote:
| Another text using this approach:
|
| https://mitpress.mit.edu/9780262047760/essentials-of-compila... -
| Jeremy Siek's _Essentials of Compilation_ (there 's also a Python
| version)
|
| Admittedly I didn't finish working through it (life got in the
| way), but I worked through a good chunk of it with a colleague
| from a non-CS background. He wanted to understand compilers and
| it seemed to be much more effective than other texts he'd tried
| to work through. I'll have to look at this one at some point,
| probably this summer when things are usually calmer for me.
| sitkack wrote:
| This is such a wonderful book.
|
| This book is open access,
| https://github.com/IUCompilerCourse/Essentials-of-Compilatio...
| both the Python and the Racket versions.
|
| If you are the kinda nerd that has nerd book club with friends,
| this is book is _perfect_ for it.
| nate_martin wrote:
| >The compiler itself is to be developed in OCaml.
|
| These seems like a misstep that I've seen in a few other compiler
| implementation courses. For some reason these programming
| language professors always insist on completing the project in
| their personal favorite language (Haskell, OCaml, Standard ML,
| etc).
|
| As a student this makes the project significantly more difficult.
| Instead of just learning how to implement a complier, you're also
| learning a new (and likely difficult) programming language.
| Jtsummers wrote:
| Learning a new programming language isn't that hard of a task,
| every decent programmer will learn a dozen or maybe even dozens
| over their career.
|
| Also, neither OCaml nor SML are hard to learn. Haskell is more
| challenging, but that's because it's become, in a sense,
| multiple languages. The core of Haskell is no harder than OCaml
| or SML to learn, except for reasoning about lazy evaluation and
| some of its consequences. All the things people use on top of
| Haskell, though, does make it more to learn but what you'd need
| to reach equivalent utility as SML or OCaml for a compilers
| course is not that hard to learn.
| remexre wrote:
| Many universities that use OCaml in upper-division courses
| also use it in lower-division; my university requires all CS
| majors to take a course that is taught in OCaml, and covers
| higher-order programming, "advanced" (Hindley-Milner) type
| systems, equational reasoning, etc., typically in their
| sophomore year.
|
| The compilers class can then be taught in it without worrying
| about that problem much.
| mrkeen wrote:
| I grew up with "easy" languages. Pascal, VB, C, C++, Java, C#.
| And frankly I'm getting real sick of them.
|
| I'm porting Dijkstra's algorithm over to C# at the moment, and
| in the last several hours here's the two most clownish things
| that have happened:
|
| 1) I have: if (node is null) { ..
| }
|
| My IDE is telling me "Expression is always false according to
| nullable reference types' annotations". Nevertheless it enters
| that section every time.
|
| 2) I have: SortedSet<int> nums = [];
| Console.Out.WriteLine(nums.Min);
|
| You know what this prints? 0
|
| The minimal element of a set which has no elements is 0.
|
| Yes, every language has its warts, and anecdotes like this
| aren't going to change anyone's mind. I only wrote these up
| because they're hitting me right now in one of those languages
| that companies use "because anyone can learn them". I like
| harder languages better than easy languages, because they're
| easy.
| neonsunset wrote:
| This is possible if you disregarded compiler warnings and
| assigned null to a non-nullable location.
| nextaccountic wrote:
| Feels like this should be a compiler error. Or saying
| otherwise, this location isn't very non-nullable if you can
| assign null to it
| neonsunset wrote:
| Just add <WarningsAsErrors>nullable</WarningsAsErrors> to
| the project manifest.
| tester756 wrote:
| 2)
|
| In general I'd recommend using Min() (from LINQ) which works
| as expected
|
| But this property has this remark: "If the SortedSet<T> has
| no elements, then the Min property returns the default value
| of T."
|
| 1)
|
| Feels like you used compiler hints incorrectly
| mattgreenrocks wrote:
| Algebraic data types are superb for compilers. Case
| exhaustiveness checks are really helpful in this domain,
| especially when doing any sort of semantic analysis.
|
| Frankly, I try to avoid languages that don't have ADTs as much
| as possible. They are incredibly useful for specifying
| invariants via your design, and their constraints on inputs
| lend themselves to easier implementation and maintenance.
| linguae wrote:
| In defense of such compilers professors, part of the purpose of
| a good undergraduate computer science program is to expose
| students to different programming language paradigms. Computing
| professionals are expected to grasp new languages, APIs, and
| tools quickly. Additionally, certain problems are easier to
| express in certain paradigms. For example, I would use a
| language like C or perhaps Rust in an operating systems class
| since we need to operate at a level that is much closer to the
| hardware. If I were teaching a course on machine learning, I'd
| use Python for its excellent ecosystem of libraries, though R
| and Julia are good alternatives. A course on relational
| databases should teach basic SQL.
|
| Back in the 2000s there were some CS undergraduate programs
| that attempted to use Java in the entire curriculum, from
| introductory courses all the way to senior-level courses such
| as compilers. There was even an operating systems textbook that
| had Java examples throughout the text
| (https://www.amazon.com/Operating-System-Concepts-Abraham-
| Sil...).
|
| I think using only one language for the entire undergraduate CS
| curriculum is a mistake. Sure, students don't have to spend
| time learning additional languages. However, everything has to
| fit into that one language, depriving students the opportunity
| to see how languages that are better suited to specific types
| of problems could actually enhance their understanding of the
| concepts they are learning. In the case of Java, it's a good
| general-purpose programming language, but there are classes
| such as computer organization and operating systems where it's
| important to discuss low-level memory management, which
| conflicts with Java's automatic memory management.
|
| When it comes to writing compilers, it turns out that
| functional programming languages with algebraic data types and
| pattern matching make working with abstract syntax trees much
| easier. I learned this the hard way when I took compilers in
| 2009 at Cal Poly. At the beginning of the course, we were given
| two weeks to write an AST interpreter of a subset of Scheme. My
| lab partner and I didn't like Dr. Scheme (now known as Racket),
| which we "endured" the previous quarter in a class on
| implementing programming language interpreters, and so we set
| about writing our interpreter in C++. It turned out to be a big
| mistake. We got it done, but it took us 40 hours to implement,
| and we had a complex class hierarchy to implement the AST. We
| realized that Dr. Scheme's features were well-suited for
| manipulating and traversing ADTs. We never complained about Dr.
| Scheme or functional programming again, and we gladly did our
| other compiler assignments in that language.
|
| 16 years later, I teach Haskell to my sophomore-level students
| in my discrete mathematics class at a Bay Area community
| college that uses C++ for the introductory programming course.
| DKordic wrote:
| Interesting arguments.
|
| Lisp Machines? "Separation of Concerns" [1]? Conway's Law?
|
| [1]: http://alarmingdevelopment.org/?p=805
| sn9 wrote:
| You have to pick some language and some amount of students are
| likely to be new to it, so you might as well pick one that
| excels at the task of writing compilers as ML-family languages
| are.
| DKordic wrote:
| I have a challenge for You: make applications of Your choice of
| BlackPill [1] or Radxa ZERO 3E [2].
|
| Hint: You might be interested in Forth Systems and Lisp
| Machines.
|
| [1]: https://github.com/WeActStudio/WeActStudio.MiniSTM32F4x1
| [2]: https://radxa.com/products/zeros/zero3e
| norir wrote:
| The hardest part of writing a compiler is designing a language.
| In my opinion, most compiler courses seem too fixated on the
| implementation details. The fact that this course targets x86
| already misses the mark for me. Computers are very fast, there is
| no reason to start with a language that translates to asm. This
| is incidental complexity that distracts from language design.
| wseqyrku wrote:
| > The hardest part of writing a compiler is designing a
| language.
|
| That's a completely separate field. If you want to learn about
| language design you should study that, though it helps if you
| have some background on both to get started with either one.
| dccsillag wrote:
| Hard disagree. There are a lot of implementation "details"
| that, if you want to do properly, are a lot of hard work and
| very much nontrivial. For example, do try to write a compiler
| with efficient incremental compilation, and especially one that
| does so while also having optimization passes. And that's just
| one example, most things in compiler implementations actually
| turn out to be fairly complex. And lots of features that modern
| languages support e.g. more powerful typesystems,
| trait/typeclass systems, etc. are also very very tricky.
|
| While designing a language is by no means trivial, it generally
| really occupies just a very small fraction of the
| language/compiler developer's time. And, in most cases, the two
| things (language design + implementation details) have to walk
| hand-in-hand, since small changes to the language design can
| vastly improve the implementation end.
| tiu wrote:
| Regarding the language design part, many universities I know of
| offer a {Advanced} Programming Languages class later on after a
| Compilers 101 class and that is where many of the topics like
| EBNF, Grammar Design along with different constructs are done.
|
| Besides, for someone learning it for the first time I think
| designing a new language seems a bit difficult.
| f1shy wrote:
| I disagree. First of all, you can write a compiler for any
| existing language. But second, inhave experience writing
| compilers and interpreters. For me the hardest was always to
| give good information/feedback on errors. Because the compiler
| cannot go ahead, means it is "confused" eliminates that
| confusion to help the user is very much non trivial.
| codr7 wrote:
| I didn't make much progress on my own languages until I
| discovered Forth, always seemed to get stuck and lose motivation
| in the parser.
|
| Writing the parser is the least interesting part of building a
| language imo.
| nickpsecurity wrote:
| That varies by person. While it may or may not interest, I'll
| say learning how to write or generate parsers is a useful skill
| that might keep paying off.
|
| I keep running into situations where I'd like to describe data
| in a high level. BNF grammars often fit those situations, are
| more readable than regex's, and could make for nice parsers.
| One must know how to parse, though. :)
| DKordic wrote:
| Exactly! "Life After Parsing" [1].
|
| IMHO C Family is a symptom of "low-level" of competence. I
| often get surprised by superficiality of their arguments.
| "Sophistry" is a useful concept (in a single word). Ad nauseam?
|
| [1]:
| http://www.semanticdesigns.com/Products/DMS/LifeAfterParsing...
| tiu wrote:
| I just got over writing 2 (well 2.5) toy compilers and I think a
| lot of the material in the compiler-teaching space lack some
| subtle developmental aspects.
|
| I wish there was a course designed somewhere which talked about
| more ingrained issues: how to structure/design the AST[0], buffer
| based vs noncontextual tokenization/parser design, index handling
| and error sync in the parser, supporting multiple codegen
| architectures, handling FFI, exposing the compiler as an API for
| external tooling, namespaces and linkage handling etc. etc. etc.
|
| It is refreshing to see how Carbon designed some of its
| components (majorly the frontend, yet to take a look at the
| backend) as it touches on some of the subtleties I mentioned. If
| someone is starting out on writing one, I would recommend taking
| a look at it or any of the talks.
|
| Always nice to see new material coming up. A few resources that I
| would like to mention would be dabaez's compiler course, Khoury
| college's compiler course (in Rust, previously i think and
| Ocaml), Nora Sandler's book as well as http://compilerbook.org;
| Which I consider to be the best guide out there for writing small
| learning compilers, the videos are good as well.
|
| [0]: Some related content that I enjoyed reading:
| https://lesleylai.info/en/ast-in-cpp-part-1-variant/
| seanmcdirmid wrote:
| You can learn all that stuff by doing and looking at other
| people's design, it's a bit niche to package that up into a ln
| advanced language design/compiler class, everyone is making
| different trade offs so are focusing on different things. There
| also isn't really a market for more than a few highly trained
| language designers and implementers, and most wind up doing
| different things than what they were trained for.
| tester756 wrote:
| >supporting multiple codegen architectures
|
| Shouldnt it be a result of modular software design?
|
| >exposing the compiler as an API for external tooling
|
| Isn't it just generating libraries instead of executables?
| remexre wrote:
| Making the _compiler itself_ provide an API enables things
| like LSP, which don't want to generate machine code at all. A
| traditional single-pass compiler usually can't accommodate
| this without re-plumbing.
| halffullbrain wrote:
| The Eclipse Compiler for Java [1] is a notable exception,
| architected around incremental compilation, an API for
| "live" AST manipulation, and a layered non-batch approach
| to when to invoke various analysis steps.
|
| The LSP for Java [2] used in eg. VSCode's Java plugins,
| builds on this API.
|
| But, no, I haven't seen a generalized approach to this
| architecture discussed in literature.
|
| 1: https://github.com/eclipse-jdt/eclipse.jdt.core 2:
| https://github.com/eclipse-jdtls/eclipse.jdt.ls
| tester756 wrote:
| I'm aware, I've used C#/.NET Compiler ecosystem and their
| Compiler as a Service approach
|
| But it seems like you're just creating libs that everyone
| and embedd / use and that's it
|
| e.g `auto ast = Compiler.GenerateAST(code);`
| tiu wrote:
| I wrote 'multiple codegen architectures' instead of 'multiple
| architectures for codegen'.
|
| As far as I have done in the toy compilers and seen the
| things in actual production ready compilers, the codegen is
| still very much tied to the one thing or the other rest llvm.
| elcritch wrote:
| Nim has multiple backends and is relatively mature. It's
| fairly readable as compilers go.
|
| There's also a new experimental rewrite of the Nim compiler
| called Nimony which targets a new intermediate called NIFC.
| That is intended to the be transformed to C, LLVM,
| JavaScript, etc.
| tester756 wrote:
| >I wrote 'multiple codegen architectures' instead of
| 'multiple architectures for codegen'.
|
| You want to generate e.g x86 + ARM + RISCV, yea?, and
| shouldnt it be a result of modular architecture?
|
| like your various codegens just take your AST and generate
| output
| yencabulator wrote:
| The compilerbook link is broken in multiple ways, but
| www.compilerbook.com redirects to
| https://www3.nd.edu/~dthain/compilerbook/
| tiu wrote:
| Yes, thank you. That is the Douglas Thain book I meant.
| (compilerbook.org seems to work for me, only broken thing
| would be the semicolon).
| convivialdingo wrote:
| I've been modifying the the MIR c2mir JIT compiler to extend the
| c11 compiler to support simple classes, boxed strings(immutable,
| nun-nullable) with AOT support.
|
| Imagine if Java and C had a love child, basically.
|
| MIR is a fantastic piece of engineering.
|
| Honestly the hardest part is representing types. Having played
| around with other compilers it seems to be a typical problem.
|
| I'm stuck in the minutiae of representing highly structured
| complexity and defining behavior in c. I can understand why many
| languages have an intermediate compiler - it's too much work and
| it will probably change over time.
| fuhsnn wrote:
| >I've been modifying the the MIR c2mir JIT compiler to extend
| the c11 compiler to support simple classes, boxed
| strings(immutable, nun-nullable) with AOT support
|
| Is the project public? Really interested in the AOT support,
| I've always wanted to see its generated code but didn't find an
| easy way to dump it.
| convivialdingo wrote:
| Once things are a little more stable, I will put it up!
|
| Right now you can just break before the (fun_call)() delegate
| and disassemble the fun_call in gdb.
|
| The basic trick is to add reloc support to the x86 translate
| code, mark external calls and replace with 0x0 placeholders,
| and copy out the machine_code and data segment output to an
| object file.
|
| I can do basic main functions with simple prints calls but
| not much more. It's a hack for now but I'll refactor it until
| it's solid.
___________________________________________________________________
(page generated 2025-04-06 23:02 UTC)