[HN Gopher] Compiler Development: Rust or OCaml?
       ___________________________________________________________________
        
       Compiler Development: Rust or OCaml?
        
       Author : bshanks
       Score  : 75 points
       Date   : 2023-08-06 21:05 UTC (1 days ago)
        
 (HTM) web link (hirrolot.github.io)
 (TXT) w3m dump (hirrolot.github.io)
        
       | devit wrote:
       | The author doesn't seem to have much experience in writing Rust.
       | 
       | For instance, passing RefCell<u32> by value as their code does
       | makes no sense (just use u32...), and the code seems to have a
       | lot of clones, most of which are probably unnecessary, while not
       | having a single instance of the "mut" keyword.
       | 
       | In fact, I'm pretty sure it's completely broken, since their
       | gensym doesn't do what they want, due to their wrong use of
       | clones and refcells (it should take an &mut u32 and just
       | increment it).
       | 
       | And definitely not idiomatic at all.
        
       | beaub wrote:
       | Compilers are in this weird spot where they are really
       | mathematically defined programs (which OCaml excels at
       | implementing), while also having high runtime efficiency as a
       | requirement (the reason why C/C++ are such prominent languages
       | for compilers).
       | 
       | With such requirements, I think a point that is fair to make is
       | that Rust acts as a great middle-ground. It avoids the cost of
       | automatic memory management and provides low-level control while
       | also having a more powerful type system and a more "functional"
       | style.
       | 
       | Brushing off the actual efficiency of the produced binary seems
       | like a huge oversight when dealing with a compiler.
        
         | memefrog wrote:
         | I am not sure that the runtime efficiency of the compiler
         | binary is that important. People like fast compile times, but
         | that is more to do with language design than the choice of
         | language for the compiler.
         | 
         | You could write a compiler for Pascal in Python or another very
         | slow language and it would be faster than a Rust or C++
         | compiler written in Rust or C++. That is because those
         | languages have designs that make compilation algorithmically
         | slow, while Pascal was designed to be fast to compile.
        
         | tester756 wrote:
         | >while also having high runtime efficiency as a requirement
         | (the reason why C/C++ are such prominent languages for
         | compilers
         | 
         | I'd want to believe that compiler engineers really put effort
         | into compilers performance, but I just don't buy it.
         | 
         | LLVM, GCC, MSVC, etc, etc all of them touch C/C++ and are slow
         | as hell
         | 
         | For compilers written in other languages I'd say that still
         | LLVM is the bottleneck
         | 
         | >It avoids the cost of automatic memory management and provides
         | low-level control.
         | 
         | What "low-level control" do you need? It is not firmware
         | development.
         | 
         | Btw: Microsoft rewrote their C# compiler from C++ to C#.
        
       | CollinEMac wrote:
       | Why does it seem like I'm hearing about OCaml all the time now?
       | It could just be frequency bias but it wasn't that long ago that
       | I'd never heard of it and now it seems to be getting a lot of
       | attention online.
        
         | UncleOxidant wrote:
         | > Why does it seem like I'm hearing about OCaml all the time
         | now?
         | 
         | I felt that way about a dozen years ago. These things have
         | cycles, apparently. But they also recently released multi-core
         | OCaml in OCaml 5 which opens some doors for OCaml that were
         | previously not open.
        
         | zem wrote:
         | the ocaml ecosystem has been going through something of a
         | renaissance over the last few years (i believe because the
         | build and package management tooling hit some sort of
         | inflection point with dune and opam respectively), so there's
         | been a lot of increased interest in it. it was (imo, of course)
         | always a very pleasant language to use, and produced small and
         | fast executables; the tooling was what was really holding it
         | back.
        
         | DonaldPShimoda wrote:
         | I think we're cycling back towards a general preference for
         | statically typed languages, for one thing. Additionally, a
         | number of traditionally functional language characteristics
         | have been finding more widespread adoption among popular
         | languages. Putting these together, OCaml is on a short list of
         | languages that are functional and statically typed and, uh,
         | perhaps "intuitive" is the word I want -- Haskell is not very
         | intuitive for many people due to its lazy evaluation scheme.
         | 
         | In my opinion, OCaml would see even _more_ widespread use if
         | the documentation were improved. I find it a chore to figure
         | out how to use OCaml well. I also would like to use third-party
         | libraries like Jane Street 's Base because they've put a lot of
         | work into providing even more functionality in their standard
         | library, but their documentation is absolutely atrocious (where
         | it exists at all).
         | 
         | OCaml is a mature language but does not have a very supportive
         | ecosystem. I'm hoping the renewed interest will prompt changes
         | there.
        
       | auggierose wrote:
       | I suggest TypeScript.
        
       | josephg wrote:
       | I normally love articles comparing programming languages at real
       | tasks, but this article seems very low quality to me. The author
       | clearly doesn't understand how rust thinks about programs.
       | Instead, they're trying to pretend that rust is an alternate
       | syntax for ocaml and being surprised to find it comes up short.
       | 
       | The same article could easily be written the other way around. We
       | could start with a high performance rust program (which makes use
       | of arena allocators, internal mutation and any other rust
       | features you love) and then try and convert it line by line into
       | ocaml. We would find that many of rust's concepts can't be
       | clearly expressed in ocaml. The ocaml code would end up uglier
       | and measurably slower than rust. And just like that the article
       | would reach the opposite conclusion - that rust is clearly the
       | better language!
       | 
       | But this is silly.
       | 
       | In general, you obviously can't translate between languages line
       | by line like this and expect to have a good time. A beautiful C
       | program is constructed using different ideas than a beautiful Lua
       | program. And a beautiful Ocaml program is very different from a
       | beautiful rust program.
       | 
       | Some obvious examples of ocaml ideas being overapplied to rust in
       | this article:
       | 
       | 1. The types don't really need to be wrapped in Rc here.
       | 
       | 2. Rust generally prefers mutable imperative code over
       | applicative code. And if you insist on applicative patterns,
       | functions should take a &Foo.
       | 
       | 3. Rust code usually doesn't rely on recursion that much, so the
       | lack of guaranteed TCO isn't something people in the community
       | care about.
       | 
       | 4. Rust is optimized for runtime performance over code beauty or
       | code size. Of course rust is less elegant looking than a garbage
       | collected language! The trade is that it should also run faster.
       | But where are the benchmarks to make the comparison fair?
       | 
       | The match example is just straight out bad rust code. This code:
       | fn eval(term: &Term) -> Value {             match term {
       | Bool(b) => Value::Bool(*b),                 Not(m) => match
       | eval(m) {                     Value::Bool(b) => Value::Bool(!b),
       | _ => panic!("`Not` on a non-boolean value"),                 },
       | // ... lots more nested matches & panics             }         }
       | 
       | Can be flattened, to approximately halve the length of the
       | program like this:                   fn eval(term: &Term) ->
       | Value {             match term {                 Bool(b) =>
       | Value::Bool(*b),                 Not(Value::Bool(b)) =>
       | Value::Bool(!b),                 // ... (all other valid
       | patterns)                 _ => panic!("{term} invalid"),
       | }         }
       | 
       | There's an old saying: "Every programming language you learn
       | should teach you to see programs in a new way". Rust is not a
       | crappy alternate syntax for ocaml any more than ocaml is a
       | crappy, alternate syntax for rust. The only thing I learned from
       | the article is that the author doesn't know rust well enough to
       | evaluate it.
        
       | yafbum wrote:
       | This starts out as a fair comparison but evolves pretty quickly
       | towards a one-sided recommendation for Ocaml. I'm quite sure that
       | there are _some_ advantages of Rust that are not listed here and
       | would be curious to learn more about them too.
        
         | eddd-ddde wrote:
         | Even the written tone feels biased > Here is CPS conversion
         | written in Rust vs > The same algorithm in idiomatic OCaml
         | 
         | Sounds to me like they are comparing bad code and good code,
         | not the languages themselves.
        
       | nicechianti wrote:
       | [dead]
        
       | zerr wrote:
       | Real World OCaml book needs a second edition.
        
         | hardwaregeek wrote:
         | There is one! https://dev.realworldocaml.org
        
       | telios wrote:
       | I don't quite follow the algorithm here, but I'm not sure the
       | `gensym` Rust implementation works as expected. `RefCell::clone`
       | does not return a copy of the reference; it returns a new
       | `RefCell` with the current `RefCell`'s value, resulting in
       | duplicate IDs. However, a `RefCell` isn't even necessary here,
       | since a `Cell` would do just fine - and you'd pass around a
       | reference to that `Cell` instead of cloning it.
       | 
       | It does feel like the code was ported as-is to Rust, and only
       | adjusted slightly to compile; there are going to be pain points
       | as a result of this process. I suspect this is the source of some
       | of the author's complaints, especially given:
       | 
       | > Although it provides us with a greater sense of how the code is
       | executing, it brings very little value to the algorithm itself.
       | 
       | Rust is, in general, for people who find value in having that
       | information; it is okay to not want to have to worry about
       | ownership, borrowing, safety, etc., but it seems a bit odd to
       | complain about this when that's what Rust is for? If you want to
       | focus on just the algorithm, and not how it's executing, then
       | OCaml is definitely a valid choice.
       | 
       | However, the point about GADTs - can Rust's recently-stabilized
       | GATs not work in the same way? Though I will admit that Rust's
       | GATs don't seem nearly as powerful as OCaml's GADTs in this
       | regard.
        
         | zem wrote:
         | > it seems a bit odd to complain about this when that's what
         | Rust is for
         | 
         | that's the point of the article - rust gives you a lot of low-
         | level control, but if you don't actually need that control then
         | you're paying the cost in ergonomics for nothing.
        
       | pjmlp wrote:
       | If one cares about productivity in typical compiler data
       | structures, naturally OCaml.
        
       | norir wrote:
       | Here are the features that I think are most important for
       | compiler development:
       | 
       | 1) built in eval -- this allows you to transpile to the host
       | language which is invaluable for writing small tests
       | 
       | 2) multiline string syntax -- for evaling more than just one
       | liners
       | 
       | 3) built in associative and sequential arrays (for the ast)
       | 
       | 4) first class closures
       | 
       | 5) panic support (for aborting early from unimplemented use
       | cases)
       | 
       | The AST can be represented as an associative array. Each element
       | type can have a 'type' field and rather than pattern matching,
       | you can use if/else. Performance doesn't really matter for the
       | bootstrap compiler because it will only ever be run on relatively
       | small input sets. To get started, you simply walk the ast to
       | transpile to the host language. The snippet is then evaled in the
       | host language to test functionality. Closures allow you to
       | implement the visitor pattern for each ast node, which allows
       | contextual information to be seamlessly interwoven amongst ast
       | nodes during the analysis/transpilation steps.
       | 
       | Keeping all of this in mind, I have identified luajit as my
       | personal favorite language for compiler development. It checks
       | the boxes above, has excellent all around performance for a
       | dynamic language (particularly when startup time is included --
       | js implementations may beat it on many benchmarks but almost
       | always have slow start up time relative to luajit) and provides a
       | best in class ffi for host system calls. You can run 5000+ line
       | lua scripts faster than most compilers can compile hello, world.
       | 
       | The other reason I like lua(jit) is the minimalism. Once you
       | master lua (which is possible because of its small size) it
       | becomes very obvious that if you can implement something in lua,
       | you can translate the lua implementation to essentially any other
       | language. In this way, there is a sense in which writing a lua
       | implementation becomes almost like a rosetta stone in which a
       | translation can be produced for nearly any other language. With
       | more powerful languages, it is hard to resist the temptation to
       | utilized features that can't always be easily transported to
       | another language. In other words, lua makes it easy to write
       | portable code. This is true both in the sense that lua can be
       | installed on practically any computer in at most a few minutes
       | and in the sense that the underlying structure of a lua program
       | that transcends the syntax of the language can be ported to
       | another computing environment/language.
       | 
       | Another benefit of transpiling to lua is that your new language
       | can easily inherit lua's best properties such as embeddability,
       | fast start up time and cross platform support while removing
       | undesirable features like global variables. Your language can
       | then also be used to replace lua in programs like nginx, redis
       | and neovim that have lua scripting engines. This of course
       | extends to transpiling to any language, which again should be
       | relatively easy if you have already transpiled to lua.
        
         | dunham wrote:
         | I would add pattern matching. I've found this really helpful
         | for manipulating ASTs by matching multiple levels of the tree
         | and pulling out values simultaneously.
        
         | CapsAdmin wrote:
         | I chose luajit for my language and while I agree with many of
         | your points I really miss a typesystem. Somewhat ironically I'm
         | working on a typesystem for luajit..
         | 
         | I also wish it was a bit more performant, but here it's likely
         | my medium to high level code and not luajit's fault. However
         | running the test suite in plain Lua seem some order of
         | magnitude slower than luajit, so it's a lot faster than plain
         | Lua at least.
        
         | CapsAdmin wrote:
         | Is your project public? I'm curious to see how you would go
         | about writing a transpiler in luajit.
        
       | convolvatron wrote:
       | isn't rust kind of a nonstarter for cfg representations which are
       | irreducibly cyclic? yes, one can use the pointers are array-
       | indices thing, but ..
        
         | tomjakubowski wrote:
         | No, it's not. For a toy compiler (or for compiling programs
         | with small CFGs), you can use Rc/Weak to represent cycles. For
         | a "real" compiler you'd be using an arena for allocations
         | anyway, which amounts to pointers-are-array-indices.
        
       | nu11ptr wrote:
       | The article has fair points, but after trying OCaml and Rust... I
       | chose Rust. Without going into huge amounts of detail, a compiler
       | is more than simply a parser/ast/code generator and there are
       | other aspects to consider such as the richness of the ecosystem,
       | editor support, etc. Also, I suspect the author is more familiar
       | with OCaml than Rust as you wouldn't typically box everything but
       | likely use an arena for the AST. In the same way, I am more
       | familiar with Rust than OCaml, so some of the warts I observed
       | may be to lack of familiarity. As such I suspect the authors
       | perspective is biased...as is mine. Nothing wrong with that.
        
         | ecshafer wrote:
         | Can you write an equivalent piece of code that shows why Rust
         | wins here with more familiarity and leveraging the ecosystem?
         | With compilers I don't think there is a huge amount of using
         | the ecosystem. I think what TFA does is a good case study in
         | trying to be objective: write it both ways and compare.
        
           | nu11ptr wrote:
           | I'm just saying a compiler is a program, and while the more
           | important things are heavy algorithm related, supporting
           | libraries like those for error handling, etc. all still
           | matter and add up. No problem if you disagree - just my
           | perspective. This isn't so much an objective thing as it is a
           | personal opinion.
        
             | brmgb wrote:
             | Personally, it's not so much that I disagree. It's more
             | that Ocaml has a top notch ecosystem for compiler writing.
             | That's probably by far its strongest point with library
             | like Menhir having few equivalent in different ecosystem.
             | 
             | Not that there is anything wrong with Rust if you are ready
             | to pay the price of having so much low level things to deal
             | with.
        
               | hardwaregeek wrote:
               | Can you explain what Menhir does better than other parser
               | frameworks? For what it's worth, I'm not a huge fan of
               | parser frameworks. I tend to prefer hand written parsers,
               | either via a combinator library or fully manually.
               | 
               | Rust has a pretty darn good ecosystem too btw. chumsky
               | for parsing, rowan for syntax trees, salsa for
               | incremental computation, miette for errors,
               | inkwell/walrus for code generation.
        
               | arcticbull wrote:
               | Wow those library recommendations are fantastic,
               | especially Chumsky. Thanks for sharing.
        
               | brmgb wrote:
               | It can generate elegant and efficient parsers for LR(1)
               | grammars.
               | 
               | > I tend to prefer hand written parsers, either via a
               | combinator library or fully manually.
               | 
               | That's common with people used to languages which provide
               | poor parser generators.
        
               | moomin wrote:
               | I mean, it's also common with people who value great
               | feedback in their tools.
        
               | hardwaregeek wrote:
               | What makes them more elegant than the average parser?
               | How's the error recovery? Can you parse into high
               | fidelity syntax trees efficiently?
               | 
               | I don't know of many production compilers that use parser
               | generators
        
           | wredue wrote:
           | There was nothing objective about the article. The moment I
           | saw seemingly random new lines in the rust for no reason, I
           | knew there was going to be a "line counts!!!!" Sentence. When
           | there was, I stopped reading because it was apparent that all
           | matter of objectivity is completely missing.
           | 
           | I don't even like rust.
        
             | reikonomusha wrote:
             | It would be nice if somebody could offer what they consider
             | to be an idiomatic Rust solution to this very routine
             | problem in compilation if they believe the author is being
             | intentionally deceiving. The Rust and OCaml code from the
             | article looked decent to me. m
        
         | hurril wrote:
         | I've been seriously at it with Rust for about six months now
         | and really loving it. What do you mean by "use an arena for the
         | AST?" What is an arena in this context?
        
           | bobbylarrybobby wrote:
           | If you're curious about arena allocators, look at the rust
           | compiler itself. It uses one for all of its allocations as
           | regular heap allocation was not performant enough.
        
           | fuklief wrote:
           | I believe it means flattening the AST, here is a nice blog
           | post about this technique
           | https://www.cs.cornell.edu/~asampson/blog/flattening.html
        
             | hurril wrote:
             | Nice! Thank you! I'd forgotten about that technique - think
             | I read about it in a ancient compiler book (by french
             | authors no less!)
             | 
             | EDIT: this was a really interesting read!
        
           | remexre wrote:
           | https://docs.rs/bumpalo/latest/bumpalo/
        
           | mughinn wrote:
           | i assume they're talking about an Arena Allocator
           | 
           | https://blog.logrocket.com/guide-using-arenas-rust/
        
             | hurril wrote:
             | I figured as much but was unsure. So the gain is that you
             | still take heap allocations, but you do it in a "novelty"
             | heap allocator that you will probably dipose of entirely at
             | some point?
        
               | vidarh wrote:
               | Allocation happens roughly like this:
               | 
               | If current top of heap + allocation size > buffer size,
               | allocate an extra buffer for the arena. Save the current
               | top in a temp variable. Add the allocation size to the
               | top Return the temp variable.
               | 
               | It's fast, and the amortized memory overhead per
               | allocation is near zero because you don't need to track
               | allocation sizes or free lists, since you only free the
               | whole arena at once (one or more buffers).
               | 
               | It's ideal for anything where the lifetime of allocations
               | is known to be roughly the same. E.g. a data structure
               | you'll free all at once.
        
       | hardwaregeek wrote:
       | I skimmed the article, and comparing the two programs, yes, the
       | OCaml one is shorter and more elegant. But it also reads like a
       | math magic spell. There's no type annotations for me to figure
       | out what the heck each term is. The naming conventions lean
       | extremely terse. Perhaps it's my lack of experience with OCaml,
       | but it doesn't feel as legible.
       | 
       | The Rust one reads like...well a program. A program that's not as
       | beautiful, but is very much designed to be taken apart, debugged,
       | improved, etc.
       | 
       | I fully agree that if you're writing pure, recursive, data
       | structure manipulations, OCaml is likely a better fit. It's
       | closer to mathematical notation and I see the elegance in that.
       | But if I were to take that data structure manipulation and turn
       | it into a compiler with thousands of lines that I navigate with
       | an IDE, with logging, with annoying little edge cases, with
       | dozens of collaborators, I'd choose Rust.
        
         | brmgb wrote:
         | I think it's your inexperience talking here as programs in both
         | languages here use mostly the same naming conventions and
         | abbreviations, the main difference being that Ocaml is its
         | usual terse and to the point while Rust is well far less terse
        
           | hardwaregeek wrote:
           | Yeah isn't that my point? Rust isn't trying to be short or
           | elegant. There's no zen of Rust. There are elegant aspects of
           | Rust, but it's not a central goal. Whereas with OCaml it's
           | trying to be elegant. It's encouraging people to write a
           | program where you read it, go "wait, how does that work?",
           | then re-read it and marvel at the beauty of it.
           | 
           | To be clear, elegance is important. A language absent of
           | elegance would be a bore to write ( _cough_ Java _cough_ ).
           | But too much elegance and it can eclipse the legibility of
           | the language. No type annotations is elegant. Is it legible?
           | Not in my opinion. But perhaps it is in yours.
        
             | brmgb wrote:
             | It is definitely very legible here and at least as legible
             | as the Rust equivalent. It's actually a lot more legible
             | than the Rust equivalent here to be fair.
             | 
             | The Ocaml program is mostly matching followed by a chain of
             | operations. It's far removed from elegance for elegance
             | sake. Meanwhile Rust is handicapped by the machinery it
             | forces you to deal with as a lower level language
             | introducing life time.
             | 
             | Type annotations are a non issue. They are systematically
             | provided in Ocaml in a different file than the code. This
             | header file is not provided here because well it's a blog
             | post.
        
               | hardwaregeek wrote:
               | I suppose that's ultimately in the eye of the beholder. I
               | suspect that if this compiler were to be 10x, 100x
               | bigger, you'd see a pretty big difference. Like rustc, as
               | massive as it is, is pretty legible as compilers go.
        
               | brmgb wrote:
               | I will take any of the dozen of sometimes very large
               | compilers written in Ocaml above rustc personally but to
               | each their own.
        
               | hardwaregeek wrote:
               | I'd love links to them! I tried reading the OCaml
               | compiler source but quickly got overwhelmed.
        
         | grumpyprole wrote:
         | > There's no type annotations for me to figure out what the
         | heck each term is
         | 
         | You can add additional annotations in OCaml if you want, or
         | just query the type of a term in Merlin.
         | 
         | > a compiler with thousands of lines that I navigate with an
         | IDE, with logging, with annoying little edge cases, with dozens
         | of collaborators, I'd choose Rust.
         | 
         | Why? OCaml supports logging and IDEs. Simple elegant code
         | without the burden of manual memory management, makes it better
         | able to cope with edge cases, being taken apart and refactored
         | etc. Less of the complexity budget has already been spent.
        
       ___________________________________________________________________
       (page generated 2023-08-07 23:00 UTC)