[HN Gopher] Compiler Development: Rust or OCaml?
___________________________________________________________________
Compiler Development: Rust or OCaml?
Author : bshanks
Score : 75 points
Date : 2023-08-06 21:05 UTC (1 days ago)
(HTM) web link (hirrolot.github.io)
(TXT) w3m dump (hirrolot.github.io)
| devit wrote:
| The author doesn't seem to have much experience in writing Rust.
|
| For instance, passing RefCell<u32> by value as their code does
| makes no sense (just use u32...), and the code seems to have a
| lot of clones, most of which are probably unnecessary, while not
| having a single instance of the "mut" keyword.
|
| In fact, I'm pretty sure it's completely broken, since their
| gensym doesn't do what they want, due to their wrong use of
| clones and refcells (it should take an &mut u32 and just
| increment it).
|
| And definitely not idiomatic at all.
| beaub wrote:
| Compilers are in this weird spot where they are really
| mathematically defined programs (which OCaml excels at
| implementing), while also having high runtime efficiency as a
| requirement (the reason why C/C++ are such prominent languages
| for compilers).
|
| With such requirements, I think a point that is fair to make is
| that Rust acts as a great middle-ground. It avoids the cost of
| automatic memory management and provides low-level control while
| also having a more powerful type system and a more "functional"
| style.
|
| Brushing off the actual efficiency of the produced binary seems
| like a huge oversight when dealing with a compiler.
| memefrog wrote:
| I am not sure that the runtime efficiency of the compiler
| binary is that important. People like fast compile times, but
| that is more to do with language design than the choice of
| language for the compiler.
|
| You could write a compiler for Pascal in Python or another very
| slow language and it would be faster than a Rust or C++
| compiler written in Rust or C++. That is because those
| languages have designs that make compilation algorithmically
| slow, while Pascal was designed to be fast to compile.
| tester756 wrote:
| >while also having high runtime efficiency as a requirement
| (the reason why C/C++ are such prominent languages for
| compilers
|
| I'd want to believe that compiler engineers really put effort
| into compilers performance, but I just don't buy it.
|
| LLVM, GCC, MSVC, etc, etc all of them touch C/C++ and are slow
| as hell
|
| For compilers written in other languages I'd say that still
| LLVM is the bottleneck
|
| >It avoids the cost of automatic memory management and provides
| low-level control.
|
| What "low-level control" do you need? It is not firmware
| development.
|
| Btw: Microsoft rewrote their C# compiler from C++ to C#.
| CollinEMac wrote:
| Why does it seem like I'm hearing about OCaml all the time now?
| It could just be frequency bias but it wasn't that long ago that
| I'd never heard of it and now it seems to be getting a lot of
| attention online.
| UncleOxidant wrote:
| > Why does it seem like I'm hearing about OCaml all the time
| now?
|
| I felt that way about a dozen years ago. These things have
| cycles, apparently. But they also recently released multi-core
| OCaml in OCaml 5 which opens some doors for OCaml that were
| previously not open.
| zem wrote:
| the ocaml ecosystem has been going through something of a
| renaissance over the last few years (i believe because the
| build and package management tooling hit some sort of
| inflection point with dune and opam respectively), so there's
| been a lot of increased interest in it. it was (imo, of course)
| always a very pleasant language to use, and produced small and
| fast executables; the tooling was what was really holding it
| back.
| DonaldPShimoda wrote:
| I think we're cycling back towards a general preference for
| statically typed languages, for one thing. Additionally, a
| number of traditionally functional language characteristics
| have been finding more widespread adoption among popular
| languages. Putting these together, OCaml is on a short list of
| languages that are functional and statically typed and, uh,
| perhaps "intuitive" is the word I want -- Haskell is not very
| intuitive for many people due to its lazy evaluation scheme.
|
| In my opinion, OCaml would see even _more_ widespread use if
| the documentation were improved. I find it a chore to figure
| out how to use OCaml well. I also would like to use third-party
| libraries like Jane Street 's Base because they've put a lot of
| work into providing even more functionality in their standard
| library, but their documentation is absolutely atrocious (where
| it exists at all).
|
| OCaml is a mature language but does not have a very supportive
| ecosystem. I'm hoping the renewed interest will prompt changes
| there.
| auggierose wrote:
| I suggest TypeScript.
| josephg wrote:
| I normally love articles comparing programming languages at real
| tasks, but this article seems very low quality to me. The author
| clearly doesn't understand how rust thinks about programs.
| Instead, they're trying to pretend that rust is an alternate
| syntax for ocaml and being surprised to find it comes up short.
|
| The same article could easily be written the other way around. We
| could start with a high performance rust program (which makes use
| of arena allocators, internal mutation and any other rust
| features you love) and then try and convert it line by line into
| ocaml. We would find that many of rust's concepts can't be
| clearly expressed in ocaml. The ocaml code would end up uglier
| and measurably slower than rust. And just like that the article
| would reach the opposite conclusion - that rust is clearly the
| better language!
|
| But this is silly.
|
| In general, you obviously can't translate between languages line
| by line like this and expect to have a good time. A beautiful C
| program is constructed using different ideas than a beautiful Lua
| program. And a beautiful Ocaml program is very different from a
| beautiful rust program.
|
| Some obvious examples of ocaml ideas being overapplied to rust in
| this article:
|
| 1. The types don't really need to be wrapped in Rc here.
|
| 2. Rust generally prefers mutable imperative code over
| applicative code. And if you insist on applicative patterns,
| functions should take a &Foo.
|
| 3. Rust code usually doesn't rely on recursion that much, so the
| lack of guaranteed TCO isn't something people in the community
| care about.
|
| 4. Rust is optimized for runtime performance over code beauty or
| code size. Of course rust is less elegant looking than a garbage
| collected language! The trade is that it should also run faster.
| But where are the benchmarks to make the comparison fair?
|
| The match example is just straight out bad rust code. This code:
| fn eval(term: &Term) -> Value { match term {
| Bool(b) => Value::Bool(*b), Not(m) => match
| eval(m) { Value::Bool(b) => Value::Bool(!b),
| _ => panic!("`Not` on a non-boolean value"), },
| // ... lots more nested matches & panics } }
|
| Can be flattened, to approximately halve the length of the
| program like this: fn eval(term: &Term) ->
| Value { match term { Bool(b) =>
| Value::Bool(*b), Not(Value::Bool(b)) =>
| Value::Bool(!b), // ... (all other valid
| patterns) _ => panic!("{term} invalid"),
| } }
|
| There's an old saying: "Every programming language you learn
| should teach you to see programs in a new way". Rust is not a
| crappy alternate syntax for ocaml any more than ocaml is a
| crappy, alternate syntax for rust. The only thing I learned from
| the article is that the author doesn't know rust well enough to
| evaluate it.
| yafbum wrote:
| This starts out as a fair comparison but evolves pretty quickly
| towards a one-sided recommendation for Ocaml. I'm quite sure that
| there are _some_ advantages of Rust that are not listed here and
| would be curious to learn more about them too.
| eddd-ddde wrote:
| Even the written tone feels biased > Here is CPS conversion
| written in Rust vs > The same algorithm in idiomatic OCaml
|
| Sounds to me like they are comparing bad code and good code,
| not the languages themselves.
| nicechianti wrote:
| [dead]
| zerr wrote:
| Real World OCaml book needs a second edition.
| hardwaregeek wrote:
| There is one! https://dev.realworldocaml.org
| telios wrote:
| I don't quite follow the algorithm here, but I'm not sure the
| `gensym` Rust implementation works as expected. `RefCell::clone`
| does not return a copy of the reference; it returns a new
| `RefCell` with the current `RefCell`'s value, resulting in
| duplicate IDs. However, a `RefCell` isn't even necessary here,
| since a `Cell` would do just fine - and you'd pass around a
| reference to that `Cell` instead of cloning it.
|
| It does feel like the code was ported as-is to Rust, and only
| adjusted slightly to compile; there are going to be pain points
| as a result of this process. I suspect this is the source of some
| of the author's complaints, especially given:
|
| > Although it provides us with a greater sense of how the code is
| executing, it brings very little value to the algorithm itself.
|
| Rust is, in general, for people who find value in having that
| information; it is okay to not want to have to worry about
| ownership, borrowing, safety, etc., but it seems a bit odd to
| complain about this when that's what Rust is for? If you want to
| focus on just the algorithm, and not how it's executing, then
| OCaml is definitely a valid choice.
|
| However, the point about GADTs - can Rust's recently-stabilized
| GATs not work in the same way? Though I will admit that Rust's
| GATs don't seem nearly as powerful as OCaml's GADTs in this
| regard.
| zem wrote:
| > it seems a bit odd to complain about this when that's what
| Rust is for
|
| that's the point of the article - rust gives you a lot of low-
| level control, but if you don't actually need that control then
| you're paying the cost in ergonomics for nothing.
| pjmlp wrote:
| If one cares about productivity in typical compiler data
| structures, naturally OCaml.
| norir wrote:
| Here are the features that I think are most important for
| compiler development:
|
| 1) built in eval -- this allows you to transpile to the host
| language which is invaluable for writing small tests
|
| 2) multiline string syntax -- for evaling more than just one
| liners
|
| 3) built in associative and sequential arrays (for the ast)
|
| 4) first class closures
|
| 5) panic support (for aborting early from unimplemented use
| cases)
|
| The AST can be represented as an associative array. Each element
| type can have a 'type' field and rather than pattern matching,
| you can use if/else. Performance doesn't really matter for the
| bootstrap compiler because it will only ever be run on relatively
| small input sets. To get started, you simply walk the ast to
| transpile to the host language. The snippet is then evaled in the
| host language to test functionality. Closures allow you to
| implement the visitor pattern for each ast node, which allows
| contextual information to be seamlessly interwoven amongst ast
| nodes during the analysis/transpilation steps.
|
| Keeping all of this in mind, I have identified luajit as my
| personal favorite language for compiler development. It checks
| the boxes above, has excellent all around performance for a
| dynamic language (particularly when startup time is included --
| js implementations may beat it on many benchmarks but almost
| always have slow start up time relative to luajit) and provides a
| best in class ffi for host system calls. You can run 5000+ line
| lua scripts faster than most compilers can compile hello, world.
|
| The other reason I like lua(jit) is the minimalism. Once you
| master lua (which is possible because of its small size) it
| becomes very obvious that if you can implement something in lua,
| you can translate the lua implementation to essentially any other
| language. In this way, there is a sense in which writing a lua
| implementation becomes almost like a rosetta stone in which a
| translation can be produced for nearly any other language. With
| more powerful languages, it is hard to resist the temptation to
| utilized features that can't always be easily transported to
| another language. In other words, lua makes it easy to write
| portable code. This is true both in the sense that lua can be
| installed on practically any computer in at most a few minutes
| and in the sense that the underlying structure of a lua program
| that transcends the syntax of the language can be ported to
| another computing environment/language.
|
| Another benefit of transpiling to lua is that your new language
| can easily inherit lua's best properties such as embeddability,
| fast start up time and cross platform support while removing
| undesirable features like global variables. Your language can
| then also be used to replace lua in programs like nginx, redis
| and neovim that have lua scripting engines. This of course
| extends to transpiling to any language, which again should be
| relatively easy if you have already transpiled to lua.
| dunham wrote:
| I would add pattern matching. I've found this really helpful
| for manipulating ASTs by matching multiple levels of the tree
| and pulling out values simultaneously.
| CapsAdmin wrote:
| I chose luajit for my language and while I agree with many of
| your points I really miss a typesystem. Somewhat ironically I'm
| working on a typesystem for luajit..
|
| I also wish it was a bit more performant, but here it's likely
| my medium to high level code and not luajit's fault. However
| running the test suite in plain Lua seem some order of
| magnitude slower than luajit, so it's a lot faster than plain
| Lua at least.
| CapsAdmin wrote:
| Is your project public? I'm curious to see how you would go
| about writing a transpiler in luajit.
| convolvatron wrote:
| isn't rust kind of a nonstarter for cfg representations which are
| irreducibly cyclic? yes, one can use the pointers are array-
| indices thing, but ..
| tomjakubowski wrote:
| No, it's not. For a toy compiler (or for compiling programs
| with small CFGs), you can use Rc/Weak to represent cycles. For
| a "real" compiler you'd be using an arena for allocations
| anyway, which amounts to pointers-are-array-indices.
| nu11ptr wrote:
| The article has fair points, but after trying OCaml and Rust... I
| chose Rust. Without going into huge amounts of detail, a compiler
| is more than simply a parser/ast/code generator and there are
| other aspects to consider such as the richness of the ecosystem,
| editor support, etc. Also, I suspect the author is more familiar
| with OCaml than Rust as you wouldn't typically box everything but
| likely use an arena for the AST. In the same way, I am more
| familiar with Rust than OCaml, so some of the warts I observed
| may be to lack of familiarity. As such I suspect the authors
| perspective is biased...as is mine. Nothing wrong with that.
| ecshafer wrote:
| Can you write an equivalent piece of code that shows why Rust
| wins here with more familiarity and leveraging the ecosystem?
| With compilers I don't think there is a huge amount of using
| the ecosystem. I think what TFA does is a good case study in
| trying to be objective: write it both ways and compare.
| nu11ptr wrote:
| I'm just saying a compiler is a program, and while the more
| important things are heavy algorithm related, supporting
| libraries like those for error handling, etc. all still
| matter and add up. No problem if you disagree - just my
| perspective. This isn't so much an objective thing as it is a
| personal opinion.
| brmgb wrote:
| Personally, it's not so much that I disagree. It's more
| that Ocaml has a top notch ecosystem for compiler writing.
| That's probably by far its strongest point with library
| like Menhir having few equivalent in different ecosystem.
|
| Not that there is anything wrong with Rust if you are ready
| to pay the price of having so much low level things to deal
| with.
| hardwaregeek wrote:
| Can you explain what Menhir does better than other parser
| frameworks? For what it's worth, I'm not a huge fan of
| parser frameworks. I tend to prefer hand written parsers,
| either via a combinator library or fully manually.
|
| Rust has a pretty darn good ecosystem too btw. chumsky
| for parsing, rowan for syntax trees, salsa for
| incremental computation, miette for errors,
| inkwell/walrus for code generation.
| arcticbull wrote:
| Wow those library recommendations are fantastic,
| especially Chumsky. Thanks for sharing.
| brmgb wrote:
| It can generate elegant and efficient parsers for LR(1)
| grammars.
|
| > I tend to prefer hand written parsers, either via a
| combinator library or fully manually.
|
| That's common with people used to languages which provide
| poor parser generators.
| moomin wrote:
| I mean, it's also common with people who value great
| feedback in their tools.
| hardwaregeek wrote:
| What makes them more elegant than the average parser?
| How's the error recovery? Can you parse into high
| fidelity syntax trees efficiently?
|
| I don't know of many production compilers that use parser
| generators
| wredue wrote:
| There was nothing objective about the article. The moment I
| saw seemingly random new lines in the rust for no reason, I
| knew there was going to be a "line counts!!!!" Sentence. When
| there was, I stopped reading because it was apparent that all
| matter of objectivity is completely missing.
|
| I don't even like rust.
| reikonomusha wrote:
| It would be nice if somebody could offer what they consider
| to be an idiomatic Rust solution to this very routine
| problem in compilation if they believe the author is being
| intentionally deceiving. The Rust and OCaml code from the
| article looked decent to me. m
| hurril wrote:
| I've been seriously at it with Rust for about six months now
| and really loving it. What do you mean by "use an arena for the
| AST?" What is an arena in this context?
| bobbylarrybobby wrote:
| If you're curious about arena allocators, look at the rust
| compiler itself. It uses one for all of its allocations as
| regular heap allocation was not performant enough.
| fuklief wrote:
| I believe it means flattening the AST, here is a nice blog
| post about this technique
| https://www.cs.cornell.edu/~asampson/blog/flattening.html
| hurril wrote:
| Nice! Thank you! I'd forgotten about that technique - think
| I read about it in a ancient compiler book (by french
| authors no less!)
|
| EDIT: this was a really interesting read!
| remexre wrote:
| https://docs.rs/bumpalo/latest/bumpalo/
| mughinn wrote:
| i assume they're talking about an Arena Allocator
|
| https://blog.logrocket.com/guide-using-arenas-rust/
| hurril wrote:
| I figured as much but was unsure. So the gain is that you
| still take heap allocations, but you do it in a "novelty"
| heap allocator that you will probably dipose of entirely at
| some point?
| vidarh wrote:
| Allocation happens roughly like this:
|
| If current top of heap + allocation size > buffer size,
| allocate an extra buffer for the arena. Save the current
| top in a temp variable. Add the allocation size to the
| top Return the temp variable.
|
| It's fast, and the amortized memory overhead per
| allocation is near zero because you don't need to track
| allocation sizes or free lists, since you only free the
| whole arena at once (one or more buffers).
|
| It's ideal for anything where the lifetime of allocations
| is known to be roughly the same. E.g. a data structure
| you'll free all at once.
| hardwaregeek wrote:
| I skimmed the article, and comparing the two programs, yes, the
| OCaml one is shorter and more elegant. But it also reads like a
| math magic spell. There's no type annotations for me to figure
| out what the heck each term is. The naming conventions lean
| extremely terse. Perhaps it's my lack of experience with OCaml,
| but it doesn't feel as legible.
|
| The Rust one reads like...well a program. A program that's not as
| beautiful, but is very much designed to be taken apart, debugged,
| improved, etc.
|
| I fully agree that if you're writing pure, recursive, data
| structure manipulations, OCaml is likely a better fit. It's
| closer to mathematical notation and I see the elegance in that.
| But if I were to take that data structure manipulation and turn
| it into a compiler with thousands of lines that I navigate with
| an IDE, with logging, with annoying little edge cases, with
| dozens of collaborators, I'd choose Rust.
| brmgb wrote:
| I think it's your inexperience talking here as programs in both
| languages here use mostly the same naming conventions and
| abbreviations, the main difference being that Ocaml is its
| usual terse and to the point while Rust is well far less terse
| hardwaregeek wrote:
| Yeah isn't that my point? Rust isn't trying to be short or
| elegant. There's no zen of Rust. There are elegant aspects of
| Rust, but it's not a central goal. Whereas with OCaml it's
| trying to be elegant. It's encouraging people to write a
| program where you read it, go "wait, how does that work?",
| then re-read it and marvel at the beauty of it.
|
| To be clear, elegance is important. A language absent of
| elegance would be a bore to write ( _cough_ Java _cough_ ).
| But too much elegance and it can eclipse the legibility of
| the language. No type annotations is elegant. Is it legible?
| Not in my opinion. But perhaps it is in yours.
| brmgb wrote:
| It is definitely very legible here and at least as legible
| as the Rust equivalent. It's actually a lot more legible
| than the Rust equivalent here to be fair.
|
| The Ocaml program is mostly matching followed by a chain of
| operations. It's far removed from elegance for elegance
| sake. Meanwhile Rust is handicapped by the machinery it
| forces you to deal with as a lower level language
| introducing life time.
|
| Type annotations are a non issue. They are systematically
| provided in Ocaml in a different file than the code. This
| header file is not provided here because well it's a blog
| post.
| hardwaregeek wrote:
| I suppose that's ultimately in the eye of the beholder. I
| suspect that if this compiler were to be 10x, 100x
| bigger, you'd see a pretty big difference. Like rustc, as
| massive as it is, is pretty legible as compilers go.
| brmgb wrote:
| I will take any of the dozen of sometimes very large
| compilers written in Ocaml above rustc personally but to
| each their own.
| hardwaregeek wrote:
| I'd love links to them! I tried reading the OCaml
| compiler source but quickly got overwhelmed.
| grumpyprole wrote:
| > There's no type annotations for me to figure out what the
| heck each term is
|
| You can add additional annotations in OCaml if you want, or
| just query the type of a term in Merlin.
|
| > a compiler with thousands of lines that I navigate with an
| IDE, with logging, with annoying little edge cases, with dozens
| of collaborators, I'd choose Rust.
|
| Why? OCaml supports logging and IDEs. Simple elegant code
| without the burden of manual memory management, makes it better
| able to cope with edge cases, being taken apart and refactored
| etc. Less of the complexity budget has already been spent.
___________________________________________________________________
(page generated 2023-08-07 23:00 UTC)