[HN Gopher] A complete compiler and VM in 150 lines of code
___________________________________________________________________
A complete compiler and VM in 150 lines of code
Author : p4bl0
Score : 62 points
Date : 2022-07-16 20:51 UTC (2 hours ago)
(HTM) web link (gist.github.com)
(TXT) w3m dump (gist.github.com)
| userbinator wrote:
| For those wondering what language this is for before clicking on
| it: it's not much more than what I think would traditionally be
| called an expression evaluator, and _very_ simple one at that ---
| it only recognises binary logic expressions, and the 150 lines
| doesn 't include the parser or lexer either. Not a bad attempt
| for what is presumably a first try, but I think "complete
| compiler" is overselling it.
| a-dub wrote:
| there's no type checking, which i'd say is a fairly integral
| part of a compiler...
|
| but it does use a parser generator to generate a parser that
| constructs an ast and it then iterates that ast to generate a
| bytecode.
|
| that's basically a compiler, even if the language is basic
| expressions.
| userbinator wrote:
| B has no type checking either.
|
| On the other hand, using a parser generator and not counting
| its output in the line count does seem like cheating.
| p4bl0 wrote:
| > using a parser generator and not counting its output in
| the line count does seem like cheating
|
| Yeah maybe I should count OCaml's standard library lines of
| code too. Maybe also count expanded macros and inlined
| functions at each call site? At which point should I stop?
| Are assembly language pseudo-instructions cheating too?
|
| Common, don't be ridiculous.
| p4bl0 wrote:
| Hey there, author here :). The language is intentionally _very_
| simple indeed, because the goal is to showcase all
| indispensable steps of a complete compiler. I initially wrote
| it for my students as an introduction to my compiler class (for
| the very first session).
|
| > would traditionally be called an expression evaluator
|
| It is not an evaluator, evaluators are included (the different
| eval functions).
|
| > the 150 lines doesn't include the parser or lexer either
|
| This is just wrong: $ cat *.ml{,l,y} *.c |
| sed '/^\s*$/d' | wc -l 149
|
| and these 149 lines (okay, 169 if you count blank lines which
| are removed using sed in the count above) include the two
| `eval` and the `run` functions which are not actually part of
| the compiler but here for clarity (they help understanding how
| the successive intermediate representations of the boolean
| expressions work -- without them the entire project is less
| than 130 lines).
|
| > Not a bad attempt for what is presumably a first try
|
| It's not ;). It is however targeted at beginner compsci
| students who have absolutely no idea how compilers are built
| yet :).
|
| > but I think "complete compiler" is overselling it.
|
| It does lexing & parsing to produce an AST of a boolean
| expression, then a traversal of this AST to transform it into a
| second intermediate representation (an AST of NAND-only
| expression), then _compiles_ the expression into a list of
| instructions (and this paradigm shift is a big step that is
| tough to teach to some students, believe me ^^), and then emits
| the corresponding bytecode (which here is 1:1 with the
| imperative instructions), and then the VM interprets this
| bytecode. I don 't think I'm overselling it :).
|
| The only way it is not complete -- and that's written in the
| README -- is that it has absolutely no semantics analysis
| (there is only one type, and no names, so there is nothing to
| analyze ^^), but that is not _mandatory_ to be called a
| compiler.
|
| Then again, this is only the first hour of a semester long
| compiler class (which covers semantic analysis, don't worry)
| ;).
| userbinator wrote:
| _The only way it is not complete -- and that 's written in
| the README -- is that it has absolutely no semantics
| analysis_
|
| I was referring to the complete lack of flow control. Could
| you compile the compiler with itself, for example?
|
| _Then again, this is only the first hour of a semester long
| compiler class_
|
| If you're taking a course, you might enjoy "leaping ahead" by
| studying C4 and C4x86. They are also VM-based, and _self-
| compiling_ , yet still tiny. I think they really raised the
| bar on what a tiny-and-complete compiler is.
| (https://news.ycombinator.com/item?id=8558822
| https://news.ycombinator.com/item?id=8746054)
| p4bl0 wrote:
| > Could you compile the compiler with itself, for example?
|
| That's only possible if you write the compiler in the
| language it compiles. By this account writing a Python
| compiler in something else than Python, for example in C,
| would make it incomplete? It doesn't make much sense.
|
| > If you're taking a course (...)
|
| You should read my post more carefully, I think.
| tiagoleifert wrote:
| > I was referring to the complete lack of flow control.
| Could you compile the compiler with itself, for example?
|
| This is not a requirement for it to be a compiler.
|
| > They are also VM-based, and self-compiling, yet still
| tiny. I think they really raised the bar on what a tiny-
| and-complete compiler is.
|
| The purpose of OP's compiler is educational so this
| comparison makes no sense.
| blacklion wrote:
| Sooo nice :-) It looks like small cute kitten to me, sorry.
| denotational wrote:
| As another comment also alludes, there's probably quite a lot
| more work required before this could be called a complete
| compiler (without heavy qualification as to what it actually
| compiles, since the language of boolean expressions probably
| isn't something that most people would consider to be a
| programming language per se); that said, toy languages like this
| are a great way to explore some fundamental principles of
| programming language theory.
|
| Boolean expressions have very trivial denotational semantics
| (since there's no recursion or state), so you could try
| formalising these in Coq (FRAP [1] and Software Foundations Vol.
| 2 [2] are both good places to learn more about this).
|
| Next, you could write formal operational semantics for your VM,
| and then finally prove that your compiler is correct in a formal
| sense (i.e. the image of any well-formed input under the compiler
| operation has a terminating execution in the operational
| semantics that yields the same value as would be obtained by the
| execution of the input expression under the denotational
| semantics).
|
| The last part would require implementing the compiler in Coq, but
| since you've written it in OCaml it probably wouldn't be very
| hard to port.
|
| [1] : https://frap.csail.mit.edu/main
|
| [2] : https://softwarefoundations.cis.upenn.edu/plf-
| current/toc.ht...
|
| Edit: Having looked at your profile I now realise that you
| probably know all of this already, but it might be informative
| for anyone else reading :)
| prezjordan wrote:
| Where can I learn more about mll and mly files? I've hand-rolled
| parsers and it's always my least favorite part of toy languages.
| [deleted]
| a-dub wrote:
| https://v2.ocaml.org/manual/lexyacc.html
| tekknolagi wrote:
| Search for ocamllex and ocamlyacc. Also Menhir.
| mftb wrote:
| So weird, within the _last hour_ I was wondering why I never
| see regular, old, lex and yacc around anymore, and then I
| came across this, which made realize it 's probably because
| they generate C. Funny. I guess these output OCaml?
| p4bl0 wrote:
| Yes, they do. Lex and yacc equivalent tools are available
| for many languages.
|
| Examples :
|
| - Python: https://ply.readthedocs.io/ (I already used it in
| a very small project, again for my students, here is an
| example of how it works: https://gist.github.com/p4bl0-/f5e
| d1e60fdc5e76a2d321bc8708a7...)
|
| - Racket: https://docs.racket-lang.org/parser-tools/ (it's
| really great, I previously used Racket for my compiler
| course but I've switched to OCaml a few years back, because
| Racket parser tools don't go well with Typed Racket).
| skybrian wrote:
| I think people are getting hung up on the word "complete." This
| is a complete compiler in the sense that "hello world" is a
| complete (but minimal) program.
| p4bl0 wrote:
| Yes exactly :), I meant that it does everything from lexing the
| source code to emitting bytecode instructions (and provides the
| VM to run this bytecode). It is not a part of a compiler, it is
| a tiny, simple, stupid, but _complete_ , compiler.
___________________________________________________________________
(page generated 2022-07-16 23:00 UTC)