[HN Gopher] Compiler Class
       ___________________________________________________________________
        
       Compiler Class
        
       Author : ingve
       Score  : 113 points
       Date   : 2021-02-09 10:43 UTC (12 hours ago)
        
 (HTM) web link (norswap.com)
 (TXT) w3m dump (norswap.com)
        
       | theXspidy wrote:
       | Get daily informative news from noisy news straight to your inbox
       | with https://CutThroughTheNoise.co For free
        
       | artemonster wrote:
       | I really hope this wouldn't end up like 95% of other "compiler"
       | classes that usually just focus on parsing, rather than on
       | emitting code, cpu architectures & optimizing
        
         | ofrzeta wrote:
         | Arguably generating code for a real CPU requires so much
         | knowledge about that specific hardware that it's not really any
         | longer about compilers. Therefore most compiler courses target
         | a simpler RISC CPU because it's easier to teach the algorithms
         | behind it.
        
         | hvidgaard wrote:
         | Going from a grammar to an AST is the first and one of the
         | hardest step IMO. And there is plenty of open source tools for
         | exactly that and in general it is uninteresting and boring.
         | 
         | Transforming the AST to something that can be used to emit
         | instructions is the major thing to focus on IMO. Getting to an
         | AST that only use very simple primitives that enables general
         | optimizations and emitting code is hard and a different task
         | every time.
        
           | mhh__ wrote:
           | The semantic analysis is by far and away the hardest thing
           | day to day. Parsing is easy
        
             | azhenley wrote:
             | Parsing most certainly is *not* easy for someone new to
             | compilers. My students remind me of that all the time.
        
               | jcranmer wrote:
               | Lexing and parsing probably falls into the category of
               | "obvious once you learn it, mystifying until then."
               | 
               | (Although, that said, compiler courses could probably do
               | well to ditch a lot of the LL(1)/LR(1) parsing theory to
               | focus on other topics instead. You still need to teach
               | grammars, and focus a bit on how to actually parse
               | grammars to AST under recursive descent, including
               | expression precedence, but you could probably excise a
               | lot of the parser generator theory.)
        
               | mhh__ wrote:
               | I know embarrassingly little parser theory - I definitely
               | couldn't write a decent parser generator without looking
               | it up in a book for a few hours - but at least there is a
               | theory: Semantic analysis has lots of mathematical
               | symbology to describe it, doesn't help you write code for
               | the most part.
        
               | marcosdumay wrote:
               | As strange as it looks, parser theory is not very
               | relevant on practice for writing parsers. You just use
               | some parser generator or a parser combinators library if
               | your language allows them, and you won't have many
               | problems (except for maybe they being slow).
               | 
               | It is way more relevant for optimizing parsers
               | performance and for designing languages syntax.
        
               | mhh__ wrote:
               | That's true but it is one and done when you get it
               | working, as opposed to sema which is a constantly
               | changing monster in many languages.
        
             | norswap wrote:
             | I'd say parsing is easier when you know how to do it, but
             | getting to understand it the first time is harder.
        
           | chrisseaton wrote:
           | > Going from a grammar to an AST is the first and one of the
           | hardest step IMO
           | 
           | I can give some concrete data about this. I've been working
           | professionally on a single compiler full-time for seven
           | years.
           | 
           | I think out of all that time I think I've spent probably less
           | than a week dealing with lexer and parser issues all
           | together. I'd say in practice parsing is _vanishingly_
           | insignificant.
        
             | MaxBarraclough wrote:
             | Things would be different if you were in the business of,
             | say, keeping g++ up to date with the latest C++ spec,
             | right?
        
               | chrisseaton wrote:
               | I have to keep up with Ruby changes.
               | 
               | But even for example the parser in clang doesn't seem to
               | be touched very often.
               | 
               | https://github.com/llvm/llvm-
               | project/commits/main/clang/lib/...
               | 
               | Take a look at the commits there and see how much each
               | changes the parser vs the rest of the compiler.
               | 
               | Also, at the bottom of page 2 for parser changes you're
               | already all the way back in 2017!
        
               | jcranmer wrote:
               | It's somewhat disingenuous to link only
               | clang/lib/Parse/Parser.cpp--a lot of the parsing code
               | isn't in Parser.cpp, but other files in lib/Parser. Look
               | at the entire directory, and you get about 1 commit per
               | few days (somewhere between daily and weekly).
        
               | chrisseaton wrote:
               | I think a lot of the commits you're seeing there are
               | really more semantic analysis and tooling rather than
               | actual parsing - they just interact with the parser.
               | 
               | I can only offer my experience - and I think most of my
               | colleagues in the field seem to say the same thing.
        
             | zabzonk wrote:
             | > I've been working professionally on a single compiler
             | full-time for seven years.
             | 
             | Which one?
        
               | chrisseaton wrote:
               | TruffleRuby (says in my profile.)
        
             | jabl wrote:
             | I guess it's a difference who the course audience is?
             | 
             | I'd wager most people who take a compilers 101 class at uni
             | won't end up working on compilers for their careers. For
             | them, some basic knowledge of parsing will likely serve
             | them well, since designing and parsing file formats and
             | network protocols etc. is something quite a lot of people
             | need to do now and then (perhaps less so nowadays that you
             | can find a library or framework to do almost anything you
             | might want to, but still). These people will, again, be
             | well served by some basic intuition how optimizing
             | compilers work, but most likely they will never need to
             | implement such a thing themselves.
             | 
             | OTOH, if working on optimizing compilers is your career,
             | sure, the parser is likely something that occasionally
             | needs some minor updating, but the vast majority of time
             | you spend on the middle/back-end optimization passes. Very
             | different focus from the average Joe programmer taking a
             | compilers 101 class at uni.
        
               | chrisseaton wrote:
               | I think it could possibly be a good idea to have separate
               | parsing and compiler courses, and teach parsing as a more
               | general technique, and have it as a prerequisite for
               | compilers.
        
         | qalmakka wrote:
         | While there are (a lot) of parser generators out there,
         | learning how to parse and validate text using a LL recursive-
         | descent parser is still very useful, because you can easily
         | implement one yourself and have lots of possible applications.
         | Writing an LL parser by hand requires nothing more than a bit
         | of theoretical knowledge and a few man-weeks. Heck, if you
         | really wish, you can even implement your own C parser in a few
         | weekends.
         | 
         | Meanwhile, writing a (non-trivial, i.e. something that does
         | more than MOV, JMP, CALL, CMP, ..) backend and optimizer
         | requires a few decades and an investment of a few million
         | dollars.
         | 
         | That's why most languages nowadays either emit C (which I find
         | is a very effective way to learn code generation, by the way)
         | or interface with LLVM. Even for the sake of learning, writing
         | a backend that generates actual code is the equivalent of
         | milling your own flour in order to bake bread: fancy, but
         | pointless. As much as people stopped milling their own flour
         | centuries ago, people stopped writing compiler their own
         | backends a long time ago, because it makes close to 0 sense
         | unless you plan to cover very specific use cases.
         | 
         | I think if you are interested in code generation, you should
         | rather learn the math behind optimizing code and hack a bit
         | around something existing like LLVM or GCC, because that
         | represents better how stuff is done in practice. Writing a
         | simple JIT compiler is also IMHO time well, but it's arguably
         | different than writing a compiler backend.
        
           | chrisseaton wrote:
           | > Meanwhile, writing a (non-trivial, i.e. something that does
           | more than MOV, JMP, CALL, CMP, ..) backend and optimizer
           | requires a few decades and an investment of a few million
           | dollars.
           | 
           | I think that's massively over-estimating it, based on
           | practical experience of such compilers being built.
           | 
           | > most languages nowadays either emit C (which I find is a
           | very effective way to learn code generation, by the way) or
           | interface with LLVM
           | 
           | I think this is also not grounded in reality. I rarely see a
           | language emitting C. LLVM is popular but not as universal as
           | you're making out.
        
             | qalmakka wrote:
             | > I think that's massively over-estimating it, based on
             | practical experience of such compilers being built.
             | 
             | Who still writes backends from scratch nowadays (i.e. after
             | ~2010)? The only real examples from recent times I'm aware
             | of are Go and Cranelift: the first is backed by Google and
             | still (arguaby) generates worse code than GCC and LLVM, the
             | second is vastly experimental.
             | 
             | > LLVM is popular but not as universal as you're making out
             | 
             | If you exclude JITs (such as the CLR, JVM and JS engines,
             | all of which have massive corporate backing behind) almost
             | a 100% of _all_ languages that came out in the last decade
             | use LLVM for code generation, even those coming from big
             | players:
             | 
             | - Swift (LLVM)
             | 
             | - Rust (LLVM)
             | 
             | - Zig (LLVM)
             | 
             | - Julia (LLVM)
             | 
             | - Pony (LLVM)
             | 
             | - Kotlin Native (LLVM)
             | 
             | The only other recent languages that come to my mind are
             | Nim and Vala, and the both generate C. Outside of these, no
             | new language has ever rolled out its own backend, unless
             | there was a huge (Google-class) corporation behind.
             | 
             | The only compilers I know which have their own backends and
             | are still relevant (i.e. they are used in the real world)
             | are:
             | 
             | - FreePascal - GHC - OCaml - DMD (now largely supplanted by
             | GDC and LDC)
             | 
             | These all have been started in '90s or before, and thus had
             | their own backends already when LLVM became popular.
             | Everything else is JIT, or is either backed by very big
             | companies (such as Microsoft) or uses GCC.
             | 
             | Why would anyone waste time writing a compiler backend
             | nowadays? It took LLVM almost 20 years to reach performance
             | parity with GCC, and still it is sometimes problematic due
             | to not supporting as many architectures as GCC.
        
               | chrisseaton wrote:
               | > Who still writes backends from scratch nowadays (i.e.
               | after ~2010)?
               | 
               | Didn't V8 _just_ got a new custom backend like literally
               | last week? And wasn 't B3 a completely new backend? And
               | what about Graal? That had much of its development and
               | several of its backends after 2010. Not everyone is using
               | LLVM like you think they are.
               | 
               | > Why would anyone waste time writing a compiler backend
               | nowadays?
               | 
               | Because LLVM isn't great at everything! It's not designed
               | for dynamic compilation, and while it can be used for
               | that, you have to put a pretty powerful IR in front of
               | it.
        
           | mhh__ wrote:
           | One huge argument in favour of recursive descent is that the
           | code effectively _is_ the grammar
        
         | mhh__ wrote:
         | Unlikely. Although if I ever get round to it I'd like to start
         | an open source compiler textbook to try and centralise
         | knowledge in this area since the field is simply too advanced
         | to summarise in a single book these days.
        
           | jcranmer wrote:
           | There's definitely a lack of advanced compiler textbooks, and
           | the venerable Dragon book definitely shows its age nowadays.
           | 
           | I've gone so far as to write an outline for what I'd want to
           | cover, and came to the conclusion that it needs several books
           | to adequately cover the material. And there's a lot of topics
           | that I am personally too ignorant of to do any justice to.
        
             | mhh__ wrote:
             | The latter point is why I think it has to be open source
             | rather than a single man or woman's work
        
         | m12k wrote:
         | Sounds like you're looking for a "Compilers 201" class.
        
         | norswap wrote:
         | Lecturer here - we will definitely cover some other topics,
         | including semantic analysis (type checking, etc), code
         | generation, basic optimizations, ...
         | 
         | I still expect parsing will be the major part of the course,
         | maybe 50%?
         | 
         | There are some good reasons for that too. Parsing has a rich
         | (and interesting) theoretical background that you don't really
         | have with basic semantic analysis (unless you go to symbolic
         | execution) and optimizations. It's more of an engineering-only
         | angle (which we cover), while parsing has both aspects.
         | 
         | Background in parsing is also more likely to serve most the
         | average student in practice :) I also think it's less easy to
         | figure out on your own.
        
           | setr wrote:
           | > Lecturer here - we will definitely cover some other topics,
           | including semantic analysis (type checking, etc), code
           | generation, basic optimizations, ...
           | 
           | How is this being updated? I've noted that all current videos
           | were posted on Feb 1, but I guess there's more coming up?
           | 
           | Also the video times are a little weird -- I assumed this was
           | being pulled from your normal classes, but the video lengths
           | range from 10 min to 30 min kind of randomly.
           | 
           | I suppose my main question is: whats the background to this
           | course?
        
       | justicezyx wrote:
       | The hidden surge of demand on compiler engineers is well below
       | the radar of most people and online channels.
       | 
       | But I am betting this will soon happen. As ASICs especially IoT
       | development will require more and more investment in compiler
       | tool chains and other programming related tools.
        
       | kuter wrote:
       | GCC has a book recommendation page [0] I had been using some of
       | these books to self study.
       | 
       | [0] https://gcc.gnu.org/wiki/ListOfCompilerBooks
        
         | mhh__ wrote:
         | Be warned that some of the books are ancient now
        
         | chrisseaton wrote:
         | Some of these books are so incredibly out of date that you will
         | end up with almost no understanding of how actual compilers
         | work. Use with care.
        
       | mrkeen wrote:
       | Cool! Any plans for emitting assembly, c or llvm?
       | 
       | There seem to be an awful lot of resources on compiler frontends,
       | and not so much on the backend.
        
         | myk9001 wrote:
         | You might find [CS 6120: Advanced Compilers: The Self-Guided
         | Online Course][1] interesting. I'm slowly working through it,
         | but basically its focus is intermediate representations,
         | optimizations, etc. A link to the course was on the first page
         | of HN some time ago.
         | 
         | If you want to know more about LLVM IR specifically, check this
         | [talk from 2019 EuroLLVM Developers' Meeting][2]
         | 
         | Also -- and you knew it was coming -- I've written [a toy-
         | compiler of a Scala subset][3] myself :)
         | 
         | I'm new to F# and writing compilers, so I'm sure the code is
         | full of rookie mistakes. Still, it works and does generate
         | assembly and executables for Linux and Windows.
         | 
         | [1]: https://www.cs.cornell.edu/courses/cs6120/2020fa/self-
         | guided...
         | 
         | [2]: https://www.youtube.com/watch?v=m8G_S5LwlTo
         | 
         | [3]: https://github.com/mykolav/coollang-2020-fs
        
         | norswap wrote:
         | I plan to show how to emit Java bytecode - the advantage being
         | that you don't have to write your own GC, and you benefit from
         | the JVM JIT compiler.
         | 
         | I think in general the backend is much easier (especially if
         | you don't have to do your own optimizations) - you mostly need
         | to know a couple of patterns for working with tree-like
         | structures.
        
         | mhh__ wrote:
         | There are a lot of resources on compiler backends, just that
         | GCC and LLVM don't follow them.
         | 
         | The GCC and LLVM machine models for example are very
         | sophisticated and I've never seen anything like them documented
         | in a textbook because they all basically assume a early 2000s
         | RISC machine rather than a CISC with a very bulky decoder.
        
       | jstanley wrote:
       | Thanks, this is serendipitous timing. I definitely plan to watch
       | the YouTube playlist.
       | 
       | I'm designing a simple CPU at the moment[1] and one of the things
       | that keeps me awake at night is thinking about how to make a
       | compiler for it.
       | 
       | It only supports 64K words of RAM (and I'm not planning to
       | implement bank switching), and I definitely want the compiler to
       | be self-hosting on my CPU, so retargeting LLVM or similar won't
       | be applicable.
       | 
       | I've looked at Small-C[2], Cowgol[3], and PL/0[4] which all
       | target related goals, from different directions.
       | 
       | [1] https://github.com/jes/scamp-cpu
       | 
       | [2] https://en.wikipedia.org/wiki/Small-C
       | 
       | [3] http://cowlark.com/cowgol/
       | 
       | [4] https://en.wikipedia.org/wiki/PL/0
        
         | retrac wrote:
         | Consider Forth. The simplicity of the language's syntax means
         | that a compiler is basically just a table lookup, and they can
         | fit into just a few kilobytes of code.
         | 
         | Even if you don't port or implement an actual Forth, a Forth-
         | like threaded code model fits tiny memory constraints very
         | well, with very high code density and only moderate overhead.
         | If you add a couple registers to the VM for easing pointer
         | access, it becomes a suitable target for C-like languages, too.
        
         | lenkite wrote:
         | Have you taken a look at Ferret ? https://ferret-lang.org/ ?
         | 
         | Ferret is a free software lisp implementation designed to be
         | used in real time embedded control systems. Ferret lisp
         | compiles down to self contained C++11. Generated code is
         | portable between any Operating System and/or Microcontroller
         | that supports a C++11 compliant compiler. It has been verified
         | to run on architectures ranging from embedded systems with as
         | little as 2KB of RAM to general purpose computers running
         | Linux/Mac OS X/Windows.
        
           | yudlejoza wrote:
           | > ... real time ... embedded ... microcontroller ...
           | 
           | > ... that supports ... C++11 ...
           | 
           | Makes perfect sense.
        
           | jstanley wrote:
           | I hadn't seen it, and it looks cool, thanks.
           | 
           | I'm really keen to be able to run the compiler end-to-end on
           | the CPU itself, so I think getting C++11 working would be the
           | stumbling block there.
        
             | read_if_gay_ wrote:
             | Maybe this is interesting to you as well?
             | https://github.com/cksystemsteaching/selfie
        
         | dexen wrote:
         | Consider the KenCC[1][2] - somewhat minimalistic, and the
         | flavor of C it uses is very sane.
         | 
         | --
         | 
         | [1] http://gsoc.cat-v.org/projects/kencc/
         | 
         | [2] yes, by _that_ Ken
        
       ___________________________________________________________________
       (page generated 2021-02-09 23:01 UTC)