[HN Gopher] Compiler Class
___________________________________________________________________
Compiler Class
Author : ingve
Score : 113 points
Date : 2021-02-09 10:43 UTC (12 hours ago)
(HTM) web link (norswap.com)
(TXT) w3m dump (norswap.com)
| theXspidy wrote:
| Get daily informative news from noisy news straight to your inbox
| with https://CutThroughTheNoise.co For free
| artemonster wrote:
| I really hope this wouldn't end up like 95% of other "compiler"
| classes that usually just focus on parsing, rather than on
| emitting code, cpu architectures & optimizing
| ofrzeta wrote:
| Arguably generating code for a real CPU requires so much
| knowledge about that specific hardware that it's not really any
| longer about compilers. Therefore most compiler courses target
| a simpler RISC CPU because it's easier to teach the algorithms
| behind it.
| hvidgaard wrote:
| Going from a grammar to an AST is the first and one of the
| hardest step IMO. And there is plenty of open source tools for
| exactly that and in general it is uninteresting and boring.
|
| Transforming the AST to something that can be used to emit
| instructions is the major thing to focus on IMO. Getting to an
| AST that only use very simple primitives that enables general
| optimizations and emitting code is hard and a different task
| every time.
| mhh__ wrote:
| The semantic analysis is by far and away the hardest thing
| day to day. Parsing is easy
| azhenley wrote:
| Parsing most certainly is *not* easy for someone new to
| compilers. My students remind me of that all the time.
| jcranmer wrote:
| Lexing and parsing probably falls into the category of
| "obvious once you learn it, mystifying until then."
|
| (Although, that said, compiler courses could probably do
| well to ditch a lot of the LL(1)/LR(1) parsing theory to
| focus on other topics instead. You still need to teach
| grammars, and focus a bit on how to actually parse
| grammars to AST under recursive descent, including
| expression precedence, but you could probably excise a
| lot of the parser generator theory.)
| mhh__ wrote:
| I know embarrassingly little parser theory - I definitely
| couldn't write a decent parser generator without looking
| it up in a book for a few hours - but at least there is a
| theory: Semantic analysis has lots of mathematical
| symbology to describe it, doesn't help you write code for
| the most part.
| marcosdumay wrote:
| As strange as it looks, parser theory is not very
| relevant on practice for writing parsers. You just use
| some parser generator or a parser combinators library if
| your language allows them, and you won't have many
| problems (except for maybe they being slow).
|
| It is way more relevant for optimizing parsers
| performance and for designing languages syntax.
| mhh__ wrote:
| That's true but it is one and done when you get it
| working, as opposed to sema which is a constantly
| changing monster in many languages.
| norswap wrote:
| I'd say parsing is easier when you know how to do it, but
| getting to understand it the first time is harder.
| chrisseaton wrote:
| > Going from a grammar to an AST is the first and one of the
| hardest step IMO
|
| I can give some concrete data about this. I've been working
| professionally on a single compiler full-time for seven
| years.
|
| I think out of all that time I think I've spent probably less
| than a week dealing with lexer and parser issues all
| together. I'd say in practice parsing is _vanishingly_
| insignificant.
| MaxBarraclough wrote:
| Things would be different if you were in the business of,
| say, keeping g++ up to date with the latest C++ spec,
| right?
| chrisseaton wrote:
| I have to keep up with Ruby changes.
|
| But even for example the parser in clang doesn't seem to
| be touched very often.
|
| https://github.com/llvm/llvm-
| project/commits/main/clang/lib/...
|
| Take a look at the commits there and see how much each
| changes the parser vs the rest of the compiler.
|
| Also, at the bottom of page 2 for parser changes you're
| already all the way back in 2017!
| jcranmer wrote:
| It's somewhat disingenuous to link only
| clang/lib/Parse/Parser.cpp--a lot of the parsing code
| isn't in Parser.cpp, but other files in lib/Parser. Look
| at the entire directory, and you get about 1 commit per
| few days (somewhere between daily and weekly).
| chrisseaton wrote:
| I think a lot of the commits you're seeing there are
| really more semantic analysis and tooling rather than
| actual parsing - they just interact with the parser.
|
| I can only offer my experience - and I think most of my
| colleagues in the field seem to say the same thing.
| zabzonk wrote:
| > I've been working professionally on a single compiler
| full-time for seven years.
|
| Which one?
| chrisseaton wrote:
| TruffleRuby (says in my profile.)
| jabl wrote:
| I guess it's a difference who the course audience is?
|
| I'd wager most people who take a compilers 101 class at uni
| won't end up working on compilers for their careers. For
| them, some basic knowledge of parsing will likely serve
| them well, since designing and parsing file formats and
| network protocols etc. is something quite a lot of people
| need to do now and then (perhaps less so nowadays that you
| can find a library or framework to do almost anything you
| might want to, but still). These people will, again, be
| well served by some basic intuition how optimizing
| compilers work, but most likely they will never need to
| implement such a thing themselves.
|
| OTOH, if working on optimizing compilers is your career,
| sure, the parser is likely something that occasionally
| needs some minor updating, but the vast majority of time
| you spend on the middle/back-end optimization passes. Very
| different focus from the average Joe programmer taking a
| compilers 101 class at uni.
| chrisseaton wrote:
| I think it could possibly be a good idea to have separate
| parsing and compiler courses, and teach parsing as a more
| general technique, and have it as a prerequisite for
| compilers.
| qalmakka wrote:
| While there are (a lot) of parser generators out there,
| learning how to parse and validate text using a LL recursive-
| descent parser is still very useful, because you can easily
| implement one yourself and have lots of possible applications.
| Writing an LL parser by hand requires nothing more than a bit
| of theoretical knowledge and a few man-weeks. Heck, if you
| really wish, you can even implement your own C parser in a few
| weekends.
|
| Meanwhile, writing a (non-trivial, i.e. something that does
| more than MOV, JMP, CALL, CMP, ..) backend and optimizer
| requires a few decades and an investment of a few million
| dollars.
|
| That's why most languages nowadays either emit C (which I find
| is a very effective way to learn code generation, by the way)
| or interface with LLVM. Even for the sake of learning, writing
| a backend that generates actual code is the equivalent of
| milling your own flour in order to bake bread: fancy, but
| pointless. As much as people stopped milling their own flour
| centuries ago, people stopped writing compiler their own
| backends a long time ago, because it makes close to 0 sense
| unless you plan to cover very specific use cases.
|
| I think if you are interested in code generation, you should
| rather learn the math behind optimizing code and hack a bit
| around something existing like LLVM or GCC, because that
| represents better how stuff is done in practice. Writing a
| simple JIT compiler is also IMHO time well, but it's arguably
| different than writing a compiler backend.
| chrisseaton wrote:
| > Meanwhile, writing a (non-trivial, i.e. something that does
| more than MOV, JMP, CALL, CMP, ..) backend and optimizer
| requires a few decades and an investment of a few million
| dollars.
|
| I think that's massively over-estimating it, based on
| practical experience of such compilers being built.
|
| > most languages nowadays either emit C (which I find is a
| very effective way to learn code generation, by the way) or
| interface with LLVM
|
| I think this is also not grounded in reality. I rarely see a
| language emitting C. LLVM is popular but not as universal as
| you're making out.
| qalmakka wrote:
| > I think that's massively over-estimating it, based on
| practical experience of such compilers being built.
|
| Who still writes backends from scratch nowadays (i.e. after
| ~2010)? The only real examples from recent times I'm aware
| of are Go and Cranelift: the first is backed by Google and
| still (arguaby) generates worse code than GCC and LLVM, the
| second is vastly experimental.
|
| > LLVM is popular but not as universal as you're making out
|
| If you exclude JITs (such as the CLR, JVM and JS engines,
| all of which have massive corporate backing behind) almost
| a 100% of _all_ languages that came out in the last decade
| use LLVM for code generation, even those coming from big
| players:
|
| - Swift (LLVM)
|
| - Rust (LLVM)
|
| - Zig (LLVM)
|
| - Julia (LLVM)
|
| - Pony (LLVM)
|
| - Kotlin Native (LLVM)
|
| The only other recent languages that come to my mind are
| Nim and Vala, and the both generate C. Outside of these, no
| new language has ever rolled out its own backend, unless
| there was a huge (Google-class) corporation behind.
|
| The only compilers I know which have their own backends and
| are still relevant (i.e. they are used in the real world)
| are:
|
| - FreePascal - GHC - OCaml - DMD (now largely supplanted by
| GDC and LDC)
|
| These all have been started in '90s or before, and thus had
| their own backends already when LLVM became popular.
| Everything else is JIT, or is either backed by very big
| companies (such as Microsoft) or uses GCC.
|
| Why would anyone waste time writing a compiler backend
| nowadays? It took LLVM almost 20 years to reach performance
| parity with GCC, and still it is sometimes problematic due
| to not supporting as many architectures as GCC.
| chrisseaton wrote:
| > Who still writes backends from scratch nowadays (i.e.
| after ~2010)?
|
| Didn't V8 _just_ got a new custom backend like literally
| last week? And wasn 't B3 a completely new backend? And
| what about Graal? That had much of its development and
| several of its backends after 2010. Not everyone is using
| LLVM like you think they are.
|
| > Why would anyone waste time writing a compiler backend
| nowadays?
|
| Because LLVM isn't great at everything! It's not designed
| for dynamic compilation, and while it can be used for
| that, you have to put a pretty powerful IR in front of
| it.
| mhh__ wrote:
| One huge argument in favour of recursive descent is that the
| code effectively _is_ the grammar
| mhh__ wrote:
| Unlikely. Although if I ever get round to it I'd like to start
| an open source compiler textbook to try and centralise
| knowledge in this area since the field is simply too advanced
| to summarise in a single book these days.
| jcranmer wrote:
| There's definitely a lack of advanced compiler textbooks, and
| the venerable Dragon book definitely shows its age nowadays.
|
| I've gone so far as to write an outline for what I'd want to
| cover, and came to the conclusion that it needs several books
| to adequately cover the material. And there's a lot of topics
| that I am personally too ignorant of to do any justice to.
| mhh__ wrote:
| The latter point is why I think it has to be open source
| rather than a single man or woman's work
| m12k wrote:
| Sounds like you're looking for a "Compilers 201" class.
| norswap wrote:
| Lecturer here - we will definitely cover some other topics,
| including semantic analysis (type checking, etc), code
| generation, basic optimizations, ...
|
| I still expect parsing will be the major part of the course,
| maybe 50%?
|
| There are some good reasons for that too. Parsing has a rich
| (and interesting) theoretical background that you don't really
| have with basic semantic analysis (unless you go to symbolic
| execution) and optimizations. It's more of an engineering-only
| angle (which we cover), while parsing has both aspects.
|
| Background in parsing is also more likely to serve most the
| average student in practice :) I also think it's less easy to
| figure out on your own.
| setr wrote:
| > Lecturer here - we will definitely cover some other topics,
| including semantic analysis (type checking, etc), code
| generation, basic optimizations, ...
|
| How is this being updated? I've noted that all current videos
| were posted on Feb 1, but I guess there's more coming up?
|
| Also the video times are a little weird -- I assumed this was
| being pulled from your normal classes, but the video lengths
| range from 10 min to 30 min kind of randomly.
|
| I suppose my main question is: whats the background to this
| course?
| justicezyx wrote:
| The hidden surge of demand on compiler engineers is well below
| the radar of most people and online channels.
|
| But I am betting this will soon happen. As ASICs especially IoT
| development will require more and more investment in compiler
| tool chains and other programming related tools.
| kuter wrote:
| GCC has a book recommendation page [0] I had been using some of
| these books to self study.
|
| [0] https://gcc.gnu.org/wiki/ListOfCompilerBooks
| mhh__ wrote:
| Be warned that some of the books are ancient now
| chrisseaton wrote:
| Some of these books are so incredibly out of date that you will
| end up with almost no understanding of how actual compilers
| work. Use with care.
| mrkeen wrote:
| Cool! Any plans for emitting assembly, c or llvm?
|
| There seem to be an awful lot of resources on compiler frontends,
| and not so much on the backend.
| myk9001 wrote:
| You might find [CS 6120: Advanced Compilers: The Self-Guided
| Online Course][1] interesting. I'm slowly working through it,
| but basically its focus is intermediate representations,
| optimizations, etc. A link to the course was on the first page
| of HN some time ago.
|
| If you want to know more about LLVM IR specifically, check this
| [talk from 2019 EuroLLVM Developers' Meeting][2]
|
| Also -- and you knew it was coming -- I've written [a toy-
| compiler of a Scala subset][3] myself :)
|
| I'm new to F# and writing compilers, so I'm sure the code is
| full of rookie mistakes. Still, it works and does generate
| assembly and executables for Linux and Windows.
|
| [1]: https://www.cs.cornell.edu/courses/cs6120/2020fa/self-
| guided...
|
| [2]: https://www.youtube.com/watch?v=m8G_S5LwlTo
|
| [3]: https://github.com/mykolav/coollang-2020-fs
| norswap wrote:
| I plan to show how to emit Java bytecode - the advantage being
| that you don't have to write your own GC, and you benefit from
| the JVM JIT compiler.
|
| I think in general the backend is much easier (especially if
| you don't have to do your own optimizations) - you mostly need
| to know a couple of patterns for working with tree-like
| structures.
| mhh__ wrote:
| There are a lot of resources on compiler backends, just that
| GCC and LLVM don't follow them.
|
| The GCC and LLVM machine models for example are very
| sophisticated and I've never seen anything like them documented
| in a textbook because they all basically assume a early 2000s
| RISC machine rather than a CISC with a very bulky decoder.
| jstanley wrote:
| Thanks, this is serendipitous timing. I definitely plan to watch
| the YouTube playlist.
|
| I'm designing a simple CPU at the moment[1] and one of the things
| that keeps me awake at night is thinking about how to make a
| compiler for it.
|
| It only supports 64K words of RAM (and I'm not planning to
| implement bank switching), and I definitely want the compiler to
| be self-hosting on my CPU, so retargeting LLVM or similar won't
| be applicable.
|
| I've looked at Small-C[2], Cowgol[3], and PL/0[4] which all
| target related goals, from different directions.
|
| [1] https://github.com/jes/scamp-cpu
|
| [2] https://en.wikipedia.org/wiki/Small-C
|
| [3] http://cowlark.com/cowgol/
|
| [4] https://en.wikipedia.org/wiki/PL/0
| retrac wrote:
| Consider Forth. The simplicity of the language's syntax means
| that a compiler is basically just a table lookup, and they can
| fit into just a few kilobytes of code.
|
| Even if you don't port or implement an actual Forth, a Forth-
| like threaded code model fits tiny memory constraints very
| well, with very high code density and only moderate overhead.
| If you add a couple registers to the VM for easing pointer
| access, it becomes a suitable target for C-like languages, too.
| lenkite wrote:
| Have you taken a look at Ferret ? https://ferret-lang.org/ ?
|
| Ferret is a free software lisp implementation designed to be
| used in real time embedded control systems. Ferret lisp
| compiles down to self contained C++11. Generated code is
| portable between any Operating System and/or Microcontroller
| that supports a C++11 compliant compiler. It has been verified
| to run on architectures ranging from embedded systems with as
| little as 2KB of RAM to general purpose computers running
| Linux/Mac OS X/Windows.
| yudlejoza wrote:
| > ... real time ... embedded ... microcontroller ...
|
| > ... that supports ... C++11 ...
|
| Makes perfect sense.
| jstanley wrote:
| I hadn't seen it, and it looks cool, thanks.
|
| I'm really keen to be able to run the compiler end-to-end on
| the CPU itself, so I think getting C++11 working would be the
| stumbling block there.
| read_if_gay_ wrote:
| Maybe this is interesting to you as well?
| https://github.com/cksystemsteaching/selfie
| dexen wrote:
| Consider the KenCC[1][2] - somewhat minimalistic, and the
| flavor of C it uses is very sane.
|
| --
|
| [1] http://gsoc.cat-v.org/projects/kencc/
|
| [2] yes, by _that_ Ken
___________________________________________________________________
(page generated 2021-02-09 23:01 UTC)