[HN Gopher] Writing a C Compiler: Build a Real Programming Langu...
       ___________________________________________________________________
        
       Writing a C Compiler: Build a Real Programming Language from
       Scratch
        
       Author : shoggouth
       Score  : 219 points
       Date   : 2024-08-12 18:26 UTC (3 days ago)
        
 (HTM) web link (nostarch.com)
 (TXT) w3m dump (nostarch.com)
        
       | shoggouth wrote:
       | It also will be available via Amazon after August 20, 2024.
       | 
       | https://www.amazon.com/Writing-Compiler-Programming-Language...
        
       | quibono wrote:
       | I swear I've seen this cover before... is this a new release or
       | an updated edition of an older book?
        
         | jdnendm wrote:
         | Book is not yet published but in early access since a couple of
         | years
         | 
         | Was featured here a couple of times.
         | 
         | Unfortunately the timing of the release is quite unfortunate
         | with regards to the summer holidays. Will take a look at it
         | next year
        
           | sgbeal wrote:
           | > Book is not yet published but in early access since a
           | couple of years
           | 
           | According to the top post's link, it was released in July
           | 2024.
        
             | hdbxbxndj wrote:
             | I seem to be mistaken. The 20th of August as listed by
             | Amazon must be the European release date then
        
           | byteplane wrote:
           | It's actually out now, I have a copy! Ordered directly fro No
           | Starch Press.
        
         | halfcat wrote:
         | _"Automate the Boring Stuff with Python"_ has a similar cover,
         | by the same publisher.
        
         | orktes wrote:
         | Many compiler related books take inspiration from the "Dragon
         | book" (Compilers: Principles, Techniques and Tools). So with
         | likely lots of books with similar looking covers.
        
           | hdbxbxndj wrote:
           | The cover looks nothing like the dragon book however?
        
             | orktes wrote:
             | Point was more that there are bunch of compiler related
             | books featuring a dragon (and knight/other character). The
             | original also has a different looking dragon & knight
             | themed cover for every edition.
             | 
             | Like if I'd see the book on a shelf I would instantly guess
             | it's related to compilers. And I bet that's completely
             | intentional homage to the original.
        
         | Almondsetat wrote:
         | I believe the author first started by making blog posts and
         | then interrupted them to simply make a book about it
        
         | thejteam wrote:
         | There was a HN article about the same book about a month ago:
         | 
         | https://news.ycombinator.com/item?id=40940799
         | 
         | So maybe you saw it then.
        
       | Coolbeanstoo wrote:
       | This looks cool, been interested in learning more about compilers
       | since I did the basics in college. Lots of things seem to focus
       | on making interpreters and never make it to the code generation
       | part so its nice to see that this features information about
       | that.
        
         | spinningslate wrote:
         | With no disrespect to the book that's the subject of this
         | thread as I haven't read it, but Bob Nystrom's Crafting
         | Interpreter [0] is a fantastic book. It covers all phases in
         | compilation, including both an interpreter and a VM.
         | 
         | It's been covered on several threads here over the years [1].
         | 
         | [0]: https://craftinginterpreters.com/ [1]:
         | https://hn.algolia.com/?q=crafting+interpreters
        
           | jcpst wrote:
           | I remember seeing this a while back. That typesetting is
           | beautiful. Thank you for bringing it up here, I might have to
           | pick that one up.
           | 
           | I've been bored with building line-of-business applications,
           | despite designing for complex requirements in high-volume
           | distributed systems.
           | 
           | In fact I took a break from CS learning entirely 9 months
           | ago. Even reading HN. I've been studying electronics and
           | analog signal processing instead.
           | 
           | But now that I've built about 50 guitar pedals of increasing
           | complexity, I feel ready to switch back to CS studies again.
        
       | i_don_t_know wrote:
       | Somewhat unrelated: Is there a book that walks you through
       | building a database system from storage to queries, optimizer,
       | execution, indexing, transactions, etc?
        
         | gtirloni wrote:
         | Related: https://news.ycombinator.com/item?id=27731896
        
         | rramadass wrote:
         | In the early 90's Al Stevens wrote 2 books _C Database
         | Development_ and _C++ Database Development_ with source code
         | which might be a good starting point.
        
           | myth_drannon wrote:
           | Interesting suggestion! here is the book on archive.org:
           | https://archive.org/details/cdatabasedevelop00stev/mode/2up
        
         | kragen wrote:
         | _transaction processing_ by gray (rip) and reuter was pretty
         | close back in the 90s. i don 't think it covered query
         | optimization because it's really about tp monitors rather than
         | databases, but, perhaps surprisingly, it does cover the other
         | topics you're asking about
        
         | rednab wrote:
         | Database Design and Implementation, ISBN 3030338355 1). Java
         | source code for the SimpleDB system from the book available
         | from the author's website 2).
         | 
         | 1) https://www.amazon.com/dp/3030338355/
         | 
         | 2) http://www.cs.bc.edu/~sciore/simpledb/
        
       | hasbot wrote:
       | So what's different about writing a compiler in 2024 than say 10,
       | 20, or 30 years ago? When I started writing compilers in the 80's
       | and 90's lex/flex and yacc/bison were popular. ANTLR came out but
       | I never had a chance to use it. Everything after lexing and
       | parsing was always hand rolled.
        
         | mjburgess wrote:
         | Parser-generators were always academic projects that had little
         | relevance to making real-world programming languages -- where
         | parsing is very easy to write, and necessarily benefits from
         | doing it (ie., you can get better error handling/etc.).
         | 
         | Today most languages are front-ends for LLVM IR, but LLVM is
         | very slow and takes a long time to optimize. Many new languages
         | target x86/arm directly with their own weakly optimized
         | backends, and output an LLVM IR for "release builds".
        
           | sestep wrote:
           | Could you give some specific examples of those new languages
           | with their own backends for faster builds?
        
             | mjburgess wrote:
             | I was immediately thinking of Jai, Zig, .. but I've seen a
             | few
        
               | stefanos82 wrote:
               | Correct me if I'm wrong, but I think both Zig and Jai use
               | LLVM as their default backend...at least, that's what I
               | have seen via live streaming for Jai, and from Zig's
               | repo.
        
               | vulcan01 wrote:
               | Zig is moving away from LLVM
               | (https://github.com/ziglang/zig/issues/16270) and Rust
               | has added Cranelift as a debug backend
               | (https://lwn.net/Articles/964735/).
               | 
               | Not sure about Jai.
        
               | stefanos82 wrote:
               | I don't think it's moving away, because if you have read
               | the entire thread, the gaming community reacts negatively
               | to say the least and they assure everyone that LLVM does
               | not going anywhere any time soon.
        
           | thechao wrote:
           | When your computer was anemic, and could barely do the tasks
           | required for it, eking out a few percent -- or a 2x! -- from
           | an optimizer was important.
           | 
           | Now-a-days, the difference between "big compiler optimized"
           | and "little compiler not optimized" can be quite dramatic;
           | but, is probably no more than 4x -- certainly within range of
           | the distinction between "systems programming language" and
           | "high tuned JITted scripting language". I think _most_ people
           | are perfectly fine with the performance of highly-tuned
           | scripting languages. The result is that all of the overhead
           | of  "big compiler" is just ... immaterial; overhead. This is
           | especially true for the case of extremely well-tuned code,
           | where the algorithm and -- last resort -- assembly, will
           | easily beat out the best optimizer by at least an order-of-
           | magnitude, or more.
        
             | auselen wrote:
             | Wouldn't just a simple case of bad code generation render
             | little compiler into a toy one?
             | 
             | Repeating others, today's compilers are really just
             | "optimizing compilers", there is no room for toying in
             | production environments.
        
               | fuhsnn wrote:
               | >just a simple case of bad code generation render little
               | compiler into a toy one
               | 
               | If you find some time to go through gcc bugzilla you'll
               | find shockingly simple snippets of code that miscompiled
               | (often by optimization passes), with fixes never
               | backported to older versions that production environments
               | like RHEL are using.
        
               | auselen wrote:
               | Ok, but that's sw engineering issue.
               | 
               | I still insist that a production grade compiler can't
               | leave performance on table. Which is where the current
               | battlefield is.
        
               | jsnnsjxj wrote:
               | I do not agree in the general case. There are very useful
               | DSL compilers which do not consider performance at all,
               | but just compile to a target which does the optimization
               | for them (JVM, LLVM IR or even just C)
        
               | auselen wrote:
               | Yes but those are called transpilers, right?
        
               | bigstrat2003 wrote:
               | I think a production grade compiler not only can, but
               | must, leave performance on the table when the cost is
               | correctness (unless the performance gain is incredibly
               | high and the correctness loss is minimal). Correctness is
               | not _all_ important, but it is the most important thing.
               | Unfortunately, compiler writers do not agree and they do
               | silly things like  "let's assume UB cannot ever happen
               | and optimize based on that".
        
               | auselen wrote:
               | Correctness is already a must, how did you arrive to
               | this?
        
               | kragen wrote:
               | if only the gcc and llvm maintainers, and c standard
               | authors, agreed with you
        
               | auselen wrote:
               | I mean these compilers build all these SW stacks, even
               | this very browser I'm using, where are these correctness
               | issues you are talking about?
        
               | kragen wrote:
               | there are new items in this category every day, but
               | https://blog.cr.yp.to/20240803-clang.html is noteworthy
        
               | auselen wrote:
               | Eh https://news.ycombinator.com/item?id=41146860
        
               | auselen wrote:
               | I realized with all the rhel systems I'm using, we are
               | never using default toolchains on them. Just use those
               | old systems to run stuff, even newer toolchains.
        
               | kragen wrote:
               | if you aren't running on the gpu you're leaving 80+% of
               | your computer's performance on the table. no optimizing
               | compiler is going to make your legacy c or lisp or rust
               | code run efficiently on the gpu, or even in most cases on
               | a multicore cpu. nor, as thechao points out, can it
               | compete with assembly-language programmers for simd
               | vectorization on the cpu
               | 
               | in summary, optimizing compilers for c or pascal or zig
               | or rust or whatever can only be used for code where
               | considerations like compatibility, ease of programming,
               | security, and predictability are more important than
               | performance
               | 
               | probably the vast majority of production code is already
               | in python and javascript, which don't even have
               | reasonable compilers at all
        
               | auselen wrote:
               | How come you made this about computer's performance :)
        
               | kragen wrote:
               | because computer performance is the topic of
               | https://news.ycombinator.com/item?id=41255695,
               | https://news.ycombinator.com/item?id=41255365, and
               | https://news.ycombinator.com/item?id=41255195, the three
               | levels of parent comments in the thread, one of which was
               | by you
        
               | auselen wrote:
               | No.
               | 
               | Main post is about a C compiler, post I responded is
               | saying
               | 
               | > When your computer was anemic, and could barely do the
               | tasks required for it
               | 
               | which I could only interpret as a machine that can run a
               | single process at a time, so not really about gpus or
               | what.
        
           | Croftengea wrote:
           | > Parser-generators were always academic projects
           | 
           | Were they? GCC abandoned bison in favour of their own parser
           | relatively recently.
        
             | userbinator wrote:
             | "relatively recently" as in around 20 years ago, in GCC 3.x
             | according to sources I found.
        
               | kragen wrote:
               | gcc was first released in 01987, but it didn't replace
               | its bison parser for c until gcc 4.1
               | https://gcc.gnu.org/gcc-4.1/changes.html which was in
               | 02006 https://gcc.gnu.org/releases.html, only 18 years
               | ago, and 19 years after it was released. joseph myers
               | first proposed doing this in 02004
               | https://gcc.gnu.org/legacy-ml/gcc-
               | patches/2004-10/msg01969.h...
               | 
               | so gcc has literally been using a parser-generator-
               | generated parser for c for more than half its existence,
               | at which point it had already become the most popular c
               | compiler across the unix world and basically the only one
               | used for linux, which had already mostly annihilated the
               | proprietary unix market. it was also imposingly dominant
               | in the embedded space
               | 
               | and i think that kind of development path is fairly
               | typical; getting a parser up and running is easier with a
               | parser generator, but it can be tricky to get it to do
               | strange things when that's what you want (especially with
               | lalr, less so with peg)
        
           | kragen wrote:
           | yacc/bison was used for lex, bc, pcc, gcc, original awk, the
           | bsd pascal compiler, eqn, m4 (!), and many other languages.
           | it's still used for pcc, oawk, mawk, pari/gp, and units.
           | that's just what i have sitting around in my downloads
           | directory
           | 
           | and, while we're talking about ocaml, ocaml does use ocamllex
           | and ocamlyacc for its own parser
           | 
           | so, while you can certainly do without parser generators,
           | they have very commonly been used for making real-world
           | programming languages. almost every programming language
           | anyone here has ever heard of was first implemented with a
           | parser generator. the main exceptions are probably fortran,
           | cobol, algol, lisps, c, and pascal
        
             | mjburgess wrote:
             | I guess I wasn't very clear. I didnt mean to say, as a
             | historical matter, were irrelevant.
             | 
             | I meant to say the idea of a parser generator is a solution
             | to a problem that that real world langs don't really have.
             | When writing a programming language, your issue isnt how
             | much time the parser is going to take to write, or how
             | complex it's going to be. The parser is a relatively
             | trivial part of the problem.
             | 
             | Due to language designers often being taught to develop
             | langauges in this fashion, many have relied on these tools.
             | But the modern view of compliers as "programming langauge
             | UIs" and the focus on DX, i'd argue its actively
             | pathological to use a parser generator.
             | 
             | Much academic work has, til recently, focused on these
             | areas -- whereas today, the bulk of the difficulty is in
             | understanding SSA/LLVM/ARM/Basic Optimizatiosn/etc. details
             | which are "boring, circumstantial" etc. and not really
             | research projects. I was just pointing this out since a lot
             | of people, myself included, go down the EBNF parser-
             | generator rabbit hole and think inventing a langauge is
             | some formal exercise -- when the reality is the opposite:
             | it's standard programming-engineering work.
        
               | kragen wrote:
               | oh, i agree that parsing is not the hard part of writing
               | a compiler, and that compilers classes overemphasize it
               | 
               | but no language starts out as a 'real world lang'; every
               | language is initially a toy language, and only becomes a
               | 'real world lang' in the unlikely case that it turns out
               | to be useful. and parser generators are _very_ useful for
               | creating toy languages. that 's why virtually every real
               | world lang you've ever used was implemented first using a
               | parser generator, even if the parser you're using for it
               | now is handwritten
               | 
               | having a formally defined grammar is also very helpful
               | for dx features like syntax highlighting and automated
               | refactoring
        
               | layer8 wrote:
               | The one thing parser generators do is that they ensure
               | that the language you implement actually matches the
               | grammar you wrote. That's still an important assurance to
               | have.
        
               | kragen wrote:
               | they also tell you where the grammar is ambiguous, which
               | can be useful
        
               | mjburgess wrote:
               | I've created many programming languages. All the ones I
               | finished and were useful did not have grammars that "I
               | wrote".
        
               | kragen wrote:
               | do you mean https://github.com/mjburgess/Lyssa and
               | https://github.com/mjburgess/Quazar? it's true that i
               | can't find a grammar in either of them (https://github.co
               | m/mjburgess/Lyssa/blob/master/src/impl.py#L... seems more
               | forthy than anything else) but (while they are very much
               | the sort of things that i like, thank you for sharing)
               | they also seem somewhat less like 'real-world programming
               | languages' than things like awk, ocaml, or our other
               | example in this thread, gcc c and objective-c until gcc
               | replaced the bison parser with a handwritten one in
               | 02006. that last compiler was the compiler nextstep was
               | built on, which got steve jobs back in control of apple,
               | to replace macos with nextstep. seems pretty real-world
               | to me
               | 
               | maybe you're talking about stuff you haven't released?
        
               | layer8 wrote:
               | Are you saying your programming languages don't have a
               | defined grammar?
        
               | kragen wrote:
               | i was puzzled about that too and got even more puzzled
               | when i looked at the languages i could find that he's
               | implemented
        
             | pklausler wrote:
             | Fortran compilers did go through a phase when table-driven
             | parsers were used, but it had the disadvantage of needing
             | complicated lexers and statement classifiers that rely on
             | semantic information. Fortran's a hard language to parse,
             | given its lack of reserved words, optional spaces, and many
             | ambiguities.
             | 
             | The f18 compiler's parser uses parser combinations to
             | construct a backtracking recursive descent parser that
             | builds a parse tree for the whole source file before doing
             | any semantic analysis. This approach allows good error
             | recovery without having to revert any updates to the symbol
             | table.
        
               | kragen wrote:
               | that's an interesting approach! though probably not
               | applicable to c and c++
               | 
               | i assume that by 'parser combinations' you mean parser
               | combinators
               | 
               | what i meant about fortran is that the _first_ fortran
               | compiler didn 't use a parser generator
        
         | pfdietz wrote:
         | Compiling today might be done at run time, exploiting dynamic
         | information. The line between a compiler and an interpreter is
         | blurred.
        
         | shortrounddev2 wrote:
         | ANTLR generate a visitor system from a BNF grammar that you
         | need to implement the logic of in your chosen language, which I
         | believe is similar to C++'s std::visit. lex and yacc generate
         | state machines using BNF grammar and you implement the logic in
         | the tools themselves
        
         | trealira wrote:
         | The book doesn't have 2024 in the title. I suspect they put it
         | there because last time a post about this book was made, I
         | noted that it was from 2022, not realizing that the book has
         | now been released in 2024.
         | 
         | https://news.ycombinator.com/item?id=40940799
         | 
         | > So what's different about writing a compiler in 2024 than say
         | 10, 20, or 30 years ago?
         | 
         | As far as I can tell, the main difference is that static single
         | assignment (SSA) as an intermediate form was not the norm 30
         | years ago, but it is nowadays. Also, in newer books, it's more
         | common to go over global register allocation now, whether
         | that's graph coloring or linear scan register allocation. If
         | you read old compiler books, the main optimizations they talk
         | about are use-def chains, moving computations out of loops, and
         | using the local and tree-based Sethi-Ullman register allocation
         | algorithm.
        
         | jcranmer wrote:
         | There are actually quite a few changes!
         | 
         | The most obvious change you'll see is the use of SSA, which has
         | become the dominant representation in IR starting 25-30 years
         | ago.
         | 
         | There's also been an increase in the importance of compiler
         | IRs, and especially the concept of code passing through
         | _multiple_ IRs before reaching machine code.
         | 
         | Formal semantics has become more of a thing in the past decade
         | or so. It's now routine that even weak memory models have a
         | detailed formal model of how they work. In LLVM, it's now a
         | requirement that you demonstrate formal correctness of new
         | InstCombine transformations (which are essentially peephole
         | optimizations).
         | 
         | The use of parser generators has really fallen into disrepute;
         | everything has transitioned to handrolled parsers these days.
         | Language standards themselves are starting to rely on context-
         | sensitive keywords, which are hard to implement in a generator-
         | based lexer/parser setup.
         | 
         | Optimizations have generally broadened in scope; we're now
         | seeing whole-function level of optimization being the default
         | way to look at stuff for analysis, and there's been a variety
         | of techniques introduced to make whole-program optimization
         | (aka LTO) much more tractable.
         | 
         | Another subtle but major shift is that compilers are
         | increasingly reliant on inferred analysis from dumber
         | representations over the original declared intent of the code.
         | For example, SROA in LLVM (which breaks up structs so that
         | individual fields can be independently allocated in registers)
         | relies not on looking at the way the struct is declared but the
         | offsets within the struct that various load and store
         | operations use.
         | 
         | A final major shift is the trend towards support for ancillary
         | tooling in the programming language space, so that the compiler
         | isn't merely a tool that goes from source to object code, but
         | is something that can be actively queried for information about
         | the source code. Things like the language server, or smart
         | formatting, or automatic refactoring tooling.
        
           | elteto wrote:
           | This is a fantastic comment, thanks.
        
         | late_again wrote:
         | You can use the "classic" tool set and parse many programming
         | languages. The real trick in writing compilers is taking
         | advantage of the hardware (disclaimer: I designed and wrote
         | middle-pass portions for IBM 360/370 processors (and clones),
         | supercomputers from Cray and ETA Systems, and Sun 4
         | workstations among others, including some RISC systems that
         | just disappeared around the bursting of the dot-com bubble) I
         | tried and failed to optimize MPP systems from all of the "name"
         | players in the 1990's. It kind-of broke my heart...
         | 
         | That "middle-pass" approach that will let you address many
         | targets is still valid; the trick is finding a sufficiently
         | robust and flexible internal representation at the right level.
         | You also have to be able to out-guess the chip vendors where
         | before you could go to the architect or a complete "System"
         | book and get the real scoop, including things you shouldn't do.
         | Oddly enough, there is simultaneously useful and completely
         | worthless documentation scattered about the internet.
         | 
         | You might want to take a look at Muchnick and Jones'
         | _Program_Flow_Analysis_ (yes, it's from 1981) but chapters 4-6
         | can be applied at code-generation time. How that fits modern
         | Intel processors (for example) is unknown. Idealizing your
         | processor as a RISC-V might be a reasonable way to proceed but
         | in the end, you'll have to translate the code for the target --
         | it will be reasonably straight-forward if you drive it all from
         | tables but it's not trivial.
        
       | badsectoracula wrote:
       | Weird that this is about building a _C_ compiler[0] in _OCaml_. I
       | expected the implementation language to also be C both for
       | consistency but also because i 'm willing to bet that there are
       | more people who can read C than OCaml.
       | 
       | [0] actually from the readme in the github repo[1] it seems to be
       | a C subset, not all of C
       | 
       | [1] https://github.com/nlsandler/nqcc2
        
         | Croftengea wrote:
         | OCaml? Thanks for saving me a click!
        
           | hdbxbxndj wrote:
           | OCaml is one of the most used languages for compiler design
           | 
           | A good engineer should be able to use the right tool for the
           | job
        
             | materielle wrote:
             | For hobbyist compiler implementations, right? Compilers for
             | the most popular languages are either written in C/C++, or
             | self-hosted.
             | 
             | You can write compilers in almost any language. I fail to
             | see how C, C++, or even Java or Python aren't the right
             | tool for the job here. I like pattern matching too, but
             | given that hundreds of successful production compilers have
             | been written without pattern matching, it's surely just a
             | personal preference.
        
               | porcoda wrote:
               | I've worked on multiple compilers in industry that are
               | written in Ocaml. A number of industrial static analyzers
               | are written in Ocaml too (eg, Infer from Facebook/Meta).
               | Yes, LLVM and GCC are the big ones written in the C/C++
               | family but they don't represent everything.
        
               | jsnnsjxj wrote:
               | I mean, we _are_ talking about a book which invites you
               | to build your own toy C compiler ^^
               | 
               | Nevertheless, OCaml is very strong in compiler design.
               | For example Rust and Hack were written in OCaml
               | initially.
               | 
               | Nevertheless you are not wrong that compilers needing the
               | very last bit of performance like the JVM and LLVM tend
               | to be written in C++
               | 
               | But the barrier is quite a lot more tending to high
               | performance/very high performance and not toy/production
               | 
               | Java and Python are suitable for implementing a toy
               | Compiler and the auther invites you to use any language
               | you like. Just the reference implementation is using
               | OCaml
               | 
               | I would however argue that using C++ is quite advanced
               | since it does not have pattern matching and using C is
               | just masochm. You will be fighting against the language
               | to do even trivial things instead of fighting the actual
               | problem at hand
        
         | kragen wrote:
         | ocaml makes writing a compiler enormously more accessible, and
         | learning to read ocaml, while it can be somewhat intimidating
         | at first, is much easier than learning to write a compiler
         | 
         | (imagine a medieval accountant trying to learn to do long
         | division in roman numerals. he'll be much better off learning
         | the western arabic numerals fibonacci is so excited about)
        
           | shortrounddev2 wrote:
           | I really really really want to get more into Ocaml but as far
           | as I know there is no good support for a debugger to use in
           | an IDE like VSCode or even vim. Everyone I talk to says they
           | just do print debugging. It's easier to 'reason about your
           | code' in FP, but I do NOT want to go back to a time where I
           | coded without breakpoints. I use F# because of its first
           | party tooling support
        
             | kragen wrote:
             | it does have a debugger with breakpoints, which even
             | supports time-travel debugging (except on windows
             | obviously), but i've never used it. it even has first-party
             | ide integration:
             | https://ocaml.org/manual/5.2/debugger.html#s:inf-debugger
             | 
             | i use debuggers a lot when i'm programming in assembly, and
             | from time to time when i'm programming in c or c++, but in
             | ocaml i've never needed one. it's not that i've never had
             | bugs in ocaml, but they tend to be of a different flavor, a
             | flavor for which breakpoints are of little value
             | 
             | it sounds like your f# experience is different; what kinds
             | of bugs have you recently found the debugger valuable for
             | in f#?
             | 
             | debug logging is usually a superior option to breakpoint
             | debugging for problems to which both are applicable,
             | because debug logging shows you the whole history of your
             | program's execution rather than just a single point in
             | time. breakpoint debugging requires a lot of manual labor,
             | painstakingly operating the machine to navigate the
             | execution to the state that has the problem. it's like
             | grinding in a video game. i'd rather program the computer
             | to do that labor for me, at which point i no longer need
             | the debugger
             | 
             | (except in c and assembly)
        
               | shortrounddev2 wrote:
               | > it does have a debugger with breakpoints, which even
               | supports time-travel debugging (except on windows
               | obviously), but i've never used it. it even has first-
               | party ide integration:
               | https://ocaml.org/manual/5.2/debugger.html#s:inf-debugger
               | 
               | 1. I am developing on windows so that's an issue for me
               | and
               | 
               | 2. I don't use emacs, I use VScode and I've not been able
               | to get the experimental debugger working for the VScode
               | plugin.
               | 
               | It's not that the debugger is purely for fixing bugs; I
               | use it as an active part of development. I want to be
               | able to freeze the program and inspect values as they are
               | running. I may know what the types or state of the
               | program are just by viewing the code, but I want to be
               | able to inspect the actual data for myself as the program
               | is running.
               | 
               | > debug logging shows you the whole history of your
               | program's execution rather than just a single point in
               | time
               | 
               | breakpoints also provide stack traces, so they provide a
               | kind of history as well. I'd rather inspect and interact
               | with a program than dig through thousands of lines of
               | logs
               | 
               | > breakpoint debugging requires a lot of manual labor,
               | painstakingly operating the machine to navigate the
               | execution to the state that has the problem
               | 
               | I see things the opposite as you: print debugging is
               | tedious and requires restarting the program from the
               | beginning whenever you make a change to the source,
               | unless you are constructing your program piecemeal with
               | the repl, which I consider to be extremely tedious as
               | well. To me, a debugger with breakpoints is a far more
               | efficient way to code than print debugging.
               | 
               | I think there is a cultural difference in software
               | engineering between people who use debuggers and people
               | who don't. John Carmack once pointed out that people who
               | come from the game dev and Windows/PC world use debuggers
               | while people from the linux and web dev world tend not
               | to. It seems to be a matter of preference/taste, and I
               | think FP programmers seem to have a distaste for
               | debuggers and graphical debugging/development
               | environments
        
               | kragen wrote:
               | > _John Carmack once pointed out that people who come
               | from the game dev and Windows /PC world use debuggers
               | while people from the linux and web dev world tend not
               | to. It seems to be a matter of preference/taste, and I
               | think FP programmers seem to have a distaste for
               | debuggers and graphical debugging/development
               | environments_
               | 
               | i think that's true! but i don't think it's purely a
               | matter of preference; it's also a matter of what the
               | ecosystems support well and what you're trying to
               | achieve. lisp is a huge exception to the rule about fp
               | programmers; lisp systems, even including emacs, have
               | extremely strong debugging support. but non-lisp fp
               | languages tend to heavily emphasize static typing,
               | thinking things through ahead of time, and correctness by
               | construction, which reduce the need for debugging. but
               | those are more valuable for writing compilers than for
               | writing games or uis in general, where it's hard to
               | define what counts as 'correct' ahead of time but easy to
               | recognize it when you test it interactively
               | 
               | webdev of course has the problem that you can't stop your
               | http response handler while the browser is waiting for a
               | response. and, often, it's sadly not very concerned with
               | ux. and it's often concerned with operating systems in a
               | way where you have to debug problems _after_ they occur,
               | in part because of the scale of the problems
               | 
               | automated testing is another ecosystem thing that reduces
               | the need for debuggers; to a significant extent it
               | competes with static typing
               | 
               | one of the things i really appreciate about godot is
               | being able to continuously adjust parameters and observe
               | variables in my games as they're running. godot is
               | motherfucking awesome, man. i definitely don't have a
               | distaste for graphical debugging and development
               | environments!
               | 
               | > _breakpoints also provide stack traces, so they provide
               | a kind of history as well. I 'd rather inspect and
               | interact with a program than dig through thousands of
               | lines of logs (...) print debugging is tedious and
               | requires restarting the program from the beginning
               | whenever you make a change to the source_
               | 
               | oh, see, i have programs like grep and emacs that dig
               | through thousands of lines of logs for me. often when i'm
               | debugging from logs i don't run the program at all; i
               | just look at the logs and the source code. sometimes the
               | program is running someplace i _can 't_ interact with it
               | --memorably, on some occasions, on a satellite out of
               | range of the groundstation. and usually exceptions on
               | python or java give me a pretty decent stack trace,
               | though other log messages unfortunately don't
               | 
               | there's another ecosystem support issue here--although
               | i've sometimes developed on systems that supported fix-
               | and-continue (cmucl, gforth, squeak, basic-80, godot),
               | python and gcc support it very poorly or not at all. so
               | for me i have to restart the program from the beginning
               | whenever i make a change to the source in any case,
               | whether i'm doing printf debugging or not. godot, again,
               | is a very nice exception to this rule, and incidentally
               | lets me add print debugging to the game while it's
               | running
               | 
               | one of the great things about a breakpoint debugger, from
               | my point of view, is that it makes it possible to add
               | logging after the fact to a running program without
               | editing the source or restarting it
               | 
               | i really appreciate you sharing your experience!
        
         | userbinator wrote:
         | The bonus of writing a C compiler in C is that you get to being
         | able to experiment with self-compilation.
        
         | anta40 wrote:
         | Written in OCaml? Ahh interesting. Suddenly I remember similar
         | book:
         | 
         | Modern Compiler Implementation in ML:
         | https://www.cs.princeton.edu/~appel/modern/ml/
         | 
         | As an undergrad student, I think the C version is kinda easier
         | to understand, though.
        
         | norir wrote:
         | There are many disadvantages to writing a compiler in c (and I
         | have done it). For me, the biggest is simply that it is very
         | verbose for the type of things that you do commonly in a
         | compiler. I have written a self-hosting compiler that
         | transpiles to c and the generated code is about 5x as long as
         | the original code. I could go through and clean it up and
         | probably get it down to 2-3x, but there are certain concepts
         | that cannot easily be expressed compactly in c (except possibly
         | with preprocessor abuse that I don't have patience for). A
         | simple example is that if you want to represent a union type
         | you have to manually define 2 structs and an enum. Matching
         | with switch is also verbose as you have to both match and
         | manually assign the value in the match case. Again, you can use
         | macros to do this though at that point you arguably aren't even
         | using c but rather a hybrid language. A language like ocaml,
         | makes it much more straightforward to define and match on
         | unions as well as gives you other nice high level things like
         | gc so you can focus on the compiler itself and not low level
         | details.
        
       | sergius wrote:
       | How does it compare with N.Wirth's?
       | 
       | https://onlinebooks.library.upenn.edu/webbin/book/lookupid?k...
        
         | hdbxbxndj wrote:
         | The book is a very hands on tutorial whereas Wirths is basic
         | literature for the general case.
         | 
         | While they teach similar content, they have a different
         | approach.
         | 
         | There are literally thousands of compiler design books out
         | there, I don't really see anything particularly comparable
         | between this book and Wirth's
        
         | anta40 wrote:
         | Similar to studying OS concepts using Silberschatz' Operating
         | System Concept and Tanenbaum's Operating Systems Design and
         | Implementation. The former only explains the theoritical ideas,
         | while the latter is the documentation of an implementation.
        
         | cxr wrote:
         | Wirth's book does not implement a "real" programming language.
         | Whatever your thoughts on Oberon and Pascal-like SHOUTCASE
         | languages, it's largely irrelevant. Oberon is arguably a "real"
         | language (and operating system), but Wirth's book does not
         | cover the implementation of Oberon. It covers the
         | implementation of Oberon0, an inarguably toy subset of Oberon.
         | (Actually, "subset" is not even correct.) The example code has
         | also diverged from the book, with Wirth abandoning the strategy
         | described in the book for avoiding redundant initialization of
         | the module static base, among other things.
         | 
         | Aside from that, I encourage everyone who cites Compiler
         | Construction to actually work through the first 10% of the book
         | and then count the number of errata.
        
       | alok-g wrote:
       | I would love to see a book that talks about going all the way to
       | generate machine code, i.e., not stopping at generation of
       | assembly.
       | 
       | Alternatively, I would like to learn about not just how to make a
       | compiler, but also simultaneously a debugger, hot-reloading, etc.
        
         | hdbxbxndj wrote:
         | Writing an simple assembler is trivial. Even macro assemblers
         | are very easy.
         | 
         | However, it's also boring.
         | 
         | Nevertheless the contents of the book cover all the techniques
         | required to write an assembler, if you'd really like to
        
           | alok-g wrote:
           | I understand that assembly file can be parsed in the same
           | way. However, I want to learn about the machine instructions
           | to the level of bits, and likewise the layouts of binary
           | files. Unless I am able to go all the way to machine code
           | loaded in memory, I would not know where in memory to add a
           | breakpoint instruction when a developer wants the same on a
           | line of code.
           | 
           | If there is some library that can help create machine code
           | from assembly instructions on a line by line basis (at least
           | as opposed to invoking a separate program that generates the
           | entire binary collectively from the assembly code), that
           | could also work.
           | 
           | In my case, I already know enough of the lexer, parser, etc.,
           | parts. What's missing is going all the way to making a
           | debugger, profiler, etc.
        
             | fuhsnn wrote:
             | >If there is some library that can help create machine code
             | from assembly instructions on a line by line basis
             | 
             | That's what JIT libraries do, for example asmjit: https://g
             | ithub.com/asmjit/asmjit/blob/master/test/asmjit_tes...
        
               | alok-g wrote:
               | Looks cool! Thanks.
        
               | dymk wrote:
               | asmjit is great; I used it for building a primitive (but
               | quite fast) query engine for in-memory graphs. It made it
               | simple to go from query AST to native instructions, most
               | of which was "call this other compiled function".
        
             | rramadass wrote:
             | Checkout _GNU Binutils_ and their usage of _libbfd_ and
             | _libopcodes_.
        
             | jsnnsjxj wrote:
             | Building a debugger and profiler is quite an advanced task
             | compared to building an assembler though ^^
             | 
             | Also much of that work is heavily dependent on the used
             | operating system.
             | 
             | Nevertheless, I'm wishing you all the best on your journey!
        
         | synack wrote:
         | The debugger book is coming soon.
         | https://nostarch.com/building-a-debugger
        
           | alok-g wrote:
           | Awesome! Thanks.
        
       | signaru wrote:
       | Have read the first few chapters and it expects that you either
       | read the accompanying source code or implement your own and pass
       | the tests. The pseudo code presented in the book often look like
       | function calls with the inner details not there in the book.
       | Furthermore, as already pointed out in another comment, the
       | available implementation is in OCaml, which is probably not
       | something many C programmers have experience with.
       | 
       | Nevertheless, I think I'm learning more from this book than most
       | other books I've tried before that are far more theoretical or
       | abstract. I'm still eager to reach the chapter on implementing C
       | types. I think it's a good book, but it requires more effort than
       | something like Crafting Interpreters or Writing a
       | Compiler/Interpreter in Go, while also covering topics not in
       | those books.
        
         | wrycoder wrote:
         | Nand2Tetris is also like that - they provide an outline and
         | tests, but you have to do the work. And, having the
         | implementation language be different from the target language
         | reduces confusion.
         | 
         | Plus, you get to become proficient in OCaml, which is a pretty
         | good language.
        
           | kragen wrote:
           | that's a good point--it was pretty confusing when i wrote ur-
           | scheme in scheme, or stoneknifeforth in stoneknifeforth,
           | because i kept getting confused about which level of the
           | language i was changing things in
        
         | myko wrote:
         | I thought this book looked neat but closed the tab before
         | reading the comments here, and after this one decided to go
         | ahead and buy it. Sounds really fun!
        
       | synack wrote:
       | I've been working through this book implementing the compiler in
       | Ada. So far, I'm really enjoying it. The book doesn't make too
       | many assumptions about implementation details, leaving you free
       | to experiment and fill in the blanks yourself.
       | 
       | It feels like a more advanced version of Crafting Interpreters.
       | 
       | I haven't looked at the OCaml implementation at all. The text and
       | unit tests are all you need.
       | 
       | Discussion on the Ada Forum: https://forum.ada-lang.io/t/writing-
       | a-c-compiler/1024
        
       | fuhsnn wrote:
       | chibicc[0] complement this book nicely, in addition to a basic
       | compiler, it guides you through writing the preprocessor and
       | driver, which, although not addressed much in literature, are the
       | missing link between the compiler built from the book and real C
       | projects.
       | 
       | [0] https://github.com/rui314/chibicc
        
       | sunday_serif wrote:
       | I'm working through this book now and really enjoying it!
       | 
       | Each chapter of the book includes a test suite to run against the
       | code you've written.
       | 
       | In some ways, the tests in this book feel very similar to the
       | labs in the book Computer Systems: A programmers perspective --
       | which is high praise!
        
       | sylware wrote:
       | I wonder why there is not the same book for c++... mmmmh... I
       | really wonder... (irony).
        
         | sylware wrote:
         | It is because c++ has an absurdely and grotesquely massive and
         | complex syntax (like rust...).
        
           | stevefolta wrote:
           | Yeah, Rust is the language for people who think C++ is not
           | complex (or hostile) _enough_.
        
       | the_panopticon wrote:
       | In Ocaml, interesting. I was similarly surprised when I learned
       | that the firs Rust compiler was written in Ocaml, too
       | https://users.rust-lang.org/t/understanding-how-the-rust-com...
        
         | mananaysiempre wrote:
         | Tree processing is best done in a language with decent
         | algebraic datatypes and pattern matching. I would've preferred
         | Standard ML, but, well, pot-ay-to, pot-ah-to. Haskell is
         | another choice but the techniques you need to use there (while
         | undeniably gaining you some possibilities) don't really
         | generalize to other languages, so you're now writing a book
         | about compiler construction in Haskell rather than just
         | compiler construction. Ditto for Rust. Kotlin has deliberately
         | anemic pattern matching. C# or F# leave you depending on
         | Microsoft's benevolence (sic). Metalua and Sweet.js both have
         | decent ADT support but both are pretty much dead. Racket
         | exists, I guess, and there are some pattern-matching libraries
         | for normal Scheme as well, but the charisma malus of the
         | parenthesis is real even if I don't understand what causes it.
         | 
         | So OCaml was probably the most mainstream choice among the
         | languages with appropriate tools, as funny as that sounds. And
         | honestly, once you get over the syntax, it doesn't actually
         | have anything outrageous.
        
         | bunderbunder wrote:
         | ML (short for "meta-language") was originally designed for use
         | in programming language research, and really shines for that
         | purpose. And OCaml is probably the most pragmatic dialect for
         | the purpose.
         | 
         | SML is _very_ dated and the standard library and ecosystem lack
         | many things that are considered table stakes for a viable
         | programming language nowadays. And F# and Scala are fine as
         | enterprise languages, but being tied to .NET and Java
         | respectively makes them less desirable for implementing a
         | language that won 't itself be coupled to one of those
         | runtimes.
        
       | viraj_shah wrote:
       | Dropping this one here! (no affiliation)
       | 
       | https://www.linuxfromscratch.org/
       | 
       | "Linux From Scratch (LFS) is a project that provides you with
       | step-by-step instructions for building your own custom Linux
       | system, entirely from source code."
        
         | pull_my_finger wrote:
         | Why though? It doesn't seem to be related at all to the OP
         | other than both are tutorial books?
        
         | jsnnsjxj wrote:
         | This has nothing to do with the post?
        
       | sim7c00 wrote:
       | cool, remember some tutorials online i think from the same author
       | (not 100% sure) doing stuff around c compilation in python. shame
       | its not in a language i want to learn. the other book on
       | compilers i got is almost to heavy to lift! :D
        
       | jerjerjer wrote:
       | I uh misread the title and thought someone built a C compiler in
       | Scratch.
       | 
       | On topic, though: wouldn't a simpler language (maybe even a
       | pseudo language) be a better target for a first learning
       | compiler. I understand they don't build a full C compiler, but
       | still. It looks to me like there's a lot of complexity add from
       | choosing such a lofty target.
        
         | tuveson wrote:
         | What do you think would make a better target? C maps pretty
         | closely to assembly, so it seems like it would be the simplest.
         | Maybe Pascal or BASIC, but most people these days don't have
         | experience with Pascal, and BASIC would probably be too simple
         | for a full-length book.
         | 
         | For writing an interpreter or transpiler, there are probably
         | better options, but for a true compiler I can't think of a
         | better choice than C (or at least a subset of C).
        
       | francogt wrote:
       | I see many comments saying that the book implements the C
       | compiler in ocaml. In the introduction the author states that the
       | book actually uses pseudo code so you are actually free to
       | implement it in any language. The only recommendation is that you
       | use a language with pattern matching because the pseudo code
       | makes heavy use of it. The reference implementation is in ocaml.
        
         | ashconnor wrote:
         | Useful list considering that feature:
         | https://en.wikipedia.org/wiki/Category:Pattern_matching_prog...
        
         | markus_zhang wrote:
         | Thanks, can you please lemme know which part uses pattern
         | matching? I'd assume mostly in the lexer, but the parser should
         | just be something that consume the tokens and spit out AST.
         | Unless of course it combines the two.
        
         | CrimsonCape wrote:
         | Question for HN, pattern matching is defined as "runtime
         | type/value checking", is that correct?
         | 
         | Is duck typing the pseudo-unsafe alternative? (Not unsafe as in
         | accessing unsafe memory, but as in throwing exceptions if the
         | duck-typed function doesn't exist on the current type)
         | 
         | Can C handle both?
         | 
         | Coming from a static type system like rust and c#, i'm doing
         | alot of "if this is a duck, duck.quack()" and i'm looking for
         | faster alternatives and less verbosity if possible
        
           | trealira wrote:
           | One thing is that pattern matching can make writing tree
           | manipulation code succinct and easier to read. For example,
           | take this article[0] that describes the difference list
           | algorithm. Basically, it's kind of like a rope, but for
           | lists. It's a tree with lists at the leaves, and when you
           | want to convert it into a list, you rewrite the tree to be
           | right-leaning, and then concatenate all the lists at once.
           | This turns repeated concatenation at the end of lists from
           | taking quadratic time into one that takes linear time (strcpy
           | can be an example of this in C [1]). The code can be written
           | like this:                 data Tree a = Leaf a | Branch
           | (Tree a) (Tree a)            fromList :: [a] -> Tree [a]
           | fromList = Leaf            toList :: Tree [a] -> [a]
           | toList (Leaf x) = x       toList (Branch (Leaf x) r) = x ++
           | toList r       toList (Branch (Branch l1 l2) r)
           | = toList (Branch l1 (Branch l2 r))            append :: Tree
           | [a] -> Tree [a] -> Tree [a]       append = Branch
           | 
           | In a language that doesn't have tree pattern matching, the
           | code wouldn't be this short and easy to understand, and I
           | don't think it could be replicated just by having duck
           | typing. Rust has pattern matching, but because it's primarily
           | focused on lower-level concerns like pointers and memory
           | ownership, pattern matching isn't this nice, because you have
           | to pattern match on the pointer first.
           | 
           | Since a compiler is all about tree manipulation, support for
           | tree pattern matching should be a boon.
           | 
           | [0]: http://h2.jaguarpaw.co.uk/posts/demystifying-dlist/
           | 
           | [1]: https://en.wikipedia.org/wiki/Joel_Spolsky#Schlemiel_the
           | _Pai...
        
       | WalterBright wrote:
       | I learned how to write a compiler by studying BYTE magazine in
       | the 70's which published the source to a complete Pascal compiler
       | as an article!
       | 
       | https://archive.org/details/byte-magazine-1978-09 (part 1)
       | 
       | All 3 parts of Tiny Pascal:
       | 
       | https://albillo.hpcalc.org/publications/Easter%20Egg%20-%20T...
        
       ___________________________________________________________________
       (page generated 2024-08-15 23:00 UTC)