[HN Gopher] Writing a C Compiler: Build a Real Programming Langu...
___________________________________________________________________
Writing a C Compiler: Build a Real Programming Language from
Scratch
Author : shoggouth
Score : 219 points
Date : 2024-08-12 18:26 UTC (3 days ago)
(HTM) web link (nostarch.com)
(TXT) w3m dump (nostarch.com)
| shoggouth wrote:
| It also will be available via Amazon after August 20, 2024.
|
| https://www.amazon.com/Writing-Compiler-Programming-Language...
| quibono wrote:
| I swear I've seen this cover before... is this a new release or
| an updated edition of an older book?
| jdnendm wrote:
| Book is not yet published but in early access since a couple of
| years
|
| Was featured here a couple of times.
|
| Unfortunately the timing of the release is quite unfortunate
| with regards to the summer holidays. Will take a look at it
| next year
| sgbeal wrote:
| > Book is not yet published but in early access since a
| couple of years
|
| According to the top post's link, it was released in July
| 2024.
| hdbxbxndj wrote:
| I seem to be mistaken. The 20th of August as listed by
| Amazon must be the European release date then
| byteplane wrote:
| It's actually out now, I have a copy! Ordered directly fro No
| Starch Press.
| halfcat wrote:
| _"Automate the Boring Stuff with Python"_ has a similar cover,
| by the same publisher.
| orktes wrote:
| Many compiler related books take inspiration from the "Dragon
| book" (Compilers: Principles, Techniques and Tools). So with
| likely lots of books with similar looking covers.
| hdbxbxndj wrote:
| The cover looks nothing like the dragon book however?
| orktes wrote:
| Point was more that there are bunch of compiler related
| books featuring a dragon (and knight/other character). The
| original also has a different looking dragon & knight
| themed cover for every edition.
|
| Like if I'd see the book on a shelf I would instantly guess
| it's related to compilers. And I bet that's completely
| intentional homage to the original.
| Almondsetat wrote:
| I believe the author first started by making blog posts and
| then interrupted them to simply make a book about it
| thejteam wrote:
| There was a HN article about the same book about a month ago:
|
| https://news.ycombinator.com/item?id=40940799
|
| So maybe you saw it then.
| Coolbeanstoo wrote:
| This looks cool, been interested in learning more about compilers
| since I did the basics in college. Lots of things seem to focus
| on making interpreters and never make it to the code generation
| part so its nice to see that this features information about
| that.
| spinningslate wrote:
| With no disrespect to the book that's the subject of this
| thread as I haven't read it, but Bob Nystrom's Crafting
| Interpreter [0] is a fantastic book. It covers all phases in
| compilation, including both an interpreter and a VM.
|
| It's been covered on several threads here over the years [1].
|
| [0]: https://craftinginterpreters.com/ [1]:
| https://hn.algolia.com/?q=crafting+interpreters
| jcpst wrote:
| I remember seeing this a while back. That typesetting is
| beautiful. Thank you for bringing it up here, I might have to
| pick that one up.
|
| I've been bored with building line-of-business applications,
| despite designing for complex requirements in high-volume
| distributed systems.
|
| In fact I took a break from CS learning entirely 9 months
| ago. Even reading HN. I've been studying electronics and
| analog signal processing instead.
|
| But now that I've built about 50 guitar pedals of increasing
| complexity, I feel ready to switch back to CS studies again.
| i_don_t_know wrote:
| Somewhat unrelated: Is there a book that walks you through
| building a database system from storage to queries, optimizer,
| execution, indexing, transactions, etc?
| gtirloni wrote:
| Related: https://news.ycombinator.com/item?id=27731896
| rramadass wrote:
| In the early 90's Al Stevens wrote 2 books _C Database
| Development_ and _C++ Database Development_ with source code
| which might be a good starting point.
| myth_drannon wrote:
| Interesting suggestion! here is the book on archive.org:
| https://archive.org/details/cdatabasedevelop00stev/mode/2up
| kragen wrote:
| _transaction processing_ by gray (rip) and reuter was pretty
| close back in the 90s. i don 't think it covered query
| optimization because it's really about tp monitors rather than
| databases, but, perhaps surprisingly, it does cover the other
| topics you're asking about
| rednab wrote:
| Database Design and Implementation, ISBN 3030338355 1). Java
| source code for the SimpleDB system from the book available
| from the author's website 2).
|
| 1) https://www.amazon.com/dp/3030338355/
|
| 2) http://www.cs.bc.edu/~sciore/simpledb/
| hasbot wrote:
| So what's different about writing a compiler in 2024 than say 10,
| 20, or 30 years ago? When I started writing compilers in the 80's
| and 90's lex/flex and yacc/bison were popular. ANTLR came out but
| I never had a chance to use it. Everything after lexing and
| parsing was always hand rolled.
| mjburgess wrote:
| Parser-generators were always academic projects that had little
| relevance to making real-world programming languages -- where
| parsing is very easy to write, and necessarily benefits from
| doing it (ie., you can get better error handling/etc.).
|
| Today most languages are front-ends for LLVM IR, but LLVM is
| very slow and takes a long time to optimize. Many new languages
| target x86/arm directly with their own weakly optimized
| backends, and output an LLVM IR for "release builds".
| sestep wrote:
| Could you give some specific examples of those new languages
| with their own backends for faster builds?
| mjburgess wrote:
| I was immediately thinking of Jai, Zig, .. but I've seen a
| few
| stefanos82 wrote:
| Correct me if I'm wrong, but I think both Zig and Jai use
| LLVM as their default backend...at least, that's what I
| have seen via live streaming for Jai, and from Zig's
| repo.
| vulcan01 wrote:
| Zig is moving away from LLVM
| (https://github.com/ziglang/zig/issues/16270) and Rust
| has added Cranelift as a debug backend
| (https://lwn.net/Articles/964735/).
|
| Not sure about Jai.
| stefanos82 wrote:
| I don't think it's moving away, because if you have read
| the entire thread, the gaming community reacts negatively
| to say the least and they assure everyone that LLVM does
| not going anywhere any time soon.
| thechao wrote:
| When your computer was anemic, and could barely do the tasks
| required for it, eking out a few percent -- or a 2x! -- from
| an optimizer was important.
|
| Now-a-days, the difference between "big compiler optimized"
| and "little compiler not optimized" can be quite dramatic;
| but, is probably no more than 4x -- certainly within range of
| the distinction between "systems programming language" and
| "high tuned JITted scripting language". I think _most_ people
| are perfectly fine with the performance of highly-tuned
| scripting languages. The result is that all of the overhead
| of "big compiler" is just ... immaterial; overhead. This is
| especially true for the case of extremely well-tuned code,
| where the algorithm and -- last resort -- assembly, will
| easily beat out the best optimizer by at least an order-of-
| magnitude, or more.
| auselen wrote:
| Wouldn't just a simple case of bad code generation render
| little compiler into a toy one?
|
| Repeating others, today's compilers are really just
| "optimizing compilers", there is no room for toying in
| production environments.
| fuhsnn wrote:
| >just a simple case of bad code generation render little
| compiler into a toy one
|
| If you find some time to go through gcc bugzilla you'll
| find shockingly simple snippets of code that miscompiled
| (often by optimization passes), with fixes never
| backported to older versions that production environments
| like RHEL are using.
| auselen wrote:
| Ok, but that's sw engineering issue.
|
| I still insist that a production grade compiler can't
| leave performance on table. Which is where the current
| battlefield is.
| jsnnsjxj wrote:
| I do not agree in the general case. There are very useful
| DSL compilers which do not consider performance at all,
| but just compile to a target which does the optimization
| for them (JVM, LLVM IR or even just C)
| auselen wrote:
| Yes but those are called transpilers, right?
| bigstrat2003 wrote:
| I think a production grade compiler not only can, but
| must, leave performance on the table when the cost is
| correctness (unless the performance gain is incredibly
| high and the correctness loss is minimal). Correctness is
| not _all_ important, but it is the most important thing.
| Unfortunately, compiler writers do not agree and they do
| silly things like "let's assume UB cannot ever happen
| and optimize based on that".
| auselen wrote:
| Correctness is already a must, how did you arrive to
| this?
| kragen wrote:
| if only the gcc and llvm maintainers, and c standard
| authors, agreed with you
| auselen wrote:
| I mean these compilers build all these SW stacks, even
| this very browser I'm using, where are these correctness
| issues you are talking about?
| kragen wrote:
| there are new items in this category every day, but
| https://blog.cr.yp.to/20240803-clang.html is noteworthy
| auselen wrote:
| Eh https://news.ycombinator.com/item?id=41146860
| auselen wrote:
| I realized with all the rhel systems I'm using, we are
| never using default toolchains on them. Just use those
| old systems to run stuff, even newer toolchains.
| kragen wrote:
| if you aren't running on the gpu you're leaving 80+% of
| your computer's performance on the table. no optimizing
| compiler is going to make your legacy c or lisp or rust
| code run efficiently on the gpu, or even in most cases on
| a multicore cpu. nor, as thechao points out, can it
| compete with assembly-language programmers for simd
| vectorization on the cpu
|
| in summary, optimizing compilers for c or pascal or zig
| or rust or whatever can only be used for code where
| considerations like compatibility, ease of programming,
| security, and predictability are more important than
| performance
|
| probably the vast majority of production code is already
| in python and javascript, which don't even have
| reasonable compilers at all
| auselen wrote:
| How come you made this about computer's performance :)
| kragen wrote:
| because computer performance is the topic of
| https://news.ycombinator.com/item?id=41255695,
| https://news.ycombinator.com/item?id=41255365, and
| https://news.ycombinator.com/item?id=41255195, the three
| levels of parent comments in the thread, one of which was
| by you
| auselen wrote:
| No.
|
| Main post is about a C compiler, post I responded is
| saying
|
| > When your computer was anemic, and could barely do the
| tasks required for it
|
| which I could only interpret as a machine that can run a
| single process at a time, so not really about gpus or
| what.
| Croftengea wrote:
| > Parser-generators were always academic projects
|
| Were they? GCC abandoned bison in favour of their own parser
| relatively recently.
| userbinator wrote:
| "relatively recently" as in around 20 years ago, in GCC 3.x
| according to sources I found.
| kragen wrote:
| gcc was first released in 01987, but it didn't replace
| its bison parser for c until gcc 4.1
| https://gcc.gnu.org/gcc-4.1/changes.html which was in
| 02006 https://gcc.gnu.org/releases.html, only 18 years
| ago, and 19 years after it was released. joseph myers
| first proposed doing this in 02004
| https://gcc.gnu.org/legacy-ml/gcc-
| patches/2004-10/msg01969.h...
|
| so gcc has literally been using a parser-generator-
| generated parser for c for more than half its existence,
| at which point it had already become the most popular c
| compiler across the unix world and basically the only one
| used for linux, which had already mostly annihilated the
| proprietary unix market. it was also imposingly dominant
| in the embedded space
|
| and i think that kind of development path is fairly
| typical; getting a parser up and running is easier with a
| parser generator, but it can be tricky to get it to do
| strange things when that's what you want (especially with
| lalr, less so with peg)
| kragen wrote:
| yacc/bison was used for lex, bc, pcc, gcc, original awk, the
| bsd pascal compiler, eqn, m4 (!), and many other languages.
| it's still used for pcc, oawk, mawk, pari/gp, and units.
| that's just what i have sitting around in my downloads
| directory
|
| and, while we're talking about ocaml, ocaml does use ocamllex
| and ocamlyacc for its own parser
|
| so, while you can certainly do without parser generators,
| they have very commonly been used for making real-world
| programming languages. almost every programming language
| anyone here has ever heard of was first implemented with a
| parser generator. the main exceptions are probably fortran,
| cobol, algol, lisps, c, and pascal
| mjburgess wrote:
| I guess I wasn't very clear. I didnt mean to say, as a
| historical matter, were irrelevant.
|
| I meant to say the idea of a parser generator is a solution
| to a problem that that real world langs don't really have.
| When writing a programming language, your issue isnt how
| much time the parser is going to take to write, or how
| complex it's going to be. The parser is a relatively
| trivial part of the problem.
|
| Due to language designers often being taught to develop
| langauges in this fashion, many have relied on these tools.
| But the modern view of compliers as "programming langauge
| UIs" and the focus on DX, i'd argue its actively
| pathological to use a parser generator.
|
| Much academic work has, til recently, focused on these
| areas -- whereas today, the bulk of the difficulty is in
| understanding SSA/LLVM/ARM/Basic Optimizatiosn/etc. details
| which are "boring, circumstantial" etc. and not really
| research projects. I was just pointing this out since a lot
| of people, myself included, go down the EBNF parser-
| generator rabbit hole and think inventing a langauge is
| some formal exercise -- when the reality is the opposite:
| it's standard programming-engineering work.
| kragen wrote:
| oh, i agree that parsing is not the hard part of writing
| a compiler, and that compilers classes overemphasize it
|
| but no language starts out as a 'real world lang'; every
| language is initially a toy language, and only becomes a
| 'real world lang' in the unlikely case that it turns out
| to be useful. and parser generators are _very_ useful for
| creating toy languages. that 's why virtually every real
| world lang you've ever used was implemented first using a
| parser generator, even if the parser you're using for it
| now is handwritten
|
| having a formally defined grammar is also very helpful
| for dx features like syntax highlighting and automated
| refactoring
| layer8 wrote:
| The one thing parser generators do is that they ensure
| that the language you implement actually matches the
| grammar you wrote. That's still an important assurance to
| have.
| kragen wrote:
| they also tell you where the grammar is ambiguous, which
| can be useful
| mjburgess wrote:
| I've created many programming languages. All the ones I
| finished and were useful did not have grammars that "I
| wrote".
| kragen wrote:
| do you mean https://github.com/mjburgess/Lyssa and
| https://github.com/mjburgess/Quazar? it's true that i
| can't find a grammar in either of them (https://github.co
| m/mjburgess/Lyssa/blob/master/src/impl.py#L... seems more
| forthy than anything else) but (while they are very much
| the sort of things that i like, thank you for sharing)
| they also seem somewhat less like 'real-world programming
| languages' than things like awk, ocaml, or our other
| example in this thread, gcc c and objective-c until gcc
| replaced the bison parser with a handwritten one in
| 02006. that last compiler was the compiler nextstep was
| built on, which got steve jobs back in control of apple,
| to replace macos with nextstep. seems pretty real-world
| to me
|
| maybe you're talking about stuff you haven't released?
| layer8 wrote:
| Are you saying your programming languages don't have a
| defined grammar?
| kragen wrote:
| i was puzzled about that too and got even more puzzled
| when i looked at the languages i could find that he's
| implemented
| pklausler wrote:
| Fortran compilers did go through a phase when table-driven
| parsers were used, but it had the disadvantage of needing
| complicated lexers and statement classifiers that rely on
| semantic information. Fortran's a hard language to parse,
| given its lack of reserved words, optional spaces, and many
| ambiguities.
|
| The f18 compiler's parser uses parser combinations to
| construct a backtracking recursive descent parser that
| builds a parse tree for the whole source file before doing
| any semantic analysis. This approach allows good error
| recovery without having to revert any updates to the symbol
| table.
| kragen wrote:
| that's an interesting approach! though probably not
| applicable to c and c++
|
| i assume that by 'parser combinations' you mean parser
| combinators
|
| what i meant about fortran is that the _first_ fortran
| compiler didn 't use a parser generator
| pfdietz wrote:
| Compiling today might be done at run time, exploiting dynamic
| information. The line between a compiler and an interpreter is
| blurred.
| shortrounddev2 wrote:
| ANTLR generate a visitor system from a BNF grammar that you
| need to implement the logic of in your chosen language, which I
| believe is similar to C++'s std::visit. lex and yacc generate
| state machines using BNF grammar and you implement the logic in
| the tools themselves
| trealira wrote:
| The book doesn't have 2024 in the title. I suspect they put it
| there because last time a post about this book was made, I
| noted that it was from 2022, not realizing that the book has
| now been released in 2024.
|
| https://news.ycombinator.com/item?id=40940799
|
| > So what's different about writing a compiler in 2024 than say
| 10, 20, or 30 years ago?
|
| As far as I can tell, the main difference is that static single
| assignment (SSA) as an intermediate form was not the norm 30
| years ago, but it is nowadays. Also, in newer books, it's more
| common to go over global register allocation now, whether
| that's graph coloring or linear scan register allocation. If
| you read old compiler books, the main optimizations they talk
| about are use-def chains, moving computations out of loops, and
| using the local and tree-based Sethi-Ullman register allocation
| algorithm.
| jcranmer wrote:
| There are actually quite a few changes!
|
| The most obvious change you'll see is the use of SSA, which has
| become the dominant representation in IR starting 25-30 years
| ago.
|
| There's also been an increase in the importance of compiler
| IRs, and especially the concept of code passing through
| _multiple_ IRs before reaching machine code.
|
| Formal semantics has become more of a thing in the past decade
| or so. It's now routine that even weak memory models have a
| detailed formal model of how they work. In LLVM, it's now a
| requirement that you demonstrate formal correctness of new
| InstCombine transformations (which are essentially peephole
| optimizations).
|
| The use of parser generators has really fallen into disrepute;
| everything has transitioned to handrolled parsers these days.
| Language standards themselves are starting to rely on context-
| sensitive keywords, which are hard to implement in a generator-
| based lexer/parser setup.
|
| Optimizations have generally broadened in scope; we're now
| seeing whole-function level of optimization being the default
| way to look at stuff for analysis, and there's been a variety
| of techniques introduced to make whole-program optimization
| (aka LTO) much more tractable.
|
| Another subtle but major shift is that compilers are
| increasingly reliant on inferred analysis from dumber
| representations over the original declared intent of the code.
| For example, SROA in LLVM (which breaks up structs so that
| individual fields can be independently allocated in registers)
| relies not on looking at the way the struct is declared but the
| offsets within the struct that various load and store
| operations use.
|
| A final major shift is the trend towards support for ancillary
| tooling in the programming language space, so that the compiler
| isn't merely a tool that goes from source to object code, but
| is something that can be actively queried for information about
| the source code. Things like the language server, or smart
| formatting, or automatic refactoring tooling.
| elteto wrote:
| This is a fantastic comment, thanks.
| late_again wrote:
| You can use the "classic" tool set and parse many programming
| languages. The real trick in writing compilers is taking
| advantage of the hardware (disclaimer: I designed and wrote
| middle-pass portions for IBM 360/370 processors (and clones),
| supercomputers from Cray and ETA Systems, and Sun 4
| workstations among others, including some RISC systems that
| just disappeared around the bursting of the dot-com bubble) I
| tried and failed to optimize MPP systems from all of the "name"
| players in the 1990's. It kind-of broke my heart...
|
| That "middle-pass" approach that will let you address many
| targets is still valid; the trick is finding a sufficiently
| robust and flexible internal representation at the right level.
| You also have to be able to out-guess the chip vendors where
| before you could go to the architect or a complete "System"
| book and get the real scoop, including things you shouldn't do.
| Oddly enough, there is simultaneously useful and completely
| worthless documentation scattered about the internet.
|
| You might want to take a look at Muchnick and Jones'
| _Program_Flow_Analysis_ (yes, it's from 1981) but chapters 4-6
| can be applied at code-generation time. How that fits modern
| Intel processors (for example) is unknown. Idealizing your
| processor as a RISC-V might be a reasonable way to proceed but
| in the end, you'll have to translate the code for the target --
| it will be reasonably straight-forward if you drive it all from
| tables but it's not trivial.
| badsectoracula wrote:
| Weird that this is about building a _C_ compiler[0] in _OCaml_. I
| expected the implementation language to also be C both for
| consistency but also because i 'm willing to bet that there are
| more people who can read C than OCaml.
|
| [0] actually from the readme in the github repo[1] it seems to be
| a C subset, not all of C
|
| [1] https://github.com/nlsandler/nqcc2
| Croftengea wrote:
| OCaml? Thanks for saving me a click!
| hdbxbxndj wrote:
| OCaml is one of the most used languages for compiler design
|
| A good engineer should be able to use the right tool for the
| job
| materielle wrote:
| For hobbyist compiler implementations, right? Compilers for
| the most popular languages are either written in C/C++, or
| self-hosted.
|
| You can write compilers in almost any language. I fail to
| see how C, C++, or even Java or Python aren't the right
| tool for the job here. I like pattern matching too, but
| given that hundreds of successful production compilers have
| been written without pattern matching, it's surely just a
| personal preference.
| porcoda wrote:
| I've worked on multiple compilers in industry that are
| written in Ocaml. A number of industrial static analyzers
| are written in Ocaml too (eg, Infer from Facebook/Meta).
| Yes, LLVM and GCC are the big ones written in the C/C++
| family but they don't represent everything.
| jsnnsjxj wrote:
| I mean, we _are_ talking about a book which invites you
| to build your own toy C compiler ^^
|
| Nevertheless, OCaml is very strong in compiler design.
| For example Rust and Hack were written in OCaml
| initially.
|
| Nevertheless you are not wrong that compilers needing the
| very last bit of performance like the JVM and LLVM tend
| to be written in C++
|
| But the barrier is quite a lot more tending to high
| performance/very high performance and not toy/production
|
| Java and Python are suitable for implementing a toy
| Compiler and the auther invites you to use any language
| you like. Just the reference implementation is using
| OCaml
|
| I would however argue that using C++ is quite advanced
| since it does not have pattern matching and using C is
| just masochm. You will be fighting against the language
| to do even trivial things instead of fighting the actual
| problem at hand
| kragen wrote:
| ocaml makes writing a compiler enormously more accessible, and
| learning to read ocaml, while it can be somewhat intimidating
| at first, is much easier than learning to write a compiler
|
| (imagine a medieval accountant trying to learn to do long
| division in roman numerals. he'll be much better off learning
| the western arabic numerals fibonacci is so excited about)
| shortrounddev2 wrote:
| I really really really want to get more into Ocaml but as far
| as I know there is no good support for a debugger to use in
| an IDE like VSCode or even vim. Everyone I talk to says they
| just do print debugging. It's easier to 'reason about your
| code' in FP, but I do NOT want to go back to a time where I
| coded without breakpoints. I use F# because of its first
| party tooling support
| kragen wrote:
| it does have a debugger with breakpoints, which even
| supports time-travel debugging (except on windows
| obviously), but i've never used it. it even has first-party
| ide integration:
| https://ocaml.org/manual/5.2/debugger.html#s:inf-debugger
|
| i use debuggers a lot when i'm programming in assembly, and
| from time to time when i'm programming in c or c++, but in
| ocaml i've never needed one. it's not that i've never had
| bugs in ocaml, but they tend to be of a different flavor, a
| flavor for which breakpoints are of little value
|
| it sounds like your f# experience is different; what kinds
| of bugs have you recently found the debugger valuable for
| in f#?
|
| debug logging is usually a superior option to breakpoint
| debugging for problems to which both are applicable,
| because debug logging shows you the whole history of your
| program's execution rather than just a single point in
| time. breakpoint debugging requires a lot of manual labor,
| painstakingly operating the machine to navigate the
| execution to the state that has the problem. it's like
| grinding in a video game. i'd rather program the computer
| to do that labor for me, at which point i no longer need
| the debugger
|
| (except in c and assembly)
| shortrounddev2 wrote:
| > it does have a debugger with breakpoints, which even
| supports time-travel debugging (except on windows
| obviously), but i've never used it. it even has first-
| party ide integration:
| https://ocaml.org/manual/5.2/debugger.html#s:inf-debugger
|
| 1. I am developing on windows so that's an issue for me
| and
|
| 2. I don't use emacs, I use VScode and I've not been able
| to get the experimental debugger working for the VScode
| plugin.
|
| It's not that the debugger is purely for fixing bugs; I
| use it as an active part of development. I want to be
| able to freeze the program and inspect values as they are
| running. I may know what the types or state of the
| program are just by viewing the code, but I want to be
| able to inspect the actual data for myself as the program
| is running.
|
| > debug logging shows you the whole history of your
| program's execution rather than just a single point in
| time
|
| breakpoints also provide stack traces, so they provide a
| kind of history as well. I'd rather inspect and interact
| with a program than dig through thousands of lines of
| logs
|
| > breakpoint debugging requires a lot of manual labor,
| painstakingly operating the machine to navigate the
| execution to the state that has the problem
|
| I see things the opposite as you: print debugging is
| tedious and requires restarting the program from the
| beginning whenever you make a change to the source,
| unless you are constructing your program piecemeal with
| the repl, which I consider to be extremely tedious as
| well. To me, a debugger with breakpoints is a far more
| efficient way to code than print debugging.
|
| I think there is a cultural difference in software
| engineering between people who use debuggers and people
| who don't. John Carmack once pointed out that people who
| come from the game dev and Windows/PC world use debuggers
| while people from the linux and web dev world tend not
| to. It seems to be a matter of preference/taste, and I
| think FP programmers seem to have a distaste for
| debuggers and graphical debugging/development
| environments
| kragen wrote:
| > _John Carmack once pointed out that people who come
| from the game dev and Windows /PC world use debuggers
| while people from the linux and web dev world tend not
| to. It seems to be a matter of preference/taste, and I
| think FP programmers seem to have a distaste for
| debuggers and graphical debugging/development
| environments_
|
| i think that's true! but i don't think it's purely a
| matter of preference; it's also a matter of what the
| ecosystems support well and what you're trying to
| achieve. lisp is a huge exception to the rule about fp
| programmers; lisp systems, even including emacs, have
| extremely strong debugging support. but non-lisp fp
| languages tend to heavily emphasize static typing,
| thinking things through ahead of time, and correctness by
| construction, which reduce the need for debugging. but
| those are more valuable for writing compilers than for
| writing games or uis in general, where it's hard to
| define what counts as 'correct' ahead of time but easy to
| recognize it when you test it interactively
|
| webdev of course has the problem that you can't stop your
| http response handler while the browser is waiting for a
| response. and, often, it's sadly not very concerned with
| ux. and it's often concerned with operating systems in a
| way where you have to debug problems _after_ they occur,
| in part because of the scale of the problems
|
| automated testing is another ecosystem thing that reduces
| the need for debuggers; to a significant extent it
| competes with static typing
|
| one of the things i really appreciate about godot is
| being able to continuously adjust parameters and observe
| variables in my games as they're running. godot is
| motherfucking awesome, man. i definitely don't have a
| distaste for graphical debugging and development
| environments!
|
| > _breakpoints also provide stack traces, so they provide
| a kind of history as well. I 'd rather inspect and
| interact with a program than dig through thousands of
| lines of logs (...) print debugging is tedious and
| requires restarting the program from the beginning
| whenever you make a change to the source_
|
| oh, see, i have programs like grep and emacs that dig
| through thousands of lines of logs for me. often when i'm
| debugging from logs i don't run the program at all; i
| just look at the logs and the source code. sometimes the
| program is running someplace i _can 't_ interact with it
| --memorably, on some occasions, on a satellite out of
| range of the groundstation. and usually exceptions on
| python or java give me a pretty decent stack trace,
| though other log messages unfortunately don't
|
| there's another ecosystem support issue here--although
| i've sometimes developed on systems that supported fix-
| and-continue (cmucl, gforth, squeak, basic-80, godot),
| python and gcc support it very poorly or not at all. so
| for me i have to restart the program from the beginning
| whenever i make a change to the source in any case,
| whether i'm doing printf debugging or not. godot, again,
| is a very nice exception to this rule, and incidentally
| lets me add print debugging to the game while it's
| running
|
| one of the great things about a breakpoint debugger, from
| my point of view, is that it makes it possible to add
| logging after the fact to a running program without
| editing the source or restarting it
|
| i really appreciate you sharing your experience!
| userbinator wrote:
| The bonus of writing a C compiler in C is that you get to being
| able to experiment with self-compilation.
| anta40 wrote:
| Written in OCaml? Ahh interesting. Suddenly I remember similar
| book:
|
| Modern Compiler Implementation in ML:
| https://www.cs.princeton.edu/~appel/modern/ml/
|
| As an undergrad student, I think the C version is kinda easier
| to understand, though.
| norir wrote:
| There are many disadvantages to writing a compiler in c (and I
| have done it). For me, the biggest is simply that it is very
| verbose for the type of things that you do commonly in a
| compiler. I have written a self-hosting compiler that
| transpiles to c and the generated code is about 5x as long as
| the original code. I could go through and clean it up and
| probably get it down to 2-3x, but there are certain concepts
| that cannot easily be expressed compactly in c (except possibly
| with preprocessor abuse that I don't have patience for). A
| simple example is that if you want to represent a union type
| you have to manually define 2 structs and an enum. Matching
| with switch is also verbose as you have to both match and
| manually assign the value in the match case. Again, you can use
| macros to do this though at that point you arguably aren't even
| using c but rather a hybrid language. A language like ocaml,
| makes it much more straightforward to define and match on
| unions as well as gives you other nice high level things like
| gc so you can focus on the compiler itself and not low level
| details.
| sergius wrote:
| How does it compare with N.Wirth's?
|
| https://onlinebooks.library.upenn.edu/webbin/book/lookupid?k...
| hdbxbxndj wrote:
| The book is a very hands on tutorial whereas Wirths is basic
| literature for the general case.
|
| While they teach similar content, they have a different
| approach.
|
| There are literally thousands of compiler design books out
| there, I don't really see anything particularly comparable
| between this book and Wirth's
| anta40 wrote:
| Similar to studying OS concepts using Silberschatz' Operating
| System Concept and Tanenbaum's Operating Systems Design and
| Implementation. The former only explains the theoritical ideas,
| while the latter is the documentation of an implementation.
| cxr wrote:
| Wirth's book does not implement a "real" programming language.
| Whatever your thoughts on Oberon and Pascal-like SHOUTCASE
| languages, it's largely irrelevant. Oberon is arguably a "real"
| language (and operating system), but Wirth's book does not
| cover the implementation of Oberon. It covers the
| implementation of Oberon0, an inarguably toy subset of Oberon.
| (Actually, "subset" is not even correct.) The example code has
| also diverged from the book, with Wirth abandoning the strategy
| described in the book for avoiding redundant initialization of
| the module static base, among other things.
|
| Aside from that, I encourage everyone who cites Compiler
| Construction to actually work through the first 10% of the book
| and then count the number of errata.
| alok-g wrote:
| I would love to see a book that talks about going all the way to
| generate machine code, i.e., not stopping at generation of
| assembly.
|
| Alternatively, I would like to learn about not just how to make a
| compiler, but also simultaneously a debugger, hot-reloading, etc.
| hdbxbxndj wrote:
| Writing an simple assembler is trivial. Even macro assemblers
| are very easy.
|
| However, it's also boring.
|
| Nevertheless the contents of the book cover all the techniques
| required to write an assembler, if you'd really like to
| alok-g wrote:
| I understand that assembly file can be parsed in the same
| way. However, I want to learn about the machine instructions
| to the level of bits, and likewise the layouts of binary
| files. Unless I am able to go all the way to machine code
| loaded in memory, I would not know where in memory to add a
| breakpoint instruction when a developer wants the same on a
| line of code.
|
| If there is some library that can help create machine code
| from assembly instructions on a line by line basis (at least
| as opposed to invoking a separate program that generates the
| entire binary collectively from the assembly code), that
| could also work.
|
| In my case, I already know enough of the lexer, parser, etc.,
| parts. What's missing is going all the way to making a
| debugger, profiler, etc.
| fuhsnn wrote:
| >If there is some library that can help create machine code
| from assembly instructions on a line by line basis
|
| That's what JIT libraries do, for example asmjit: https://g
| ithub.com/asmjit/asmjit/blob/master/test/asmjit_tes...
| alok-g wrote:
| Looks cool! Thanks.
| dymk wrote:
| asmjit is great; I used it for building a primitive (but
| quite fast) query engine for in-memory graphs. It made it
| simple to go from query AST to native instructions, most
| of which was "call this other compiled function".
| rramadass wrote:
| Checkout _GNU Binutils_ and their usage of _libbfd_ and
| _libopcodes_.
| jsnnsjxj wrote:
| Building a debugger and profiler is quite an advanced task
| compared to building an assembler though ^^
|
| Also much of that work is heavily dependent on the used
| operating system.
|
| Nevertheless, I'm wishing you all the best on your journey!
| synack wrote:
| The debugger book is coming soon.
| https://nostarch.com/building-a-debugger
| alok-g wrote:
| Awesome! Thanks.
| signaru wrote:
| Have read the first few chapters and it expects that you either
| read the accompanying source code or implement your own and pass
| the tests. The pseudo code presented in the book often look like
| function calls with the inner details not there in the book.
| Furthermore, as already pointed out in another comment, the
| available implementation is in OCaml, which is probably not
| something many C programmers have experience with.
|
| Nevertheless, I think I'm learning more from this book than most
| other books I've tried before that are far more theoretical or
| abstract. I'm still eager to reach the chapter on implementing C
| types. I think it's a good book, but it requires more effort than
| something like Crafting Interpreters or Writing a
| Compiler/Interpreter in Go, while also covering topics not in
| those books.
| wrycoder wrote:
| Nand2Tetris is also like that - they provide an outline and
| tests, but you have to do the work. And, having the
| implementation language be different from the target language
| reduces confusion.
|
| Plus, you get to become proficient in OCaml, which is a pretty
| good language.
| kragen wrote:
| that's a good point--it was pretty confusing when i wrote ur-
| scheme in scheme, or stoneknifeforth in stoneknifeforth,
| because i kept getting confused about which level of the
| language i was changing things in
| myko wrote:
| I thought this book looked neat but closed the tab before
| reading the comments here, and after this one decided to go
| ahead and buy it. Sounds really fun!
| synack wrote:
| I've been working through this book implementing the compiler in
| Ada. So far, I'm really enjoying it. The book doesn't make too
| many assumptions about implementation details, leaving you free
| to experiment and fill in the blanks yourself.
|
| It feels like a more advanced version of Crafting Interpreters.
|
| I haven't looked at the OCaml implementation at all. The text and
| unit tests are all you need.
|
| Discussion on the Ada Forum: https://forum.ada-lang.io/t/writing-
| a-c-compiler/1024
| fuhsnn wrote:
| chibicc[0] complement this book nicely, in addition to a basic
| compiler, it guides you through writing the preprocessor and
| driver, which, although not addressed much in literature, are the
| missing link between the compiler built from the book and real C
| projects.
|
| [0] https://github.com/rui314/chibicc
| sunday_serif wrote:
| I'm working through this book now and really enjoying it!
|
| Each chapter of the book includes a test suite to run against the
| code you've written.
|
| In some ways, the tests in this book feel very similar to the
| labs in the book Computer Systems: A programmers perspective --
| which is high praise!
| sylware wrote:
| I wonder why there is not the same book for c++... mmmmh... I
| really wonder... (irony).
| sylware wrote:
| It is because c++ has an absurdely and grotesquely massive and
| complex syntax (like rust...).
| stevefolta wrote:
| Yeah, Rust is the language for people who think C++ is not
| complex (or hostile) _enough_.
| the_panopticon wrote:
| In Ocaml, interesting. I was similarly surprised when I learned
| that the firs Rust compiler was written in Ocaml, too
| https://users.rust-lang.org/t/understanding-how-the-rust-com...
| mananaysiempre wrote:
| Tree processing is best done in a language with decent
| algebraic datatypes and pattern matching. I would've preferred
| Standard ML, but, well, pot-ay-to, pot-ah-to. Haskell is
| another choice but the techniques you need to use there (while
| undeniably gaining you some possibilities) don't really
| generalize to other languages, so you're now writing a book
| about compiler construction in Haskell rather than just
| compiler construction. Ditto for Rust. Kotlin has deliberately
| anemic pattern matching. C# or F# leave you depending on
| Microsoft's benevolence (sic). Metalua and Sweet.js both have
| decent ADT support but both are pretty much dead. Racket
| exists, I guess, and there are some pattern-matching libraries
| for normal Scheme as well, but the charisma malus of the
| parenthesis is real even if I don't understand what causes it.
|
| So OCaml was probably the most mainstream choice among the
| languages with appropriate tools, as funny as that sounds. And
| honestly, once you get over the syntax, it doesn't actually
| have anything outrageous.
| bunderbunder wrote:
| ML (short for "meta-language") was originally designed for use
| in programming language research, and really shines for that
| purpose. And OCaml is probably the most pragmatic dialect for
| the purpose.
|
| SML is _very_ dated and the standard library and ecosystem lack
| many things that are considered table stakes for a viable
| programming language nowadays. And F# and Scala are fine as
| enterprise languages, but being tied to .NET and Java
| respectively makes them less desirable for implementing a
| language that won 't itself be coupled to one of those
| runtimes.
| viraj_shah wrote:
| Dropping this one here! (no affiliation)
|
| https://www.linuxfromscratch.org/
|
| "Linux From Scratch (LFS) is a project that provides you with
| step-by-step instructions for building your own custom Linux
| system, entirely from source code."
| pull_my_finger wrote:
| Why though? It doesn't seem to be related at all to the OP
| other than both are tutorial books?
| jsnnsjxj wrote:
| This has nothing to do with the post?
| sim7c00 wrote:
| cool, remember some tutorials online i think from the same author
| (not 100% sure) doing stuff around c compilation in python. shame
| its not in a language i want to learn. the other book on
| compilers i got is almost to heavy to lift! :D
| jerjerjer wrote:
| I uh misread the title and thought someone built a C compiler in
| Scratch.
|
| On topic, though: wouldn't a simpler language (maybe even a
| pseudo language) be a better target for a first learning
| compiler. I understand they don't build a full C compiler, but
| still. It looks to me like there's a lot of complexity add from
| choosing such a lofty target.
| tuveson wrote:
| What do you think would make a better target? C maps pretty
| closely to assembly, so it seems like it would be the simplest.
| Maybe Pascal or BASIC, but most people these days don't have
| experience with Pascal, and BASIC would probably be too simple
| for a full-length book.
|
| For writing an interpreter or transpiler, there are probably
| better options, but for a true compiler I can't think of a
| better choice than C (or at least a subset of C).
| francogt wrote:
| I see many comments saying that the book implements the C
| compiler in ocaml. In the introduction the author states that the
| book actually uses pseudo code so you are actually free to
| implement it in any language. The only recommendation is that you
| use a language with pattern matching because the pseudo code
| makes heavy use of it. The reference implementation is in ocaml.
| ashconnor wrote:
| Useful list considering that feature:
| https://en.wikipedia.org/wiki/Category:Pattern_matching_prog...
| markus_zhang wrote:
| Thanks, can you please lemme know which part uses pattern
| matching? I'd assume mostly in the lexer, but the parser should
| just be something that consume the tokens and spit out AST.
| Unless of course it combines the two.
| CrimsonCape wrote:
| Question for HN, pattern matching is defined as "runtime
| type/value checking", is that correct?
|
| Is duck typing the pseudo-unsafe alternative? (Not unsafe as in
| accessing unsafe memory, but as in throwing exceptions if the
| duck-typed function doesn't exist on the current type)
|
| Can C handle both?
|
| Coming from a static type system like rust and c#, i'm doing
| alot of "if this is a duck, duck.quack()" and i'm looking for
| faster alternatives and less verbosity if possible
| trealira wrote:
| One thing is that pattern matching can make writing tree
| manipulation code succinct and easier to read. For example,
| take this article[0] that describes the difference list
| algorithm. Basically, it's kind of like a rope, but for
| lists. It's a tree with lists at the leaves, and when you
| want to convert it into a list, you rewrite the tree to be
| right-leaning, and then concatenate all the lists at once.
| This turns repeated concatenation at the end of lists from
| taking quadratic time into one that takes linear time (strcpy
| can be an example of this in C [1]). The code can be written
| like this: data Tree a = Leaf a | Branch
| (Tree a) (Tree a) fromList :: [a] -> Tree [a]
| fromList = Leaf toList :: Tree [a] -> [a]
| toList (Leaf x) = x toList (Branch (Leaf x) r) = x ++
| toList r toList (Branch (Branch l1 l2) r)
| = toList (Branch l1 (Branch l2 r)) append :: Tree
| [a] -> Tree [a] -> Tree [a] append = Branch
|
| In a language that doesn't have tree pattern matching, the
| code wouldn't be this short and easy to understand, and I
| don't think it could be replicated just by having duck
| typing. Rust has pattern matching, but because it's primarily
| focused on lower-level concerns like pointers and memory
| ownership, pattern matching isn't this nice, because you have
| to pattern match on the pointer first.
|
| Since a compiler is all about tree manipulation, support for
| tree pattern matching should be a boon.
|
| [0]: http://h2.jaguarpaw.co.uk/posts/demystifying-dlist/
|
| [1]: https://en.wikipedia.org/wiki/Joel_Spolsky#Schlemiel_the
| _Pai...
| WalterBright wrote:
| I learned how to write a compiler by studying BYTE magazine in
| the 70's which published the source to a complete Pascal compiler
| as an article!
|
| https://archive.org/details/byte-magazine-1978-09 (part 1)
|
| All 3 parts of Tiny Pascal:
|
| https://albillo.hpcalc.org/publications/Easter%20Egg%20-%20T...
___________________________________________________________________
(page generated 2024-08-15 23:00 UTC)