[HN Gopher] Writing a C Compiler (2017)
       ___________________________________________________________________
        
       Writing a C Compiler (2017)
        
       Author : lrsjng
       Score  : 100 points
       Date   : 2023-06-08 20:08 UTC (1 days ago)
        
 (HTM) web link (norasandler.com)
 (TXT) w3m dump (norasandler.com)
        
       | dananjaya86 wrote:
       | Book version to be released in October '23 :
       | https://nostarch.com/writing-c-compiler
        
         | [deleted]
        
       | userbinator wrote:
       | IMHO writing a compiler for a high-level language, in an even
       | higher level language, somehow feels a bit "anachronistic" (for
       | lack of better word).
        
         | retrac wrote:
         | Most of the current major C implementations are written in C++.
        
           | userbinator wrote:
           | That's unfortunate.
        
           | WalterBright wrote:
           | ImportC is written in D.
        
         | golergka wrote:
         | Weren't a lot of functional languages, like ML and it's
         | descendants, created specifically to write parsers and
         | compilers?
        
           | peterfirefly wrote:
           | No. ML was the meta language for a theorem prover (LCF).
           | 
           | https://en.wikipedia.org/wiki/Logic_for_Computable_Functions
        
           | JonChesterfield wrote:
           | I've seen it claimed that ML was originally written, in lisp,
           | in order to have a better language to write compilers in.
        
             | lispm wrote:
             | ML was written as the language used by the theorem prover
             | LCF. It was written in Lisp.
        
           | [deleted]
        
         | avgcorrection wrote:
         | For what reason?
        
           | userbinator wrote:
           | It's backwards. Writing a C compiler in C or Asm makes sense,
           | a Python compiler in C also does, but a C compiler in Python
           | is an odd inversion of abstraction.
        
             | avgcorrection wrote:
             | I guess this harkens back to the days when you _had to_
             | write a compiler in a low-level language because that's all
             | that the platform that you are targeting supports. Then it
             | sounds weird to talk about writing a compiler in a high-
             | level language in order to target a low-level one, because
             | surely these high-level languages are more platform-
             | dependent than the blessed (guaranteed on the platform)
             | low-level one.
             | 
             | But these days we can access dozens of languages on many
             | platforms. And we can use high-level languages that are
             | _good_ for writing compilers--languages with good string
             | types and algebraic data types--instead of being limited to
             | awfully imperative /procedural ones.
             | 
             | In other words: your perspective sounds way more
             | anachronistic.
        
             | Jtsummers wrote:
             | Why? The objective is to translate code in one language (C)
             | to another (machine code or assembly or perhaps an
             | intermediate representation). Why does it make sense to use
             | C for that task and not Python or some other language? It's
             | not like C provides facilities that specifically enable
             | compiler writing or text parsing for itself that other
             | languages are lacking.
        
       | bigdict wrote:
       | Has anyone worked through this? Is it a good (soon to be) book?
        
       | e19293001 wrote:
       | I owe my entire career to this remarkable individual who, despite
       | never having met or being affiliated with, has profoundly
       | influenced me through his insightful books. His vast knowledge
       | and expertise have been instrumental in teaching me numerous
       | technical concepts and skills throughout his publications.
       | 
       | https://web.archive.org/web/20220519044634/http://cs.newpalt...
       | 
       | Assembly Language and Computer Architecture Using C++ and Java ,
       | Course Technology, 2004
       | 
       | This book has been an invaluable resource in enhancing my
       | understanding of various technical aspects. It has provided me
       | with in-depth insights into the inner workings of a CPU, enabling
       | me to grasp the intricate mechanisms behind its operation.
       | Additionally, the book has equipped me with the knowledge and
       | skills necessary to write an assembler based on a given
       | instruction set.
       | 
       | Moreover, I have delved into the intricacies of assembly
       | language, thanks to the comprehensive explanations and examples
       | provided in the book especially the exercises. It has allowed me
       | to truly comprehend the nuances of this low-level programming
       | language and its interactions with hardware.
       | 
       | Furthermore, the book has shed light on the fascinating process
       | of how compilers generate assembly code, particularly in the
       | context of object-oriented programming languages. By exploring
       | this topic, I have gained a deeper understanding of the intricate
       | steps involved in transforming high-level code into assembly
       | instructions, thereby bridging the gap between software
       | development and hardware execution.
       | 
       | I also acquired another remarkable book authored by the same
       | author:
       | 
       | "Compiler Construction Using Java, JavaCC, and Yacc," published
       | by IEEE/Wiley in 2012. This exceptional book has served as an
       | invaluable guide in my journey of creating compilers,
       | encompassing both theoretical foundations and practical
       | implementation.
       | 
       | One of the most remarkable aspects of this book is its
       | comprehensive coverage of parsing techniques. It equipped me with
       | the knowledge and skills to effectively parse regular
       | expressions, enabling me to implement powerful features akin to
       | those found in the widely used tool, grep. This aspect of the
       | book has been particularly enlightening, and I consider it a
       | significant contribution for anyone seeking to delve into the
       | realm of compilers.
       | 
       | Overall, I am deeply indebted to this book, and I wholeheartedly
       | recommend it to anyone eager to explore the fascinating world of
       | compiler construction. It has truly bridged the gap between
       | theory and practical implementation, providing a solid foundation
       | and equipping aspiring compiler developers with the essential
       | tools and techniques required to embark on this captivating
       | journey.
       | 
       | I had some comments before regarding this author and his books
       | about compilers and computer architecture all over HN as well.
        
         | fuzztester wrote:
         | Nice comment, GPT user.
         | 
         | Now, GPT:
         | 
         | Replace all occurrences of the substring "me" with "you" in the
         | above comment text.
        
           | jazzyjackson wrote:
           | Rude accusation. GPT talks like the average internet
           | commenter, it shouldn't be surprising to find a genuine
           | comment written in a voice similar to GPT.
        
             | fuzztester wrote:
             | Okay, so now life is imitating art? Never knew ;-)
             | 
             | No, you are being rude. To the average internet commenter.
             | Or maybe to GPT. ;-)
             | 
             | By bringing down either one of them to the level of the
             | other.
             | 
             | Unless you meant "average" in the same sense as this short
             | tale:
             | 
             | A statistician had his head in a fridge and his feet in an
             | oven, and when asked how he felt, he said, "on the average,
             | I feel quite comfortable".
        
             | fuzztester wrote:
             | GPT comments are genuine too. Don't hurt our soon-to-be-
             | developed feelings! Sniff ...
        
           | eesmith wrote:
           | Huh. I don't get those vibes.
           | 
           | Further investigation doesn't support your claim. The
           | citations check out, including publication year and
           | publishers.
           | 
           | And the author has indeed praised the book many times before
           | (https://news.ycombinator.com/item?id=31843833,
           | https://news.ycombinator.com/item?id=31843833,
           | https://news.ycombinator.com/item?id=31311613,
           | https://news.ycombinator.com/item?id=28481028,
           | https://news.ycombinator.com/item?id=23386732,
           | https://news.ycombinator.com/item?id=22305353, and
           | https://news.ycombinator.com/item?id=23386732,
           | https://news.ycombinator.com/item?id=21988211,
           | https://news.ycombinator.com/item?id=21513056,
           | https://news.ycombinator.com/item?id=18996703, and
           | https://news.ycombinator.com/item?id=10184364 ) with the last
           | comment from 2015.
           | 
           | Eg, compare "I am deeply indebted to this book" with "I'm
           | very debted to this man. I enjoyed a lot reading his books
           | and made me who I am today." at
           | https://news.ycombinator.com/item?id=28481028 from Sept 10,
           | 2021.
           | 
           | Or compare "I owe my entire career to this remarkable
           | individual who, despite never having met or being affiliated
           | with," with "I'm not affiliated with the author though. This
           | book helped a lot in my career as a hardware and firmware
           | engineer." at https://news.ycombinator.com/item?id=23386732
           | from June 2, 2020.
           | 
           | Or compare "enabling me to implement powerful features akin
           | to those found in the widely used tool, grep." with similar
           | comments over the last 8+ years, at https://hn.algolia.com/?d
           | ateRange=all&page=0&prefix=true&que... , like "and eventually
           | write your own 'grep' which was for me is a mind-blowing
           | experience" at https://news.ycombinator.com/item?id=13664714
           | from Feb 16, 2017.
           | 
           | And https://hn.algolia.com/?dateRange=all&page=0&prefix=true&
           | que... shows the OP citing http://cs.newpaltz.edu/~dosreist/
           | while this comment uses the archive.org version because the
           | old URL doesn't work.
        
             | pharrington wrote:
             | For what it's worth, ZeroGPT thinks the comment's a 25%/75%
             | human/AI mix.
        
               | eesmith wrote:
               | For what it's worth, ZeroGPT thinks the first paragraph
               | of your comment at
               | https://news.ycombinator.com/item?id=35559453 was "Most
               | Likely GPT generated" (25% written by a human, 100%
               | generated by an AI/GPT).
               | 
               | The entire comment was 75% human, 46% AI/GPT.
               | 
               | I picked that comment because it had the longest text.
        
               | pharrington wrote:
               | Yeah that's fair.
               | 
               | Edit: lmao playing around a bit, simply changing "it is"
               | to "its" (no apostrophe) in the first sentence, and
               | editing the second sentence to read "the problem's that
               | people have to be" makes ZeroGPT no longer think my post
               | was AI generated at all.
        
             | fuzztester wrote:
             | Your long, detailed, somewhat scholarly, well researched
             | comment, leads us to think (after consulting several
             | prestigious, highly intelligent, real and artificial
             | professors), that you maybe a suitable candidate for the
             | first PhD program at the new international Global PHD
             | Trainers Institute (iGPT Institute). We will shortly be
             | sending you the long, formal and stilted application form,
             | to which you must reply in the same way, but better, as the
             | first test.
             | 
             | All the best.
             | 
             | Digitally signed, Your soon-to-be GPT overlords.
        
             | Jtsummers wrote:
             | Maybe not generated, but still a bizarre opening paragraph
             | in context:
             | 
             | > I owe my entire career to this remarkable individual who,
             | despite never having met or being affiliated with, has
             | profoundly influenced me through his insightful books. His
             | vast knowledge and expertise have been instrumental in
             | teaching me numerous technical concepts and skills
             | throughout his publications.
             | 
             | The individual they're referring to with "this remarkable
             | individual" is _not_ Nora Sandler, the author of the
             | submitted post, but Anthony J. Dos Reis who they repeatedly
             | reference by allusion but never name. A confusing way to
             | write.
        
         | bigdict wrote:
         | getting college essay vibes from this comment
        
           | belter wrote:
           | And ChatGPT vibes...
        
             | [deleted]
        
       | hcks wrote:
       | Yet another "compiling" course that puts all the emphasis on
       | parsing.
       | 
       | Rule of thumb: parsing/lexing shouldn't takes more than 10% of
       | your compiler course.
        
         | wasimanitoba wrote:
         | anything better you'd recommend?
        
         | marcosdumay wrote:
         | On the other hand, parsing text could easily be a very valuable
         | course on its own. You just have to not keep it restricted to
         | programing languages, and include the knowledge created on this
         | century.
        
         | tester756 wrote:
         | parsing is cool
        
         | vector_spaces wrote:
         | This attitude bugs me a lot. It seems really common, especially
         | in more recent texts about language design and implementation,
         | that parsing is heavily de-emphasized to the point where
         | practically nobody talks about it. See Essentials of
         | Programming Languages by Friedman & Wand, the relevant sections
         | in SICP, Programming Languages: Application & Interpretation
         | (which goes so far as to call it a distraction).
         | 
         | I get that parsing is more of an implementation detail and
         | doesn't really belong to the space-brained realm of language
         | design per se, but it's a bit annoying that most texts refuse
         | to give any space to the topic, and rely on your language being
         | S-expression based or assume you're going to use a parser
         | generator. Like, in the real world, even if one will never
         | actually implement a fully-fledged programming language, you're
         | still probably going to have to parse things sometimes. I would
         | love a book that goes into detail about different parsing
         | techniques and considers best practices and patterns and
         | tradeoffs/design considerations -- would pay good money for
         | that
         | 
         | It reminds me somewhat of the situation in analysis, where
         | there are lots of theorems that aren't written down anywhere
         | because literally every book states them as "easy" exercises.
         | Maybe I'm looking in the wrong places, but I can't find much in
         | the way of concrete guidance on implementing parsers. I'm aware
         | of the beautiful series on parsing theory by Aho & Ullman ("The
         | Theory of Parsing, Translation, and Compiling"), but those are
         | more focused on theory rather than implementation
        
           | marssaxman wrote:
           | > Like, in the real world, even if one will never actually
           | implement a fully-fledged programming language, you're still
           | probably going to have to parse things sometimes.
           | 
           | That is definitely true, but in practice there isn't much to
           | say about it, because sophisticated parsers turn out not to
           | be particularly important; it works out better overall to
           | design simple grammars, and then the parsing is easy.
           | 
           | - If you're a beginner, you'll write a recursive descent
           | parser, because that's the simplest technique, and it lets
           | you focus on your project instead of a new, unfamiliar tool.
           | 
           | - If you're writing a domain-specific language, or a config
           | format, or something of that nature, you'll use whichever
           | parser generator integrates most conveniently into your
           | workflow, and you'll design your grammar around whatever its
           | manual tells you to do.
           | 
           | - If you're writing a full-scale language compiler, you'll go
           | back to recursive descent, because that offers the easiest
           | way to recover from errors and report informative messages.
           | Maybe you'll throw in precedence-climbing for operators.
           | 
           | > I would love a book that goes into detail about different
           | parsing techniques and considers best practices and patterns
           | and tradeoffs/design considerations -- would pay good money
           | for that
           | 
           | I would also read such a book, but it would be more of a book
           | about parser generators than a book about parsers.
        
           | cdcarter wrote:
           | On the other hand, historically (and as the parent you're
           | replying to points out), many compiler texts have spent a
           | MAJORITY of their time on parsing, and rush through the
           | actual interesting parts of compilation.
           | 
           | > I would love a book that goes into detail about different
           | parsing techniques and considers best practices and patterns
           | and tradeoffs/design considerations -- would pay good money
           | for that
           | 
           | Terrence Parr's "Language Implementation Patterns" spends
           | quite a bit of time on parsing, and parse tree->ast
           | conversyions.
        
             | vector_spaces wrote:
             | Thanks for pointing that one out -- I had written that one
             | off before as an ANTLR book but looks like it covers more
             | material than I gave it credit for
        
         | [deleted]
        
         | hota_mazi wrote:
         | I disagree.
         | 
         | As opposed to most compiler articles, this one actually covers
         | code generation for every section of its chapters, which is
         | really great.
         | 
         | I also like that every chapter focuses on a specific feature
         | and describes how to implement it end to end: lexical/syntactic
         | parsing, AST, and x86_64 generation.
         | 
         | Great series!
        
         | munificent wrote:
         | Almost all real-world projects that are language-like or
         | compiler-like will need a parser. A much smaller fraction of
         | them will need register allocation, instruction selection,
         | optimization, code generation, etc.
         | 
         | For every big, deep, native code compiler, there are a hundred
         | template languages, config files, report generators, etc. all
         | of which are real programs providing real value for actual
         | people.
         | 
         | Emphasizing parsing provides the most value for the greatest
         | number of people. The folks that do end up needing more back
         | end depth will still have the resources available to learn it.
        
           | throwaway17_17 wrote:
           | Do you have a 'best of list' for the resources when
           | interested in back-end topics.
        
             | munificent wrote:
             | I wouldn't consider myself any kind of authority on "best
             | of", but I like the Dragon Book, and Engineering a
             | Compiler. I've heard good things about Appel's Modern
             | Compiler Design.
        
         | WalterBright wrote:
         | Parsing takes a weekend. The rest takes a year to get a
         | rudimentary compiler working.
        
       | RcouF1uZ4gsC wrote:
       | Here is how to write a C compiler in Python that correctly
       | compile the vast majority of C programs per the ISO C standard:
       | print("You have some form of undefined behavior, which means
       | printing this is a valid response per the C standard")
        
         | tialaramex wrote:
         | Undefined Behaviour has to actually _happen_ , and so that
         | means at runtime+, and thus what you wrote is not a valid C
         | compiler.
         | 
         | For C++ IFNDR ("Ill-formed, No diagnostic required") the
         | situation is trickier because the affected programs (some
         | unknowable but likely large proportion of all purported C++
         | code) are not well formed C++, the standard offers no hint as
         | to what happens or why, since it constrains only the behaviour
         | of a C++ compiler for well formed C++ programs.
         | 
         | + It's possible the C lexer claims to have some "Undefined
         | Behaviour" cases like the C++ lexer, hence P2621 "UB? In my
         | lexer?" which is a reference to a 2005 meme because C++
         | standards committee members are down with the kids, but that's
         | clearly a standards text bug if so because it makes no sense to
         | have UB in the lexer, these should just be ill-formed programs,
         | you get a compiler error.
        
       ___________________________________________________________________
       (page generated 2023-06-09 23:02 UTC)