[HN Gopher] Hacking the Go compiler to add a new keyword
___________________________________________________________________
Hacking the Go compiler to add a new keyword
Author : todsacerdoti
Score : 100 points
Date : 2021-12-08 17:01 UTC (5 hours ago)
(HTM) web link (avi.im)
(TXT) w3m dump (avi.im)
| not-my-account wrote:
| A great video with a similar topic is George Hotz adding "fore"
| loops to clang, which runs the body 4 times per loop.
|
| https://m.youtube.com/watch?v=ee1bXLDN60U
| eatonphil wrote:
| On the topic, there was another good post recently on hacking in
| a new operator to the Go compiler. Yes they rebuild the ^
| operator but it's still very illustrative of hacking on the Go
| project!
|
| https://medium.com/trendyol-tech/contributing-the-go-compile...
| Maksadbek wrote:
| There is a similar article here
| https://eli.thegreenplace.net/2019/go-compiler-internals-add...,
| where the author adds the `until` keyword to Go compiler.
| avinassh wrote:
| hey! author of the submitted post here. I did refer to Eli's
| articles and they were incredibly helpful. I have mentioned
| them in my post too.
|
| This was the only post I could find on internet which talked
| about Go compiler internals.
| Maksadbek wrote:
| Yup, didn't read the beginning of your post and missed the
| reference. I guess my comment is redundant since you already
| added a reference.
| samhw wrote:
| As someone who once did the same thing to attempt to
| strengthen the type system in Go, I offer you my endless
| sympathy...
| SilasX wrote:
| Semi-related: in the nand2tetris course, they teach you enough so
| they can add a keyword in their (admittedly toy) system -- the
| course involves implementing the entire toolchain, including the
| high-level language[1] compiler, which emits VM code, and the VM
| code's translator into assembly.
|
| By the time you've implemented a compiler that "just works" for
| the language, you notice that you have really inefficient code
| for those times when all you need to do is increment a variable
| by 1, given that the hardware has an opcode for that! In order to
| have nice, general compilation across all use cases, you've
| programmed the compiler to implement any case of "add X to
| variable" via the VM commands for "push X's value onto stack,
| push variable's value onto stack, call add, pop stack into X".
|
| So, I figured I could add "inc" as a keyword. You have the
| compiler recognize that keyword and translate "inc <var>" into a
| VM instruction, and then tell the translator how to turn that VM
| instruction into something that makes use of the opcode.
|
| (Alternatively, you can have it just recognize when it's doing
| something of the form "<variable> = <variable> + 1", but that's
| trickier once you've written the whole VM emission step as a
| single-pass operation.)
|
| I know, pretty basic stuff from the standpoint of a professional
| compiler programmer, but pretty neat to be able to make an
| addition to the language like that!
|
| [1] It uses "Jack", a syntactic-sugar-free Java-like language
| lxe wrote:
| > The hash method considers the token's first and second
| characters and the length.
|
| Quite a kludgey optimization for the token hash
| Leszek wrote:
| V8 developer here (who happened to also implement perfect
| hashing in V8's tokenizer) - perfect hashing is a very common
| compiler optimisation, and as the sibling comment says, is
| worth it for the runtime scanner speed improvement. If you do
| end up adding new keywords (which is ~never for anything that
| wants to preserve backwards compat), then you just recalculate
| your perfect hash with gperf or by hand or however.
| tptacek wrote:
| The language isn't extensible, so these are changes that happen
| very rarely; the compiler is simply optimized for the actual
| task it has. What would be weird would be expending any real
| effort --- or, worse, runtime cost --- for an engineering case
| we know is never going to happen.
| benhoyt wrote:
| Yes, and specifically in this case no keywords have been
| added to the Go language at all since its 1.0 release (I just
| checked the 1.0.1 spec and there are still 25 keywords). Even
| the addition of generics coming soon in 1.18 will add no new
| keywords (though it will add a new token, "~").
| VWWHFSfQ wrote:
| I find it disconcerting that the go compiler is such a mess that
| it took this much effort just to alias a new keyword. I know the
| Ruby internals are famously very nasty but I'm surprised go is
| this bad.
| throwaway894345 wrote:
| What's a language that allows for easily adding new keywords?
| What tradeoffs were involved in facilitating that property? Is
| this property compatible with Go's goals?
| nikanj wrote:
| Many languages are easy and fast to extend. Namely toy
| languages that are not really going anywhere. A dead-simple,
| easy-to-extend compiler is a few thousand LOC - and produces
| completely crap code for all platforms.
|
| If you want your language to actually have some real-world
| usage, you need real-world performance numbers. Which tends
| to lead to compiler codebases an few orders of magnitude
| larger, and much gnarlier to extend.
| adgjlsfhk1 wrote:
| Also, good languages don't have many keywords, so they
| aren't optimized for adding them.
| wk_end wrote:
| In this context (I see you Lispers) the difficulty of adding
| a new keyword to the compiler is less a property of the
| language than a property of how the compiler's implemented.
|
| Specifically, the weird stuff the author encountered like:
|
| * Generating the token list by parsing the comments of a
| source file
|
| * only parsing up to a hard-coded token instead of all of the
| known tokens (?!)
|
| * using a hacky token hashing mechanism that only looks at
| the first two characters of the token
|
| have nothing to do with Go-the-language.
| preseinger wrote:
| A language and its principal compiler are I think not so
| decoupled as you're implying.
| philosopher1234 wrote:
| Why does this mean its a mess? Why should the codebase be
| optimized for adding new keywords, when that happens maybe once
| a decade? Your comment seems overly negative.
| benhoyt wrote:
| Indeed. In fact, even with the release of generics in Go
| 1.18, which is coming out in early 2022 exactly a decade
| after Go 1.0, there will be no new keywords. So it won't even
| have happened once in a decade. :-)
| johnisgood wrote:
| > Other than Eli's post, there are no documentation or articles
| on Go compiler internals. How does someone get started working on
| them? How do they navigate and find all these intricacies without
| spending hours? Maybe Google has some internal documentation on
| the Go compiler.
|
| It would be nice to have more information out there on the
| internals of the Go compiler. Perhaps there is.
|
| I found stuff like:
|
| - https://github.com/emluque/golang-internals-resources
|
| - https://www.altoros.com/blog/golang-internals-part-1-main-co...
|
| - https://github.com/teh-cmc/go-internals
|
| But yeah, Eli's articles[1] are pretty good.
|
| [1] https://eli.thegreenplace.net/2019/go-compiler-internals-
| add...
| avinassh wrote:
| Thank you for these links! I think I have seen the first two at
| some point, but they weren't helpful. The `go-internals` looks
| great, and I will check them out.
|
| I am also curious about the daily development cycle by a
| regular Go contributor. How do they make changes, how do they
| do quick tests before running the whole test suite etc
| melony wrote:
| The Go compiler is about as straightforward as it can get. Just
| read the source:
|
| https://github.com/golang/go
| avinassh wrote:
| I don't think I would have figured out how one adds a new
| token if not for Eli's post. This comment [0] perfectly
| explains the quirks I ran into.
|
| As an exercise, can you help me figure out how to add a token
| just from the source and discover these quirks?
|
| On second thought, reading from the source and figuring it
| out could have been possible if you spent hours. But don't
| you think it should also have some comments to navigate?
|
| [0] - https://news.ycombinator.com/item?id=29489113
| londons_explore wrote:
| Google has an _amazing_ code search tool. You can try it
| out here[1]. That generally makes browsing source code much
| quicker and easier, which in turn makes understanding the
| structure of huge codebases much easier.
|
| With that tool, I prefer to just dive into the source in
| most cases rather than read documentation, especially when
| there is a good chance the documentation is wrong/outdated.
|
| [1]: https://source.chromium.org
| preseinger wrote:
| Code explains what and how, but not why. Why is necessary
| for building a robust mental model of any system, and can
| only be provided by documentation (or other humans).
| londons_explore wrote:
| Googles source code files frequently have 50+ lines of
| comments at the top of the file to explain the why...
| johnisgood wrote:
| > https://source.chromium.org
|
| Wow, it is pretty cool! Is there an open source software
| that is similar to this? It reminds me of
| https://elixir.bootlin.com/. I really want something like
| these two.
|
| Currently checking out https://github.com/bootlin/elixir.
| ferdowsi wrote:
| I clicked into the Go Github repo and found documentation
| pretty easily. The compiler code itself is well documented, and
| Go's code navigation tooling itself helps learning.
|
| https://github.com/golang/go/tree/master/src/cmd/compile
| avinassh wrote:
| I did run into this. The page linked to is a high level
| documentation with very few details which are specific to the
| codebase.
|
| Take the example of adding a new token. You have to run go
| generate to generate token strings. But nowhere in the docs
| or in the code it is mentioned what exactly is the 'stringer'
| and how to install it.
| _wldu wrote:
| This is a great demo. The old C compiler backdoor, but in Go:
|
| https://github.com/yeokm1/reflections-on-trusting-trust-go
|
| The Gopher Con Singapore (2018) video is a really great summary
| (20 mins). He modifies the compiler and inserts the backdoor
| during the presentation:
|
| https://www.youtube.com/watch?v=T82JttlJf60&list=PLq2Nv-Sh8E...
| rodmena wrote:
| > How do they navigate and find all these intricacies without
| spending hours? Technical documentation of internals is not
| important for corporate built languages -- which is a shame.
| fefe23 wrote:
| Why do we link to some dude applying a HOWTO instead of the
| HOWTO?
| quotemstr wrote:
| If you do perfect hashing, you should make it infallible ---
| retry map creation with tweaked hash functions until it works.
| didip wrote:
| I wonder if the author has heard of https://github.com/goplus/gop
|
| He'd have fun reverse engineering it.
| avinassh wrote:
| I have seen it on the HN, but I hadn't looked closely. Thanks
| for linking again!
___________________________________________________________________
(page generated 2021-12-08 23:00 UTC)