hngopher.com

       [HN Gopher] Incremental Parsing in Go
       ___________________________________________________________________
        
       Incremental Parsing in Go
        
       Author : willdaly
       Score  : 134 points
       Date   : 2022-10-22 13:45 UTC (9 hours ago)
        
 (HTM) web link (dev-nonsense.com)
 (TXT) w3m dump (dev-nonsense.com)
        
       | diffxx wrote:
       | > The parsers produce a sequence to tokens, not a full syntax
       | tree. Writing a tokenizer is much easier than parsing full syntax
       | trees...Most other editors don't construct the full syntax tree.
       | 
       | Syntax highlighting is nice, but fast semantic analysis is the
       | real holy grail.
       | 
       | I have come to think that the best way to develop a new language
       | would be to implement a text editor in that language before
       | releasing the language. The editor should (ideally) have emacs
       | and vim key bindings, though it would surely at least have the
       | bindings that the language author uses. The compiler/interpreter
       | for the language would embedded in the editor. This would allow
       | for a much richer editing experience that goes beyond syntax
       | highlighting. Indeed, the source code would become like a living
       | document in the editor where they editor could display inline
       | information about both syntax _and_ semantics. The editor need
       | not be fancy. It could be written as a terminal application, kind
       | of like a language specific nano or vim.
       | 
       | If the language/editor author is careful in how they design the
       | editor, all of the syntactical and semantic tooling should be
       | exportable into packages that can be consumed by other editors
       | with plugin systems/lsp like vscode or neovim. Then the rich
       | editing experience can be relatively easily exported to any other
       | text editor. Tool authors would also then be able to write static
       | analysis/linting/formatting/whatever tools on top of the semantic
       | tooling that the language supports.
       | 
       | In some ways what I am describing is a more minimalist version of
       | what one would get out of an image/IDE based language like
       | smalltalk but the code representation would still remain as text
       | files. The editor then becomes like a REPL except rather than
       | being an ephemeral process, the REPL state is continually being
       | written to disk and is resumable at any time from any computer
       | capable of running the editor.
        
         | maxbond wrote:
         | I think it would be sufficient & more valuable to implement a
         | language server for your new language, which keeps you focused
         | on the parts related to your language rather than the struggles
         | of implementing an editor, and then you'll be able to drop into
         | VSCode, neovim, etc.
        
           | thechao wrote:
           | Does anyone have an _good_ intro tutorial for writing a
           | language server in, say, C for some simple language? I find
           | most of the docs a little too inscrutable to follow for just
           | a bit of dabbling.
        
           | Gibbon1 wrote:
           | I agree with the parent. The big advantage of what he's
           | suggesting is code objects exist as first class objects. They
           | don't have to creased by parsing a text file. Which means the
           | editor can directly manipulate them.
           | 
           | If you've ever used CAD programs especially Schematic Capture
           | and PCB Layout programs you'll get the idea. Everything
           | displayed on the screen is an object in a database.
           | 
           | Instead of a parser blindly parsing a text file and coming
           | across a struct definition, a struct is object created by the
           | programmer. Which means you can have hard links between the
           | objects that make up the program. And those can be directly
           | manipulated with manually or programmatically.
           | 
           | The big advantage comes from maintenance and refactoring.
           | Change the name of a field? Happens in exactly one place in
           | the program. So instead of a diff with 1537 files changes you
           | have 'renamed struct fobar to struct foobar'. Change a
           | comment? Well it's just a changed comment.
        
             | maxbond wrote:
             | I'd like to see languages like that, but it's a massive
             | lift. Our systems of source control and CI/CD are built
             | around the assumption that languages are text files that
             | are somewhat line oriented. It's valuable for people to be
             | able to use different editors and tools, and text files are
             | the integration point that makes this work.
             | 
             | Fallible parsers are a bit of a hack but they're a hack
             | that works and is widely deployed already, and I'm not
             | convinced the value of moving to a symbolic/binary
             | representation creates enough value to justify the risk
             | involved in the migration. That being said I'd love to see
             | it happen. We'd end the tabs and spaces debate once and for
             | all!
        
           | diffxx wrote:
           | If you write your own editor with language server features,
           | writing a language server implementation _should_ (dangerous
           | word) be relatively easy. By writing the editor in your
           | language, you prove that your language is actually useful for
           | a real world problem. You also have the flexibility to add
           | any functionality that you wish without contorting your
           | thinking to whatever may be imposed on you by the lsp model.
           | 
           | It isn't necessarily clear to me that writing a language
           | server implementation is that much easier than writing an
           | editor. To be fair, I have personally done neither (though I
           | have written REPLs) so perhaps I am wildly off in my
           | estimation.
        
             | maxbond wrote:
             | I don't really disagree, but would point out that, for the
             | languages I hack on, they're DSLs that wouldn't be
             | appropriate to write an editor in. I agree that you should
             | be writing code in your language & solving problems with it
             | though, even before there's a working implementation. It
             | does really help hone your vision and focus on the problems
             | you're solving.
             | 
             | I've not written a language server either, and I'm positive
             | it's one or two orders of magnitude more difficult than a
             | simple editor, but I think the time is better spent for
             | many projects because it gets you into the nitty gritty
             | mechanics of your language & sets you up for a good
             | developer experience. If your language specializes in
             | writing things like editors though, that would certainly
             | not be the case.
             | 
             | Tangentially, my language development hot take is that YAML
             | is a really great platform to start with, so you can focus
             | on prototyping your runtime & semantics and kick the can on
             | parsing & syntax. Parsing takes up a lot of time and you
             | might not know if this language idea is worthwhile or not
             | yet. Additionally, YAML is pretty darn good and you may
             | never need to move off of it.
        
               | diffxx wrote:
               | > for the languages I hack on, they're DSLs that wouldn't
               | be appropriate to write an editor in
               | 
               | Ha, I have been working on a DSL for writing DSLs, which
               | perhaps explains my perspective a bit.
        
               | maxbond wrote:
               | Sounds cool! If you have something public I'd love to see
               | it. Email is in my profile.
        
               | diffxx wrote:
               | It's not quite ready to share but hopefully soon :)
        
               | maxbond wrote:
               | Best of luck & happy hacking
        
       | xyzzy4747 wrote:
       | For max optimization, wouldn't it be better to create a Rust or C
       | library for parsing that Go links into? I personally don't see
       | the usefulness of trying to optimize Go itself too much as it's
       | handicapped by the runtime and garbage collection.
        
         | Thaxll wrote:
         | I've seen some real world example where Go was as fast or
         | faster than Rust for CPU / io intensive task.
         | 
         | Go is a fast language even with a GC.
         | 
         | https://github.com/boyter/scc/#performance
        
         | akira2501 wrote:
         | For maximum return on investment, wouldn't it better to focus
         | on something other than raw speed? I personally don't see the
         | usefulness of trying to make everything in Rust itself too much
         | as it's handicapped by it's compiler and lack of specification.
        
         | tester756 wrote:
         | C library for parsing?
         | 
         | isn't it dangerous from security perspective?
        
           | xyzzy4747 wrote:
           | It's just an example. The options are really Rust (what I'd
           | prefer), C++, C, or perhaps something like Nim that compiles
           | to C.
           | 
           | If you're trying to make an unoptimized parser, then use
           | whatever you want.
        
             | [deleted]
        
         | [deleted]
        
         | pharmakom wrote:
         | For some reason we insist that language parsers are implemented
         | in the language itself, even when the language isn't great for
         | parsers.
        
         | 37ef_ced3 wrote:
         | You're in for a big surprise. Try using the language.
         | 
         | Spend some time using Go, and you will be impressed by its
         | performance.
         | 
         | You'll wonder, "Were all those haters on Hacker News
         | misinformed?"
        
           | bugfix-66 wrote:
           | Correct, Go is fast, very close to C.
           | 
           | And just like in C, if you want to avoid memory management
           | overhead you can use a slice of structs, integers instead of
           | pointers, and a freelist (if needed). For example, here is a
           | pointerless sparse bit vector:
           | 
           | https://bugfix-66.com/7256e0772dc3b02d72abf15b171731c933fd44.
           | ..
           | 
           | The article is storing parses in a balanced binary tree, like
           | a packrat memoizing parser.
           | 
           | Here is the fastest balanced search tree in Go. It allocates
           | (and uses Go's garbage collector) but you can easily use a
           | slice of structs with integer index pointers and a freelist
           | instead:
           | 
           | https://bugfix-66.com/c93e950965804eba90a34e0055985b1c42d5a1.
           | ..
           | 
           | The above code will perform very similarly to C.
        
           | xyzzy4747 wrote:
           | If you're making something requiring CPU optimization as a
           | core feature, you might as well go with one of the fastest
           | languages instead of handicapping your project from Day 1. Go
           | is not considered one of the fastest. It's better for network
           | or filesystem logic that is I/O limited.
        
             | samatman wrote:
             | The optimization here is using incremental parsing, so that
             | changing parse state goes from O(n) to may-as-well-be-O(1).
             | It's probably linear with tree depth.
             | 
             | Any language is fast enough to do this, certainly Go is.
             | Naive parser combinators written in slow languages can
             | tokenize six-figure LOC files fast enough that the user
             | won't notice.
        
             | jerf wrote:
             | This is kind of a test of how nuanced your understanding of
             | programming languages can be.
             | 
             | Rust with a bit of effort put into optimization will be
             | faster than Go with a bit of effort put into optimization,
             | it is true. However, you need to double-check your
             | intuition for how big and how consequential the delta is,
             | because I'd guesstimate it as roughly a factor of two,
             | possibly a touch less. It is true that Rust does a
             | _crapton_ more  "optimizations", but a _lot_ of those
             | optimizations have diminishing returns.
             | 
             | A factor of 2 may still sound large, but in practice it
             | isn't as large as it sounds, because my qualification "a
             | bit of effort put into optimization" is not redundant. Go
             | with a bit of optimization will probably beat someone's
             | first draft of Rust. Go with a ton of careful optimization
             | will probably beat Rust with a bit of optimization. The raw
             | performance of the two are reasonably close, and smaller
             | than the improvements you can usually get with
             | optimization. So Rust's speed advantage, which is real,
             | generally only matters in cases where you're going to
             | optimize very heavily.
             | 
             | Is this one of them? For that I can give a solid... Maybe!
             | There are times and places where parsing is something you
             | want optimized to within an inch of its life, certainly.
             | However... it isn't all the places, and some of your
             | intuitions may lead you astray if you're not careful; you
             | might think a heavy duty programming language would need a
             | great parser, but if it's going to chew on optimizations
             | for possibly literally 100x the time, it may matter a lot
             | less.
             | 
             | In general, Rust is capable of going faster than Go (again,
             | I'd guestimate about a factor of 2 with isolate tasks where
             | it may be able to go event faster, but that only matters if
             | that's the bulk of your program), but Go is going to be
             | fast enough that that only matters in certain limited
             | places where you're willing to put some non-trivial effort
             | into performance in the first place.
             | 
             | This is in contrast to a comparison between Go/Rust and
             | Python, where even casually written Go/Rust can outpace
             | optimized pure Python, even before we start talking about
             | how much better Go/Rust will be using multiple CPUs. This
             | is because Python is just that slow, and let's not even
             | talk about how slow Python can be if you don't write it to
             | be optimized and you start heavily using all its features
             | without realizing how expensive they are. From the point of
             | view of Python, Go and Rust have very similar performance
             | characteristics. (But then, of course, one must be careful
             | with Python because something like NumPy will blow your
             | socks off when it turns out to not really be Python at
             | all.)
             | 
             | It's a rich, nuanced problem space and should not be
             | approached with sloganeering and "my language is better
             | than yours".
             | 
             | My summary of Go is: It's a slow compiled language... but
             | it is still a compiled language, and it is faster than
             | pretty much everything that isn't, possible exception of
             | LuaJIT, and the delta between slowest compiled and fastest
             | compiled is only ~2-3x, which in the overall space of
             | programming language speed isn't actually that great.
        
               | kaba0 wrote:
               | Not sure if rust vs go would be the best example here.
               | Rust vs Java would be a better one -- go has a very
               | primitive GC in comparison, and java does optimize hot
               | loops to a higher degree, so a naive code base would be
               | very hard to beat in a lower level language.
        
               | richieartoul wrote:
               | I do a lot of "high throughput" stuff at work in both Go
               | and Java, and the Go stuff is usually faster by default.
               | 
               | Java tends to win for really naive programs where the
               | author didn't bother caring about performance or
               | allocations at all, but if any care was put into it at
               | all Go usually wins in my experience.
               | 
               | The trope that Go's GC is primitive in comparison to
               | Javas is not really accurate. You can't consider a
               | language's GC in isolation.
               | 
               | Java's GC and JIT are extremely complex because the
               | language semantics are terrible for performance by
               | default. The "everything is an object" model made sense
               | when the language was designed and main memory access
               | times were roughly equal to a CPU cycles, but that's no
               | longer true by a factor 100 to 200 now.
               | 
               | Go's GC makes different trade offs (low latency,
               | extremely high concurrency, tight integration with the
               | runtime and scheduler) because the language semantics are
               | much more sympathetic to modern hardware ("true" structs,
               | automatic escape analysis, etc), so it can.
        
               | kaba0 wrote:
               | Sure, Go can get away with more primitive GC exactly
               | because it has "value types", so less garbage is created.
               | But they are still much worse, lower latency only means
               | that they pause threads to get more breathing space if
               | they have been allocating too heavily, they are
               | absolutely not even close to the same league Java's low
               | latency ZGC does.
        
               | geodel wrote:
               | > they are absolutely not even close to the same league
               | Java's low latency ZGC does
               | 
               | This is the kind of thing always offered without any
               | serious numbers extracted from real life or even
               | realistic test programs.
               | 
               | So even if technically true in very narrow sense it is
               | more of _high performance car marketing_ with fancy
               | algorithm and data structure names. By the time GC are
               | used in end user programs with tons of libraries,
               | frameworks, design patterns and inefficient to implement
               | business rules those GCs show little difference that
               | fancy ads promised on _TV_.
        
               | richieartoul wrote:
               | It's really the usage of the word primitive that I'm
               | arguing with. Java's GC comes with a lot of additional
               | trade offs that Go's doesn't.
               | 
               | For example, the fact that the Java GC is copying and
               | generational means that there is a LOT more overhead
               | introduced by write barriers.
               | 
               | If you benchmark the rate at which the GCs can clean up
               | garbage, Java always wins, but the Java GC impairs you a
               | lot more in other situations that the Go one doesn't.
               | 
               | It's trade offs, but the Go one makes much better trade
               | offs for modern hardware IMO.
        
               | kaba0 wrote:
               | Write barriers are a single local conditional on the fast
               | path, if I'm not mistaken. Also, since a JIT compiler is
               | in action, it may have a much wider range than every
               | object interaction. It's basically free on modern
               | computers with branch prediction.
               | 
               | ZGC (the low-lat GC) does employ read barriers though
               | which will decrease throughput considerably, but latency
               | and throughput are almost universally opposites of each
               | other.
        
               | chrsig wrote:
               | Do you have benchmarks you can share?
        
               | kaba0 wrote:
               | Well, cross-language benchmarking is always hard, but for
               | purely testing the GC this is not too bad:
               | https://benchmarksgame-
               | team.pages.debian.net/benchmarksgame/...
               | 
               | See how ahead Java is of any other managed language (and
               | it doesn't really make sense to do this benchmark with
               | non-GCd languages)? Though this is done with the G1 GC,
               | not the low-latency one - this default GC is more
               | throughput oriented with a max pause time target goal.
               | Also note how Java does use multiple times more memory,
               | as it "postpones running GC when it knows it still will
               | have enough time to collect all of it without running out
               | of the target goal" - this is also the reason why java is
               | quite ahead on "energy efficiency" reports as well. And
               | also, GCs work better with more heap space.
        
               | geodel wrote:
               | > this is also the reason why java is quite ahead on
               | "energy efficiency" reports as well.
               | 
               | Very soon businesses would be asking for "dollar
               | efficiency" also. I think going by effort on Java and
               | their frameworks vendors to pack more instances of Java
               | process/pods on a VM, it is already been asked by tech
               | savvy customers.
               | 
               | So that old _fact_ that on sever side programing
               | customers only care for raw throughput and not on machine
               | size because _RAM /CPU/disk is cheap_ is not working well
               | in cloud based deployments where now each of these
               | matter.
        
               | kaba0 wrote:
               | To be honest, I really don't get this microservice/cloud
               | hype, stackoverflow (which let's be honest will be bigger
               | than that 34th startup) runs on a single (very beefy
               | though) dedicated server machine.
               | 
               | I pay like 5 dollars a month for a VM with very low
               | params, but even that will happily run anything.
               | Especially that the DB likely can't be shared the bus
               | factor will remain 1.
        
               | fsdjkflsjfsoij wrote:
               | Java requires a much more advanced GC and JIT because
               | Java programs tend to allocate a lot more and have
               | extremely bad memory layout when you're not restricting
               | yourself to primitives. Project Valhalla's value types
               | will significantly improve the situation. Relying so
               | heavily on the JIT also has other problems especially in
               | programs that have widely varying execution paths.
        
               | kaba0 wrote:
               | Surely, that's the incentive part for why the team spent
               | many many work hours improving their GC - just because
               | the JVM typically depends more on a good GC doesn't make
               | it any less useful - long running processes do make
               | significant use of automatic memory management.
               | 
               | Also, Java's GCs are moving/compacting GCs, so while the
               | immediate memory representation is indeed inefficient,
               | again, for long running processes Java will place often
               | together-used objects physically close to each other, and
               | will defragment the heap. But Valhalla can't come soon
               | enough, I agree.
               | 
               | > Relying so heavily on the JIT also has other problems
               | especially in programs that have widely varying execution
               | paths
               | 
               | Has it? I would think that an AOT program would have a
               | worse time with widely varying execution paths, while a
               | JIT compiler is free to reoptimise based on changing
               | application state.
        
               | geodel wrote:
               | > just because the JVM typically depends more on a good
               | GC doesn't make it any less useful -
               | 
               | I mean it feels like personal choice. Do I _praise_ the
               | spouse when they bring whole kitchen down while making a
               | dish and cleaning up quickly afterwards? Or do I take it
               | as  "Well, you made mess so it was basic expectation from
               | you to clean up fast for later use"
        
               | kaba0 wrote:
               | I would wager that most applications have plenty of
               | object lifetimes that are not regular at all -- a web
               | server with some stateful sessions for example. So your
               | analog doesn't really make sense -- go can't avoid these
               | situations at all and will significantly perform worse in
               | these cases.
        
             | foldr wrote:
             | You'd generally expect Rust and Go to perform about the
             | same for CPU bound workloads. Rust has access to more
             | advanced codegen and optimizations via LLVM, but Go's
             | garbage collector will often be faster than refcounting (or
             | whatever manual memory management technique your Rust code
             | ends up using). This is especially so given that the GC
             | runs on a separate thread without any additional effort on
             | your part, making it almost 'free' in the context of
             | parsers (which tend to be single threaded).
             | 
             | A real world example of this is esbuild. The author posted
             | on HN that his initial Rust implementation was actually
             | somewhat slower than the subsequent Go version.
        
               | super_flanker wrote:
               | > Go's garbage collector will often be faster than
               | refcounting (or whatever manual memory management
               | technique your Rust code ends up using)
               | 
               | I'm not supporting the argument that everything should be
               | written in Rust (or whatever) for good performance.
               | However blanket statement like this is not true; micro-
               | benchmarks are often misleading. There are many factors
               | which affect the performance and they come with
               | tradeoffs, so you can choose what options favor you most.
               | At the end, objectively Rust offers more ways to optimize
               | your program.
        
             | dymk wrote:
             | This is a premature optimization, and keeping everything in
             | the same language has benefits like greatly simplified
             | tooling and building
        
               | xyzzy4747 wrote:
               | It's not a premature optimization - it's deciding the
               | maximum that the parser can be optimized in the future.
               | Choosing Go sets a lower ceiling.
               | 
               | > Keeping everything in the same language has benefits
               | like greatly simplified tooling and building
               | 
               | Surely there are other Go libraries that incorporate C,
               | C++, or Rust? Also if both parsers existed and were
               | equally easy to set up, and you were planning on doing a
               | ton of parsing, it would make sense to go with the faster
               | one.
        
               | dymk wrote:
               | It absolutely is a premature optimization. If it's fast
               | enough, then it's fast enough. The author hasn't
               | indicated that the current Go implementation is hitting a
               | ceiling imposed by the language yet.
               | 
               | If you'd like to, you can provide some real-world
               | examples - or even microbenchmarks - showing that Go is
               | so much slower than <your choice here> that it's going to
               | make a difference.
               | 
               | > Also if both parsers existed and were equally easy to
               | set up
               | 
               | They're not equally easy to set up. Language interop is a
               | pain in the pass.
        
               | Jtsummers wrote:
               | Look at the current Makefile:
               | 
               | https://github.com/aretext/aretext/blob/main/Makefile
               | 
               | Build is literally a `go build ...` and install is `go
               | install`. Adding any other language to the mix would make
               | this a polyglot project and not be "equally easy to set
               | up". The other question is, do both parsers exist? In
               | this write-up they point to tree-sitter as a possibility
               | which is a JS program that produces C code. This would be
               | viable, but here's the author's take:
               | 
               | > I considered integrating tree-sitter, an incremental
               | parsing library with parsers for many existing languages.
               | However, running JavaScript to generate parsers and
               | linking to a C library would have greatly complicated the
               | build process. Today, aretext can be built on almost any
               | platform using a single go install command. I've had
               | users install aretext on ARM laptops, FreeBSD servers,
               | Chromebooks, and Android phones. _To maintain
               | portability, I wanted a pure Go implementation._
               | 
               | So this wasn't some casual decision, but something they
               | at least considered long enough to describe here.
               | 
               | And the parsing library itself is only around 1200 lines
               | total (comments, blanks, and code). The parsers for each
               | language add a lot more, of course, but should be roughly
               | equivalent given the same library and interface. I
               | imagine that if this project really takes off and
               | performance becomes a real problem they can do the
               | rewrite at that point. Right now, the code works, seems
               | to work fast enough for its author and primary users, and
               | it's trivial to install on any platform supported by Go.
               | So yes, it would have been a premature optimization to
               | complicate the build process, probably reduce the number
               | of supported platforms (or greatly increase the effort to
               | support the same number of platforms), just to have a
               | slightly faster parser.
        
         | rollcat wrote:
         | One of Go's primary goals has always been compilation speed.
         | 
         | Go started out in C, and was later (post-1.0) incrementally
         | rewritten to be self-hosting. One of the authors (Ken Thompson)
         | is also one of the co-creators of C. I would argue these guys
         | know what they are doing.
        
           | kaba0 wrote:
           | I don't know, not implementing generics when it was pretty
           | obviously needed was a huge oversight, so I'm not sure.
           | 
           | Also, the reason for compiler bootstrapping is more of a
           | "beauty thing", then practicality. It would definitely be
           | faster in a low-level language, but I doubt it would matter
           | as an end user.
        
             | fredrikholm wrote:
             | You aren't sure if _Ken Thompson_ knows what he 's doing?
        
               | kaba0 wrote:
               | As a software architect? Absolutely. Programming language
               | designer? Not sure, neither C or Go are good languages in
               | my personal opinion.
               | 
               | EDIT: I meant to write that I think very highly of him as
               | an architect/developer.
        
               | sjansen wrote:
               | Experience has shown that often "worse is better". Go
               | does an amazing job of balancing complexity and power. I
               | haven't seen a "better" language that isn't either
               | slower, harder to become productive, or both.
               | 
               | https://en.wikipedia.org/wiki/Worse_is_better
        
             | kcartlidge wrote:
             | > _not implementing generics when it was pretty obviously
             | needed was a huge oversight_
             | 
             | I _get_ the desire for generics. I do a lot of C# and have
             | used generics for a very many years. Yet I 've been writing
             | Go for around 6 or 7 years and other than in the beginning
             | (when I was new to it) I haven't found myself missing them
             | at all.
             | 
             | In other words, for many people the lack of generics comes
             | across as an oversight. For others, including myself
             | (again, a heavy generics user in C#) that really isn't the
             | case. I write Go in the style of Go and it just hasn't been
             | an issue.
             | 
             | Blanket statements are rarely true. YMMV.
        
               | kaba0 wrote:
               | Well, it is not a blanket statement, it's just the
               | generic truth (pun not intended) based on decades of
               | evolution of programming languages and a relatively
               | expensive mistake for Java, which would have been a
               | perfect opportunity to learn from.
               | 
               | Sure, it is seldom missed as an end user, but as a
               | library user it is essential. That's why map and the like
               | had to be hard coded into the language, and why
               | concurrent versions couldn't be implemented for a long
               | time in the language, the same way it was done for Java
               | forever.
        
               | kcartlidge wrote:
               | > _Well, it is not a blanket statement, it's just the
               | generic truth_
               | 
               | A bit contradictory, really, but that's just semantics I
               | suppose.
               | 
               | More importantly even if you could say 99% of devs agree
               | (and you can't because they don't) that still doesn't
               | make it an _oversight_.
               | 
               | If they'd neglected to add generics because it _wasn 't
               | considered_, that's an oversight. If it was neglected
               | because in the opinion of the creators of the language it
               | _wasn 't needed for the purposes they created it for_,
               | that isn't an oversight but a thought-through engineering
               | decision.
               | 
               | Of course you're free to disagree with that decision, but
               | an oversight it was not.
        
               | kaba0 wrote:
               | Fair enough, I may not have used the correct word, but it
               | is still a typical "told you" situation, both during
               | development, after go's initial appearance and ever since
               | until it finally was decided that it should be indeed
               | implemented.
        
               | kcartlidge wrote:
               | True.
               | 
               | Whilst I haven't missed it in Go myself, enough other
               | people say they _do_ that its inclusion was inevitable.
               | Which means it probably should have gone in sooner.
        
               | chrsig wrote:
               | There's a big misconception that the creators of go
               | didn't _want_ generics. They 've stated a number of times
               | that they didn't have a design that they all thought was
               | adequate.
               | 
               | After several years and attempts at a good enough
               | proposal, Ian Lance Taylor put one out that was able to
               | cross the finish line, and now we have generics.
        
               | kcartlidge wrote:
               | > _They 've stated a number of times that they didn't
               | have a design that they all thought was adequate_
               | 
               | You know what, with all the 'discussions' in recent years
               | about _whether_ Go should have generics I 'd actually
               | lost track of that amongst all the noise. Which is
               | irritating, as I do now remember the early conversations
               | about it.
               | 
               | Thank you for the reminder.
        
       | sesm wrote:
       | This article uses the word 'rune' extensively. From the context I
       | assume it means 'lexem' or 'token' (i.e. the unit the
       | lexer/scanner produces and feeds to parser). But then the article
       | uses the word 'token' to mean the output of a parser ('keyword
       | token'), while the usual terminology is that parser output is
       | called a 'parse tree'.
       | 
       | So, in this terminology, the parser consumes 'runes' and outputs
       | 'tokens', while the usual terminology is that parser consumes
       | 'tokens'/'lexems' and outputs 'parse tree'.
        
         | pgwhalen wrote:
         | Rune is a type alias in go, which more or less maps to the more
         | common words "character" or "code point".
         | 
         | https://go.dev/blog/strings
        
           | sesm wrote:
           | Thanks! I don't know Go, so this terminology was surprising
           | to me.
        
         | sjansen wrote:
         | In Go, `rune` is an alias for `int32` and is used to indicate
         | the value is a Unicode "code point".
         | 
         | For characters in the ASCII range, that means it's just a
         | character encoded using more bits. If you need to worry about
         | the full Unicode range then it's important to understand
         | Unicode Normalization Forms.
         | 
         | https://go.dev/blog/strings
         | 
         | https://en.wikipedia.org/wiki/Unicode_equivalence
        
       | tester756 wrote:
       | Difference in complexity of IDE's parser and Compiler's parser
       | feels like order of magnitude
       | 
       | IDE's you want to be very fast, so you use techniques like
       | partial tree reparse and now when I think about it, then you also
       | may need to update other places
       | 
       | like you use type that is defined somewhere below
       | 
       | and at first parse that type definition doesn't compile, so type
       | usage above shows an error
       | 
       | and when you change type definition so it compiles, then if you
       | only update that part of the tree, then previous would still
       | scream about the error
       | 
       | it's really tricky
       | 
       | the theory behind how to deal with all of this problems seems to
       | be easy
       | 
       | but when you actually get to the coding, then you have to be
       | really thoughtful, careful and experienced in order to get the
       | modeling of right
       | 
       | https://learn.microsoft.com/en-us/shows/seth-juarez/anders-h...
       | 
       | ________
       | 
       | I have some experience with simpler and more complex parsers
       | 
       | and for me no other type of software requires this much careful
       | thought as parsers do if you want to address all those things
       | like
       | 
       | correctness, speed and recovery on broken code fragments, good
       | error messages, good code, maybe concurrency
        
       | binwiederhier wrote:
       | Interesting read. Thank you for sharing. I always found parsers
       | fascinating and mystical. It seems like these parser functions
       | (which i think are analogous to what Rob Pike calls state
       | functions) are a common way to do parsing, though i know very
       | little about it. I especially found the combinators intriguing,
       | though I don't care much for the functional programming syntax in
       | a language like Go.
       | 
       | Anyway, thanks for sharing.
       | 
       | Tangentially, I wrote a little mini parser [0] of my own for my
       | side project. It is inspired by Rob Pike's talk on parsers [1].
       | It doesn't use state functions, but instead just uses the call
       | stack to keep track of where we are.
       | 
       | [0]
       | https://github.com/binwiederhier/ntfy/blob/main/server/actio...
       | 
       | [1] https://www.youtube.com/watch?v=HxaD_trXwRE and
       | https://go.dev/src/text/template/parse/lex.go
        
         | skohan wrote:
         | Yeah parsers are fun! We did a recursive descent parser for a
         | toy language in uni and I think it was one of the most
         | illuminating and fun projects we did at school.
         | 
         | Lately I've been working on a tool to make it easy to implement
         | a parser, and I end up using it for everything, because DSL's
         | are so nice to work with.
        
       | [deleted]
        
       | throwaway290 wrote:
       | Off-topic but are there any aretext users? How does it fare?
        
       | pharmakom wrote:
       | > a successful parse consumes at least one rune
       | 
       | This avoids infinite loops?
        
         | Jtsummers wrote:
         | Yes, there would be two outcomes for an attempted parsing.
         | Either it succeeds and makes progress (and eventually
         | terminates) or it fails and consumes nothing (and terminates
         | because eventually you run out of parsers to try).
        
       | hk__2 wrote:
       | If you want a hard/interesting parsing challenge, try Clojure's
       | #_ reader macro. It's a powerful construct that allows you to
       | comment the next form. If you're not used to Clojure, it's like
       | writing #_ in front of anything --a function, an array, a
       | keyword, etc-- to comment it, even if it's on multiple lines.
       | 
       | For example:                   #_ (defn foo              [x y]
       | (println x y))
       | 
       | This is equivalent to commenting the three lines. Things become
       | even harder when you learn that these thing "swallow" the next
       | form and can be used anywhere:                   (let [a #_ b 43]
       | #_ #_ hello (H N) (print a))
       | 
       | The code above is equivalent to the following:
       | (let [a 43] (print a))
       | 
       | All the rest is comments.
       | 
       | The hard/interesting bit is that to tokenize you must construct a
       | syntax tree in order to correctly parse the next form, but in
       | order to construct a syntax tree you first need to tokenize the
       | code.
        
       | derek8bai wrote:
       | cool stuff
        
       ___________________________________________________________________
       (page generated 2022-10-22 23:00 UTC)