[HN Gopher] Incremental Parsing in Go
___________________________________________________________________
Incremental Parsing in Go
Author : willdaly
Score : 134 points
Date : 2022-10-22 13:45 UTC (9 hours ago)
(HTM) web link (dev-nonsense.com)
(TXT) w3m dump (dev-nonsense.com)
| diffxx wrote:
| > The parsers produce a sequence to tokens, not a full syntax
| tree. Writing a tokenizer is much easier than parsing full syntax
| trees...Most other editors don't construct the full syntax tree.
|
| Syntax highlighting is nice, but fast semantic analysis is the
| real holy grail.
|
| I have come to think that the best way to develop a new language
| would be to implement a text editor in that language before
| releasing the language. The editor should (ideally) have emacs
| and vim key bindings, though it would surely at least have the
| bindings that the language author uses. The compiler/interpreter
| for the language would embedded in the editor. This would allow
| for a much richer editing experience that goes beyond syntax
| highlighting. Indeed, the source code would become like a living
| document in the editor where they editor could display inline
| information about both syntax _and_ semantics. The editor need
| not be fancy. It could be written as a terminal application, kind
| of like a language specific nano or vim.
|
| If the language/editor author is careful in how they design the
| editor, all of the syntactical and semantic tooling should be
| exportable into packages that can be consumed by other editors
| with plugin systems/lsp like vscode or neovim. Then the rich
| editing experience can be relatively easily exported to any other
| text editor. Tool authors would also then be able to write static
| analysis/linting/formatting/whatever tools on top of the semantic
| tooling that the language supports.
|
| In some ways what I am describing is a more minimalist version of
| what one would get out of an image/IDE based language like
| smalltalk but the code representation would still remain as text
| files. The editor then becomes like a REPL except rather than
| being an ephemeral process, the REPL state is continually being
| written to disk and is resumable at any time from any computer
| capable of running the editor.
| maxbond wrote:
| I think it would be sufficient & more valuable to implement a
| language server for your new language, which keeps you focused
| on the parts related to your language rather than the struggles
| of implementing an editor, and then you'll be able to drop into
| VSCode, neovim, etc.
| thechao wrote:
| Does anyone have an _good_ intro tutorial for writing a
| language server in, say, C for some simple language? I find
| most of the docs a little too inscrutable to follow for just
| a bit of dabbling.
| Gibbon1 wrote:
| I agree with the parent. The big advantage of what he's
| suggesting is code objects exist as first class objects. They
| don't have to creased by parsing a text file. Which means the
| editor can directly manipulate them.
|
| If you've ever used CAD programs especially Schematic Capture
| and PCB Layout programs you'll get the idea. Everything
| displayed on the screen is an object in a database.
|
| Instead of a parser blindly parsing a text file and coming
| across a struct definition, a struct is object created by the
| programmer. Which means you can have hard links between the
| objects that make up the program. And those can be directly
| manipulated with manually or programmatically.
|
| The big advantage comes from maintenance and refactoring.
| Change the name of a field? Happens in exactly one place in
| the program. So instead of a diff with 1537 files changes you
| have 'renamed struct fobar to struct foobar'. Change a
| comment? Well it's just a changed comment.
| maxbond wrote:
| I'd like to see languages like that, but it's a massive
| lift. Our systems of source control and CI/CD are built
| around the assumption that languages are text files that
| are somewhat line oriented. It's valuable for people to be
| able to use different editors and tools, and text files are
| the integration point that makes this work.
|
| Fallible parsers are a bit of a hack but they're a hack
| that works and is widely deployed already, and I'm not
| convinced the value of moving to a symbolic/binary
| representation creates enough value to justify the risk
| involved in the migration. That being said I'd love to see
| it happen. We'd end the tabs and spaces debate once and for
| all!
| diffxx wrote:
| If you write your own editor with language server features,
| writing a language server implementation _should_ (dangerous
| word) be relatively easy. By writing the editor in your
| language, you prove that your language is actually useful for
| a real world problem. You also have the flexibility to add
| any functionality that you wish without contorting your
| thinking to whatever may be imposed on you by the lsp model.
|
| It isn't necessarily clear to me that writing a language
| server implementation is that much easier than writing an
| editor. To be fair, I have personally done neither (though I
| have written REPLs) so perhaps I am wildly off in my
| estimation.
| maxbond wrote:
| I don't really disagree, but would point out that, for the
| languages I hack on, they're DSLs that wouldn't be
| appropriate to write an editor in. I agree that you should
| be writing code in your language & solving problems with it
| though, even before there's a working implementation. It
| does really help hone your vision and focus on the problems
| you're solving.
|
| I've not written a language server either, and I'm positive
| it's one or two orders of magnitude more difficult than a
| simple editor, but I think the time is better spent for
| many projects because it gets you into the nitty gritty
| mechanics of your language & sets you up for a good
| developer experience. If your language specializes in
| writing things like editors though, that would certainly
| not be the case.
|
| Tangentially, my language development hot take is that YAML
| is a really great platform to start with, so you can focus
| on prototyping your runtime & semantics and kick the can on
| parsing & syntax. Parsing takes up a lot of time and you
| might not know if this language idea is worthwhile or not
| yet. Additionally, YAML is pretty darn good and you may
| never need to move off of it.
| diffxx wrote:
| > for the languages I hack on, they're DSLs that wouldn't
| be appropriate to write an editor in
|
| Ha, I have been working on a DSL for writing DSLs, which
| perhaps explains my perspective a bit.
| maxbond wrote:
| Sounds cool! If you have something public I'd love to see
| it. Email is in my profile.
| diffxx wrote:
| It's not quite ready to share but hopefully soon :)
| maxbond wrote:
| Best of luck & happy hacking
| xyzzy4747 wrote:
| For max optimization, wouldn't it be better to create a Rust or C
| library for parsing that Go links into? I personally don't see
| the usefulness of trying to optimize Go itself too much as it's
| handicapped by the runtime and garbage collection.
| Thaxll wrote:
| I've seen some real world example where Go was as fast or
| faster than Rust for CPU / io intensive task.
|
| Go is a fast language even with a GC.
|
| https://github.com/boyter/scc/#performance
| akira2501 wrote:
| For maximum return on investment, wouldn't it better to focus
| on something other than raw speed? I personally don't see the
| usefulness of trying to make everything in Rust itself too much
| as it's handicapped by it's compiler and lack of specification.
| tester756 wrote:
| C library for parsing?
|
| isn't it dangerous from security perspective?
| xyzzy4747 wrote:
| It's just an example. The options are really Rust (what I'd
| prefer), C++, C, or perhaps something like Nim that compiles
| to C.
|
| If you're trying to make an unoptimized parser, then use
| whatever you want.
| [deleted]
| [deleted]
| pharmakom wrote:
| For some reason we insist that language parsers are implemented
| in the language itself, even when the language isn't great for
| parsers.
| 37ef_ced3 wrote:
| You're in for a big surprise. Try using the language.
|
| Spend some time using Go, and you will be impressed by its
| performance.
|
| You'll wonder, "Were all those haters on Hacker News
| misinformed?"
| bugfix-66 wrote:
| Correct, Go is fast, very close to C.
|
| And just like in C, if you want to avoid memory management
| overhead you can use a slice of structs, integers instead of
| pointers, and a freelist (if needed). For example, here is a
| pointerless sparse bit vector:
|
| https://bugfix-66.com/7256e0772dc3b02d72abf15b171731c933fd44.
| ..
|
| The article is storing parses in a balanced binary tree, like
| a packrat memoizing parser.
|
| Here is the fastest balanced search tree in Go. It allocates
| (and uses Go's garbage collector) but you can easily use a
| slice of structs with integer index pointers and a freelist
| instead:
|
| https://bugfix-66.com/c93e950965804eba90a34e0055985b1c42d5a1.
| ..
|
| The above code will perform very similarly to C.
| xyzzy4747 wrote:
| If you're making something requiring CPU optimization as a
| core feature, you might as well go with one of the fastest
| languages instead of handicapping your project from Day 1. Go
| is not considered one of the fastest. It's better for network
| or filesystem logic that is I/O limited.
| samatman wrote:
| The optimization here is using incremental parsing, so that
| changing parse state goes from O(n) to may-as-well-be-O(1).
| It's probably linear with tree depth.
|
| Any language is fast enough to do this, certainly Go is.
| Naive parser combinators written in slow languages can
| tokenize six-figure LOC files fast enough that the user
| won't notice.
| jerf wrote:
| This is kind of a test of how nuanced your understanding of
| programming languages can be.
|
| Rust with a bit of effort put into optimization will be
| faster than Go with a bit of effort put into optimization,
| it is true. However, you need to double-check your
| intuition for how big and how consequential the delta is,
| because I'd guesstimate it as roughly a factor of two,
| possibly a touch less. It is true that Rust does a
| _crapton_ more "optimizations", but a _lot_ of those
| optimizations have diminishing returns.
|
| A factor of 2 may still sound large, but in practice it
| isn't as large as it sounds, because my qualification "a
| bit of effort put into optimization" is not redundant. Go
| with a bit of optimization will probably beat someone's
| first draft of Rust. Go with a ton of careful optimization
| will probably beat Rust with a bit of optimization. The raw
| performance of the two are reasonably close, and smaller
| than the improvements you can usually get with
| optimization. So Rust's speed advantage, which is real,
| generally only matters in cases where you're going to
| optimize very heavily.
|
| Is this one of them? For that I can give a solid... Maybe!
| There are times and places where parsing is something you
| want optimized to within an inch of its life, certainly.
| However... it isn't all the places, and some of your
| intuitions may lead you astray if you're not careful; you
| might think a heavy duty programming language would need a
| great parser, but if it's going to chew on optimizations
| for possibly literally 100x the time, it may matter a lot
| less.
|
| In general, Rust is capable of going faster than Go (again,
| I'd guestimate about a factor of 2 with isolate tasks where
| it may be able to go event faster, but that only matters if
| that's the bulk of your program), but Go is going to be
| fast enough that that only matters in certain limited
| places where you're willing to put some non-trivial effort
| into performance in the first place.
|
| This is in contrast to a comparison between Go/Rust and
| Python, where even casually written Go/Rust can outpace
| optimized pure Python, even before we start talking about
| how much better Go/Rust will be using multiple CPUs. This
| is because Python is just that slow, and let's not even
| talk about how slow Python can be if you don't write it to
| be optimized and you start heavily using all its features
| without realizing how expensive they are. From the point of
| view of Python, Go and Rust have very similar performance
| characteristics. (But then, of course, one must be careful
| with Python because something like NumPy will blow your
| socks off when it turns out to not really be Python at
| all.)
|
| It's a rich, nuanced problem space and should not be
| approached with sloganeering and "my language is better
| than yours".
|
| My summary of Go is: It's a slow compiled language... but
| it is still a compiled language, and it is faster than
| pretty much everything that isn't, possible exception of
| LuaJIT, and the delta between slowest compiled and fastest
| compiled is only ~2-3x, which in the overall space of
| programming language speed isn't actually that great.
| kaba0 wrote:
| Not sure if rust vs go would be the best example here.
| Rust vs Java would be a better one -- go has a very
| primitive GC in comparison, and java does optimize hot
| loops to a higher degree, so a naive code base would be
| very hard to beat in a lower level language.
| richieartoul wrote:
| I do a lot of "high throughput" stuff at work in both Go
| and Java, and the Go stuff is usually faster by default.
|
| Java tends to win for really naive programs where the
| author didn't bother caring about performance or
| allocations at all, but if any care was put into it at
| all Go usually wins in my experience.
|
| The trope that Go's GC is primitive in comparison to
| Javas is not really accurate. You can't consider a
| language's GC in isolation.
|
| Java's GC and JIT are extremely complex because the
| language semantics are terrible for performance by
| default. The "everything is an object" model made sense
| when the language was designed and main memory access
| times were roughly equal to a CPU cycles, but that's no
| longer true by a factor 100 to 200 now.
|
| Go's GC makes different trade offs (low latency,
| extremely high concurrency, tight integration with the
| runtime and scheduler) because the language semantics are
| much more sympathetic to modern hardware ("true" structs,
| automatic escape analysis, etc), so it can.
| kaba0 wrote:
| Sure, Go can get away with more primitive GC exactly
| because it has "value types", so less garbage is created.
| But they are still much worse, lower latency only means
| that they pause threads to get more breathing space if
| they have been allocating too heavily, they are
| absolutely not even close to the same league Java's low
| latency ZGC does.
| geodel wrote:
| > they are absolutely not even close to the same league
| Java's low latency ZGC does
|
| This is the kind of thing always offered without any
| serious numbers extracted from real life or even
| realistic test programs.
|
| So even if technically true in very narrow sense it is
| more of _high performance car marketing_ with fancy
| algorithm and data structure names. By the time GC are
| used in end user programs with tons of libraries,
| frameworks, design patterns and inefficient to implement
| business rules those GCs show little difference that
| fancy ads promised on _TV_.
| richieartoul wrote:
| It's really the usage of the word primitive that I'm
| arguing with. Java's GC comes with a lot of additional
| trade offs that Go's doesn't.
|
| For example, the fact that the Java GC is copying and
| generational means that there is a LOT more overhead
| introduced by write barriers.
|
| If you benchmark the rate at which the GCs can clean up
| garbage, Java always wins, but the Java GC impairs you a
| lot more in other situations that the Go one doesn't.
|
| It's trade offs, but the Go one makes much better trade
| offs for modern hardware IMO.
| kaba0 wrote:
| Write barriers are a single local conditional on the fast
| path, if I'm not mistaken. Also, since a JIT compiler is
| in action, it may have a much wider range than every
| object interaction. It's basically free on modern
| computers with branch prediction.
|
| ZGC (the low-lat GC) does employ read barriers though
| which will decrease throughput considerably, but latency
| and throughput are almost universally opposites of each
| other.
| chrsig wrote:
| Do you have benchmarks you can share?
| kaba0 wrote:
| Well, cross-language benchmarking is always hard, but for
| purely testing the GC this is not too bad:
| https://benchmarksgame-
| team.pages.debian.net/benchmarksgame/...
|
| See how ahead Java is of any other managed language (and
| it doesn't really make sense to do this benchmark with
| non-GCd languages)? Though this is done with the G1 GC,
| not the low-latency one - this default GC is more
| throughput oriented with a max pause time target goal.
| Also note how Java does use multiple times more memory,
| as it "postpones running GC when it knows it still will
| have enough time to collect all of it without running out
| of the target goal" - this is also the reason why java is
| quite ahead on "energy efficiency" reports as well. And
| also, GCs work better with more heap space.
| geodel wrote:
| > this is also the reason why java is quite ahead on
| "energy efficiency" reports as well.
|
| Very soon businesses would be asking for "dollar
| efficiency" also. I think going by effort on Java and
| their frameworks vendors to pack more instances of Java
| process/pods on a VM, it is already been asked by tech
| savvy customers.
|
| So that old _fact_ that on sever side programing
| customers only care for raw throughput and not on machine
| size because _RAM /CPU/disk is cheap_ is not working well
| in cloud based deployments where now each of these
| matter.
| kaba0 wrote:
| To be honest, I really don't get this microservice/cloud
| hype, stackoverflow (which let's be honest will be bigger
| than that 34th startup) runs on a single (very beefy
| though) dedicated server machine.
|
| I pay like 5 dollars a month for a VM with very low
| params, but even that will happily run anything.
| Especially that the DB likely can't be shared the bus
| factor will remain 1.
| fsdjkflsjfsoij wrote:
| Java requires a much more advanced GC and JIT because
| Java programs tend to allocate a lot more and have
| extremely bad memory layout when you're not restricting
| yourself to primitives. Project Valhalla's value types
| will significantly improve the situation. Relying so
| heavily on the JIT also has other problems especially in
| programs that have widely varying execution paths.
| kaba0 wrote:
| Surely, that's the incentive part for why the team spent
| many many work hours improving their GC - just because
| the JVM typically depends more on a good GC doesn't make
| it any less useful - long running processes do make
| significant use of automatic memory management.
|
| Also, Java's GCs are moving/compacting GCs, so while the
| immediate memory representation is indeed inefficient,
| again, for long running processes Java will place often
| together-used objects physically close to each other, and
| will defragment the heap. But Valhalla can't come soon
| enough, I agree.
|
| > Relying so heavily on the JIT also has other problems
| especially in programs that have widely varying execution
| paths
|
| Has it? I would think that an AOT program would have a
| worse time with widely varying execution paths, while a
| JIT compiler is free to reoptimise based on changing
| application state.
| geodel wrote:
| > just because the JVM typically depends more on a good
| GC doesn't make it any less useful -
|
| I mean it feels like personal choice. Do I _praise_ the
| spouse when they bring whole kitchen down while making a
| dish and cleaning up quickly afterwards? Or do I take it
| as "Well, you made mess so it was basic expectation from
| you to clean up fast for later use"
| kaba0 wrote:
| I would wager that most applications have plenty of
| object lifetimes that are not regular at all -- a web
| server with some stateful sessions for example. So your
| analog doesn't really make sense -- go can't avoid these
| situations at all and will significantly perform worse in
| these cases.
| foldr wrote:
| You'd generally expect Rust and Go to perform about the
| same for CPU bound workloads. Rust has access to more
| advanced codegen and optimizations via LLVM, but Go's
| garbage collector will often be faster than refcounting (or
| whatever manual memory management technique your Rust code
| ends up using). This is especially so given that the GC
| runs on a separate thread without any additional effort on
| your part, making it almost 'free' in the context of
| parsers (which tend to be single threaded).
|
| A real world example of this is esbuild. The author posted
| on HN that his initial Rust implementation was actually
| somewhat slower than the subsequent Go version.
| super_flanker wrote:
| > Go's garbage collector will often be faster than
| refcounting (or whatever manual memory management
| technique your Rust code ends up using)
|
| I'm not supporting the argument that everything should be
| written in Rust (or whatever) for good performance.
| However blanket statement like this is not true; micro-
| benchmarks are often misleading. There are many factors
| which affect the performance and they come with
| tradeoffs, so you can choose what options favor you most.
| At the end, objectively Rust offers more ways to optimize
| your program.
| dymk wrote:
| This is a premature optimization, and keeping everything in
| the same language has benefits like greatly simplified
| tooling and building
| xyzzy4747 wrote:
| It's not a premature optimization - it's deciding the
| maximum that the parser can be optimized in the future.
| Choosing Go sets a lower ceiling.
|
| > Keeping everything in the same language has benefits
| like greatly simplified tooling and building
|
| Surely there are other Go libraries that incorporate C,
| C++, or Rust? Also if both parsers existed and were
| equally easy to set up, and you were planning on doing a
| ton of parsing, it would make sense to go with the faster
| one.
| dymk wrote:
| It absolutely is a premature optimization. If it's fast
| enough, then it's fast enough. The author hasn't
| indicated that the current Go implementation is hitting a
| ceiling imposed by the language yet.
|
| If you'd like to, you can provide some real-world
| examples - or even microbenchmarks - showing that Go is
| so much slower than <your choice here> that it's going to
| make a difference.
|
| > Also if both parsers existed and were equally easy to
| set up
|
| They're not equally easy to set up. Language interop is a
| pain in the pass.
| Jtsummers wrote:
| Look at the current Makefile:
|
| https://github.com/aretext/aretext/blob/main/Makefile
|
| Build is literally a `go build ...` and install is `go
| install`. Adding any other language to the mix would make
| this a polyglot project and not be "equally easy to set
| up". The other question is, do both parsers exist? In
| this write-up they point to tree-sitter as a possibility
| which is a JS program that produces C code. This would be
| viable, but here's the author's take:
|
| > I considered integrating tree-sitter, an incremental
| parsing library with parsers for many existing languages.
| However, running JavaScript to generate parsers and
| linking to a C library would have greatly complicated the
| build process. Today, aretext can be built on almost any
| platform using a single go install command. I've had
| users install aretext on ARM laptops, FreeBSD servers,
| Chromebooks, and Android phones. _To maintain
| portability, I wanted a pure Go implementation._
|
| So this wasn't some casual decision, but something they
| at least considered long enough to describe here.
|
| And the parsing library itself is only around 1200 lines
| total (comments, blanks, and code). The parsers for each
| language add a lot more, of course, but should be roughly
| equivalent given the same library and interface. I
| imagine that if this project really takes off and
| performance becomes a real problem they can do the
| rewrite at that point. Right now, the code works, seems
| to work fast enough for its author and primary users, and
| it's trivial to install on any platform supported by Go.
| So yes, it would have been a premature optimization to
| complicate the build process, probably reduce the number
| of supported platforms (or greatly increase the effort to
| support the same number of platforms), just to have a
| slightly faster parser.
| rollcat wrote:
| One of Go's primary goals has always been compilation speed.
|
| Go started out in C, and was later (post-1.0) incrementally
| rewritten to be self-hosting. One of the authors (Ken Thompson)
| is also one of the co-creators of C. I would argue these guys
| know what they are doing.
| kaba0 wrote:
| I don't know, not implementing generics when it was pretty
| obviously needed was a huge oversight, so I'm not sure.
|
| Also, the reason for compiler bootstrapping is more of a
| "beauty thing", then practicality. It would definitely be
| faster in a low-level language, but I doubt it would matter
| as an end user.
| fredrikholm wrote:
| You aren't sure if _Ken Thompson_ knows what he 's doing?
| kaba0 wrote:
| As a software architect? Absolutely. Programming language
| designer? Not sure, neither C or Go are good languages in
| my personal opinion.
|
| EDIT: I meant to write that I think very highly of him as
| an architect/developer.
| sjansen wrote:
| Experience has shown that often "worse is better". Go
| does an amazing job of balancing complexity and power. I
| haven't seen a "better" language that isn't either
| slower, harder to become productive, or both.
|
| https://en.wikipedia.org/wiki/Worse_is_better
| kcartlidge wrote:
| > _not implementing generics when it was pretty obviously
| needed was a huge oversight_
|
| I _get_ the desire for generics. I do a lot of C# and have
| used generics for a very many years. Yet I 've been writing
| Go for around 6 or 7 years and other than in the beginning
| (when I was new to it) I haven't found myself missing them
| at all.
|
| In other words, for many people the lack of generics comes
| across as an oversight. For others, including myself
| (again, a heavy generics user in C#) that really isn't the
| case. I write Go in the style of Go and it just hasn't been
| an issue.
|
| Blanket statements are rarely true. YMMV.
| kaba0 wrote:
| Well, it is not a blanket statement, it's just the
| generic truth (pun not intended) based on decades of
| evolution of programming languages and a relatively
| expensive mistake for Java, which would have been a
| perfect opportunity to learn from.
|
| Sure, it is seldom missed as an end user, but as a
| library user it is essential. That's why map and the like
| had to be hard coded into the language, and why
| concurrent versions couldn't be implemented for a long
| time in the language, the same way it was done for Java
| forever.
| kcartlidge wrote:
| > _Well, it is not a blanket statement, it's just the
| generic truth_
|
| A bit contradictory, really, but that's just semantics I
| suppose.
|
| More importantly even if you could say 99% of devs agree
| (and you can't because they don't) that still doesn't
| make it an _oversight_.
|
| If they'd neglected to add generics because it _wasn 't
| considered_, that's an oversight. If it was neglected
| because in the opinion of the creators of the language it
| _wasn 't needed for the purposes they created it for_,
| that isn't an oversight but a thought-through engineering
| decision.
|
| Of course you're free to disagree with that decision, but
| an oversight it was not.
| kaba0 wrote:
| Fair enough, I may not have used the correct word, but it
| is still a typical "told you" situation, both during
| development, after go's initial appearance and ever since
| until it finally was decided that it should be indeed
| implemented.
| kcartlidge wrote:
| True.
|
| Whilst I haven't missed it in Go myself, enough other
| people say they _do_ that its inclusion was inevitable.
| Which means it probably should have gone in sooner.
| chrsig wrote:
| There's a big misconception that the creators of go
| didn't _want_ generics. They 've stated a number of times
| that they didn't have a design that they all thought was
| adequate.
|
| After several years and attempts at a good enough
| proposal, Ian Lance Taylor put one out that was able to
| cross the finish line, and now we have generics.
| kcartlidge wrote:
| > _They 've stated a number of times that they didn't
| have a design that they all thought was adequate_
|
| You know what, with all the 'discussions' in recent years
| about _whether_ Go should have generics I 'd actually
| lost track of that amongst all the noise. Which is
| irritating, as I do now remember the early conversations
| about it.
|
| Thank you for the reminder.
| sesm wrote:
| This article uses the word 'rune' extensively. From the context I
| assume it means 'lexem' or 'token' (i.e. the unit the
| lexer/scanner produces and feeds to parser). But then the article
| uses the word 'token' to mean the output of a parser ('keyword
| token'), while the usual terminology is that parser output is
| called a 'parse tree'.
|
| So, in this terminology, the parser consumes 'runes' and outputs
| 'tokens', while the usual terminology is that parser consumes
| 'tokens'/'lexems' and outputs 'parse tree'.
| pgwhalen wrote:
| Rune is a type alias in go, which more or less maps to the more
| common words "character" or "code point".
|
| https://go.dev/blog/strings
| sesm wrote:
| Thanks! I don't know Go, so this terminology was surprising
| to me.
| sjansen wrote:
| In Go, `rune` is an alias for `int32` and is used to indicate
| the value is a Unicode "code point".
|
| For characters in the ASCII range, that means it's just a
| character encoded using more bits. If you need to worry about
| the full Unicode range then it's important to understand
| Unicode Normalization Forms.
|
| https://go.dev/blog/strings
|
| https://en.wikipedia.org/wiki/Unicode_equivalence
| tester756 wrote:
| Difference in complexity of IDE's parser and Compiler's parser
| feels like order of magnitude
|
| IDE's you want to be very fast, so you use techniques like
| partial tree reparse and now when I think about it, then you also
| may need to update other places
|
| like you use type that is defined somewhere below
|
| and at first parse that type definition doesn't compile, so type
| usage above shows an error
|
| and when you change type definition so it compiles, then if you
| only update that part of the tree, then previous would still
| scream about the error
|
| it's really tricky
|
| the theory behind how to deal with all of this problems seems to
| be easy
|
| but when you actually get to the coding, then you have to be
| really thoughtful, careful and experienced in order to get the
| modeling of right
|
| https://learn.microsoft.com/en-us/shows/seth-juarez/anders-h...
|
| ________
|
| I have some experience with simpler and more complex parsers
|
| and for me no other type of software requires this much careful
| thought as parsers do if you want to address all those things
| like
|
| correctness, speed and recovery on broken code fragments, good
| error messages, good code, maybe concurrency
| binwiederhier wrote:
| Interesting read. Thank you for sharing. I always found parsers
| fascinating and mystical. It seems like these parser functions
| (which i think are analogous to what Rob Pike calls state
| functions) are a common way to do parsing, though i know very
| little about it. I especially found the combinators intriguing,
| though I don't care much for the functional programming syntax in
| a language like Go.
|
| Anyway, thanks for sharing.
|
| Tangentially, I wrote a little mini parser [0] of my own for my
| side project. It is inspired by Rob Pike's talk on parsers [1].
| It doesn't use state functions, but instead just uses the call
| stack to keep track of where we are.
|
| [0]
| https://github.com/binwiederhier/ntfy/blob/main/server/actio...
|
| [1] https://www.youtube.com/watch?v=HxaD_trXwRE and
| https://go.dev/src/text/template/parse/lex.go
| skohan wrote:
| Yeah parsers are fun! We did a recursive descent parser for a
| toy language in uni and I think it was one of the most
| illuminating and fun projects we did at school.
|
| Lately I've been working on a tool to make it easy to implement
| a parser, and I end up using it for everything, because DSL's
| are so nice to work with.
| [deleted]
| throwaway290 wrote:
| Off-topic but are there any aretext users? How does it fare?
| pharmakom wrote:
| > a successful parse consumes at least one rune
|
| This avoids infinite loops?
| Jtsummers wrote:
| Yes, there would be two outcomes for an attempted parsing.
| Either it succeeds and makes progress (and eventually
| terminates) or it fails and consumes nothing (and terminates
| because eventually you run out of parsers to try).
| hk__2 wrote:
| If you want a hard/interesting parsing challenge, try Clojure's
| #_ reader macro. It's a powerful construct that allows you to
| comment the next form. If you're not used to Clojure, it's like
| writing #_ in front of anything --a function, an array, a
| keyword, etc-- to comment it, even if it's on multiple lines.
|
| For example: #_ (defn foo [x y]
| (println x y))
|
| This is equivalent to commenting the three lines. Things become
| even harder when you learn that these thing "swallow" the next
| form and can be used anywhere: (let [a #_ b 43]
| #_ #_ hello (H N) (print a))
|
| The code above is equivalent to the following:
| (let [a 43] (print a))
|
| All the rest is comments.
|
| The hard/interesting bit is that to tokenize you must construct a
| syntax tree in order to correctly parse the next form, but in
| order to construct a syntax tree you first need to tokenize the
| code.
| derek8bai wrote:
| cool stuff
___________________________________________________________________
(page generated 2022-10-22 23:00 UTC)