[HN Gopher] Tree-sitter: an incremental parsing system for progr...
___________________________________________________________________
Tree-sitter: an incremental parsing system for programming tools
Author : sbt567
Score : 331 points
Date : 2021-02-22 15:03 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| alissasobo wrote:
| You can watch a good Strangeloop presentation on Tree Sitter.
| https://www.youtube.com/watch?v=Jes3bD6P0To
| wiradikusuma wrote:
| While we're in this discussion: Say I want to implement "SQL" for
| my app (if you've used Jira, I want to make my own JQL). Is this
| the tool for that? I'm looking for something much simpler than
| ANTLR.
| Grimm1 wrote:
| I recently used this to put together a unified PL classification
| model. It's nice because any language treesitter grows to support
| we'll support pretty effortlessly and treesitter captures more
| than enough nuance per language to derive high quality
| classifications.
|
| It's fair to say we can classify a snippet of code based on
| either single or multiple AST paths produced by treesitter. Right
| now only doing the programming language but extending it to
| function classification or description etc isn't out of the
| question we just don't need it right now.
| drewdennison wrote:
| We've been using tree-sitter for Semgrep and it's nothing short
| of incredible. Amazing work by Max and team.
| ahelwer wrote:
| I half-wrote a tree-sitter grammar for a niche DSL (the PRISM
| probabilistic model checking language). It was a very nice
| experience. It's part of another half-written side project to
| create a language server for PRISM; I still haven't gotten around
| to making the whole end-to-end pipeline work.
|
| With its syntax tree query frontend I wonder whether tree-sitter
| would make a good interpreter frontend for some niche languages,
| or you need something more powerful.
| amelius wrote:
| Next steps: incrementally resolve symbols and type-check?
| dcreager wrote:
| We're currently working on a more precise version of the Code
| Nav that's shipped on github.com, which is very similar in
| spirit to this!
| ritter2a wrote:
| I tried to use this to ease the front end work load of students
| in a compiler project (building a C compiler) for a University
| course, so that the project could be focused on the more
| interesting middle and back end parts of the compiler. However,
| reported bugs in the C grammar that saw no activity at all [1]
| made this impossible. From this small sample of experiences, I
| was left with the impression that Tree Sitter is great for things
| like syntax highlighting, where wrong results are annoying but
| not dramatic, but not so suitable for tools that need a really
| correct syntax tree.
|
| --- [1] https://github.com/tree-sitter/tree-sitter-c/issues/51
| dang wrote:
| If curious, past threads:
|
| _Tree-sitter: new incremental parsing system for programming
| tools (2018) [video]_ -
| https://news.ycombinator.com/item?id=21675113 - Dec 2019 (28
| comments)
|
| _Tree-sitter - a new parsing system for programming tools
| [video]_ - https://news.ycombinator.com/item?id=18213022 - Oct
| 2018 (25 comments)
|
| Others?
| maxbrunsfeld wrote:
| One more that I know of:
|
| _Atom understands your code better than ever before_ -
| https://news.ycombinator.com/item?id=18349013 - Oct 2018
| Annili wrote:
| I'm curious to see if Tree-sitter can be used to provide fast and
| rich code navigation. I was able to implement simple goto
| definition/references [1], not sure if it can be used for more
| advanced navigation features in a language-agnostic way.
|
| If you're interested, GitHub is already using it [2] for that
| purpose and Sourcegraph is experimenting it [3]
|
| [1] https://github.com/alidn/lsif-os [2]
| https://github.com/github/semantic [3]
| https://github.com/sourcegraph/sourcegraph/issues/17378
| maxbrunsfeld wrote:
| At GitHub, we're in the process of building a more precise code
| navigation system on top of Tree-sitter, that models language-
| specific name-resolution rules in detail.
|
| Our currently-available code navigation system also uses Tree-
| sitter, but it is pretty simple; it just matches up references
| and definitions by their name.
| himujjal wrote:
| Wrote tree-sitter-svelte. Was a good experience. I am also
| writing a programming language of my own similar to TypeScript
| and I am using tree-sitter for the same. Its a delight to work
| with it. Removes a lot of the worries.
| ducktective wrote:
| Is this the same thing neovim uses for syntax highlighting?
|
| Is there a chance for it getting integrated to vim? Last I
| checked vim used a regex method which was slow and faulty.
| ckolkey wrote:
| Yup, neovim 0.5+ will be using treesitter for any supported
| languages, with the current Regex highlighting as a fallback.
| [deleted]
| mkingston wrote:
| Follow the nvim 0.5 release here:
| https://github.com/neovim/neovim/milestone/19
| guerrilla wrote:
| Is the use case for this mainly IDEs or is it intended to replace
| traditional lexer and parser generators too?
| dcreager wrote:
| We are also using this to power a lot of the program analysis
| features on github.com. We use it to generate the symbol list
| for Code Navigation, as an example, and are starting to look at
| extracting more semantic information about some languages using
| tree-sitter parse trees as intermediaries.
| patrec wrote:
| I have used tree-sitter, but only for a very simple use case.
| The main shortcoming I am aware of are error messages, see
| here:
|
| https://github.com/tree-sitter/tree-sitter/issues/255
|
| Tree sitter will basically always generate a parse tree, even
| for malformed input, in which case it will add ERROR nodes for
| the bits it doesn't like (it will also inform you that there
| were problems with the parse by setting a boolean attribute).
| So you have some information you can use to construct a useful
| error message yourself, but some parser generators will handle
| this better (although it has to be said that the difficulty of
| obtaining good error messages from a parser generator are still
| one of the main the reasons production parsers are mostly
| written by hand).
| guerrilla wrote:
| Ah I see, so the reparation isn't avoidable for now? That
| doesn't seem very appropraite for compilers then.
| patrec wrote:
| Why would it not be appropriate? The only annoyance I see
| is that currently you will have to generate a good error
| message from it yourself, but a first pass at the problem
| shouldn't be too onerous.
| guerrilla wrote:
| Ok, I misunderstood. I thought it repaired without error
| sometimes but I see that you were clear that that isn't
| the case.
| srcreigh wrote:
| Tree Sitter is amazing. The parsing is fast enough to run on
| every keystroke. The parse tree is extremely concise and
| readable. It resembles an AST more than a parse tree (ie no 11
| levels of binary op precedence rules in the tree). The parse tree
| emits specific ERROR nodes, so you can get a semi-functional tree
| even with broken syntax.
|
| I can't wait for the tools to get built with this. Paredit for
| TypeScript. Syntax-tree based highlighting (vs regex
| highlighting). A command to "add an arg to current function"
| which works across languages. A command to add a CSS class to the
| nearest JSX node, or to walk up the tree at the className="| ..."
| position, adding a new className if it doesn't exist.
|
| There's a nicely documented Emacs package for this [1]. The
| documentation is at [2]. The parse trees work great. There's
| syntax highlighting support and tree-walking APIs. There's a bit
| of confusion about TSX vs typescript langs but it's fixable with
| some config change [3].
|
| [1]: https://github.com/ubolonton/emacs-tree-sitter [2]:
| https://ubolonton.github.io/emacs-tree-sitter/ [3]:
| https://github.com/ubolonton/emacs-tree-sitter/issues/66#iss...
| Annili wrote:
| > "Paredit for TypeScript"
|
| Is there a list of ideas for Structural Editing in C-like
| languages?
|
| I can think of `extend-selection, `move to parent block`, `add
| arg to function`
| dcreager wrote:
| Worth calling out that the syntax highlighting support is used
| to highlight several languages in github.com. (Linguist is
| still used for the long tail of languages, but we plan to
| migrate more and more over to tree-sitter-based highlighting
| over time.)
|
| The query language is also what's used to drive the
| fuzzy/ctags-like Code Navigation feature. Both of those are
| powered by tree-sitter query files defined in each language's
| repo, like these for Go: https://github.com/tree-sitter/tree-
| sitter-go/tree/master/qu...
| eins1234 wrote:
| Awesome to hear that amazing tech like tree-sitter lives on
| even though Atom, the product it was built for, is pretty
| much on life support at this point.
|
| Curious if there's any efforts to bring tree-sitter to
| VSCode? Exposing tree-sitter to extensions could open up so
| many possibilities like OP mentioned.
| josteink wrote:
| Tooting my own horn, Emacs' csharp-mode[1] is undergoing a
| rewrite to be 100% based on tree-sitter rather than regexps.
|
| The new code runs way faster and is so much nicer to work with.
|
| Once all the kinks are gone, I can't imagine going back.
|
| [1] https://github.com/emacs-csharp/csharp-
| mode/blob/master/csha...
| robto wrote:
| I'm so excited for this to become built-in in more places! I
| think once non-lisp users can experience the Power of
| Structural Editing they'll say, "Hey, I understand now why you
| all feel so passionate about your parentheses!"
|
| And I can stop feeling like my fingers have all lost a knuckle
| when I'm writing Typescript :)
| rgossiaux wrote:
| Neovim nightly already has some tools available as plugins. I'm
| using tree-sitter for syntax highlighting, text objects, and
| folding right now. Pretty satisfied so far.
| mkingston wrote:
| The official release of built-in treesitter comes with neovim
| 0.5. Which _looks_ like it 'll be out pretty soon. I've been
| watching a fairly steady march toward release here:
| https://github.com/neovim/neovim/milestone/19
| tazjin wrote:
| A friend of mine started working on an experimental Emacs mode
| to provide structural navigation of code based on tree-sitter:
| https://cs.tvl.fyi/depot/-/tree/users/Profpatsch/emacs-tree-...
|
| The potential for this is essentially something like Paredit,
| but for all languages.
| yewenjie wrote:
| Can someone point some examples of what `paredit` for other
| languages provide? I do various lisp programming occasionally
| but have not used `paredit` yet.
| tazjin wrote:
| Check out this video for a quick demo:
| http://emacsrocks.com/e14.html
|
| If you know a Lisp I recommend just giving paredit a spin
| for a few minutes, it's an interesting experience.
| z3t4 wrote:
| Looks like it's mainly tree/code manipulation. Typing
| code on the keyboard is probably the least taxing thing
| when it comes to software development. But I guess it
| will be nice once it has become a "reflex" rather then a
| conscious key-combo.
| tazjin wrote:
| It's not so much about reducing the amount of characters
| typed, and instead moving the way you think about code
| from the character level to a more structural level.
|
| Calling it a "reflex" is an interesting phrase! Tools
| like magit let me encode complicated processes into
| muscle memory, in a way where retrieval doesn't have to
| go through remembering and typing a string. Structural
| editing is similar.
| mumblemumble wrote:
| I only started using it a few months ago. It's such a
| natural way to edit code, it only took me about a day for
| it to become reflexive.
|
| Now it just feels vaguely annoying to work without it.
| It's fine, it's just one of those ergonomic changes that
| nags at you a bit. Kind of like the opposite of that
| feeling of taking off uncomfortable business clothes at
| the end of the day. Or what I imagine people who are
| better at vim than me keep talking about.
| lalaithion wrote:
| Maybe I can finally have this syntax highlighting style:
| https://youtu.be/b0EF0VTs9Dc?t=900
| srcreigh wrote:
| There is an emacs package for this (maybe beta). I can't
| remember the name of it and Google is failing me.
|
| EDIT: finally found it https://github.com/alphapapa/prism.el
| jackcviers3 wrote:
| Rainbow delimiters mode kind of does this, but doesn't
| maintain the scope color of referenced variables.
| brundolf wrote:
| The idea is pretty awesome, but my eyes nearly rolled out of
| my head from the needless condescension at the beginning.
| maxbrunsfeld wrote:
| Hey, Tree-sitter author here. Thanks for posting! Let me know if
| you have questions about the project.
| gravypod wrote:
| When I played around with tree sitter a bit I noticed there
| were situations where ast elements didn't exactly contain what
| I'd expect them to. For example: comments are represented in
| the AST but unfortunately they don't have the contents of the
| comment parsed out following the laguanges conventions.
|
| I was wondering if this is a case I could open an issue about?
| Is this for the main tree sitter repo or should I open one
| language-by-language?
|
| I was looking into automating some stuff across all languages
| with tree-sitter but handling all of the languages comments
| syntaxes made it very hard.
| maxbrunsfeld wrote:
| Most tree-sitter grammars just parse comments as a single
| token. Can you give an example of what you mean when you say
| "contents of the comment parsed out"?
|
| Are you talking about conventions like JSDoc, for putting
| structured data inside of comments? On GitHub, we handle that
| by parsing JSDoc comments in a separate pass, using a
| separate parser. We do it this way because JSDoc isn't really
| part of the JavaScript language, not all projects use JSDoc,
| and not all applications are interested in parsing the text
| inside of comments.
| gugagore wrote:
| My guess is that they meant parsing code that has been
| "commented out".
| lemming wrote:
| Is it possible to use tree-sitter to generate parsers in
| languages other than C? How hard would it be to modify it to
| create parsers in e.g. Java?
|
| _Edit:_ sorry, I just saw that you had answered that below.
| anaerobicover wrote:
| I've done two grammars for my own use in the last few months
| (well, one isn't quite complete yet) and it's been quite an
| enjoyable (learning) experience. Thanks for sharing this tool!
| maxbrunsfeld wrote:
| That's great to hear. Thanks!
| autoditype wrote:
| Thanks for building this. I had not heard of it before, but it
| looks great Are there more tutorials elsewhere on the Internet
| you would recommned, besides what is in the documentation?
| maxbrunsfeld wrote:
| Not that I know of, right now :(.
|
| In the near future, we'll create some more GitHub-specific
| documentation that walks you through how to add advanced
| language support for any programming language on GitHub, by
| writing a Tree-sitter grammar, and then by writing the _tree
| queries_ that are used for syntax highlighting, simple code
| navigation, and someday soon... _precise code navigation_.
| yig wrote:
| Are there any plans to support modifying the grammar on the fly
| or without recompiling?
| maxbrunsfeld wrote:
| One day, I would love to generalize the web-based playground
| so that you could edit the grammars. But it's complicated,
| because we use C as our output language, so you would always
| need to recompile the C after changing the grammar.
|
| So, I would say that it's not on our near-term roadmap.
| dcreager wrote:
| I don't think you can do this without recompiling, since the
| grammars get translated into C code before use. But the
| built-in command line tools ('tree-sitter parse', etc) all
| support a mode where they will detect local changes to a
| checked-out grammar definition, and recompile on the fly if
| needed. (This happens each time the CLI program is started
| up; it doesn't happen during a long-running process.)
| sitkack wrote:
| The obvious answer is to embed TCC or another C compiler
| and either generate a dynamic library or generate wasm and
| load it directly into the process.
|
| exec_wasm(generate_wasm(generate_c(grammar)))
|
| Now if you can make that whole fn chain incremental, then a
| delta_grammar -> delta_c -> delta_wasm ->
| delta_recomputed_wasm_call stack, this will propagate
| deltas down to exec_wasm and you could dynamically execute
| the generated code as the grammar changes.
| akavel wrote:
| There's been some recent discussion as to whether tree-sitter
| grammars can be used to parse markdown with some hacks or not
| (currently it's being done by working around all the tree-
| sitter machinery, resulting in a lot of problems), with no
| consensus among plugin authors:
|
| https://github.com/nvim-treesitter/nvim-treesitter/issues/87...
|
| Could you possibly chime into that discussion and help them
| with any possible insights you might have on that? That would
| be really awesome! TIA <3
| fiddlerwoaroof wrote:
| I've been using tree-sitter via FFI from Common Lisp, but what
| I'd really like would be a way to write my own code generator
| so that the generated parser could be "native" lisp code.
| Otherwise, it's an amazing tool: my only other complaint would
| be the lack of a grammar for objective-c which would be useful
| for a lisp/objective-c bridge I've been working on.
| maxbrunsfeld wrote:
| I think that it'd be pretty easy to generate parser code in
| other languages besides C, but it would be a lot of work to
| do to port the core library itself[1] to those other
| languages.
|
| [1] https://github.com/tree-sitter/tree-
| sitter/tree/master/lib/s...
|
| I agree about the Objective-C grammar! Although it looks like
| somebody's started work on it:
|
| https://github.com/merico-dev/tree-sitter-objc
| josephg wrote:
| There's an architecture for compilers that I've been wanting
| for years where a keystroke change to the sourcecode results in
| an incremental change to the AST, and then the compiler can
| consume that AST delta to generate a binary patch to the
| compiled executable.
|
| Would tree-sitter be able to be used for that? (What I want is
| to feed tree-sitter a stream of keystroke changes and get out a
| stream of minimal AST changes as a result).
| chrisseaton wrote:
| Tree-sitter is unfathomable to me. This is the grammar for Ruby:
|
| https://github.com/tree-sitter/tree-sitter-ruby/blob/master/...
|
| I find it absolutely amazing that a grammar for something as
| complicated as Ruby can be so concise. Less than a thousand
| lines. The corresponding Bison grammar is 13k lines. And I think
| the tree-sitter one is scannerless so also includes the lexer?!
| How do they do it?
| codesnik wrote:
| bison should be compared to https://github.com/tree-
| sitter/tree-sitter-ruby/blob/master/... probably?
| chrisseaton wrote:
| No the JSON file there is generated (I believe?) from the
| JavaScript I linked, while the Bison file is hand-written.
|
| With tree-sitter you're hand-writing a 1k file. With Bison
| you're hand-writing a 13k file.
| dcreager wrote:
| This is more a function of Ruby than of tree-sitter. The tree-
| sitter grammars for other languages are hopefully less
| inscrutable. For Ruby, we basically just ported whitequark's
| parser [1] over to tree-sitter's grammar DSL and scanner API.
|
| [1] https://github.com/whitequark/parser
| chrisseaton wrote:
| I didn't mean the tree-sitter grammar was not understandable
| - it's very understandable - I just can't work out how to
| managed to find such a concise way to express grammars. Even
| compared to Whitequark it's 1/3 the size. What's the unique
| thing you do that makes it so concise?
|
| It also seems somehow to be completely declarative? How have
| you managed to transform Ruby parsing to be context-free? For
| example where's the set of what's currently a local variable
| so you can distinguish from method calls?
| tp3 wrote:
| The code is obviously much simpler than its syntax - most
| importantly, its syntactical simplicity makes it way easier
| to deal with. So when you write the code to parse it you
| don't have to try to parse it in one fell swoop like you do
| in Whitequark.
|
| So you can't read anything from a method call! I can make
| it so, if you're doing a class method (of any kind) you
| have to invoke the constructor, as described in "What is a
| method?" There's also a few new techniques like
| "new_class_method", which requires creating an object (of
| some kind) for that class... but what about that? It's not
| "I've just fixed Tree-sitter's problem"; it's that Tree-
| sitter hasn't yet resolved the problem yet - there are
| other parsing problems besides Tree-sitter in Ruby itself
| like those of classes (and classes are not part of Tree-
| sitter) and things that are known as "type-traits" and so
| on - so as it's not quite enough it can be done by other
| things. The reason for using LR grammar is that when it
| comes to this - what do I want from that grammar?
|
| The point I'm making here is that LR doesn't give a reason
| for what you're doing. As a programmer you are trying to
| write code that is portable because - if it works in a
| domain you don't understand (such as Ruby) - then you don't
| know what you're doing is wrong. There can be a domain (as
| in any language) that's a lot more complex than this - but
| since we've got that, how can I be sure it won't mess up
| the code I'm writing?
| dcreager wrote:
| Ahh my mistake! :-)
|
| To be fair, we're cheating a little bit because the Ruby
| grammar relies so heavily on an external scannar, which is
| just under 1,000 lines of C++: https://github.com/tree-
| sitter/tree-sitter-ruby/blob/master/...
| chrisseaton wrote:
| But for example how do you parse the difference between
| `x = 14; x` and `y = 14; x`? In the latter case `x` is a
| method call, and in the former it's a local variable
| read. I can't see where the parser maintains a set of
| local variables and where it queries this set. Is it
| somehow done declaratively? If so that's a huge
| achievement I don't think that's really been done before
| in a parser generator.
|
| I really want to try tree-sitter for using in an actual
| Ruby implementation because it's so beautiful!
| dcreager wrote:
| [EDITED to make the example actually line up with OP's
| test]
|
| There's no symbol table in the parser, so at parse time,
| we don't distinguish those cases: $ cat
| test.rb module Test def test1 x =
| 14; x end def test2 y =
| 14; x end end $ tree-sitter parse
| test.rb (program [0, 0] - [9, 0] (module
| [0, 0] - [8, 3] name: (constant [0, 7] - [0,
| 11]) (method [1, 2] - [3, 5] name:
| (identifier [1, 6] - [1, 11]) (assignment [2,
| 4] - [2, 10] left: (identifier [2, 4] - [2,
| 5]) right: (integer [2, 8] - [2, 10]))
| (identifier [2, 12] - [2, 13])) (method [5, 2]
| - [7, 5] name: (identifier [5, 6] - [5, 11])
| (assignment [6, 4] - [6, 10] left:
| (identifier [6, 4] - [6, 5]) right:
| (integer [6, 8] - [6, 10])) (identifier [6,
| 12] - [6, 13]))))
|
| In both cases the bit after the semicolon just parses as
| (identifier).
|
| For some use cases (e.g. syntax highlighting, depending
| on your colorization rules) it doesn't matter, and so we
| don't want to pay the cost. If it does matter (like in an
| actual implementation), then you'd have to implement this
| yourself and drive it by the parse tree you get from
| tree-sitter.
| chrisseaton wrote:
| Right you could just have a phase to fix-it-up after
| parsing. Much better than trying to shoe-horn an
| imperative action into a nice more-pure parser. Great
| idea!
| anaerobicover wrote:
| No, the Ruby grammar is actually an outlier from what I've
| seen; it has one of the largest/most complex external scanners:
| https://github.com/tree-sitter/tree-sitter-ruby/blob/master/...
|
| Precisely because the language is complicated and less amenable
| to LR parsing.
| ComputerGuru wrote:
| Not a ruby developer here: that sounds terrifying! Does it
| make it harder to have a proper mental model of the language
| (note: not the libraries) or is this mainly because of
| flexibility (too many ways to skin one cat)?
| anaerobicover wrote:
| I don't write Ruby regularly either, but I wouldn't say
| that _syntactic_ complexity, is necessarily equivalent to
| _semantic_ complexity. And the syntax is the only part that
| 's relevant to Tree-sitter: it's not an
| interpreter/compiler.
|
| Note also that (as I alluded to above) the parsing
| technique that Tree-sitter uses, "LR parsing", makes some
| things more difficult to parse than they'd be with another
| kind of parser. This is a deliberate trade-off, because LR
| parsing makes certain features of Tree-sitter, like fast
| re-parsing in response to input changes, much much easier.
| tp3 wrote:
| So, a syntactic tree is a list of elements, grouped by
| their ordering, which are to be parsed from their
| arguments, as they appeared in the input. Or a grammar
| tree, which is a set of elements. There's many things we
| can do to make Tree-sitter simpler to read and write.
| Perhaps, like in Perl, there are syntactic categories of
| types that make it much easier to find things like nodes
| in a tree, since they're the ones that come in the input.
| Or I'd be willing to say that maybe, like in Haskell,
| certain aspects of the language, are syntactic
| categories, like the parser. So some things that might
| not be obvious in code, like what the syntax for a class
| of names is, might be obvious in theory, too. Or, at
| least they might be obvious in a particular way. Or some
| aspects of the compiler are really special, and we can
| infer those in terms of what the compiler does. Or, of
| course, we can do all these other things, too. We can
| rewrite the parser, or the compiler, to try to do more or
| less anything that the parser does. Or maybe we can make
| Tree-sitter a lot simpler in general. Which I think is
| probably what you've been thinking about.
| codesnik wrote:
| It's mostly to work _less_ surprising to the programmer,
| AFAIR. Probably the most complexity is from having to
| differentiate local variables and methods depending if the
| symbol had an assignment before in the scope.
| revscat wrote:
| Flexibility. "Too many" is debatable: most organizations
| wind up settling on a subset of the idioms that Ruby
| provides, and some of the more esoteric constructs see
| infrequent use anywhere.
|
| There has been, however, discussion about the need to clean
| up some of the lesser-used language feature, but obviously
| doing so carries risks.
| RangerScience wrote:
| My mental model of Ruby is one the simplest of any of the
| languages I've worked with, but it's also the hardest to
| put into any words. JS actually does beat it out, and then
| Scala and Python come after.
|
| Everything is kind-of-but-not-really an object, a
| reference, and a function, all at the same time - which
| _sounds_ complicated but in my head... turns out to be
| pretty simple. Everything 's just kind of different flavors
| of the same thing. `attr_accessor` is a good place to see
| this in action.
|
| The flexibility comes more from the variety of available
| core language options (procs, blocks, and lambdas) and core
| libraries (map/each/collect, for example), not from a
| variety of underlying concepts.
| e12e wrote:
| > Not a ruby developer here: that sounds terrifying! Does
| it make it harder to have a proper mental model of the
| language
|
| It is a little terrifying in the sense that I'd not want to
| write language level tools (eg: syntax highlighter).
|
| But if you have scheme on one end and natural language on
| the other, ruby leans a bit towards natural language - but
| in a good way. In some ways ruby isn't that different from
| Smalltalk - but it has a lot (sometimes I think too many,
| sometimes not) _conveniences_.
|
| Parantheses and brackets are largely optional "where it
| makes sense". Conditionals support postfix, eg these are
| equivalent: if should_send?()
| send_mail({to: 'u@x.com'}) end send_mail
| to: 'u@x.com' if should_send?
| brundolf wrote:
| Here's what it looks like to call it from Rust:
| https://github.com/tree-sitter/tree-sitter/tree/master/lib/b...
|
| Seems like this would make it much easier to bootstrap a
| performant language-server. Very cool; maybe that will be my next
| project.
| dcreager wrote:
| We also have several of the language grammars published as
| crates: https://crates.io/search?q=tree-sitter (And doing the
| same for other grammars is a fairly painless process.)
|
| So if you're writing a tool for a single language (like a
| language server), it should be as easy as adding tree-sitter
| and tree-sitter-blah to your cargo manifest.
| brundolf wrote:
| Awesome! Though my thinking was that it would have an
| especially large impact for languages that aren't popular
| enough to have their own LSP yet; you no longer have to be an
| expert in writing interactive compilers to set up a
| respectable LSP for a niche language, or even a home-grown
| one
| dcreager wrote:
| Yes! This is a great point. It's similar to what I
| mentioned over on this thread [1] about how we're working
| on a more precise version of Code Navigation based on tree-
| sitter. The tl;dr is that you'd write something like tree-
| sitter queries [2], just like you do for the current fuzzy
| Code Nav, but the query DSL would be a bit more
| sophisticated, allowing you to specify the actual name
| resolution rules of your language. One of the things we're
| using to test this is an LSP shim that lets us test our
| rules in VS Code (or any other LSP-compliant editor).
|
| [1] https://news.ycombinator.com/item?id=26227476 [2]
| https://tree-sitter.github.io/tree-sitter/using-
| parsers#patt...
| pcr910303 wrote:
| To me, the most impressive use of tree-sitter was an iOS text
| editor that uses it to parse huge JSON files / mixed language
| files and highlight them in a very robust way. [0][1] I'm hoping
| tree-sitter becomes more common like LSP and Emacs can get exact
| highlighting and other tools with it...
|
| [0]: https://twitter.com/simonbs/status/1352697855845273600
|
| [1]: https://twitter.com/simonbs/status/1362492842141171720?s=21
| ducktective wrote:
| Yeah but I don't think LSP specs contain syntax-highlighting or
| semantic highlighting.
| [deleted]
| orra wrote:
| LSP supports semantic highlighting:
| https://microsoft.github.io/language-server-
| protocol/specifi...
|
| Though AIUI the basic syntax highlighting is done by the
| editor (e.g. VSCode uses Textmate grammar support).
| picardythird wrote:
| FYI there is tree-sitter.el for Emacs.
| ACosmicDust wrote:
| Emacs does have a package to use tree-sitter [0]. I think
| emacs-lsp is aware of this highlighting backend and performs
| pretty well.
|
| (semantic highlighting is pretty slow for C++ with font-lock,
| with tree-sitter it's a breeze :))
|
| [0]: https://github.com/ubolonton/emacs-tree-sitter
___________________________________________________________________
(page generated 2021-02-22 23:00 UTC)