[HN Gopher] Emacs: Feature/tree-sitter merged into master
___________________________________________________________________
Emacs: Feature/tree-sitter merged into master
Author : signa11
Score : 220 points
Date : 2022-11-23 04:46 UTC (18 hours ago)
(HTM) web link (lists.gnu.org)
(TXT) w3m dump (lists.gnu.org)
| Wonnk13 wrote:
| So if I have a fairly unremarkable setup with LSP to give me
| completions, what do I get by fooling around with tree-sitter. It
| seems like this is more geared toward building an AST, so I'm not
| sure how it would present itself to the end user currently?
| omnicognate wrote:
| Faster and more correct syntax highlighting is the main benefit
| atm, as I understand it.
|
| In general it's for functionality that needs to understand
| syntax but doesn't need a full compilation-level understanding
| of the code, that can benefit from much faster responses than
| an LSP server can provide and that people may want working out
| of the box for many languages without having to install and
| configure language servers, generate compilation DBs, etc.
| bergheim wrote:
| You are correct. tree-sitter is not in competition with lsp;
| lsp is project wide (different files), so will do say code
| completion. tree-sitter is analyzing the current buffer and
| applying things like highlighting, brackets etc.
|
| lsp can do some of these tings as well, but sending the entire
| buffer over to the lsp server every time you want to update the
| buffer is an expensive operation. tree-sitter does it locally.
| josteink wrote:
| I think at this point it's a new building primitive mostly
| aimed at major-mode authors.
|
| That said, tree-sitter should make it possible to create
| paredit-like implementations for languages not LISP and other
| stuff like that, which IMO could turn out to be really neat.
|
| As a change, this is quite significant, but not directly aimed
| at end-users.
| arc-in-space wrote:
| Interesting. Current syntax highlighting in emacs is mostly fine,
| except for how it occasionally blows up - an unterminated quote,
| in some languages, can run out and match the entire tail as a
| string, potentially freezing on a large file. Paredit() avoids
| this by not even letting you do that unless you ask very nicely.
| I wonder if tree-sitter helps there.
| clircle wrote:
| Maybe you and i are using different major modes, but i see the
| warts of regex high lighting in eMacs. It's quite bad in auctex
| aidenn0 wrote:
| Incremental parsing of incorrect code is one of those things that
| is literally impossible in the general case, but tree-sitter has
| found a lot of good ways to do it that are not just possible for
| a large fraction of reality, but also _performant_. It 's hard to
| understate how impressive a piece of engineering this is.
| Ian_Macharia wrote:
| I use tree-sitter in neovim and the syntax highlighting is on par
| with VSCode
| mickeyp wrote:
| If you're wondering what Tree-Sitter is and why Emacs would want
| it, I wrote about it a while ago:
|
| https://www.masteringemacs.org/article/tree-sitter-complicat...
| afry1 wrote:
| That point you make about syntax highlighting being slow while
| using eglot/LSP-mode is a great one. I've been a bit
| underwhelmed with eglot, and I think that must be the reason:
| it feels like I'm programming in a bowl of oatmeal with every
| keystroke.
|
| Do you have any tips or guides for using treesitter for syntax
| highlighting/structural editing and eglot/LSP-mode for
| everything else?
| omnicognate wrote:
| AFAIK eglot/lsp-mode don't do syntax highlighting. The
| article's just explaining why that is (i.e. because it would
| be too slow).
|
| If you don't have tree-sitter your syntax highlighting will
| be done by the regex based font-lock-mode. I don't think
| eglot/lsp-mode make that slower, and I believe tree-sitter
| should speed it up (and make it more correct) without
| affecting them. I haven't tried it yet, though.
| TeMPOraL wrote:
| It must be a matter of configuration. At work, I use buffer
| re-fontification as an indicator that clangd correctly
| processed the C++ source file I just opened. That's with an
| Emacs built from source ~half a year ago + LSP mode.
| omnicognate wrote:
| Oh, interesting, it does appear lsp-mode now does
| "semantic highlighting" if the server supports it. I
| switched to eglot a while back (before it was added),
| which doesn't.
|
| I don't think I'd want that. Syntax highlighting and
| indentation are things I want instant feedback from.
|
| That affects the answer to the question. I assume you'd
| need to persuade lsp-mode not to do this and leave it to
| tree-sitter, but I don't know how to do that.
| TeMPOraL wrote:
| Last I checked, "semantic tokens" were opt-in; I think
| it's still the case:
|
| https://emacs-lsp.github.io/lsp-
| mode/page/settings/semantic-...
| sph wrote:
| There is not an Emacs topic I'd like to know more about haven't
| already covered on your website.
|
| Thanks, your articles and your book are the best guides into
| the world of Emacs.
| mickeyp wrote:
| Thank you :) I'm glad you like my site and my book!
| davidkunz wrote:
| Congratulations, Emacs! I hope it will be a similar success story
| as in Neovim. If more systems use it, the question "should my
| programming language provide a Tree-Sitter parser" becomes a no
| brainer.
| signa11 wrote:
| For those unfamiliar with it, tree-sitter (https://emacs-tree-
| sitter.github.io/) aims to be a foundational package that
| understands code structurally (think abstract syntax trees). This
| was done earlier via regex's, which has its limitations.
|
| This talk: https://www.thestrangeloop.com/2018/tree-sitter---
| a-new-pars... by the author is quite instructive as well.
| tmalsburg2 wrote:
| Is there a chance that this is going to make the parsing of
| large org mode files faster?
| AlanYx wrote:
| It's not related to tree-sitter, but recent work on using
| text properties instead of overlays for folded regions in org
| has improved performance opening org files with folded
| regions from O(n^2) to O(nlogn). See
| https://blog.tecosaur.com/tmio/2022-05-31-folding.html It's a
| big improvement in practice.
| robenkleene wrote:
| One thing I'll add, because I think it's an interesting insight
| about the priorities of code parsing for text editors: Tree-
| sitter is specifically designed to be very effective at parsing
| code that's in an invalid state. E.g., think about adding a new
| line to a program, the new line you're adding is typically
| invalid for the majority of time until you've finished typing
| it out.
| josteink wrote:
| This HN post though is about a (new) core tree-sitter
| implementation in Emacs itself, which is not the same as the
| third party package[1] you linked. To give credit where credit
| is due though, it was obviously inspired by this work and what
| it allowed in community-maintained packages.
|
| The new implementation has been authored by Yuan Fu in close
| collaboration with the core Emacs maintainers and the rest of
| the community. It has been an ongoing effort for many, many
| months.
|
| This is great news, and means that also core Emacs language-
| binding provided as part of Emacs itself will now be able to
| make use of tree-sitter based parsers as well, something which
| wouldn't have been happening if they would have to depend on a
| third-party package to get those bindings.
|
| I've been somewhat involved in the process, although not a
| major player, but needless to say I'm very excited about these
| news and can't wait to see what sort of improvements this
| enables across the line once people start using it.
|
| [1] https://github.com/emacs-tree-sitter/elisp-tree-sitter
| phtrivier wrote:
| So, we still have to wait for each major-mode mainteners to
| update their code in order to benefits from those change ? In
| this case, how big should the change be for a "typical" mode
| ? Is it going to happen for C/python/typescript/etc.. anytime
| soon ?
| josteink wrote:
| If you follow the Emacs-devel mailing list, you will see
| many of the built in modes adds support for tree-sitter to
| various degrees lots of languages are already on the list
| (C, Python, Javascript, Bash, JSON and CSS).
|
| It also includes some new language-modes which has never
| been part of Emacs before (like typescript).
|
| I'd love to see C# on the list, but that might depend on me
| having the time to land production-grade major-mode, so
| that might end up happening later rather than sooner.
|
| Anyway, from what I understand what has been merged so far
| should all be available as part of Emacs 29 once released.
| erganemic wrote:
| I'm really impressed with the strides Emacs has made recently:
| native compilation, project.el, eglot, and now tree-sitter?
|
| As a user who hadn't kept up with development news until
| recently, I'd always mentally sorted Emacs into the same taxonomy
| as stuff like `find`: old, powerful, with a clunky interface and
| a stodgy resistance to updating how it does things (though not
| without reason).
|
| I'm increasingly feeling like that's an unfair classification on
| my part--I'm genuinely super excited to see where Emacs is in 5
| years.
| zelphirkalt wrote:
| I have the same feeling.
|
| There is one more, possibly gigantic, thing though: Better
| handling of very long strings. I know the data structures for
| strings have various tradeoffs, but properly abstracted, it
| should be possible to even give a choice, no? So users could
| choose the data structure, based on their use cases. But I know
| little about the internals and maybe that is all too low level
| to be something a user could choose from the user interface or
| configuration.
|
| I hope string data structure is properly abstracted from, so
| that it is exchangable for another data structure, but I have
| my doubts. Would like to be surprised here and anyone credibly
| telling me, that string data structure in Emacs has an
| abstraction barrier around it, and is actually exchangable, by
| implementing basic string functions like "get nth character" or
| "get substring" in terms of another data structure.
|
| If it is not properly abstracted from, then of course it could
| be a nightmare to change the data structure.
| b3morales wrote:
| This was also something that was enhanced recently and will
| be in Emacs 29: https://github.com/emacs-
| mirror/emacs/blob/21b387c39bd9cf07c...
|
| > Emacs is now capable of editing files with very long lines.
|
| > The display of long lines has been optimized, and Emacs
| should no
|
| > longer choke when a buffer on display contains long lines.
|
| > ...
| ilyt wrote:
| I use IntelliJ products but still prefer Emacs as an editor. I
| moved off it for code for IDE features, even if I managed to
| get some convenience in Emacs it ran synchronously which meant
| experience could be pretty laggy at times vs "at worst popup
| with extra info will be delayed" in IDEA
| sph wrote:
| Yes, it feels there is a lot of momentum going on recently.
|
| Both neovim and Emacs are being improved at breakneck pace, and
| it is quite incredible for such an old piece of software with,
| dare I say, a quirky contribution model. The maintainers are
| working really hard on keeping it current and competitive.
| bloopernova wrote:
| I'm really hoping that Emacs becomes multithreaded somehow. Or
| at least improves some operations so that they're non-blocking.
|
| I've been using Emacs primarily for org-mode/roam/babel for a
| few years now. I'm very glad for its existence, I really think
| I've become a more effective DevOps person because of it.
| s0l1dsnak3123 wrote:
| Indeed, I'm using Emacs for Code, reading/writing documents
| and emails, as well as consuming RSS feeds. The ecosystem and
| values that underpin Emacs are fantastic - in my personal
| case the only downside to heavy use of Emacs is that it can
| struggle to utilise my hardware. This tends to be
| particularly noticeable when using TRAMP and Eglot, or
| producing large org tables.
| wyuenho wrote:
| I'll be entirely satisfied with a process/event queue/loop
| that we can submit tasks to like Javascript's. There is
| already a command loop in Emacs, we just can't use it for
| anything other than input events and commands. Once we have
| an good event loop, we can build a state machine like Redux
| on it, then we can start rebuilding the display machinery,
| then we can start deleting all those hooks that constantly
| interfere with each other...
| ilyt wrote:
| Yeah the extra micro-waits introduced by some IDE-like
| features were annoying last time I used it.
| deagle50 wrote:
| Hope is on the horizon: https://old.reddit.com/r/emacs/comm
| ents/ymrkyn/async_nonbloc...
| rs_rs_rs_rs_rs wrote:
| This is excellent!
| tmalsburg2 wrote:
| Emacs does have threads: https://www.gnu.org/software/emacs/m
| anual/html_node/elisp/Th...
| bloopernova wrote:
| I probably didn't use the right terminology. I mean that if
| I list-packages then U, then x to start updating, I should
| be able to go back to my editor and continue working.
| natrys wrote:
| There was a package[1] that did exactly that, so it
| should be technically possible, unfortunately it has been
| unmaintained for a while. In any case I/O asynchronicity
| is achievable without actual multithreading (there are
| also IRC/telegram/matrix/mastodon clients that don't
| freeze the UI).
|
| [1] https://github.com/Malabarba/paradox
| tmalsburg2 wrote:
| I think a lot of packages are not yet using threads. And
| to be honest, I'm a bit scared of packages starting to
| use threads because there are a million ways in which you
| can mess up with threads especially given Emacs'
| architecture. What if two threads start manipulating the
| same buffer? Emacs wasn't built with these scenarios in
| mind. But perhaps I'm too pessimistic and there are good
| answers for that.
| TeMPOraL wrote:
| I want to see good interactive tools for working with and
| introspecting threads / async / other concurrency models
| first. In general, because I don't know of any, and in
| Emacs in particular.
|
| My current experience with Emacs concurrency is mostly
| negative - occasionally, an async-heavy package (like
| e.g. Magit-style UI for Docker) will break, and I find it
| hard to figure out why. Futures-heavy code I've seen
| tends to keep critical data local (lexically let-bound),
| which is the opposite of what you want in a malleable
| system like Emacs. For example, I'd like to have a way to
| list _all_ unresolved futures everywhere in Emacs, the
| way I can with e.g. external processes. But it seems that
| at least the async library used (aio, IIRC) is not
| designed for that.
| klibertp wrote:
| > For example, I'd like to have a way to list all
| unresolved futures everywhere in Emacs, the way I can
| with e.g. external processes.
|
| I think you could get this done by advising promise
| creation/resolution functions, aio-promise and aio-
| resolve. The async/await macros are wrappers around
| generators-over-promises in this library.
|
| But yes, in general Emacs concurrency sucks. The least
| bad option I found was using promises' implementation
| (chuntaro/emacs-promise) that uses `cl-defgeneric` for
| `promise-then` and (obviously) moving as much processing
| to a subprocess as possible. The former allows you to
| make any type "thenable" by implementing the method for
| it, which is nice for bundling the state around async
| operations. cl-defstructs are nice for the purpose.
| morelisp wrote:
| I was sad the day I saw Emacs implemented threads before
| a proper async event loop / futures / etc. Do those
| first, see what kinds of concurrent code people actually
| want to write, then write a multithreaded scheduler for
| that.
|
| Instead it's backwards, now we have hard-to-use
| concurrency primitives and still shitty UIs.
| wyuenho wrote:
| Like the way Python have threads lol. Emacs has generators
| too, and there are promises implemented on top of them, but
| they aren't very useful in the elisp ecosystem because at
| some point you are still going to have to poll due to a
| lack of a JS like event loop that users can submit tasks
| to.
| bjourne wrote:
| Check out the emacs-devel@gnu.org list sometime. It's
| incredibly well run and is in my opinion the secret sauce that
| keeps the project running.
| jamborine wrote:
| I'm so impressed by this
| antipaul wrote:
| What's the "explain it like I'm 5 years old" (ELI5) for tree-
| sitter? Why should I, an emacs user but not lisp hacker, care
| about it?
| chriswarbo wrote:
| tree-sitter creates parsers, e.g. for programming languages,
| config formats, etc.
|
| Emacs modes can use those parsers on buffer contents, e.g. for
| syntax colouring/highlighting, finding matching delimiters
| (e.g. moving the cursor over an `if`, and having all the
| corresponding clauses (e.g. else/elif/fi) highlighted), for
| contextual editing (e.g. escaping " when inside a string), etc.
|
| This can be remarkably tricky to get right; e.g. consider
| languages which can splice expressions inside strings (which
| can themselves contain strings, containing spliced expressions,
| etc.)
|
| Using tree-sitter should make this easier and more robust (i.e.
| less time spent implementing parsers; more time spent
| implementing features!). I _think_ it would also allow grammars
| to be re-used across different tools, which should improve
| support for obscure /niche languages.
| 2pEXgD0fZ5cF wrote:
| Does this mean that every emacs language package would
| automatically make use of this once it is built in. Or will
| this rather enable the possibility to write/rewrite
| programming language modes so they make use of tree-sitter
| because they can assume it is available in the default emacs
| install from then on?
| omnicognate wrote:
| It needs to be explicitly used. As far as I'm aware it
| doesn't slot in behind an existing API and magically make
| things better.
| 2pEXgD0fZ5cF wrote:
| Got it. Are there any beginner guides yet on how to write
| an emacs (language) package while making use of it?
| mdaniel wrote:
| Unknown if this qualifies as "beginner guide" but the in-
| tree document is titled "STARTER GUIDE ON WRITING MAJOR
| MODE WITH TREE-SITTER": https://git.savannah.gnu.org/cgit
| /emacs.git/tree/admin/notes...
| giraffe_lady wrote:
| You know how emacs typically has the worst syntax highlighting
| of all mainstream editors for a given language? This makes it
| better.
| lawn wrote:
| Another useful feature is that it makes it easier to support
| mixing languages in the same file.
|
| Think highlighting for html/JS/CSS in a single file or fully
| featured highlighting inside markdown code snippets.
| mcqueenjordan wrote:
| I have a huge belief in tree-sitter. I think it's going to
| continue to grow and become an important tool, especially in
| security/code tooling contexts.
| norir wrote:
| The main innovation of tree-sitter, even more than incremental
| parsing, as I see it is that it provides a uniform api for
| traversing a parse tree, which makes it relatively
| straightforward to onboard a new language to a tool with tree-
| sitter support. The problem though is that the tree-sitter
| grammar is nearly always going to be an approximation to the
| actual language grammar, unless the language
| compiler/interpreter uses tree-sitter for parsing. To me, this
| is problematic for tooling because it is always possible for a
| tree-sitter based tool to be flat out wrong relative to the
| actual language. For syntax highlighting, this is generally not
| a huge deal (and tree-sitter does generally work well, though
| there are exceptions), but I'd be more cautious with security
| tools based on tree-sitter.
|
| If all languages changed their reference parsers to tree-
| sitter, this would be moot, but that seems unlikely. Language
| parsers are often optimized beyond what is possible in a
| general purpose parser generator like tree-sitter and/or have
| ambiguities that cannot be resolved with the tree-sitter dsl.
|
| What feels perhaps likely in the future is that a standard
| parse tree api emerges, analogous to lsp, and then language
| parsers could emit trees traversable by this api. Maybe it's
| just the tree-sitter c api with an alternate front end? Hard to
| say, but I suspect either something better than (but likely at
| least partially inspired by) tree-sitter will emerge or we will
| get stuck in a local minimum with tooling based on slightly
| incorrect language parsers.
| debugnik wrote:
| > unless the language compiler/interpreter uses tree-sitter
| for parsing
|
| Doubtful, last time I tried tree-sitter would parse invalid
| inputs without even tagging any errors in the parse tree. For
| example, it would silently accept extra tokens, or keywords
| in the place of identifiers. Replacing the built-in lexer and
| then validating the parse tree for correctness would be close
| to writing the grammar twice.
|
| And accepting partially correct inputs within the compiler
| toolchain isn't too hard, so I don't really see the advantage
| of agreeing on tree-sitter and not just on a parse tree
| representation that editors can then query, as you then
| suggested. If the big deal is having it execute client-side
| or being sandboxed, I feel that's orthogonal to parsing
| algorithms.
| difflens wrote:
| > as I see it is that it provides a uniform api for
| traversing a parse tree, which makes it relatively
| straightforward to onboard a new language to a tool with
| tree-sitter support. The problem though is that the tree-
| sitter grammar is nearly always going to be an approximation
| to the actual language grammar, unless the language
| compiler/interpreter uses tree-sitter for parsing.
|
| Author of DiffLens (https://marketplace.visualstudio.com/item
| s?itemName=DiffLens...) here. A uniform API for traversing a
| parse tree for all languages would be amazing for DiffLens!
| However, I fear languages are different enough that this
| ideal may never be reached :) Or maybe there would be a core
| set of APIs and extensions for the idiosyncrasies of each
| language. For DiffLens though, we try to use the language's
| official parser/compiler if it exposes an AST
| cjohansson wrote:
| tree-sitter is a bit better than regexp but it is not an
| actual parser of grammars, a fast actual parser of all
| languages for syntax coloring is the future I think, tree-
| sitter is a pragmatic middle-ground while we wait for the
| prime solution
___________________________________________________________________
(page generated 2022-11-23 23:01 UTC)