[HN Gopher] Structured Editing and Incremental Parsing
___________________________________________________________________
Structured Editing and Incremental Parsing
Author : ltratt
Score : 121 points
Date : 2024-11-27 20:42 UTC (1 days ago)
(HTM) web link (tratt.net)
(TXT) w3m dump (tratt.net)
| practal wrote:
| I think simpler is better when it comes to structured editing.
| Recursive teXt has the advantage that it proposes a simple block
| structure built into the text itself [1]. Of course, you will
| need to design your language to take advantage of this block
| structure.
|
| [1] http://recursivetext.com
| computably wrote:
| Since Lisp has been around since 1960... Congratulations,
| you're only about 64 years late.
| practal wrote:
| No doubt, brackets of course also convey structure. But I
| think indentation is better for visualising block structure.
| Inside these blocks, you can still use brackets, and errors
| like missing opening or closing brackets will not spill over
| into other blocks.
|
| And yeah, I am definitely coming for Lisp.
| llm_trw wrote:
| I wrote a racket function that would need 35 levels of
| indentation ten minutes ago. White space isn't coming for
| lisps until we figure out 12 dimensional displays.
| cjfd wrote:
| Is a function that would need 35 levels of indentation a
| good idea? I have seen C code with about 12 levels of
| indentation and that was not too great.
| llm_trw wrote:
| What other languages use syntax for lisps use function
| applications for.
|
| Viz. the array reference a[0] in algols is a function
| application in lisps (vector-ref a 0).
|
| The same is true for everything else. Semantic white
| space in such a language is a terrible idea as everyone
| eventually finds out.
| practal wrote:
| Recursive teXt (RX) would be a great fit for Lisp,
| although I am more interested in replacing Lisp entirely
| with a simpler language rooted in abstraction logic.
|
| Note that RX is not like normal semantic white space, but
| simpler. It is hard coded into the text without taking
| its content into consideration. RX is basically nested
| records of text, making it very robust, but encoded as
| plain text instead of JSON or something like that.
| llm_trw wrote:
| No it won't be. S-expressions are the simplest way to
| linearize a tree.
|
| Everyone thinks there's something better and the very
| motivated write an interpreter for their ideal language
| in lisp.
|
| The ideas inevitably have so much jank when used in anger
| that you always come back to sexp.
|
| Now if you discover a way to linearize dags or arbitrary
| graphs without having to keep a table of symbols I'd love
| to hear it.
| practal wrote:
| > S-expressions are the simplest way to linearize a tree.
|
| S-expressions are _one_ way to linearize a tree.
|
| Now, "simple" can mean different things depending on what
| you are trying to achieve. RX is simpler than
| s-expressions if you prefer indentation over brackets,
| and like the robustness that it brings. Abstraction
| algebra terms are simpler than s-expressions if you want
| to actually reason about and with them.
| llm_trw wrote:
| In your own examples you're using both brackets and white
| space to delineate structure. This is complex because you
| need two parsers to even start working on the input
| stream and the full parser must know how to switch
| between them.
|
| In short: I get all the pain of semantic white space with
| all the pain of lisp s-exp's with the benefits of
| neither.
| kazinator wrote:
| If you don't allow the indentation based parsing to be
| nested within a bracket-based expression, it doesn't look
| too bad. At the top level you have the indentation-based
| parser. When that sees an open bracket, it recurses to
| the regular parser which doesn't care about indentation.
| llm_trw wrote:
| This works until you start using macros.
| benatkin wrote:
| An everyday example is the difference between
| JSON.stringify(data, null, 2) and a pretty printer like
| ipython or Deno.inspect. Having only one node per line
| reduces the amount of data that can be displayed. It's
| the same with code.
| practal wrote:
| Assuming this number of indentation is really necessary
| (which I doubt; maybe a few auxiliary definitions are in
| order?), obviously only the first few levels would be
| represented as their own Recursive teXt blocks.
| llm_trw wrote:
| >obviously only the first few levels would be represented
| as their own Recursive teXt blocks.
|
| This is not at all obvious.
| practal wrote:
| It is obvious to me, nobody would want to represent 35
| levels of nesting by indentation. So I would represent
| the first few (2-4) levels in RX, and the rest by other
| means, such as brackets. Your language should be designed
| such that the cutoff is up to you, the writer of the
| code, and really just a matter of syntax, not semantics.
|
| Obviously, I (usually) would not want to write things
| like + *
| a b * c
| d
|
| but rather + a * b
| c * d
|
| or, even better of course, (a * b) + (c
| * d)
|
| I think of blocks more as representing high-level
| structure, while brackets are there for low-level and
| fine-grained structure. As the border between these can
| be fluid, where you choose the cutoff will depend on the
| situation and also be fluid.
| llm_trw wrote:
| >Your language should be designed such that the cutoff is
| up to you, the writer of the code, and really just a
| matter of syntax, not semantics.
|
| I have more important things to think about in my code
| than when I switch between two dialects of the language.
|
| Especially since I get no extra expressive power by doing
| so.
|
| >Obviously, I (usually) would not want to write things
| like
|
| Or just use (+ (* a b) (* c d)) which is simpler that any
| of the example above. Then I can chose between any of:
| ( + (* a b) (* c d)) ( +
| ( * a b ) ( *
| c d ) )
|
| Or whatever else you want to do.
|
| >As the border between these can be fluid, where you
| choose the cutoff will depend on the situation and also
| be fluid.
|
| It's only fluid because you've had XX years of infix
| notation caused brain damage to make you think that.
| shakna wrote:
| Wisp [0] is the big one talked about in Scheme land, and
| it wouldn't need 35 levels of indentation. Like the rest
| of Lisp-world, there's flexibility where you need it.
|
| For example, this shows a bit of the mix available:
| for-each l : task if : and
| (equal? task "help") (not (null? (delete "help" tasks)))
| map l : x
| help #f #f x delete "help" tasks
| run-task task . tasks
|
| [0] https://srfi.schemers.org/srfi-119/srfi-119.html
| shakna wrote:
| Trying to use any kind of syntax highlighter with TeX is a pain
| in the butt. I didn't mean LaTeX there. I mean TeX, which can
| rewrite it's own lexer, and a lot of libraries work by doing so.
| I move in and out of TeXInfo syntax and it basically just causes
| most editors to sit there screaming that everything is broken.
| llm_trw wrote:
| Yes its pretty funny when you realise what a tiny corner of the
| design space of programs most users inhabit that they think
| things like lsp are an amazing tool instead of a weekend
| throwaway project.
|
| What's even funnier is how much they attack anyone who points
| this out.
| foo42 wrote:
| perhaps the "attacks" relate to the condescending tone with
| which you relate your superior skills.
|
| I think most people's amazement with lsp relates to the
| practical benefits of such a project _not_ being thrown away
| but taken that last 10% (which is 90% of the work) to make it
| suitable for so many use cases and selling people on the idea
| of doing so.
| llm_trw wrote:
| What's amazing about lsp isn't the polish, it's that we've
| hobbled our selves so much that a tool like it is even
| useful.
|
| Only having exposure to the algol family of languages does
| for your mental capabilities what a sugar only diet does
| for your physical capabilities. It used to be the case that
| all programmers had exposure to assembly/machine code which
| broke them out of the worst habits algols instill. No
| longer.
|
| Pointing out that the majority of programmers today have
| the mental equivalent of scurvy is somehow condescending
| but the corp selling false teeth along with their sugar
| buckets is somehow commendable.
| dzaima wrote:
| Knowing non-algol languages won't make editor actions any
| less useful for algol-like. If anything, it'll just make
| you pretend that you don't need such and as such will end
| up less productive than you could be.
|
| And editor actions can be useful for any language which
| either allow you to edit things, or has more than one way
| to do the same thing (among a bunch of other things),
| which includes basically everything. Of course editor
| functionality isn't a thing that'd be 100% beneficial
| 100% of the time, but it's plenty above 0% if you don't
| purposefully ignore it.
| lolinder wrote:
| > Pointing out that the majority of programmers today
| have the mental equivalent of scurvy is somehow
| condescending
|
| You can (and many people do!) say the exact same thing in
| a different tone and with different word choice and have
| people nod along in agreement. If you're finding that
| people consistently react negatively to you when you say
| it, please consider that it might be because of the way
| in which you say it.
|
| I'm one of those who would normally nod along in
| agreement and writing in support, but your comments here
| make me want to disagree on principle because you come
| off as unbearably smug.
| llm_trw wrote:
| >I'm one of those who would normally nod along in
| agreement and writing in support, but your comments here
| make me want to disagree on principle because you come
| off as unbearably smug.
|
| So much the worse for you.
| itishappy wrote:
| Kinda feels like this might be an instance of simply
| simply holding your tools wrong. What about ASM prevents
| incremental parsing and structured editing from being
| useful?
|
| Some concrete examples for us lesser mortals please.
| llm_trw wrote:
| The fact that there is no separation between data,
| addresses and commands?
|
| The general advice is that you shouldn't mix them, but
| the general advice today is that you shouldn't use ASM
| anyway.
| throw10920 wrote:
| This is _sneering_ , where you don't respond to a particular
| poster's point, but instead attack a fictional group of
| people that you made up based on two the intersection of
| different attributes.
|
| It's against the HN guidelines
| (https://news.ycombinator.com/newsguidelines.html), boring,
| unenlightening, not intellectually gratifying, degrades the
| quality of the site, and takes far less intelligence than the
| "mental equivalent of scurvy" that you name. Don't do it.
| llm_trw wrote:
| So is yours:
|
| > Please respond to the strongest plausible interpretation
| of what someone says, not a weaker one that's easier to
| criticize. Assume good faith.
| sudahtigabulan wrote:
| > it is now clear to me that there is ongoing work on structured
| editing which either doesn't know about incremental parsing in
| general, or Tim's algorithms specifically. I hope this post
| serves as a useful advert to such folk
|
| I'm curious about this unnamed ongoing work (that is unaware of
| incremental parsing).
|
| Anyone know what he is referring to?
| DannyBee wrote:
| I don't know specifically - but even now, i still end up having
| to explain to people that incremental parsing/lexing
| (particularly without error recovery) is not hard, it is not
| really complicated, and as the author here said, Tim (et al)
| have made beautiful algorithms that make this stuff easy.
|
| Heck, incremental lexing is even easy to explain. For each
| token, track where the lexer _actually looked_ in the input
| stream to make decisions. Any time that part of the input
| stream changes, every token to actually look at the changed
| portion of the input stream is re-lexed, and if the result
| changes, keep re-lexing until the before /after tokenstreams
| sync up again or you run out of input. That's it.
|
| You can also make a dumber version that statically calculates
| the maximum lookahead (lookbehind if you support that too) of
| the entire grammer, or the maximum possible lookahead per
| token, and uses that instead of tracking the actual lookahead
| used. In practice, this is often harder than just tracking the
| actual lookahead used.
|
| In an LL system like ANTLR, incremental parsing is very similar
| - since it generates top-down parsers, it's the same basic
| theory - track what token ranges were looked at as you parse.
| During incremental update, only descend into portions of the
| parse tree where the token ranges looked at contain modified
| tokens.
|
| Bottom up is trickier. Error recovery is the meaningfully
| tricky part in all of this.
|
| Before tree-sitter, I was constantly explaining this stuff to
| people (I followed the projects that these algorithms came out
| of - ENSEMBLE, HARMONIA, etc). After more people get that there
| are ways of doing this, but you still run into people who are
| re-creating things we solved in pretty great ways many years
| ago.
| toomim wrote:
| It's really fun seeing my old research groups mentioned
| decades later. Thanks for writing this up.
| bgoated01 wrote:
| I'm not extremely familiar with the details of incremental
| parsing, but I have used Cursorless, a VSCode extension based on
| tree-sitter for voice controlled structured editing, and it is
| pretty powerful. You can use the structured editing when you want
| and also normal editing in between. Occasionally the parser will
| get things wrong and only change/take/select part of a function
| or what have you, but in general it's very useful, and I tend to
| miss it now that I am no longer voice coding much. I seem to
| remember that there was a similar extension for emacs (sans voice
| control). treemacs, or something? Anyone used that?
|
| [0] https://www.cursorless.org/
| codetrotter wrote:
| Does anything similar exist for JetBrains IDEs, but fully open
| source? (Open source plugin, and open source voice recognition
| model running locally.)
| servilio wrote:
| The one I know about is Combobulate[1], it uses treesitter but
| without voice control.
|
| [1] https://github.com/mickeynp/combobulate
| setopt wrote:
| Treemacs is not TreeSitter-related, it's just a file tree plug-
| in.
| gushogg-blake wrote:
| I did some work on structural editing a while back, using Tree-
| sitter to get the AST (abstract syntax tree, the parse tree used
| for structural edits). I now use the editor as my daily driver
| but don't use or feel the need for structural editing commands
| that often - probably partly out of habit and partly because text
| edits are just better for most editing.
|
| I do miss the "wrap" command when using other editors, but it
| could be implemented reasonably easily without a parse tree. I
| found that a lot of the structural edits correspond to
| indentation levels anyway, but the parse tree definitely helps.
|
| HN post: https://news.ycombinator.com/item?id=29787861
| etse wrote:
| > why structured editors haven't taken over the world: most
| programmers find them so annoying to use in some situations that
| they don't view the pros as outweighing the cons.
|
| Seems like "annoying" refers to a user interface annoyance.
|
| I'm guessing the following since I couldn't tell what structured
| editing is like from the article:
|
| Keyboard entry is immediate, but prone to breaking the program
| structure. Structured editing through specific commands is an
| abstraction on top of key entry (or mouse), both of which add a
| layer of resistance. Another layer might come from having to
| recall the commands, or if recognizing them, having to peruse a
| list of them, at least while learning it.
|
| What does the developer's experience with incremental parsing
| feel like?
| Eliah_Lakhin wrote:
| > What does the developer's experience with incremental parsing
| feel like?
|
| It's essentially the experience most of us already have when
| using Visual Studio, IntelliJ, or any modern IDE on a daily
| basis.
|
| The term "incremental parsing" might be a bit misleading. A
| more accurate (though wordier) term would be a "stateful parser
| capable of reparsing the text in parts". The core idea is that
| you can write text seamlessly while the editor dynamically
| updates local fragments of its internal representation (usually
| a syntax tree) in real time around the characters you're
| typing.
|
| An incremental parser is one of the key components that enable
| modern code editors to stay responsive. It allows the editor to
| keep its internal syntax tree synchronized with the user's
| edits without needing to reparse the entire project on every
| keystroke. This stateful approach contrasts with stateless
| compilers that reparse the entire project from scratch.
|
| This continuous (or incremental) patching of the syntax tree is
| what enables modern IDEs to provide features like real-time
| code completion, semantic highlighting, and error detection.
| Essentially, while you focus on writing code, the editor is
| constantly maintaining and updating a structural representation
| of your program behind the scenes.
|
| The article's author suggests an alternative idea: instead of
| reparsing the syntax tree incrementally, the programmer would
| directly edit the syntax tree itself. In other words, you would
| be working with the program's structure rather than its raw
| textual representation.
|
| This approach could simplify the development of code editors.
| The editor would primarily need to offer a GUI for tree
| structure editing, which might still appear as flat text for
| usability but would fundamentally involve structural
| interactions.
|
| Whether this approach improves the end-user experience is hard
| to say. It feels akin to graphical programming languages, which
| already have a niche (e.g., visual scripting in game engines).
| However, the challenge lies in the interface.
|
| The input device (keyboard) designed for natural text input and
| have limitations when it comes to efficiently interacting with
| structural data. In theory, these hurdles could be overcome
| with time, but for now, the bottleneck is mostly a question of
| UI/UX design. And as of today, we lack a clear, efficient
| approach to tackle this problem.
___________________________________________________________________
(page generated 2024-11-28 23:01 UTC)