[HN Gopher] Structured Editing and Incremental Parsing
       ___________________________________________________________________
        
       Structured Editing and Incremental Parsing
        
       Author : ltratt
       Score  : 121 points
       Date   : 2024-11-27 20:42 UTC (1 days ago)
        
 (HTM) web link (tratt.net)
 (TXT) w3m dump (tratt.net)
        
       | practal wrote:
       | I think simpler is better when it comes to structured editing.
       | Recursive teXt has the advantage that it proposes a simple block
       | structure built into the text itself [1]. Of course, you will
       | need to design your language to take advantage of this block
       | structure.
       | 
       | [1] http://recursivetext.com
        
         | computably wrote:
         | Since Lisp has been around since 1960... Congratulations,
         | you're only about 64 years late.
        
           | practal wrote:
           | No doubt, brackets of course also convey structure. But I
           | think indentation is better for visualising block structure.
           | Inside these blocks, you can still use brackets, and errors
           | like missing opening or closing brackets will not spill over
           | into other blocks.
           | 
           | And yeah, I am definitely coming for Lisp.
        
             | llm_trw wrote:
             | I wrote a racket function that would need 35 levels of
             | indentation ten minutes ago. White space isn't coming for
             | lisps until we figure out 12 dimensional displays.
        
               | cjfd wrote:
               | Is a function that would need 35 levels of indentation a
               | good idea? I have seen C code with about 12 levels of
               | indentation and that was not too great.
        
               | llm_trw wrote:
               | What other languages use syntax for lisps use function
               | applications for.
               | 
               | Viz. the array reference a[0] in algols is a function
               | application in lisps (vector-ref a 0).
               | 
               | The same is true for everything else. Semantic white
               | space in such a language is a terrible idea as everyone
               | eventually finds out.
        
               | practal wrote:
               | Recursive teXt (RX) would be a great fit for Lisp,
               | although I am more interested in replacing Lisp entirely
               | with a simpler language rooted in abstraction logic.
               | 
               | Note that RX is not like normal semantic white space, but
               | simpler. It is hard coded into the text without taking
               | its content into consideration. RX is basically nested
               | records of text, making it very robust, but encoded as
               | plain text instead of JSON or something like that.
        
               | llm_trw wrote:
               | No it won't be. S-expressions are the simplest way to
               | linearize a tree.
               | 
               | Everyone thinks there's something better and the very
               | motivated write an interpreter for their ideal language
               | in lisp.
               | 
               | The ideas inevitably have so much jank when used in anger
               | that you always come back to sexp.
               | 
               | Now if you discover a way to linearize dags or arbitrary
               | graphs without having to keep a table of symbols I'd love
               | to hear it.
        
               | practal wrote:
               | > S-expressions are the simplest way to linearize a tree.
               | 
               | S-expressions are _one_ way to linearize a tree.
               | 
               | Now, "simple" can mean different things depending on what
               | you are trying to achieve. RX is simpler than
               | s-expressions if you prefer indentation over brackets,
               | and like the robustness that it brings. Abstraction
               | algebra terms are simpler than s-expressions if you want
               | to actually reason about and with them.
        
               | llm_trw wrote:
               | In your own examples you're using both brackets and white
               | space to delineate structure. This is complex because you
               | need two parsers to even start working on the input
               | stream and the full parser must know how to switch
               | between them.
               | 
               | In short: I get all the pain of semantic white space with
               | all the pain of lisp s-exp's with the benefits of
               | neither.
        
               | kazinator wrote:
               | If you don't allow the indentation based parsing to be
               | nested within a bracket-based expression, it doesn't look
               | too bad. At the top level you have the indentation-based
               | parser. When that sees an open bracket, it recurses to
               | the regular parser which doesn't care about indentation.
        
               | llm_trw wrote:
               | This works until you start using macros.
        
               | benatkin wrote:
               | An everyday example is the difference between
               | JSON.stringify(data, null, 2) and a pretty printer like
               | ipython or Deno.inspect. Having only one node per line
               | reduces the amount of data that can be displayed. It's
               | the same with code.
        
               | practal wrote:
               | Assuming this number of indentation is really necessary
               | (which I doubt; maybe a few auxiliary definitions are in
               | order?), obviously only the first few levels would be
               | represented as their own Recursive teXt blocks.
        
               | llm_trw wrote:
               | >obviously only the first few levels would be represented
               | as their own Recursive teXt blocks.
               | 
               | This is not at all obvious.
        
               | practal wrote:
               | It is obvious to me, nobody would want to represent 35
               | levels of nesting by indentation. So I would represent
               | the first few (2-4) levels in RX, and the rest by other
               | means, such as brackets. Your language should be designed
               | such that the cutoff is up to you, the writer of the
               | code, and really just a matter of syntax, not semantics.
               | 
               | Obviously, I (usually) would not want to write things
               | like                   +              *
               | a                  b             *                  c
               | d
               | 
               | but rather                   +             a * b
               | c * d
               | 
               | or, even better of course,                   (a * b) + (c
               | * d)
               | 
               | I think of blocks more as representing high-level
               | structure, while brackets are there for low-level and
               | fine-grained structure. As the border between these can
               | be fluid, where you choose the cutoff will depend on the
               | situation and also be fluid.
        
               | llm_trw wrote:
               | >Your language should be designed such that the cutoff is
               | up to you, the writer of the code, and really just a
               | matter of syntax, not semantics.
               | 
               | I have more important things to think about in my code
               | than when I switch between two dialects of the language.
               | 
               | Especially since I get no extra expressive power by doing
               | so.
               | 
               | >Obviously, I (usually) would not want to write things
               | like
               | 
               | Or just use (+ (* a b) (* c d)) which is simpler that any
               | of the example above. Then I can chose between any of:
               | ( +           (* a b)           (* c d))              ( +
               | ( *            a            b           )           ( *
               | c             d           )          )
               | 
               | Or whatever else you want to do.
               | 
               | >As the border between these can be fluid, where you
               | choose the cutoff will depend on the situation and also
               | be fluid.
               | 
               | It's only fluid because you've had XX years of infix
               | notation caused brain damage to make you think that.
        
               | shakna wrote:
               | Wisp [0] is the big one talked about in Scheme land, and
               | it wouldn't need 35 levels of indentation. Like the rest
               | of Lisp-world, there's flexibility where you need it.
               | 
               | For example, this shows a bit of the mix available:
               | for-each            l : task                if : and
               | (equal? task "help") (not (null? (delete "help" tasks)))
               | map                       l : x
               | help #f #f x                       delete "help" tasks
               | run-task task            . tasks
               | 
               | [0] https://srfi.schemers.org/srfi-119/srfi-119.html
        
       | shakna wrote:
       | Trying to use any kind of syntax highlighter with TeX is a pain
       | in the butt. I didn't mean LaTeX there. I mean TeX, which can
       | rewrite it's own lexer, and a lot of libraries work by doing so.
       | I move in and out of TeXInfo syntax and it basically just causes
       | most editors to sit there screaming that everything is broken.
        
         | llm_trw wrote:
         | Yes its pretty funny when you realise what a tiny corner of the
         | design space of programs most users inhabit that they think
         | things like lsp are an amazing tool instead of a weekend
         | throwaway project.
         | 
         | What's even funnier is how much they attack anyone who points
         | this out.
        
           | foo42 wrote:
           | perhaps the "attacks" relate to the condescending tone with
           | which you relate your superior skills.
           | 
           | I think most people's amazement with lsp relates to the
           | practical benefits of such a project _not_ being thrown away
           | but taken that last 10% (which is 90% of the work) to make it
           | suitable for so many use cases and selling people on the idea
           | of doing so.
        
             | llm_trw wrote:
             | What's amazing about lsp isn't the polish, it's that we've
             | hobbled our selves so much that a tool like it is even
             | useful.
             | 
             | Only having exposure to the algol family of languages does
             | for your mental capabilities what a sugar only diet does
             | for your physical capabilities. It used to be the case that
             | all programmers had exposure to assembly/machine code which
             | broke them out of the worst habits algols instill. No
             | longer.
             | 
             | Pointing out that the majority of programmers today have
             | the mental equivalent of scurvy is somehow condescending
             | but the corp selling false teeth along with their sugar
             | buckets is somehow commendable.
        
               | dzaima wrote:
               | Knowing non-algol languages won't make editor actions any
               | less useful for algol-like. If anything, it'll just make
               | you pretend that you don't need such and as such will end
               | up less productive than you could be.
               | 
               | And editor actions can be useful for any language which
               | either allow you to edit things, or has more than one way
               | to do the same thing (among a bunch of other things),
               | which includes basically everything. Of course editor
               | functionality isn't a thing that'd be 100% beneficial
               | 100% of the time, but it's plenty above 0% if you don't
               | purposefully ignore it.
        
               | lolinder wrote:
               | > Pointing out that the majority of programmers today
               | have the mental equivalent of scurvy is somehow
               | condescending
               | 
               | You can (and many people do!) say the exact same thing in
               | a different tone and with different word choice and have
               | people nod along in agreement. If you're finding that
               | people consistently react negatively to you when you say
               | it, please consider that it might be because of the way
               | in which you say it.
               | 
               | I'm one of those who would normally nod along in
               | agreement and writing in support, but your comments here
               | make me want to disagree on principle because you come
               | off as unbearably smug.
        
               | llm_trw wrote:
               | >I'm one of those who would normally nod along in
               | agreement and writing in support, but your comments here
               | make me want to disagree on principle because you come
               | off as unbearably smug.
               | 
               | So much the worse for you.
        
               | itishappy wrote:
               | Kinda feels like this might be an instance of simply
               | simply holding your tools wrong. What about ASM prevents
               | incremental parsing and structured editing from being
               | useful?
               | 
               | Some concrete examples for us lesser mortals please.
        
               | llm_trw wrote:
               | The fact that there is no separation between data,
               | addresses and commands?
               | 
               | The general advice is that you shouldn't mix them, but
               | the general advice today is that you shouldn't use ASM
               | anyway.
        
           | throw10920 wrote:
           | This is _sneering_ , where you don't respond to a particular
           | poster's point, but instead attack a fictional group of
           | people that you made up based on two the intersection of
           | different attributes.
           | 
           | It's against the HN guidelines
           | (https://news.ycombinator.com/newsguidelines.html), boring,
           | unenlightening, not intellectually gratifying, degrades the
           | quality of the site, and takes far less intelligence than the
           | "mental equivalent of scurvy" that you name. Don't do it.
        
             | llm_trw wrote:
             | So is yours:
             | 
             | > Please respond to the strongest plausible interpretation
             | of what someone says, not a weaker one that's easier to
             | criticize. Assume good faith.
        
       | sudahtigabulan wrote:
       | > it is now clear to me that there is ongoing work on structured
       | editing which either doesn't know about incremental parsing in
       | general, or Tim's algorithms specifically. I hope this post
       | serves as a useful advert to such folk
       | 
       | I'm curious about this unnamed ongoing work (that is unaware of
       | incremental parsing).
       | 
       | Anyone know what he is referring to?
        
         | DannyBee wrote:
         | I don't know specifically - but even now, i still end up having
         | to explain to people that incremental parsing/lexing
         | (particularly without error recovery) is not hard, it is not
         | really complicated, and as the author here said, Tim (et al)
         | have made beautiful algorithms that make this stuff easy.
         | 
         | Heck, incremental lexing is even easy to explain. For each
         | token, track where the lexer _actually looked_ in the input
         | stream to make decisions. Any time that part of the input
         | stream changes, every token to actually look at the changed
         | portion of the input stream is re-lexed, and if the result
         | changes, keep re-lexing until the before /after tokenstreams
         | sync up again or you run out of input. That's it.
         | 
         | You can also make a dumber version that statically calculates
         | the maximum lookahead (lookbehind if you support that too) of
         | the entire grammer, or the maximum possible lookahead per
         | token, and uses that instead of tracking the actual lookahead
         | used. In practice, this is often harder than just tracking the
         | actual lookahead used.
         | 
         | In an LL system like ANTLR, incremental parsing is very similar
         | - since it generates top-down parsers, it's the same basic
         | theory - track what token ranges were looked at as you parse.
         | During incremental update, only descend into portions of the
         | parse tree where the token ranges looked at contain modified
         | tokens.
         | 
         | Bottom up is trickier. Error recovery is the meaningfully
         | tricky part in all of this.
         | 
         | Before tree-sitter, I was constantly explaining this stuff to
         | people (I followed the projects that these algorithms came out
         | of - ENSEMBLE, HARMONIA, etc). After more people get that there
         | are ways of doing this, but you still run into people who are
         | re-creating things we solved in pretty great ways many years
         | ago.
        
           | toomim wrote:
           | It's really fun seeing my old research groups mentioned
           | decades later. Thanks for writing this up.
        
       | bgoated01 wrote:
       | I'm not extremely familiar with the details of incremental
       | parsing, but I have used Cursorless, a VSCode extension based on
       | tree-sitter for voice controlled structured editing, and it is
       | pretty powerful. You can use the structured editing when you want
       | and also normal editing in between. Occasionally the parser will
       | get things wrong and only change/take/select part of a function
       | or what have you, but in general it's very useful, and I tend to
       | miss it now that I am no longer voice coding much. I seem to
       | remember that there was a similar extension for emacs (sans voice
       | control). treemacs, or something? Anyone used that?
       | 
       | [0] https://www.cursorless.org/
        
         | codetrotter wrote:
         | Does anything similar exist for JetBrains IDEs, but fully open
         | source? (Open source plugin, and open source voice recognition
         | model running locally.)
        
         | servilio wrote:
         | The one I know about is Combobulate[1], it uses treesitter but
         | without voice control.
         | 
         | [1] https://github.com/mickeynp/combobulate
        
         | setopt wrote:
         | Treemacs is not TreeSitter-related, it's just a file tree plug-
         | in.
        
       | gushogg-blake wrote:
       | I did some work on structural editing a while back, using Tree-
       | sitter to get the AST (abstract syntax tree, the parse tree used
       | for structural edits). I now use the editor as my daily driver
       | but don't use or feel the need for structural editing commands
       | that often - probably partly out of habit and partly because text
       | edits are just better for most editing.
       | 
       | I do miss the "wrap" command when using other editors, but it
       | could be implemented reasonably easily without a parse tree. I
       | found that a lot of the structural edits correspond to
       | indentation levels anyway, but the parse tree definitely helps.
       | 
       | HN post: https://news.ycombinator.com/item?id=29787861
        
       | etse wrote:
       | > why structured editors haven't taken over the world: most
       | programmers find them so annoying to use in some situations that
       | they don't view the pros as outweighing the cons.
       | 
       | Seems like "annoying" refers to a user interface annoyance.
       | 
       | I'm guessing the following since I couldn't tell what structured
       | editing is like from the article:
       | 
       | Keyboard entry is immediate, but prone to breaking the program
       | structure. Structured editing through specific commands is an
       | abstraction on top of key entry (or mouse), both of which add a
       | layer of resistance. Another layer might come from having to
       | recall the commands, or if recognizing them, having to peruse a
       | list of them, at least while learning it.
       | 
       | What does the developer's experience with incremental parsing
       | feel like?
        
         | Eliah_Lakhin wrote:
         | > What does the developer's experience with incremental parsing
         | feel like?
         | 
         | It's essentially the experience most of us already have when
         | using Visual Studio, IntelliJ, or any modern IDE on a daily
         | basis.
         | 
         | The term "incremental parsing" might be a bit misleading. A
         | more accurate (though wordier) term would be a "stateful parser
         | capable of reparsing the text in parts". The core idea is that
         | you can write text seamlessly while the editor dynamically
         | updates local fragments of its internal representation (usually
         | a syntax tree) in real time around the characters you're
         | typing.
         | 
         | An incremental parser is one of the key components that enable
         | modern code editors to stay responsive. It allows the editor to
         | keep its internal syntax tree synchronized with the user's
         | edits without needing to reparse the entire project on every
         | keystroke. This stateful approach contrasts with stateless
         | compilers that reparse the entire project from scratch.
         | 
         | This continuous (or incremental) patching of the syntax tree is
         | what enables modern IDEs to provide features like real-time
         | code completion, semantic highlighting, and error detection.
         | Essentially, while you focus on writing code, the editor is
         | constantly maintaining and updating a structural representation
         | of your program behind the scenes.
         | 
         | The article's author suggests an alternative idea: instead of
         | reparsing the syntax tree incrementally, the programmer would
         | directly edit the syntax tree itself. In other words, you would
         | be working with the program's structure rather than its raw
         | textual representation.
         | 
         | This approach could simplify the development of code editors.
         | The editor would primarily need to offer a GUI for tree
         | structure editing, which might still appear as flat text for
         | usability but would fundamentally involve structural
         | interactions.
         | 
         | Whether this approach improves the end-user experience is hard
         | to say. It feels akin to graphical programming languages, which
         | already have a niche (e.g., visual scripting in game engines).
         | However, the challenge lies in the interface.
         | 
         | The input device (keyboard) designed for natural text input and
         | have limitations when it comes to efficiently interacting with
         | structural data. In theory, these hurdles could be overcome
         | with time, but for now, the bottleneck is mostly a question of
         | UI/UX design. And as of today, we lack a clear, efficient
         | approach to tackle this problem.
        
       ___________________________________________________________________
       (page generated 2024-11-28 23:01 UTC)