[HN Gopher] Zed Decoded: Rope and SumTree
       ___________________________________________________________________
        
       Zed Decoded: Rope and SumTree
        
       Author : avinassh
       Score  : 144 points
       Date   : 2024-04-28 12:00 UTC (11 hours ago)
        
 (HTM) web link (zed.dev)
 (TXT) w3m dump (zed.dev)
        
       | rochak wrote:
       | Great explanation. If anyone has good references for learning
       | this by making a minimal editor like sed from scratch, please
       | share. Wait, now that I think about it, does the name "Zed" draw
       | inspiration from "sed"? If yes, that's rad.
        
         | nickzelei wrote:
         | I assumed it was the alternative pronunciation of Zero.
        
         | humzashahid98 wrote:
         | I have a simple (well, simple by my standards because I've
         | spent a long time with it) implementation of a rope in Standard
         | ML [0], OCaml [1] and F# [2]. (See tiny_rope.sml, brolib.fs or
         | lib/tiny_rope.ml for implementation if you can read any of
         | those languages; the F# implementation is probably easiest to
         | read.)
         | 
         | [0] https://github.com/hummy123/brolib-sml
         | 
         | [1] https://github.com/hummy123/brolib
         | 
         | [2] https://github.com/hummy123/brolib-fs
         | 
         | The essence of a data structure is a binary tree where the
         | internal nodes (the N2 case) contains a pointer to the left
         | subtree, an integer containing the total length of the strings
         | in the left subtree and a pointer to the right subtree. Then
         | there are leaf nodes (the N0 case) which contain simply
         | strings. (There are some other cases in the type like N1, N3
         | and L2, but those are solely for balancing because I built my
         | ropes on top of 1-2 Brother Trees as described by Ralf Hinze,
         | and those aren't essential to the rope data structure.)
         | 
         | When indexing (which is necessary for the insertion and
         | deletion operations), you have a simple recursive algorithm
         | which can be best seen in the recursive "ins" function. In the
         | internal N2 nodes, the algorithm is to compare the index (given
         | as an argument) with the left metadata. If the index argument
         | is less than the left metadata, recurse to the left subtree
         | passing the same index; otherwise, recurse to the right
         | subtree, subtracting the index argument with the left metadata.
         | 
         | By the end, when you eventually reach the leaf case, the index
         | argument is equal to the position you want to insert into in
         | the current node. (I haven't tried to understand the maths
         | behind this but it's how the data structure works.) At that
         | point, all you do is insert into the leaf node's string (this
         | is the same as inserting at an arbitrary index in any normal
         | string) if you can without exceeding the maximum limit, or else
         | you can insert another node.
         | 
         | Then you unroll the recursion. Unrolling the recursion involves
         | updating the left subtree metadata when you reach the parent,
         | and it also involves balancing. (I'm using 1-2 Brother Trees
         | for balancing but ropes don't really care which balancing you
         | use or if you use one at all.)
         | 
         | That's pretty much all there is to ropes. The deletion and
         | substring algorithms just require minor modifications (the user
         | might specify a range that includes more than one subtree, so
         | you might need to recurse on both subtrees).
         | 
         | You can extend the idea behind ropes to hold more metadata too.
         | For example, rope.sml (Standard ML) also tracks line metadata
         | to allow indexing by line number. The changes required for this
         | are: store an array at the leaf nodes containing indices of
         | line breaks in the string also at this node, and at internal N2
         | nodes you should also store an additional integer indicating
         | the number of lines in the left subtree.
         | 
         | There is an idea I haven't found too useful which is, if two
         | leaf nodes have strings that can be joined without reaching the
         | maximum limit, then join them. I haven't found this idea to
         | improve performance much although it theoretically should.
         | 
         | I want to give a shout out to the MLton compiler for Standard
         | ML here - the two rope implementations compiled with it handily
         | beat the fastest ropes in Rust which is surprising. (My code
         | performs well with F# and OCaml, but MLton takes it a level
         | beyond that.)
        
         | bartekpacia wrote:
         | It draws from the original UNIX text editor called "ed".
         | 
         | https://zed.dev/faq#why-zed
        
       | alberth wrote:
       | I really want to love Zed.
       | 
       | And truly believe they are onto something big with multiplayer
       | coding, chat, collab.
       | 
       | But I've run into so many bugs I've had to stop using Zed.
       | 
       | The amount of time I've spent filing bugs, troubleshooting,
       | documenting errors etc - is counter productive.
       | 
       | It makes me realize that I'm old enough now to just want things
       | to work.
       | 
       | I'll give it another go once it hits 1.0
       | 
       | Until then, wishing them the best and for them to make the
       | developer world better for us all.
        
         | Terretta wrote:
         | > _The amount of time I 've spent troubling shoot..._
         | 
         | Conveys a lovely element of gardening: new bits of code appear
         | like shoots in the garden, often troubling if they aren't what
         | one planted, taking time to weed giving nutritious shoots room
         | to grow.
        
           | verticalscaler wrote:
           | Zed? More like Zen.
        
         | icholy wrote:
         | This is how I've been feeling about neovim recently.
        
           | pkos98 wrote:
           | are you referring to the actual editor or to the plugins you
           | used?
           | 
           | the editor itself has been very stable for me (daily usage)
           | and the plugin ecosystem definitely offers so many choices
           | that most of the time I can find stable/well-working ones for
           | my use case.
        
           | _virtu wrote:
           | I'm curious as to why. I finally went all in on neovim about
           | two years ago and have been having an amazingly stable
           | experience.
        
       | Syzygies wrote:
       | Ropes remind me of Haskell's                 type ShowS = String
       | -> String
       | 
       | `Text` is generally preferred to `String` in Haskell, but I've
       | happily written parsers that depend on `ShowS` to defer
       | concatenation to linear time final output. `Text` to me always
       | felt more C-like, and better suited to C-like applications.
       | 
       | I've survived in math partly by attempting to catalog how others
       | think. I sense a divide in Haskell, between people who prefer to
       | view the compiler as a hermetically sealed abstraction, and those
       | of us who try exploit what we think the compiler is doing. To
       | view `ShowS` as a rope one needs to consider how the compiler
       | handles lazy evaluation.
        
         | lupire wrote:
         | You don't need to anything about the compiler internals to know
         | that a String is a linked list, and that concatenating a linked
         | list is expensive, and that ShowS is a function interface that
         | enables optimized contatenation.
         | 
         | Rope is a data structure, not a function.
         | 
         | https://hackage.haskell.org/package/base-4.19.1.0/docs/GHC-S...
         | 
         | https://hackage.haskell.org/package/base-4.19.1.0/docs/Data-...
        
           | Syzygies wrote:
           | Data structures are useless without functions, and functions
           | are useless without data structures. In use, ropes are like
           | `ShowS`. It feels to me like you're making a grammatical
           | distinction, of a kind I try to avoid so I can think more
           | abstractly. Our categories are historical accident.
        
       | fallat wrote:
       | There are so many high performance editors that are just lists of
       | strings. It's evidently false they are not high performance.
        
         | MatthiasPortzel wrote:
         | Like what? Vim, Emacs, Helix, VSCode, Zed, Xi, Lapce, and
         | Monaco all do not.
        
           | sp33der89 wrote:
           | Kakoune does, and I found it (without extensions) to perform
           | excellent even with multicursor.
           | 
           | Not saying that array of lines is more performant than ropes,
           | but just adding a datapoint here.
        
           | aktau wrote:
           | What is Vim's internal data structure nowadays? I thought it
           | was essentially a list (array) of strings.
        
         | simonw wrote:
         | From the article:                   If you really love strings,
         | you might now         be thinking "better than a string? easy:
         | multiple strings." And you wouldn't be         that far off!
         | Some editors do represent         text as an array-of-lines
         | with each line         being a string. VS Code's Monaco editor
         | worked that way for quite a while, but an         array of
         | strings can still be plagued by         the same problems as a
         | single string.         Excessive memory consumption and
         | performance issues made the VS Code team         look for
         | something better.
        
       | mightyham wrote:
       | Similar to a SumTree, if all you need is incrementally computed
       | values without data persistence, a heap makes for a more
       | efficient data structure. I remember learning about this from the
       | following blog:
       | https://timvieira.github.io/blog/post/2016/11/21/heaps-for-i...
        
       | MatthiasPortzel wrote:
       | There's a good series of posts called "Rope Science" by the
       | author of the now-unmaintained Xi editor. These posts were cited
       | as an inspiration by the Lapce author.
       | 
       | => https://xi-editor.io/docs/rope_science_00.html
        
         | ayewo wrote:
         | I first learned of the rope data structure and piece table [1]
         | from the Tree Sitter docs: https://tree-sitter.github.io/tree-
         | sitter/using-parsers#prov...
         | 
         | 1: https://en.wikipedia.org/wiki/Piece_table
        
       | BaculumMeumEst wrote:
       | I've tried Zed and it's really nice but there is a very high bar
       | for switching to a new editor - it has to be sufficiently better
       | than alternatives and also have widespread adoption. There are so
       | many man hours that went into things like undo-tree and magit.
       | And thanks to the sheer popularity of mainstream editors, I can
       | get language support for languages that aren't even publicly
       | available like Jai.
       | 
       | I want to use it, but I don't think I am going to be able to
       | justify investing in new editors at this point unless there is
       | something extremely compelling about it.
        
         | cedws wrote:
         | I tried Zed a few weeks ago and was extremely impressed. I was
         | expecting it to be another half baked 'Rewrite it in Rust'
         | hobby project, but it's actually very much usable for the kind
         | of development I'm doing. Since then I've completely swapped it
         | out with VSCode and loving it so far. There's only one or two
         | small issues which I'm hoping will be ironed out soon. Zed is
         | also really really fast. I can feel the input latency is
         | significantly lower than VSCode and faster feedback helps me
         | write code more fluently.
        
           | bioneuralnet wrote:
           | Ditto! The only thing that occasionally bumps me is a few
           | missing features in vim-mode (marks & recording). But it
           | feels so much faster than VS Code, I'm willing to put up with
           | it so far.
        
       | jgalt212 wrote:
       | pylint yells at me when my modules are more than 1500 lines.
       | maybe it assumes the module is stored as string and not a rope.
        
       | josephg wrote:
       | I've seen the SumTree pattern show up a bunch recently. As I
       | understand it, VSCode has something similar for syntax
       | highlighting. And I've both seen and written several
       | implementations of this pattern in the various text CRDTs I've
       | interacted with. - though in each case the structure is given a
       | different name. (Eg RangeTree, ContentTree, G-Tree, etc).
       | 
       | It's nice to see the idea is catching on. It's a good one.
        
       ___________________________________________________________________
       (page generated 2024-04-28 23:00 UTC)