[HN Gopher] Zed Decoded: Rope and SumTree
___________________________________________________________________
Zed Decoded: Rope and SumTree
Author : avinassh
Score : 144 points
Date : 2024-04-28 12:00 UTC (11 hours ago)
(HTM) web link (zed.dev)
(TXT) w3m dump (zed.dev)
| rochak wrote:
| Great explanation. If anyone has good references for learning
| this by making a minimal editor like sed from scratch, please
| share. Wait, now that I think about it, does the name "Zed" draw
| inspiration from "sed"? If yes, that's rad.
| nickzelei wrote:
| I assumed it was the alternative pronunciation of Zero.
| humzashahid98 wrote:
| I have a simple (well, simple by my standards because I've
| spent a long time with it) implementation of a rope in Standard
| ML [0], OCaml [1] and F# [2]. (See tiny_rope.sml, brolib.fs or
| lib/tiny_rope.ml for implementation if you can read any of
| those languages; the F# implementation is probably easiest to
| read.)
|
| [0] https://github.com/hummy123/brolib-sml
|
| [1] https://github.com/hummy123/brolib
|
| [2] https://github.com/hummy123/brolib-fs
|
| The essence of a data structure is a binary tree where the
| internal nodes (the N2 case) contains a pointer to the left
| subtree, an integer containing the total length of the strings
| in the left subtree and a pointer to the right subtree. Then
| there are leaf nodes (the N0 case) which contain simply
| strings. (There are some other cases in the type like N1, N3
| and L2, but those are solely for balancing because I built my
| ropes on top of 1-2 Brother Trees as described by Ralf Hinze,
| and those aren't essential to the rope data structure.)
|
| When indexing (which is necessary for the insertion and
| deletion operations), you have a simple recursive algorithm
| which can be best seen in the recursive "ins" function. In the
| internal N2 nodes, the algorithm is to compare the index (given
| as an argument) with the left metadata. If the index argument
| is less than the left metadata, recurse to the left subtree
| passing the same index; otherwise, recurse to the right
| subtree, subtracting the index argument with the left metadata.
|
| By the end, when you eventually reach the leaf case, the index
| argument is equal to the position you want to insert into in
| the current node. (I haven't tried to understand the maths
| behind this but it's how the data structure works.) At that
| point, all you do is insert into the leaf node's string (this
| is the same as inserting at an arbitrary index in any normal
| string) if you can without exceeding the maximum limit, or else
| you can insert another node.
|
| Then you unroll the recursion. Unrolling the recursion involves
| updating the left subtree metadata when you reach the parent,
| and it also involves balancing. (I'm using 1-2 Brother Trees
| for balancing but ropes don't really care which balancing you
| use or if you use one at all.)
|
| That's pretty much all there is to ropes. The deletion and
| substring algorithms just require minor modifications (the user
| might specify a range that includes more than one subtree, so
| you might need to recurse on both subtrees).
|
| You can extend the idea behind ropes to hold more metadata too.
| For example, rope.sml (Standard ML) also tracks line metadata
| to allow indexing by line number. The changes required for this
| are: store an array at the leaf nodes containing indices of
| line breaks in the string also at this node, and at internal N2
| nodes you should also store an additional integer indicating
| the number of lines in the left subtree.
|
| There is an idea I haven't found too useful which is, if two
| leaf nodes have strings that can be joined without reaching the
| maximum limit, then join them. I haven't found this idea to
| improve performance much although it theoretically should.
|
| I want to give a shout out to the MLton compiler for Standard
| ML here - the two rope implementations compiled with it handily
| beat the fastest ropes in Rust which is surprising. (My code
| performs well with F# and OCaml, but MLton takes it a level
| beyond that.)
| bartekpacia wrote:
| It draws from the original UNIX text editor called "ed".
|
| https://zed.dev/faq#why-zed
| alberth wrote:
| I really want to love Zed.
|
| And truly believe they are onto something big with multiplayer
| coding, chat, collab.
|
| But I've run into so many bugs I've had to stop using Zed.
|
| The amount of time I've spent filing bugs, troubleshooting,
| documenting errors etc - is counter productive.
|
| It makes me realize that I'm old enough now to just want things
| to work.
|
| I'll give it another go once it hits 1.0
|
| Until then, wishing them the best and for them to make the
| developer world better for us all.
| Terretta wrote:
| > _The amount of time I 've spent troubling shoot..._
|
| Conveys a lovely element of gardening: new bits of code appear
| like shoots in the garden, often troubling if they aren't what
| one planted, taking time to weed giving nutritious shoots room
| to grow.
| verticalscaler wrote:
| Zed? More like Zen.
| icholy wrote:
| This is how I've been feeling about neovim recently.
| pkos98 wrote:
| are you referring to the actual editor or to the plugins you
| used?
|
| the editor itself has been very stable for me (daily usage)
| and the plugin ecosystem definitely offers so many choices
| that most of the time I can find stable/well-working ones for
| my use case.
| _virtu wrote:
| I'm curious as to why. I finally went all in on neovim about
| two years ago and have been having an amazingly stable
| experience.
| Syzygies wrote:
| Ropes remind me of Haskell's type ShowS = String
| -> String
|
| `Text` is generally preferred to `String` in Haskell, but I've
| happily written parsers that depend on `ShowS` to defer
| concatenation to linear time final output. `Text` to me always
| felt more C-like, and better suited to C-like applications.
|
| I've survived in math partly by attempting to catalog how others
| think. I sense a divide in Haskell, between people who prefer to
| view the compiler as a hermetically sealed abstraction, and those
| of us who try exploit what we think the compiler is doing. To
| view `ShowS` as a rope one needs to consider how the compiler
| handles lazy evaluation.
| lupire wrote:
| You don't need to anything about the compiler internals to know
| that a String is a linked list, and that concatenating a linked
| list is expensive, and that ShowS is a function interface that
| enables optimized contatenation.
|
| Rope is a data structure, not a function.
|
| https://hackage.haskell.org/package/base-4.19.1.0/docs/GHC-S...
|
| https://hackage.haskell.org/package/base-4.19.1.0/docs/Data-...
| Syzygies wrote:
| Data structures are useless without functions, and functions
| are useless without data structures. In use, ropes are like
| `ShowS`. It feels to me like you're making a grammatical
| distinction, of a kind I try to avoid so I can think more
| abstractly. Our categories are historical accident.
| fallat wrote:
| There are so many high performance editors that are just lists of
| strings. It's evidently false they are not high performance.
| MatthiasPortzel wrote:
| Like what? Vim, Emacs, Helix, VSCode, Zed, Xi, Lapce, and
| Monaco all do not.
| sp33der89 wrote:
| Kakoune does, and I found it (without extensions) to perform
| excellent even with multicursor.
|
| Not saying that array of lines is more performant than ropes,
| but just adding a datapoint here.
| aktau wrote:
| What is Vim's internal data structure nowadays? I thought it
| was essentially a list (array) of strings.
| simonw wrote:
| From the article: If you really love strings,
| you might now be thinking "better than a string? easy:
| multiple strings." And you wouldn't be that far off!
| Some editors do represent text as an array-of-lines
| with each line being a string. VS Code's Monaco editor
| worked that way for quite a while, but an array of
| strings can still be plagued by the same problems as a
| single string. Excessive memory consumption and
| performance issues made the VS Code team look for
| something better.
| mightyham wrote:
| Similar to a SumTree, if all you need is incrementally computed
| values without data persistence, a heap makes for a more
| efficient data structure. I remember learning about this from the
| following blog:
| https://timvieira.github.io/blog/post/2016/11/21/heaps-for-i...
| MatthiasPortzel wrote:
| There's a good series of posts called "Rope Science" by the
| author of the now-unmaintained Xi editor. These posts were cited
| as an inspiration by the Lapce author.
|
| => https://xi-editor.io/docs/rope_science_00.html
| ayewo wrote:
| I first learned of the rope data structure and piece table [1]
| from the Tree Sitter docs: https://tree-sitter.github.io/tree-
| sitter/using-parsers#prov...
|
| 1: https://en.wikipedia.org/wiki/Piece_table
| BaculumMeumEst wrote:
| I've tried Zed and it's really nice but there is a very high bar
| for switching to a new editor - it has to be sufficiently better
| than alternatives and also have widespread adoption. There are so
| many man hours that went into things like undo-tree and magit.
| And thanks to the sheer popularity of mainstream editors, I can
| get language support for languages that aren't even publicly
| available like Jai.
|
| I want to use it, but I don't think I am going to be able to
| justify investing in new editors at this point unless there is
| something extremely compelling about it.
| cedws wrote:
| I tried Zed a few weeks ago and was extremely impressed. I was
| expecting it to be another half baked 'Rewrite it in Rust'
| hobby project, but it's actually very much usable for the kind
| of development I'm doing. Since then I've completely swapped it
| out with VSCode and loving it so far. There's only one or two
| small issues which I'm hoping will be ironed out soon. Zed is
| also really really fast. I can feel the input latency is
| significantly lower than VSCode and faster feedback helps me
| write code more fluently.
| bioneuralnet wrote:
| Ditto! The only thing that occasionally bumps me is a few
| missing features in vim-mode (marks & recording). But it
| feels so much faster than VS Code, I'm willing to put up with
| it so far.
| jgalt212 wrote:
| pylint yells at me when my modules are more than 1500 lines.
| maybe it assumes the module is stored as string and not a rope.
| josephg wrote:
| I've seen the SumTree pattern show up a bunch recently. As I
| understand it, VSCode has something similar for syntax
| highlighting. And I've both seen and written several
| implementations of this pattern in the various text CRDTs I've
| interacted with. - though in each case the structure is given a
| different name. (Eg RangeTree, ContentTree, G-Tree, etc).
|
| It's nice to see the idea is catching on. It's a good one.
___________________________________________________________________
(page generated 2024-04-28 23:00 UTC)