[HN Gopher] Ropey - A UTF8 text rope for manipulating and editin...
       ___________________________________________________________________
        
       Ropey - A UTF8 text rope for manipulating and editing large text
        
       Author : keepamovin
       Score  : 163 points
       Date   : 2025-01-15 15:27 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | javier2 wrote:
       | Any editors using this?
        
         | recov wrote:
         | Not that library in particular, but https://zed.dev/blog/zed-
         | decoded-rope-sumtree
        
         | cschmidt wrote:
         | It seems like Helix is using it https://github.com/helix-
         | editor/helix/blob/master/docs/archi...
        
           | abound wrote:
           | FWIW, I use Helix as my main editor and every time it has
           | crashed (probably a few dozen times over a year or two, I've
           | filed issues), it's related to bad text position stuff, where
           | it effectively goes "out of bounds" on the text data
           | structure.
           | 
           | I think its mostly due to multiple buffers showing the same
           | content, as opposed to this Ropey library directly.
        
       | rdimartino wrote:
       | I hadn't heard of rope data structures until I read about the xi
       | editor (also written in Rust) a few years ago, but it looks like
       | that's been discontinued.
       | 
       | https://github.com/xi-editor/xi-editor
        
         | kibwen wrote:
         | The authors of Xi are currently working on Xilem, an
         | experimental reactive UI framework for Rust:
         | https://github.com/linebender/xilem
         | 
         | In the announcement post, they mention that work on Xi is
         | considered "on hold" rather than strictly discontinued:
         | https://raphlinus.github.io/rust/gui/2022/05/07/ui-architect...
        
           | infogulch wrote:
           | Legendary-tier yak shaving.
           | 
           | "I want to build an editor, but first I must solve rendering
           | 2D graphics purely on the GPU, invent a parallelizable path
           | solver, and code a human perception-based color value
           | manipulation library."
        
             | PoignardAzur wrote:
             | You have no idea.
             | 
             | I think we're at five or six levels of yaks by now.
             | 
             | (xi -> xilem -> masonry -> vello -> peniko -> color)
        
             | bbkane wrote:
             | It's a lot of fun to follow, especially as its so different
             | than my developmental expertise.
             | 
             | You can see the current projects (13 active) on
             | https://linebender.org , and several members post
             | interesting checkins in https://xi.zulipchat.com/
        
             | kstrauser wrote:
             | This is the path to Enlightenment (17).
        
             | mananaysiempre wrote:
             | > first I must solve rendering 2D graphics purely on the
             | GPU
             | 
             | To be fair, the original author of Xi ('raphlinus) has been
             | working on GPU-side 2D rendering much longer than on Xi.
        
           | amanda99 wrote:
           | Repo says "discontinued".
        
             | daveguy wrote:
             | Yes, the xi repo is discontinued. They recommend the lapce
             | editor as the spiritual successor:
             | 
             | https://github.com/lapce/lapce
        
               | satvikpendem wrote:
               | I'd also recommend Helix [0] (which also uses the rope
               | data structure [1]), that's a more widely used editor
               | also written in Rust.
               | 
               | [0] https://github.com/helix-editor/helix
               | 
               | [1] https://github.com/helix-
               | editor/helix/blob/master/docs/archi...
        
               | VeejayRampay wrote:
               | helix is a really really good text editor / terminal IDE
               | 
               | I'm seriously impressed by the level of quality out of
               | the box
        
               | pryelluw wrote:
               | Thanks for posting. I discovered floem
               | https://github.com/lapce/floem I've been looking for
               | something like it
        
         | itishappy wrote:
         | Zed uses something similar to ropes as well:
         | 
         | https://zed.dev/blog/zed-decoded-rope-sumtree
        
           | PittleyDunkin wrote:
           | Zed seems to be a gui-oriented editor here: https://zed.dev/
        
             | supriyo-biswas wrote:
             | You still need a backing data structure that holds the
             | contents of your editor, and that's where you'd use a rope.
        
           | infogulch wrote:
           | Zed's Sum Tree is my favorite datastructure ever and is the
           | future of database indexes.
        
             | senderista wrote:
             | I think this is what Guy Steele called a "monoid-cached
             | tree":
             | 
             | https://www.youtube.com/watch?v=ftcIcn8AmSY
        
         | secondcoming wrote:
         | Is it even possible to write any text editor without some form
         | of rope data structure?
        
           | caconym_ wrote:
           | Gap buffers are the other classic option, and there are
           | others too, e.g. piece tables.
        
           | cschmidt wrote:
           | Here's a paper reviewing the various choices, that is often
           | mentioned in discussions around data structures for text
           | editors:
           | 
           | https://www.cs.unm.edu/~crowley/papers/sds.pdf
        
           | marssaxman wrote:
           | Most certainly: gap buffers, piece tables, and line arrays
           | are also popular choices.
        
       | neilv wrote:
       | How would you associate non-character data with ranges of
       | characters, such as for syntax coloring, semantic links, and
       | references to points in the text?
       | 
       | (I couldn't find a mention of this in the README, design.md, or
       | examples.)
       | 
       | In Emacs buffers, the concepts include _text properties_ ,
       | _overlays_ , and _markers_.
        
         | filcuk wrote:
         | That would depend on your editor's implementation.
        
           | neilv wrote:
           | But, within this API, is there any support for the
           | associations with non-character data?
           | 
           | For example, if you delete some text these Ropey data
           | structure, does Ropey have facilities to update the
           | associated non-character data (such as deleting all or part
           | of one or more chunks of the non-character data, and/or
           | updating positional information)? Or do you have to do that
           | separately outside of Ropey?
        
             | zaphar wrote:
             | A rope is only concerned with manipulating a string with
             | very low cpu overhead while maintaining the illusion of a
             | sequence of characters or bytes. It doesn't really care or
             | maintain any other text decoration you might be
             | maintaining. That is the concern of the consumer of the
             | rope and I'm not sure there is a good common interface for
             | that.
        
               | neilv wrote:
               | Thanks.
               | 
               | I was a little confused, because the lede sentence was
               | "Ropey is a utf8 text rope for Rust, designed to be the
               | backing text-buffer for applications such as text
               | editors."
               | 
               | Pretty much all text editors are expected to implement
               | decorations and references, _somehow_ , and some popular
               | text buffer APIs support those.
        
               | favorited wrote:
               | If you'd like an example of how this can be done, Swift's
               | AttributedString type is exactly that. It manages the
               | association of runs of attributes (font, kerning, color,
               | etc.) with Unicode text, and its backing storage uses a
               | rope type provided by the swift-collections package. (The
               | rope module itself isn't stabilized yet, so it's really
               | only used for AttributedString at this point, AFAIK.)
               | 
               | https://github.com/swiftlang/swift-
               | foundation/tree/main/Sour...
               | 
               | https://github.com/apple/swift-
               | collections/tree/main/Sources...
        
               | adastra22 wrote:
               | I thought the defining feature of a text editor (as
               | opposed to a word processor) is that it didn't have rich
               | text decorations. Are we talking about the same thing?
        
               | nicoburns wrote:
               | Most text editors will support things like syntax
               | highlighting, which are text-decorations even if they're
               | nor user-managed.
        
               | neilv wrote:
               | Not rich text, but decorations like decorations on a data
               | structure. (I was trying to match the terminology that I
               | thought a previous commenter was using.)
        
         | iLemming wrote:
         | _I 'm sorry, it's only vaguely related, but maybe someone can
         | share some ideas._
         | 
         | What would be some good use-cases for using Ropey with Emacs?
         | Maybe re-formatting/beautifying huge json files or something
         | like that?
         | 
         |  _I didn 't have time yet to explore the project more closely,
         | but it looks very interesting._
        
       | ComputerGuru wrote:
       | Rust is missing an abstraction over non-contiguous chunks of
       | contiguous allocations of data that would make handling ropes
       | seamless and more natural even for smaller sizes.
       | 
       | C# has the concept of "Sequences" which is basically a
       | generalization of a deque with associated classes and apis such
       | as ReadOnlySequence and SequenceReader to encourage reduced
       | allocations, reuse of existing buffers/slices even for
       | composition, etc
       | 
       | Knowing the rust community, I wouldn't be surprised if there's
       | already an RFC for something like this.
        
         | jzelinskie wrote:
         | Is buf-list[0] what you're describing?
         | 
         | [0]: https://crates.io/crates/buf-list
        
         | gpm wrote:
         | I think you might be looking for the bytes crate, which is
         | pretty widely used in networking code:
         | https://docs.rs/bytes/latest/bytes/index.html
         | 
         | In general this sort of structure is the sort of thing I'd
         | expect to see in an external crate in rust, not the standard
         | library. So it's unlikely there's any RFCs, and more likely
         | there's a few competing implementations lying around.
        
           | zamalek wrote:
           | Bytes is essentially multiple slices over a optimistically
           | single contiguous arc buffer. It's basically the inverse of
           | what the root comment is after (an array of buffers). It's a
           | rather strange crate because network IO doesn't actually need
           | contiguous memory.
           | 
           | std does actually have a vague version of what the root
           | comment wants: https://doc.rust-
           | lang.org/std/io/struct.IoSlice.html and its sibling
           | IoSliceMut (slicing, appending, inserting, etc. is out of
           | scope for both - so not usable for rope stuff)
        
             | cmrdporcupine wrote:
             | Yah I'd Bytes' chief use is avoiding copies when dealing
             | with distinct portions of (contiguous) buffers.
             | 
             | It is not a tool for composing disparate pieces into one
             | (while avoiding copies)
        
             | derefr wrote:
             | > It's a rather strange crate because network IO doesn't
             | actually need contiguous memory.
             | 
             | Network IO doesn't _need_ contiguous memory, no, but each
             | side of the duplex kind of benefits from it in its own way:
             | 
             | 1. on receive, you can treat a _contiguous_ received
             | network datagram as its own little memory arena -- write
             | code that sends sliced references to the contents of the
             | datagram to other threads to work with, where those
             | references keep the datagram arena itself alive for as long
             | as it 's being worked with; and then drop the whole thing
             | when the handling of the datagram is complete.
             | 
             | (This is somewhat akin to the Erlang approach -- where the
             | received message is a globally-shared binary; it gets
             | passed by refcount into an actor started just for handling
             | that request; that actor is spawned with its own
             | preallocated memory arena; into that arena, the actor spits
             | any temporaries related to copying/munging the slices of
             | the shared binary, without having to grow the arena; the
             | actor quickly finishes and dies; the arena is deallocated
             | without ever having had to GC, and the refcount of the
             | shared binary goes to zero -- unless non-copied slices of
             | it were async-forwarded to other actors for further
             | processing.)
             | 
             | Also note that the whole premise here is _zero-copy_
             | networking (as the bytes docs say:
             | https://docs.rs/bytes/1.9.0/bytes/#bytes). The "message"
             | being received here isn't a copy of the one from the
             | network card, but literally the same physical wired memory
             | the PHY sees as being part of its IO ring-buffer -- just
             | also mapped into your process's memory on (zero-copy)
             | receive. If this data came chunked, you'd _need_ to copy
             | some of it to assemble those chunks into a contiguous
             | string or data structure. But since it arrives
             | contiguously, you can just slice it, and cast the resulting
             | slice into whatever type you like.
             | 
             | 2. on send -- presuming you're doing non-blocking IO --
             | it's nice to once again have a _preallocated arena_ into
             | which you can write out byte-sequences before flinging them
             | at the kernel as [vectors of] large, contiguous DMA
             | requests, without having to stop to allocate. (This removes
             | the CPU as a bottleneck from IO performance -- think
             | writev(2).)
             | 
             | The ideal design here is that you allocate fixed-sized
             | refcounted buffers; fill them up until the next thing you
             | want to write doesn't fit+; and then intentionally drop the
             | current buffer, switching your write_arena reference to
             | point to a freshly-allocated buffer; and repeating. Each
             | buffer then lives until all its slice-references get
             | consumed. This forms kind of a "memory-lifetime-managed
             | buffer-persisted message queue" -- with the backing buffers
             | of your messages living until all the messages held in them
             | get "ACKed" [i.e. dropped by the receiving threads.]
             | 
             | Also, rather than having the buffers deallocate when you
             | "use them up" -- requiring you to allocate the next time
             | you need a buffer -- you can instead have the buffer's
             | destructor release the memory it's holding into a buffer
             | pool; and then have your next-buffer-please logic pull from
             | that pool in preference to allocating. But then you'll want
             | a higher-level "writable stream that is actually a mempool
             | + current write_arena reference" type. (Hey, that's
             | BufMut!)
             | 
             | + And at that point, when the next message doesn't fit, you
             | _do not split the message_. That violates the whole premise
             | of vectorizing the writes. Instead, you leave some of the
             | buffer unused, and push the large message into a fresh
             | buffer, so that the message will still correspond to a
             | single vectorized-write element  / io_uring call / DMA
             | request / etc. If the message is so large it won't fit in
             | your default buffer size, you allocate a buffer just for
             | that one message, or better yet, you utilize a special
             | second pool of _larger_ fixed-size buffers.  "Jumbo"
             | buffers, per se.
             | 
             | (Get it yet? Networking _hardware_ is also doing exactly
             | what I 'm describing here to pack and unpack your packets
             | into frames. For a NIC or switch, the buffers _are_ the
             | [bodies of the] frames; a jumbo buffer is an Ethernet jumbo
             | frame; and so on.)
        
               | zamalek wrote:
               | > Get it yet
               | 
               | I'm not sure if your comment was meant to be
               | condescending, but it really does come across at that.
               | I'm very well versed in this domain.
               | 
               | Having a per-request/connection arena isn't the only
               | option. What I have seen/use, which is still zero copy
               | (as far as IO zero copy can be in Rust without resorting
               | to bytemuck/blittable types), is to have a pool of
               | buffers of a specific length - typically page-sized by
               | default and definitely page-aligned. These buffers can
               | come from a single large contiguous allocation. If you
               | run out of space in a buffer you grab a new/reused one
               | from the pool, add it to your vec of buffers, and carry
               | on. At the end of the story you would use vectored IO to
               | submit all of them at once - all the way down to the NIC
               | and everything.
               | 
               | This approach is more widespread mainly due to historical
               | reasons: it's really easy to fragment 32bit address
               | space, so allocating jumbo buffers simply wasn't an
               | option if you didn't want your server OOMing with 1GB of
               | available (but non-contiguous) memory.
               | 
               | https://man7.org/linux/man-pages/man3/iovec.3type.html
               | 
               | https://learn.microsoft.com/en-
               | us/windows/win32/api/ws2def/n...
        
         | deathanatos wrote:
         | Hmm. It's similar to, but not fully, a `BufRead`? Maybe a
         | `BufRead + Seek`. The slicing ability isn't really covered by
         | those traits, though, but I think you could wrap a BufRead+Seek
         | in something that effectively slices it.
         | 
         | A `BufRead + Seek` need not be backed by memory, though, except
         | in the midst of being read. (A buffered normal file implements
         | `BufRead + Seek`, for example.)
         | 
         | I feel like either Iterator or in some rare case of requiring
         | generic indexing, Index, are more important than "it is
         | composed of some number of linked memory allocations"?
         | 
         | A ReadOnlySequence seems to imply a linked-list of memory
         | sections though; I'm not sure a good rope is going to be able
         | to non-trivially interface with that, since the rope is a tree;
         | walking the nodes in sequence is possible, but it's a tree
         | walk, and something like ReadOnlySequenceSegment::Next() is
         | then a bit _tricky_. (You could gather the set of nodes into an
         | array ahead of time, but now merely turning it into that is
         | O(nodes) which is sad.)
         | 
         | (And while it might be tempting to say "have the leaf nodes be
         | a LL", I don't think you _want_ to, as it means that inserts
         | need to adjust those links, and I think you would rather have
         | mutations produce a cheaply made but entirely new tree, which I
         | don 't think permits a LL of the leafs. You want this to make
         | undo/redo cheap: it's just "go back to the last rope", and then
         | all the ropes share the underlying character data that's not
         | changing rope to rope. The rope in the OP seems to support
         | this: "Cloning ropes is extremely cheap. Rope clones share
         | data,")
        
         | caconym_ wrote:
         | I wrote a utf-8 capable (but also fully generic over element
         | type) rope implementation in Rust last fall (edit: 2023) and
         | the main issue I ran into was the lack of a suitable regex
         | library capable of working across slice boundaries. With some
         | finagling I did manage to get it to work with most/all of the
         | other relevant iterator/reader traits IIRC, and it benchmarked
         | fairly well from a practical perspective, though it's not as
         | fast as some of the other explicitly performance-focused
         | implementations out there.
         | 
         | I'm afraid I might not have that much free time again for a
         | long time, but maybe when I do, somebody will have solved the
         | regex issue for me...
        
         | duped wrote:
         | Vec<Vec<T>>?
        
         | theolivenbaum wrote:
         | There's also a really nice implementation of Rope for C# here:
         | https://github.com/FlatlinerDOA/Rope
        
       | pixelpoet wrote:
       | Author is perhaps better known for his really great path tracer
       | (and attendant blog), Psychopath:
       | https://github.com/cessen/psychopath
       | 
       | Also, I have to wonder when this fad of loudly announcing when
       | something is written in Rust will finally come to pass. Maybe
       | software written in other languages should loudly announce it in
       | every second sentence? To me at least it's become as self-
       | aggrandizingly cringe as "Sent from my iPhone" at the end of
       | every email...
        
         | mattdw wrote:
         | When it's a library of code, the language it is written in is
         | pretty pertinent information as that's the language it has to
         | be consumed from...
        
           | pixelpoet wrote:
           | I get that of course, but on the other hand I'm sure you know
           | what I'm getting at too: users of certain languages,
           | platforms etc feel the need to announce it as a point of
           | pride, or feature in itself, and frequently. Every language
           | will have this to some degree (with the possible exception of
           | COBOL, lol), but there are definite outliers.
        
             | itishappy wrote:
             | I believe HN actually had an article for a Minecraft server
             | written in COBOL about a month ago, lol.
             | 
             | https://news.ycombinator.com/item?id=42513022
        
         | uecker wrote:
         | Maybe it is already considered an achievement if someone
         | manages to write a program in Rust.
        
         | ricciardo wrote:
         | Think about the implications of the language though. When
         | something is written in rust, management of memory is safer,
         | multi threaded applications are safer, etc... (due to the
         | nature of the language). If something is written in C++, the
         | developer might be more inclined to review the code and tests
         | to ensure proper handling of memory as well as determining if
         | (when there is) non-deterministic behavior is safe. Hence, when
         | highlighting something is written in Rust, it might not be just
         | for the buzzword but also for something like developer
         | confidence.
        
         | xboxnolifes wrote:
         | I think you've just become primed to reflex on seeing the word
         | rust. The Readme has only 2 uses of the word rust: the header,
         | and in the features. One tells you the language the library is
         | for, the other is used as context for a type.
        
         | J_Shelby_J wrote:
         | When it stops baiting engagement with these sort of comments,
         | probably.
        
         | keybored wrote:
         | I for one am fed up with people advertising that they are using
         | rope for their string implementation in an editor or editor-
         | adjacent library. Ugh. We get it. Niche datastructure, woah so
         | cool. You really went above and beyond and read that appendix
         | to your data structures and algorithms textbook...!
        
       | qup wrote:
       | What a perfect readme.
       | 
       | Kudos to the author.
        
         | mathfailure wrote:
         | No need for sarcasm, maybe target auditory already knows what
         | 'rope' is.
        
           | doormatt wrote:
           | Why do you assume sarcasm? I thought the readme was
           | unsarcastically excellent as well.
        
           | qup wrote:
           | No sarcasm, it was good enough to compliment
        
       | rileytg wrote:
       | i hope someone can use this to create an editor similar to
       | notepad++ that is cross platform. I have not found an editor that
       | can handle large files as well as notepad++ on non windows
       | systems. Last I looked into this, the issue was lack of low level
       | libraries to handle large files.
        
         | ori_b wrote:
         | You're someone. Start typing.
        
         | nicoburns wrote:
         | It's not open source, but sublime text does well with large
         | files (depending on your definition of large, but several GB is
         | fine)
        
       | RockRobotRock wrote:
       | built in rust btw
        
       | ggregoire wrote:
       | Didn't know about this data structure. What are some use cases
       | other than text editors? The article on wikipedia [1] doesn't
       | expand much on this.
       | 
       | [1] https://en.wikipedia.org/wiki/Rope_(data_structure)
        
         | josephg wrote:
         | We use them pretty heavily in realtime collaborative editing
         | libraries for text. Ie, text CRDTs. In diamond types, merging a
         | branch into another branch requires (essentially) replaying a
         | whole lot of edit operations. Using a rope makes that fast.
        
       | devit wrote:
       | "Handling texts that are larger than available memory. Ropey is
       | an in-memory data structure."
       | 
       | That seems to make it of dubious use, not really suitable for a
       | well-engineered text editor.
       | 
       | The fact that it's UTF-8 only is also a serious problem since
       | files can contain arbitrary byte sequences.
        
         | TheRealPomax wrote:
         | What text editor do you use where loading a large text file
         | _doesn 't_ put that entire text file in memory? In a world
         | where even sub-$100 devices have gigabytes of memory, this
         | doesn't seem even remotely problematic.
        
         | abtinf wrote:
         | What kind of wild use case involves _editing_ a _text file_
         | larger than modern ram?
        
       ___________________________________________________________________
       (page generated 2025-01-15 23:00 UTC)