[HN Gopher] Ropey - A UTF8 text rope for manipulating and editin...
___________________________________________________________________
Ropey - A UTF8 text rope for manipulating and editing large text
Author : keepamovin
Score : 163 points
Date : 2025-01-15 15:27 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| javier2 wrote:
| Any editors using this?
| recov wrote:
| Not that library in particular, but https://zed.dev/blog/zed-
| decoded-rope-sumtree
| cschmidt wrote:
| It seems like Helix is using it https://github.com/helix-
| editor/helix/blob/master/docs/archi...
| abound wrote:
| FWIW, I use Helix as my main editor and every time it has
| crashed (probably a few dozen times over a year or two, I've
| filed issues), it's related to bad text position stuff, where
| it effectively goes "out of bounds" on the text data
| structure.
|
| I think its mostly due to multiple buffers showing the same
| content, as opposed to this Ropey library directly.
| rdimartino wrote:
| I hadn't heard of rope data structures until I read about the xi
| editor (also written in Rust) a few years ago, but it looks like
| that's been discontinued.
|
| https://github.com/xi-editor/xi-editor
| kibwen wrote:
| The authors of Xi are currently working on Xilem, an
| experimental reactive UI framework for Rust:
| https://github.com/linebender/xilem
|
| In the announcement post, they mention that work on Xi is
| considered "on hold" rather than strictly discontinued:
| https://raphlinus.github.io/rust/gui/2022/05/07/ui-architect...
| infogulch wrote:
| Legendary-tier yak shaving.
|
| "I want to build an editor, but first I must solve rendering
| 2D graphics purely on the GPU, invent a parallelizable path
| solver, and code a human perception-based color value
| manipulation library."
| PoignardAzur wrote:
| You have no idea.
|
| I think we're at five or six levels of yaks by now.
|
| (xi -> xilem -> masonry -> vello -> peniko -> color)
| bbkane wrote:
| It's a lot of fun to follow, especially as its so different
| than my developmental expertise.
|
| You can see the current projects (13 active) on
| https://linebender.org , and several members post
| interesting checkins in https://xi.zulipchat.com/
| kstrauser wrote:
| This is the path to Enlightenment (17).
| mananaysiempre wrote:
| > first I must solve rendering 2D graphics purely on the
| GPU
|
| To be fair, the original author of Xi ('raphlinus) has been
| working on GPU-side 2D rendering much longer than on Xi.
| amanda99 wrote:
| Repo says "discontinued".
| daveguy wrote:
| Yes, the xi repo is discontinued. They recommend the lapce
| editor as the spiritual successor:
|
| https://github.com/lapce/lapce
| satvikpendem wrote:
| I'd also recommend Helix [0] (which also uses the rope
| data structure [1]), that's a more widely used editor
| also written in Rust.
|
| [0] https://github.com/helix-editor/helix
|
| [1] https://github.com/helix-
| editor/helix/blob/master/docs/archi...
| VeejayRampay wrote:
| helix is a really really good text editor / terminal IDE
|
| I'm seriously impressed by the level of quality out of
| the box
| pryelluw wrote:
| Thanks for posting. I discovered floem
| https://github.com/lapce/floem I've been looking for
| something like it
| itishappy wrote:
| Zed uses something similar to ropes as well:
|
| https://zed.dev/blog/zed-decoded-rope-sumtree
| PittleyDunkin wrote:
| Zed seems to be a gui-oriented editor here: https://zed.dev/
| supriyo-biswas wrote:
| You still need a backing data structure that holds the
| contents of your editor, and that's where you'd use a rope.
| infogulch wrote:
| Zed's Sum Tree is my favorite datastructure ever and is the
| future of database indexes.
| senderista wrote:
| I think this is what Guy Steele called a "monoid-cached
| tree":
|
| https://www.youtube.com/watch?v=ftcIcn8AmSY
| secondcoming wrote:
| Is it even possible to write any text editor without some form
| of rope data structure?
| caconym_ wrote:
| Gap buffers are the other classic option, and there are
| others too, e.g. piece tables.
| cschmidt wrote:
| Here's a paper reviewing the various choices, that is often
| mentioned in discussions around data structures for text
| editors:
|
| https://www.cs.unm.edu/~crowley/papers/sds.pdf
| marssaxman wrote:
| Most certainly: gap buffers, piece tables, and line arrays
| are also popular choices.
| neilv wrote:
| How would you associate non-character data with ranges of
| characters, such as for syntax coloring, semantic links, and
| references to points in the text?
|
| (I couldn't find a mention of this in the README, design.md, or
| examples.)
|
| In Emacs buffers, the concepts include _text properties_ ,
| _overlays_ , and _markers_.
| filcuk wrote:
| That would depend on your editor's implementation.
| neilv wrote:
| But, within this API, is there any support for the
| associations with non-character data?
|
| For example, if you delete some text these Ropey data
| structure, does Ropey have facilities to update the
| associated non-character data (such as deleting all or part
| of one or more chunks of the non-character data, and/or
| updating positional information)? Or do you have to do that
| separately outside of Ropey?
| zaphar wrote:
| A rope is only concerned with manipulating a string with
| very low cpu overhead while maintaining the illusion of a
| sequence of characters or bytes. It doesn't really care or
| maintain any other text decoration you might be
| maintaining. That is the concern of the consumer of the
| rope and I'm not sure there is a good common interface for
| that.
| neilv wrote:
| Thanks.
|
| I was a little confused, because the lede sentence was
| "Ropey is a utf8 text rope for Rust, designed to be the
| backing text-buffer for applications such as text
| editors."
|
| Pretty much all text editors are expected to implement
| decorations and references, _somehow_ , and some popular
| text buffer APIs support those.
| favorited wrote:
| If you'd like an example of how this can be done, Swift's
| AttributedString type is exactly that. It manages the
| association of runs of attributes (font, kerning, color,
| etc.) with Unicode text, and its backing storage uses a
| rope type provided by the swift-collections package. (The
| rope module itself isn't stabilized yet, so it's really
| only used for AttributedString at this point, AFAIK.)
|
| https://github.com/swiftlang/swift-
| foundation/tree/main/Sour...
|
| https://github.com/apple/swift-
| collections/tree/main/Sources...
| adastra22 wrote:
| I thought the defining feature of a text editor (as
| opposed to a word processor) is that it didn't have rich
| text decorations. Are we talking about the same thing?
| nicoburns wrote:
| Most text editors will support things like syntax
| highlighting, which are text-decorations even if they're
| nor user-managed.
| neilv wrote:
| Not rich text, but decorations like decorations on a data
| structure. (I was trying to match the terminology that I
| thought a previous commenter was using.)
| iLemming wrote:
| _I 'm sorry, it's only vaguely related, but maybe someone can
| share some ideas._
|
| What would be some good use-cases for using Ropey with Emacs?
| Maybe re-formatting/beautifying huge json files or something
| like that?
|
| _I didn 't have time yet to explore the project more closely,
| but it looks very interesting._
| ComputerGuru wrote:
| Rust is missing an abstraction over non-contiguous chunks of
| contiguous allocations of data that would make handling ropes
| seamless and more natural even for smaller sizes.
|
| C# has the concept of "Sequences" which is basically a
| generalization of a deque with associated classes and apis such
| as ReadOnlySequence and SequenceReader to encourage reduced
| allocations, reuse of existing buffers/slices even for
| composition, etc
|
| Knowing the rust community, I wouldn't be surprised if there's
| already an RFC for something like this.
| jzelinskie wrote:
| Is buf-list[0] what you're describing?
|
| [0]: https://crates.io/crates/buf-list
| gpm wrote:
| I think you might be looking for the bytes crate, which is
| pretty widely used in networking code:
| https://docs.rs/bytes/latest/bytes/index.html
|
| In general this sort of structure is the sort of thing I'd
| expect to see in an external crate in rust, not the standard
| library. So it's unlikely there's any RFCs, and more likely
| there's a few competing implementations lying around.
| zamalek wrote:
| Bytes is essentially multiple slices over a optimistically
| single contiguous arc buffer. It's basically the inverse of
| what the root comment is after (an array of buffers). It's a
| rather strange crate because network IO doesn't actually need
| contiguous memory.
|
| std does actually have a vague version of what the root
| comment wants: https://doc.rust-
| lang.org/std/io/struct.IoSlice.html and its sibling
| IoSliceMut (slicing, appending, inserting, etc. is out of
| scope for both - so not usable for rope stuff)
| cmrdporcupine wrote:
| Yah I'd Bytes' chief use is avoiding copies when dealing
| with distinct portions of (contiguous) buffers.
|
| It is not a tool for composing disparate pieces into one
| (while avoiding copies)
| derefr wrote:
| > It's a rather strange crate because network IO doesn't
| actually need contiguous memory.
|
| Network IO doesn't _need_ contiguous memory, no, but each
| side of the duplex kind of benefits from it in its own way:
|
| 1. on receive, you can treat a _contiguous_ received
| network datagram as its own little memory arena -- write
| code that sends sliced references to the contents of the
| datagram to other threads to work with, where those
| references keep the datagram arena itself alive for as long
| as it 's being worked with; and then drop the whole thing
| when the handling of the datagram is complete.
|
| (This is somewhat akin to the Erlang approach -- where the
| received message is a globally-shared binary; it gets
| passed by refcount into an actor started just for handling
| that request; that actor is spawned with its own
| preallocated memory arena; into that arena, the actor spits
| any temporaries related to copying/munging the slices of
| the shared binary, without having to grow the arena; the
| actor quickly finishes and dies; the arena is deallocated
| without ever having had to GC, and the refcount of the
| shared binary goes to zero -- unless non-copied slices of
| it were async-forwarded to other actors for further
| processing.)
|
| Also note that the whole premise here is _zero-copy_
| networking (as the bytes docs say:
| https://docs.rs/bytes/1.9.0/bytes/#bytes). The "message"
| being received here isn't a copy of the one from the
| network card, but literally the same physical wired memory
| the PHY sees as being part of its IO ring-buffer -- just
| also mapped into your process's memory on (zero-copy)
| receive. If this data came chunked, you'd _need_ to copy
| some of it to assemble those chunks into a contiguous
| string or data structure. But since it arrives
| contiguously, you can just slice it, and cast the resulting
| slice into whatever type you like.
|
| 2. on send -- presuming you're doing non-blocking IO --
| it's nice to once again have a _preallocated arena_ into
| which you can write out byte-sequences before flinging them
| at the kernel as [vectors of] large, contiguous DMA
| requests, without having to stop to allocate. (This removes
| the CPU as a bottleneck from IO performance -- think
| writev(2).)
|
| The ideal design here is that you allocate fixed-sized
| refcounted buffers; fill them up until the next thing you
| want to write doesn't fit+; and then intentionally drop the
| current buffer, switching your write_arena reference to
| point to a freshly-allocated buffer; and repeating. Each
| buffer then lives until all its slice-references get
| consumed. This forms kind of a "memory-lifetime-managed
| buffer-persisted message queue" -- with the backing buffers
| of your messages living until all the messages held in them
| get "ACKed" [i.e. dropped by the receiving threads.]
|
| Also, rather than having the buffers deallocate when you
| "use them up" -- requiring you to allocate the next time
| you need a buffer -- you can instead have the buffer's
| destructor release the memory it's holding into a buffer
| pool; and then have your next-buffer-please logic pull from
| that pool in preference to allocating. But then you'll want
| a higher-level "writable stream that is actually a mempool
| + current write_arena reference" type. (Hey, that's
| BufMut!)
|
| + And at that point, when the next message doesn't fit, you
| _do not split the message_. That violates the whole premise
| of vectorizing the writes. Instead, you leave some of the
| buffer unused, and push the large message into a fresh
| buffer, so that the message will still correspond to a
| single vectorized-write element / io_uring call / DMA
| request / etc. If the message is so large it won't fit in
| your default buffer size, you allocate a buffer just for
| that one message, or better yet, you utilize a special
| second pool of _larger_ fixed-size buffers. "Jumbo"
| buffers, per se.
|
| (Get it yet? Networking _hardware_ is also doing exactly
| what I 'm describing here to pack and unpack your packets
| into frames. For a NIC or switch, the buffers _are_ the
| [bodies of the] frames; a jumbo buffer is an Ethernet jumbo
| frame; and so on.)
| zamalek wrote:
| > Get it yet
|
| I'm not sure if your comment was meant to be
| condescending, but it really does come across at that.
| I'm very well versed in this domain.
|
| Having a per-request/connection arena isn't the only
| option. What I have seen/use, which is still zero copy
| (as far as IO zero copy can be in Rust without resorting
| to bytemuck/blittable types), is to have a pool of
| buffers of a specific length - typically page-sized by
| default and definitely page-aligned. These buffers can
| come from a single large contiguous allocation. If you
| run out of space in a buffer you grab a new/reused one
| from the pool, add it to your vec of buffers, and carry
| on. At the end of the story you would use vectored IO to
| submit all of them at once - all the way down to the NIC
| and everything.
|
| This approach is more widespread mainly due to historical
| reasons: it's really easy to fragment 32bit address
| space, so allocating jumbo buffers simply wasn't an
| option if you didn't want your server OOMing with 1GB of
| available (but non-contiguous) memory.
|
| https://man7.org/linux/man-pages/man3/iovec.3type.html
|
| https://learn.microsoft.com/en-
| us/windows/win32/api/ws2def/n...
| deathanatos wrote:
| Hmm. It's similar to, but not fully, a `BufRead`? Maybe a
| `BufRead + Seek`. The slicing ability isn't really covered by
| those traits, though, but I think you could wrap a BufRead+Seek
| in something that effectively slices it.
|
| A `BufRead + Seek` need not be backed by memory, though, except
| in the midst of being read. (A buffered normal file implements
| `BufRead + Seek`, for example.)
|
| I feel like either Iterator or in some rare case of requiring
| generic indexing, Index, are more important than "it is
| composed of some number of linked memory allocations"?
|
| A ReadOnlySequence seems to imply a linked-list of memory
| sections though; I'm not sure a good rope is going to be able
| to non-trivially interface with that, since the rope is a tree;
| walking the nodes in sequence is possible, but it's a tree
| walk, and something like ReadOnlySequenceSegment::Next() is
| then a bit _tricky_. (You could gather the set of nodes into an
| array ahead of time, but now merely turning it into that is
| O(nodes) which is sad.)
|
| (And while it might be tempting to say "have the leaf nodes be
| a LL", I don't think you _want_ to, as it means that inserts
| need to adjust those links, and I think you would rather have
| mutations produce a cheaply made but entirely new tree, which I
| don 't think permits a LL of the leafs. You want this to make
| undo/redo cheap: it's just "go back to the last rope", and then
| all the ropes share the underlying character data that's not
| changing rope to rope. The rope in the OP seems to support
| this: "Cloning ropes is extremely cheap. Rope clones share
| data,")
| caconym_ wrote:
| I wrote a utf-8 capable (but also fully generic over element
| type) rope implementation in Rust last fall (edit: 2023) and
| the main issue I ran into was the lack of a suitable regex
| library capable of working across slice boundaries. With some
| finagling I did manage to get it to work with most/all of the
| other relevant iterator/reader traits IIRC, and it benchmarked
| fairly well from a practical perspective, though it's not as
| fast as some of the other explicitly performance-focused
| implementations out there.
|
| I'm afraid I might not have that much free time again for a
| long time, but maybe when I do, somebody will have solved the
| regex issue for me...
| duped wrote:
| Vec<Vec<T>>?
| theolivenbaum wrote:
| There's also a really nice implementation of Rope for C# here:
| https://github.com/FlatlinerDOA/Rope
| pixelpoet wrote:
| Author is perhaps better known for his really great path tracer
| (and attendant blog), Psychopath:
| https://github.com/cessen/psychopath
|
| Also, I have to wonder when this fad of loudly announcing when
| something is written in Rust will finally come to pass. Maybe
| software written in other languages should loudly announce it in
| every second sentence? To me at least it's become as self-
| aggrandizingly cringe as "Sent from my iPhone" at the end of
| every email...
| mattdw wrote:
| When it's a library of code, the language it is written in is
| pretty pertinent information as that's the language it has to
| be consumed from...
| pixelpoet wrote:
| I get that of course, but on the other hand I'm sure you know
| what I'm getting at too: users of certain languages,
| platforms etc feel the need to announce it as a point of
| pride, or feature in itself, and frequently. Every language
| will have this to some degree (with the possible exception of
| COBOL, lol), but there are definite outliers.
| itishappy wrote:
| I believe HN actually had an article for a Minecraft server
| written in COBOL about a month ago, lol.
|
| https://news.ycombinator.com/item?id=42513022
| uecker wrote:
| Maybe it is already considered an achievement if someone
| manages to write a program in Rust.
| ricciardo wrote:
| Think about the implications of the language though. When
| something is written in rust, management of memory is safer,
| multi threaded applications are safer, etc... (due to the
| nature of the language). If something is written in C++, the
| developer might be more inclined to review the code and tests
| to ensure proper handling of memory as well as determining if
| (when there is) non-deterministic behavior is safe. Hence, when
| highlighting something is written in Rust, it might not be just
| for the buzzword but also for something like developer
| confidence.
| xboxnolifes wrote:
| I think you've just become primed to reflex on seeing the word
| rust. The Readme has only 2 uses of the word rust: the header,
| and in the features. One tells you the language the library is
| for, the other is used as context for a type.
| J_Shelby_J wrote:
| When it stops baiting engagement with these sort of comments,
| probably.
| keybored wrote:
| I for one am fed up with people advertising that they are using
| rope for their string implementation in an editor or editor-
| adjacent library. Ugh. We get it. Niche datastructure, woah so
| cool. You really went above and beyond and read that appendix
| to your data structures and algorithms textbook...!
| qup wrote:
| What a perfect readme.
|
| Kudos to the author.
| mathfailure wrote:
| No need for sarcasm, maybe target auditory already knows what
| 'rope' is.
| doormatt wrote:
| Why do you assume sarcasm? I thought the readme was
| unsarcastically excellent as well.
| qup wrote:
| No sarcasm, it was good enough to compliment
| rileytg wrote:
| i hope someone can use this to create an editor similar to
| notepad++ that is cross platform. I have not found an editor that
| can handle large files as well as notepad++ on non windows
| systems. Last I looked into this, the issue was lack of low level
| libraries to handle large files.
| ori_b wrote:
| You're someone. Start typing.
| nicoburns wrote:
| It's not open source, but sublime text does well with large
| files (depending on your definition of large, but several GB is
| fine)
| RockRobotRock wrote:
| built in rust btw
| ggregoire wrote:
| Didn't know about this data structure. What are some use cases
| other than text editors? The article on wikipedia [1] doesn't
| expand much on this.
|
| [1] https://en.wikipedia.org/wiki/Rope_(data_structure)
| josephg wrote:
| We use them pretty heavily in realtime collaborative editing
| libraries for text. Ie, text CRDTs. In diamond types, merging a
| branch into another branch requires (essentially) replaying a
| whole lot of edit operations. Using a rope makes that fast.
| devit wrote:
| "Handling texts that are larger than available memory. Ropey is
| an in-memory data structure."
|
| That seems to make it of dubious use, not really suitable for a
| well-engineered text editor.
|
| The fact that it's UTF-8 only is also a serious problem since
| files can contain arbitrary byte sequences.
| TheRealPomax wrote:
| What text editor do you use where loading a large text file
| _doesn 't_ put that entire text file in memory? In a world
| where even sub-$100 devices have gigabytes of memory, this
| doesn't seem even remotely problematic.
| abtinf wrote:
| What kind of wild use case involves _editing_ a _text file_
| larger than modern ram?
___________________________________________________________________
(page generated 2025-01-15 23:00 UTC)