[HN Gopher] Text Editor: Data Structures
       ___________________________________________________________________
        
       Text Editor: Data Structures
        
       Author : ibobev
       Score  : 117 points
       Date   : 2023-12-26 15:48 UTC (7 hours ago)
        
 (HTM) web link (www.averylaird.com)
 (TXT) w3m dump (www.averylaird.com)
        
       | Hammershaft wrote:
       | Emacs is the fastest editor the author has worked with!!??
        
         | fleur-de-lotus wrote:
         | He never tried anything else. For speed, vi is number one,
         | followed by sublimetext.
        
           | TwentyPosts wrote:
           | Even the suggestion that vim is fast is funny to me.
           | 
           | Vim feels sluggish compared to NeoVim.
           | 
           | NeoVim feels sluggish compared to Helix.
        
             | behnamoh wrote:
             | > NeoVim feels sluggish compared to Helix.
             | 
             | Of course, if you offer only a portion of NeoVim's features
             | the speed improvement should not be surprising.
        
               | TwentyPosts wrote:
               | At the same time, NeoVim lacks many, many of Helix'
               | features unless you use plugins. That's fine, of course,
               | plugins and customization are part of what makes NeoVim
               | so great.
               | 
               | Anyway, can you name a few NeoVim features you miss in
               | Helix? I still use NeoVim now and then, so I'd be happy
               | to learn more, even if it's something small.
        
         | tom_ wrote:
         | There are other editors?!
        
         | TacticalCoder wrote:
         | To be fair on a modern setup it _is_ fast. Sure it 's not nano
         | or vi(m) but compared to Electron-based stuff, it's plenty
         | fast.
         | 
         | Five years ago:                   time emacs -Q -eval '(kill-
         | emacs)'
         | 
         | would be slow at 320 ms.
         | 
         | But on a "recent" AMD 7700X that'd be 80ms.
         | 
         | That's starting emacs, evaluating elisp code, and exiting
         | Emacs.
         | 
         | Surely if you can start Emacs, execute elisp, code and exit in
         | 80ms it shouldn't lag too much to insert a character.
         | 
         | Now, I know, some modes can be slow. But Emacs in recent
         | version as seen _lots_ of changes enhancing its performances.
         | And hardware have become really very fast.
         | 
         | YMMV but my Emacs is really incredibly responsive.
         | 
         | P.S: I was already using Emacs on a 486 back in the "Eight
         | Megabytes And Constantly Swapping" days. The joke was fun and
         | true. Nowadays Emacs is actually one of the lightest sofware I
         | use.
        
         | medo-bear wrote:
         | As an emacs user I find it offensive that he tried other
         | editors
        
       | theusus wrote:
       | Rope has been the hype since years and now author is claim that
       | it was piece table all along.
        
       | jph wrote:
       | Ropes have all kinds of bonus benefits. The Xi editor author has
       | some good notes about text editor rope coding:
       | https://abishov.com/xi-editor/docs/rope_science_01.html
        
         | chucksmash wrote:
         | Project site linked from the GitHub[0] is https://xi-editor.io.
         | Linked doc is a mirror of this[1], which was originally written
         | by Raph Linus.
         | 
         | [0]: https://github.com/xi-editor/xi-editor
         | 
         | [1]: https://xi-editor.io/docs/rope_science_01.html
        
       | Rusky wrote:
       | You can get the best of piece tables and ropes using a B-tree-
       | based rope- at small sizes, you get a single node that works like
       | a sorted piece table, but you can also scale up to larger
       | documents without the high constant factors of a binary tree.
        
         | josephg wrote:
         | You can also get the best of btrees and gap buffers by putting
         | a gap buffer in each leaf node in the tree. That allows the
         | leaves to be much bigger (400 bytes seems to work well in
         | practice) - and that helps keep the tree itself small and thus
         | in the cpu cache.
         | 
         | As the article says, it's quite a complex data structure
         | though.
        
       | guidoism wrote:
       | Honestly an array isn't really the worst... for smallish files on
       | modern processors. Sequential memcpy is pretty dang fast and
       | pointer indirection can be slow.
       | 
       | Just sayin'
        
         | naniwaduni wrote:
         | This sort of data structure evaluation mostly starts to make
         | sense when you approach interactive large-file editing as a
         | soft-realtime problem.
        
         | 082349872349872 wrote:
         | and if an array doesn't work, a gap buffer isn't much
         | additional complexity.
         | 
         | all of my C homebrew editors have been gap buffer based, and I
         | told myself I'd reimplement if I ever wanted to edit something
         | large enough for that to be a problem -- but it never was.
        
         | eludwig wrote:
         | Worked on a homegrown Mac wsywyg editor back in the 90s. Arrays
         | worked perfectly. If you are assuming that files fit in memory,
         | using BlockMove() was very, very fast indeed.
         | 
         | I can see if you need to edit multi-gig log files and things
         | will not fit in memory, but for small files, array is totally
         | fine.
         | 
         | There were other tricks that were done back then to keep the
         | number of single char inserts down to a minimum while typing.
         | Like reading chars into a small buffer during fast typing and
         | then inserting all the keystrokes at once as soon as you had
         | the time.
        
         | whazor wrote:
         | Yes. When typing normally there is no slowness because you're
         | swapping one string for another.
         | 
         | But when pressing enter, pasting a bigger snippet, or deleting
         | lines the slowdown is less noticeable. When you type a word,
         | the deltas are tiny and it is easy to process for your brain.
         | While the changes in multiple lines happen less often, so
         | waiting a couple of frames doesn't bother you much.
         | 
         | This is at millions of lines. With small files it is indeed too
         | fast.
        
         | billforsternz wrote:
         | I was amused that this approach was dismissed without analysis
         | as impractical. Sure it was impractical in living memory when
         | CPUs and memory were many many orders of magnitude slower.
        
       | Nilithus wrote:
       | VSCode have a blog article talking out their move to using Piece
       | Table as their main data structure.
       | https://code.visualstudio.com/blogs/2018/03/23/text-buffer-r...
       | 
       | Has some good visualizations as well.
        
       | afc wrote:
       | In my text editor (https://github.com/alefore/edge) I use a
       | balanced binary tree containing small chunks (std::vector) of
       | contiguous lines.
       | 
       | That works well enough for me: https://asciinema.org/a/314752
       | 
       | This requires loading the entire file into memory, but computers
       | are so fast today that optimizing that away is pointless IMO (as
       | the video shows, a 12MB file with 400k lines can be
       | loaded/edited/saved reasonably).
       | 
       | The tree structure is implemented as a generic container here:
       | https://github.com/alefore/edge/blob/master/src/language/con....
       | The tree is actually deeply immutable; each modification
       | operation returns a new tree (where most contents are shared with
       | the original tree). The leafs contain std::vectors that hold
       | between 128 and 256 lines (save for a few cases). The actual
       | specialization that holds a sequence of lines is
       | LineSequence::Lines here:
       | https://github.com/alefore/edge/blob/master/src/language/tex...
        
         | vidarh wrote:
         | I just use a (Ruby) array of strings for mine, on the basis
         | that I have never on purpose wanted to actually _edit_ a file
         | large enough that this is a problem. Computers keeps getting
         | faster, but files I want to edit really don 't.
         | 
         | If I open a huge one it's usually either an accident or because
         | I want to view it, not edit it. It's easy enough to support
         | abnormally large files, but it felt to me like a distraction
         | that adds complexity for the sake of a fringe case I was
         | perfectly happy to just define as out of scope.
         | 
         | Incidentally, the server process for my editor holds every
         | buffer that I've ever opened (over several years) and not
         | explicitly killed in memory, and it's consuming too little for
         | me to bother adding any garbage collection for buffers or
         | similar (it's synced to disk, so on reboot I have the full same
         | set of buffers available). We're talking many hundred buffers.
         | RAM is cheap and plentiful.
        
           | cdcarter wrote:
           | You've worked primarily in your own home built editor for
           | years? That's pretty cool.
        
             | afc wrote:
             | I'm not the person you directly replied to, but I've worked
             | on my text editor since 2014 and used it exclusively since
             | ~2015. It's been a lot of fun!
        
         | atombender wrote:
         | How does that perform when the entire 12MB file is a single
         | line?
         | 
         | Before you say it's unrealistic; sure, it's not a normal use
         | case for code, but I frequently find myself wanting to open a
         | huge JSON file, for example, just to perform a search/replace,
         | or a minor edit, or copy some fragment, or just look at its
         | contents. Pretty-printing first, or doing the change with a
         | specialized tool like jq, is also something I do, but being
         | able to just load the file into a text editor is often more
         | convenient.
         | 
         | In my experience, editors tend to break down on edge cases like
         | these. Editors (e.g. Jetbrains) often turn off syntax
         | highlighting and/or edit capabilities on large files, not to
         | mention put an upper limit on the number of cursors.
        
           | afc wrote:
           | Thank you for your response. You ask a good question.
           | 
           | I expect very long lines should work just fine, but I'll give
           | it a try when I have a chance.
           | 
           | I represent lines using a virtual class that has a few
           | implementations. IIRC, the line will start as a tree
           | containing 64kb chunks. As edits are applied, these chunks
           | will gradually get replaced with other implementations (of
           | the virtual class) that delegate to the original instances
           | (and the representation is occasionally optimized (which can
           | incur copying) to avoid too much nesting). So simplifying, I
           | think this should just work due to the optimized
           | representation that I use for lines, which doesn't require
           | the characters to be contiguous in memory.
           | 
           | I fully agree with your observation about how things often
           | break in the corner cases. I actually also put a configurable
           | cap on the number of cursors (e.g., a reg exp search stops
           | after 100 matches).
           | 
           | For syntax highlighting I don't currently have a cap, but
           | this is ~fine: it just ~wastes a background thread (all
           | syntax parsing is offloaded to a background thread, not
           | blocking the main thread). But yeah, I should probably make
           | this thread give up after a configurable time out.
        
         | layer8 wrote:
         | 12 MB isn't necessarily that much. Here is a 96 MB SVG file:
         | https://commons.m.wikimedia.org/wiki/File:Koppen-Geiger_Map_...
         | 
         | Here are genome sequence files that are several hundred MB
         | compressed (the decompressed format is text):
         | https://ftp.ncbi.nlm.nih.gov/genomes/refseq/
         | 
         | Finally here is an almost 2 TB XML file:
         | https://wiki.openstreetmap.org/wiki/Planet.osm
         | 
         | These aren't the most typical use cases, but it's nice if an
         | editor can at least handle > 32-bit files (larger than 2 or 4
         | GB).
        
       | ayhanfuat wrote:
       | Previous discussion from 2017:
       | https://news.ycombinator.com/item?id=15381886 (121 comments)
        
       | mcarmichael wrote:
       | A nice companion piece, focused on a related probem:
       | 
       | "An Efficient Data Structure For A Hex Editor"
       | https://www.chiark.greenend.org.uk/~sgtatham/tweak/btree.htm...
        
       | behnamoh wrote:
       | But what is a good structure for modern text editors in 2024?
       | Nowadays text editors need to keep track of git commits (by whom,
       | when, which line(s)), reference and footnotes (the author
       | mentions Markdown's manual [^...] notation but ideally you'd want
       | something automatic--even latex isn't automatic by default), etc.
        
       ___________________________________________________________________
       (page generated 2023-12-26 23:00 UTC)