[HN Gopher] Text Editor: Data Structures
___________________________________________________________________
Text Editor: Data Structures
Author : ibobev
Score : 117 points
Date : 2023-12-26 15:48 UTC (7 hours ago)
(HTM) web link (www.averylaird.com)
(TXT) w3m dump (www.averylaird.com)
| Hammershaft wrote:
| Emacs is the fastest editor the author has worked with!!??
| fleur-de-lotus wrote:
| He never tried anything else. For speed, vi is number one,
| followed by sublimetext.
| TwentyPosts wrote:
| Even the suggestion that vim is fast is funny to me.
|
| Vim feels sluggish compared to NeoVim.
|
| NeoVim feels sluggish compared to Helix.
| behnamoh wrote:
| > NeoVim feels sluggish compared to Helix.
|
| Of course, if you offer only a portion of NeoVim's features
| the speed improvement should not be surprising.
| TwentyPosts wrote:
| At the same time, NeoVim lacks many, many of Helix'
| features unless you use plugins. That's fine, of course,
| plugins and customization are part of what makes NeoVim
| so great.
|
| Anyway, can you name a few NeoVim features you miss in
| Helix? I still use NeoVim now and then, so I'd be happy
| to learn more, even if it's something small.
| tom_ wrote:
| There are other editors?!
| TacticalCoder wrote:
| To be fair on a modern setup it _is_ fast. Sure it 's not nano
| or vi(m) but compared to Electron-based stuff, it's plenty
| fast.
|
| Five years ago: time emacs -Q -eval '(kill-
| emacs)'
|
| would be slow at 320 ms.
|
| But on a "recent" AMD 7700X that'd be 80ms.
|
| That's starting emacs, evaluating elisp code, and exiting
| Emacs.
|
| Surely if you can start Emacs, execute elisp, code and exit in
| 80ms it shouldn't lag too much to insert a character.
|
| Now, I know, some modes can be slow. But Emacs in recent
| version as seen _lots_ of changes enhancing its performances.
| And hardware have become really very fast.
|
| YMMV but my Emacs is really incredibly responsive.
|
| P.S: I was already using Emacs on a 486 back in the "Eight
| Megabytes And Constantly Swapping" days. The joke was fun and
| true. Nowadays Emacs is actually one of the lightest sofware I
| use.
| medo-bear wrote:
| As an emacs user I find it offensive that he tried other
| editors
| theusus wrote:
| Rope has been the hype since years and now author is claim that
| it was piece table all along.
| jph wrote:
| Ropes have all kinds of bonus benefits. The Xi editor author has
| some good notes about text editor rope coding:
| https://abishov.com/xi-editor/docs/rope_science_01.html
| chucksmash wrote:
| Project site linked from the GitHub[0] is https://xi-editor.io.
| Linked doc is a mirror of this[1], which was originally written
| by Raph Linus.
|
| [0]: https://github.com/xi-editor/xi-editor
|
| [1]: https://xi-editor.io/docs/rope_science_01.html
| Rusky wrote:
| You can get the best of piece tables and ropes using a B-tree-
| based rope- at small sizes, you get a single node that works like
| a sorted piece table, but you can also scale up to larger
| documents without the high constant factors of a binary tree.
| josephg wrote:
| You can also get the best of btrees and gap buffers by putting
| a gap buffer in each leaf node in the tree. That allows the
| leaves to be much bigger (400 bytes seems to work well in
| practice) - and that helps keep the tree itself small and thus
| in the cpu cache.
|
| As the article says, it's quite a complex data structure
| though.
| guidoism wrote:
| Honestly an array isn't really the worst... for smallish files on
| modern processors. Sequential memcpy is pretty dang fast and
| pointer indirection can be slow.
|
| Just sayin'
| naniwaduni wrote:
| This sort of data structure evaluation mostly starts to make
| sense when you approach interactive large-file editing as a
| soft-realtime problem.
| 082349872349872 wrote:
| and if an array doesn't work, a gap buffer isn't much
| additional complexity.
|
| all of my C homebrew editors have been gap buffer based, and I
| told myself I'd reimplement if I ever wanted to edit something
| large enough for that to be a problem -- but it never was.
| eludwig wrote:
| Worked on a homegrown Mac wsywyg editor back in the 90s. Arrays
| worked perfectly. If you are assuming that files fit in memory,
| using BlockMove() was very, very fast indeed.
|
| I can see if you need to edit multi-gig log files and things
| will not fit in memory, but for small files, array is totally
| fine.
|
| There were other tricks that were done back then to keep the
| number of single char inserts down to a minimum while typing.
| Like reading chars into a small buffer during fast typing and
| then inserting all the keystrokes at once as soon as you had
| the time.
| whazor wrote:
| Yes. When typing normally there is no slowness because you're
| swapping one string for another.
|
| But when pressing enter, pasting a bigger snippet, or deleting
| lines the slowdown is less noticeable. When you type a word,
| the deltas are tiny and it is easy to process for your brain.
| While the changes in multiple lines happen less often, so
| waiting a couple of frames doesn't bother you much.
|
| This is at millions of lines. With small files it is indeed too
| fast.
| billforsternz wrote:
| I was amused that this approach was dismissed without analysis
| as impractical. Sure it was impractical in living memory when
| CPUs and memory were many many orders of magnitude slower.
| Nilithus wrote:
| VSCode have a blog article talking out their move to using Piece
| Table as their main data structure.
| https://code.visualstudio.com/blogs/2018/03/23/text-buffer-r...
|
| Has some good visualizations as well.
| afc wrote:
| In my text editor (https://github.com/alefore/edge) I use a
| balanced binary tree containing small chunks (std::vector) of
| contiguous lines.
|
| That works well enough for me: https://asciinema.org/a/314752
|
| This requires loading the entire file into memory, but computers
| are so fast today that optimizing that away is pointless IMO (as
| the video shows, a 12MB file with 400k lines can be
| loaded/edited/saved reasonably).
|
| The tree structure is implemented as a generic container here:
| https://github.com/alefore/edge/blob/master/src/language/con....
| The tree is actually deeply immutable; each modification
| operation returns a new tree (where most contents are shared with
| the original tree). The leafs contain std::vectors that hold
| between 128 and 256 lines (save for a few cases). The actual
| specialization that holds a sequence of lines is
| LineSequence::Lines here:
| https://github.com/alefore/edge/blob/master/src/language/tex...
| vidarh wrote:
| I just use a (Ruby) array of strings for mine, on the basis
| that I have never on purpose wanted to actually _edit_ a file
| large enough that this is a problem. Computers keeps getting
| faster, but files I want to edit really don 't.
|
| If I open a huge one it's usually either an accident or because
| I want to view it, not edit it. It's easy enough to support
| abnormally large files, but it felt to me like a distraction
| that adds complexity for the sake of a fringe case I was
| perfectly happy to just define as out of scope.
|
| Incidentally, the server process for my editor holds every
| buffer that I've ever opened (over several years) and not
| explicitly killed in memory, and it's consuming too little for
| me to bother adding any garbage collection for buffers or
| similar (it's synced to disk, so on reboot I have the full same
| set of buffers available). We're talking many hundred buffers.
| RAM is cheap and plentiful.
| cdcarter wrote:
| You've worked primarily in your own home built editor for
| years? That's pretty cool.
| afc wrote:
| I'm not the person you directly replied to, but I've worked
| on my text editor since 2014 and used it exclusively since
| ~2015. It's been a lot of fun!
| atombender wrote:
| How does that perform when the entire 12MB file is a single
| line?
|
| Before you say it's unrealistic; sure, it's not a normal use
| case for code, but I frequently find myself wanting to open a
| huge JSON file, for example, just to perform a search/replace,
| or a minor edit, or copy some fragment, or just look at its
| contents. Pretty-printing first, or doing the change with a
| specialized tool like jq, is also something I do, but being
| able to just load the file into a text editor is often more
| convenient.
|
| In my experience, editors tend to break down on edge cases like
| these. Editors (e.g. Jetbrains) often turn off syntax
| highlighting and/or edit capabilities on large files, not to
| mention put an upper limit on the number of cursors.
| afc wrote:
| Thank you for your response. You ask a good question.
|
| I expect very long lines should work just fine, but I'll give
| it a try when I have a chance.
|
| I represent lines using a virtual class that has a few
| implementations. IIRC, the line will start as a tree
| containing 64kb chunks. As edits are applied, these chunks
| will gradually get replaced with other implementations (of
| the virtual class) that delegate to the original instances
| (and the representation is occasionally optimized (which can
| incur copying) to avoid too much nesting). So simplifying, I
| think this should just work due to the optimized
| representation that I use for lines, which doesn't require
| the characters to be contiguous in memory.
|
| I fully agree with your observation about how things often
| break in the corner cases. I actually also put a configurable
| cap on the number of cursors (e.g., a reg exp search stops
| after 100 matches).
|
| For syntax highlighting I don't currently have a cap, but
| this is ~fine: it just ~wastes a background thread (all
| syntax parsing is offloaded to a background thread, not
| blocking the main thread). But yeah, I should probably make
| this thread give up after a configurable time out.
| layer8 wrote:
| 12 MB isn't necessarily that much. Here is a 96 MB SVG file:
| https://commons.m.wikimedia.org/wiki/File:Koppen-Geiger_Map_...
|
| Here are genome sequence files that are several hundred MB
| compressed (the decompressed format is text):
| https://ftp.ncbi.nlm.nih.gov/genomes/refseq/
|
| Finally here is an almost 2 TB XML file:
| https://wiki.openstreetmap.org/wiki/Planet.osm
|
| These aren't the most typical use cases, but it's nice if an
| editor can at least handle > 32-bit files (larger than 2 or 4
| GB).
| ayhanfuat wrote:
| Previous discussion from 2017:
| https://news.ycombinator.com/item?id=15381886 (121 comments)
| mcarmichael wrote:
| A nice companion piece, focused on a related probem:
|
| "An Efficient Data Structure For A Hex Editor"
| https://www.chiark.greenend.org.uk/~sgtatham/tweak/btree.htm...
| behnamoh wrote:
| But what is a good structure for modern text editors in 2024?
| Nowadays text editors need to keep track of git commits (by whom,
| when, which line(s)), reference and footnotes (the author
| mentions Markdown's manual [^...] notation but ideally you'd want
| something automatic--even latex isn't automatic by default), etc.
___________________________________________________________________
(page generated 2023-12-26 23:00 UTC)