[HN Gopher] Zed Editor: All Occurrences Search from 1s to 4ms
       ___________________________________________________________________
        
       Zed Editor: All Occurrences Search from 1s to 4ms
        
       Author : drakerossman
       Score  : 113 points
       Date   : 2024-02-18 10:16 UTC (12 hours ago)
        
 (HTM) web link (registerspill.thorstenball.com)
 (TXT) w3m dump (registerspill.thorstenball.com)
        
       | tiffanyh wrote:
       | Here's the commit:
       | 
       | https://github.com/zed-industries/zed/pull/6700
        
       | adamtaylor_13 wrote:
       | This blog post about how they built Zed from the learnings of
       | Atom was fascinating to me. They mention this upgrade in the
       | interview: https://zed.dev/blog/we-have-to-start-over
        
         | CharlesW wrote:
         | FYI this was discussed yesterday:
         | https://news.ycombinator.com/item?id=39408288
        
       | avinassh wrote:
       | This is a fun post!
       | 
       | I installed Zed recently and I indeed find it fast. I was able to
       | quickly open lots of large files / projects.
       | 
       | I am trying to hack on SQLite and unfortunately, I couldn't
       | figure out how to setup the project (which comes with a Makefile)
       | with Zed for code completion and go to definition. I couldn't
       | find any help in the documentation either. VSCode was
       | automatically able do it once I installed the C plugin. I will
       | keep an eye on the project and probably try again after a few
       | months.
        
         | elbear wrote:
         | I had a similar experience while trying to use Zed with Python.
         | I expected there to be a setup or at least to be told what to
         | install like VSCode does. Since neither of those things
         | happened I left it for now and returned to vim.
        
       | canistel wrote:
       | Don't have Mac. Won't buy Mac. Please, Linux.
        
         | nitinreddy88 wrote:
         | https://lapce.dev/
         | 
         | This is Zed replacement for me. Cross platform. Same
         | performance as Zed. Written in Rust for all the benefits
        
           | canistel wrote:
           | I have to say I have used Lapce and was quite impressed. A
           | little rough around the edges, but it is still Alpha (if I am
           | not wrong).
        
           | itishappy wrote:
           | How's it perform at the task at hand: finding all instances
           | of `<span` in Poe's poetry?
           | 
           | https://github.com/standardebooks/edgar-allan-
           | poe_poetry/blo...
        
         | zokier wrote:
         | Do we really need to have this discussion in every single Zed
         | thread? Support for other platforms is planned and some work
         | towards that is already happening.
        
           | azeemba wrote:
           | I mean Zed is clearly making a marketing push recently.
           | 
           | With that attention comes many new people finding it but also
           | discovering that they can't use it.
        
             | jsheard wrote:
             | It doesn't help that their frontpage sales pitch never
             | mentions that it's currently Mac-only, so it's not until
             | you get invested and go to download it that you realise you
             | can't actually use it.
        
         | jasode wrote:
         | _> Please, Linux._
         | 
         | If you skim through the comments on the github ticket about Zed
         | being ported to Linux, the TLDR is... _they 're currently
         | working on it._ : https://github.com/zed-
         | industries/zed/issues/7015
         | 
         | It's a small set of contributors so expect that it will be a
         | while.
        
       | diimdeep wrote:
       | Anyone tried to build it from sources, I bet what would kill this
       | editor in the end is slow rust build time and complexity of
       | language.
        
         | soap- wrote:
         | Why would that kill it? Have you ever tried to compile a
         | browser?
        
           | diimdeep wrote:
           | no browser is fully built with rust, and that is why.
        
         | shakow wrote:
         | > I bet what would kill this editor in the end is slow rust
         | build time
         | 
         | I hope you don't know how long it takes to build
         | Linux/llvm/macOS/gcc/Chrome/Firefox/...
        
         | 149765 wrote:
         | Full debug build on my 7 year old pc takes 1m 33s with all
         | dependencies already downloaded.
        
         | itishappy wrote:
         | VS Code cannot be built from (public) sources at all, and that
         | doesn't seem to have affected it's market share.
         | 
         | > Building VS Code application is proprietary and hence this is
         | not shared, but you can always build your own open source
         | version of VS Code by cloning it.
         | 
         | https://github.com/Microsoft/vscode/issues/18083
        
       | jonstewart wrote:
       | I think the conclusion, about Rust having zero-cost abstractions,
       | is completely wrong. The before code and the after code are
       | functionally equivalent. Find all the matches, do some checking,
       | select all of them -- if Rust's abstractions were zero-cost, the
       | performance differences between the versions would be negligible.
       | 
       | Obviously, it's a straw man argument for me to make about the
       | zero-cost abstraction claim. But the point here, about which the
       | blog is curiously silent, is that contemporary CPUs are fickle
       | beasts, and they can run 250X faster, without resorting to exotic
       | techniques, if you structure your code just so. This is the
       | complete opposite of the zero-cost abstraction claim, which I
       | don't mean to be pedantic about, but is worth stating bald-faced
       | because it's what performance engineers have learned, that the
       | answer is always "it depends."
        
       | karolist wrote:
       | I don't like my editors being fast and pushing burden of
       | productivity on me. I need something to blame with regards to my
       | output, so no thanks!
        
       | jsnell wrote:
       | I'm kind of confused by this post. It talks about how this is an
       | example of zero-cost abstractions, while actually showing the
       | opposite.
       | 
       | The original implementation was grotesquely inefficient, it's a
       | 172kB file! To fix that, they ended up manually inlining and
       | duplicating 70 lines of code. And the optimized version is still
       | 50x slower than grep --color would be.
       | 
       |  _Why_ was the original version so slow? Can 't be due to the
       | searching, given they were _4 orders of magnitude off_ from where
       | they should be. It can 't be due to this kind of batch oriented
       | programming having more efficient memory access patterns. The
       | data set is so tiny that it's all going to fit in caches.
       | Something in the editor's internal data structures working better
       | when the stages are not interleaved? That would be my first
       | guess. Maybe making changes to the editor data structures
       | invalidated the search state, forced each search step to restart
       | from the start of the buffer, and you ended up with an
       | accidentally quadratic algorithm?
       | 
       | There's clearly a real performance problem in their system that
       | was papered over by copy-pasting some code, but that they'll
       | inevitably keep hitting over and over as people write the obvious
       | code, it seems to work fine, but then has bad performance in
       | practice. The right thing to do is to either fix that underlying
       | problem, or to make it as easy to write the well-performing
       | version as the bad one. (E.g. have some kind of scoped "batch
       | context" abstraction, which queues up the edits and then applies
       | them in a batch when leaving the scope.)
       | 
       | Anyway, all of this makes the conclusion quite odd. The real
       | moral of the story should be that CPUs are really fast, not that
       | Rust is magical and will change the way one thinks.
        
         | jonstewart wrote:
         | First, I agree with your overall point, that this post
         | illustrates the opposite of what it claims.
         | 
         | However, the discrepancy can be due to cache misses (and likely
         | is, in part). One thing people get wrong about caches is
         | looking at some smallish data set and thinking, "oh that fits
         | in cache."
         | 
         | However, cache is divided into small 64-byte chunks ("lines").
         | And there are typically three levels of cache, each using
         | lines. Hardware may detect sequential access and start pulling
         | in next lines for you in advance, but it depends a lot on
         | access patterns.
         | 
         | With the original loop, the access patterns are such that
         | there's no guarantee the next bit of data will be in memory.
         | It's proceeding sequentially, perhaps, but through several
         | different areas of memory. I'd love to instrument the code to
         | compare dcache miss rates, and L1 vs L2 vs L3, though not
         | enough to deal with building zed myself. My guess is that the
         | before version has a worse hit rate than the after version.
         | 
         | What's more, there's also the instruction cache. The original
         | version pulls a lot of code into that loop. My guess is that
         | the icache hit rate is also worse in the original. That's
         | deadly -- the CPU does nothing. And it might stall enough that
         | the OS context switches (after all, the times reported in the
         | blog post are probably wall clock time; reducing idle enough so
         | that the OS doesn't context switch goes a long ways towards
         | explaining a 250X speed up).
         | 
         | Finally, with the batched loops, there's not that much code in
         | each one and they're classic for-loops. The compiler can apply
         | unrolling; even if it doesn't, the CPU reorder
         | buffer/dispatcher basically will. My bet is that the after
         | version has a significantly higher instructions per cycle (IPC)
         | score, ie better instruction-level parallelism.
         | 
         | I'm agnostic about your conclusion, that the main data
         | structures in place in zed doom it to similar performance
         | problems. Not sure? It may have reasonable data structures. But
         | as you point out with the grep comparison, the competition when
         | it comes to these operations are tools and techniques that have
         | been honed to an unreasonable degree over decades, to the point
         | where their codebases and CPU architectures have co-evolved.
         | It's a steep hill to climb.
        
           | jsnell wrote:
           | It really can't be just cache effects. You might not be
           | appreciating just how slow that original program was, or how
           | little data this is.
           | 
           | To illustrate, I wrote a Python program[0] that does the
           | search using the naive algorithm, implemented using pure
           | interpreted Python rather than calling out to any optimized
           | builtin or native code libraries, with each character being a
           | separate object, and with each character being stored in a
           | separate 128 element array just to waste more cache. So
           | really absurdly inefficient.
           | 
           | It executes the benchmark search on the sample file in 2ms on
           | a Ryzen 2700.
           | 
           | Sure, cache effects would have contributed a bit to the
           | difference between the original and optimized version, but
           | the contribution just has to be minor.
           | 
           | [0] https://www.snellman.net/tmp/39417829.py
        
             | jonstewart wrote:
             | As context, I've implemented a specialized grep engine in
             | C++, with decent performance (not the best, but better than
             | classic PCRE). So, not saying you're wrong, simply that, I
             | have some background/intuition about searching, too.
             | 
             | Why is the original Rust code slow, though? You say it's
             | bad code -- what's bad? Why are the data structures bad? I
             | haven't looked further into zed than this blog post so
             | curious whether you have more insight.
             | 
             | With your Python script, it's a straightforward state
             | machine and just counts the number of matches. Now, Python
             | is slow. But this should be fairly simple, small pcode all
             | the same and doesn't compare to all that the original code
             | is doing (ie manipulating the offsets and making
             | selections).
             | 
             | Do you insight into what their selection code is doing? Is
             | that drawing at the same time? Or somehow has complicated
             | nonlinear access times? I'm just learning Rust and don't
             | yet have good intuition for its pitfalls.
             | 
             | I find it curious that they didn't present any info from a
             | profiler. I suspect it's not their search code that's slow,
             | but, again, not clear about Rust or surrounding
             | code/context of zed and its data structures. Appreciate
             | your comments.
        
               | jsnell wrote:
               | First, just to be clear, I didn't call their code or data
               | structures bad. The only context where I used that word
               | was "make it as easy to write the well-performing version
               | as the bad one", which was talking about the performance,
               | not the code.
               | 
               | I know nothing about their code or data structures beyond
               | what was in this blog post. I'm just working from the
               | numbers, which don't add up.
               | 
               | In the interleaved version they're spending about 430
               | microseconds for each of the editor operations. In the
               | batch version they're spending <2 microseconds. As per
               | the blog post, the order of operations is the same. (The
               | bit about removing overlapping search results is
               | irrelevant, their search term can't self-overlap, so
               | doing that portion in batch doesn't seem relevant.)
               | 
               | So, given the order of operations is the same, how much
               | cache pressure is there between two calls to the editor
               | data structures in the? The file is tiny and the results
               | are dense. There's a span element on average evey 74
               | bytes. There's basically no pressure. And still the
               | interleaved version is taking 430 microseconds longer.
               | Given the minimal cache pressure between the calls, it is
               | just impossible for that to be due to cache misses. It'd
               | basically require that having accessed a couple of
               | cachelines worth of (easily prefetchable) data would have
               | the side effect of turning thousands of memory accesses
               | from L1 hits to L3 misses in the next function call.
               | Seems pretty implausible.
               | 
               | Why do you think that this is a more likely explanation
               | than second order effects from the interleaving? (It
               | doesn't need to be my initial guess of edits resetting
               | the search state, there's plenty of other similar issues
               | where a leaky abstraction causes minor changes in the
               | order of operations to have outsized performance impact.
               | For example, maybe they don't commit edits immediately,
               | but only on the next read operation.)
        
             | kibibu wrote:
             | To me, the order of magnitude suggests that the select
             | function used in the first version did UI updates for each
             | occurence, which the new version deferred and did only
             | once.
        
         | KTibow wrote:
         | It's worth noting that this isn't a find command, it's a find
         | and select command
        
       | nightowl_games wrote:
       | Would love to know more about why the old version was slow. Would
       | love to see profiler output.
        
       | jasonjmcghee wrote:
       | My understanding of zero-cost abstractions is that rust doesn't
       | have the runtime overhead for things like vtable lookups and
       | calling a function during execution - that many other languages
       | do.
       | 
       | Isn't that concept completely unrelated to this?
       | 
       | Unless I'm misunderstanding the issue and fix, this is
       | performance improvements due to batched operations.
       | 
       | Search for all occurrences, perform necessary processing.
       | 
       | Vs.
       | 
       | if not complete, find next occurrence, perform processing,
       | repeat.
       | 
       | What am I missing?
        
         | noam_k wrote:
         | I didn't delve too deeply into the code in the post, but no, I
         | wouldn't say that this is example of zero-cost abstractions at
         | all.
         | 
         | Zero cost can come in two forms (that come to my mind right
         | now, feel free to comment with more): runtime and memory. You
         | mentioned dynamic dispatch (when calling a vtable method) that
         | _can_ be zero cost when using generics, the compiler will
         | "inline" the type and know what method is being called. Note
         | that there is a compile-time trade-off here, and the binary
         | size would be larger than using `&dyn` (a pointer to a vtable).
         | 
         | To illustrate memory zero-cost, consider an optional boolean
         | argument to a program. A boolean has two valid states, but used
         | 8 bits of memory, defining `Option<bool>` would naively use one
         | byte for the discriminant, and another for the value. Rust uses
         | the niche optimization[0] to promise that this type will only
         | use one byte of memory.
         | 
         | [0]https://google.github.io/comprehensive-rust/smart-
         | pointers/b...
        
       | nikeee wrote:
       | Is there a plan to adopt more of the VSCode ecosystem like VSC
       | plugins? This would tremendously drive adoption.
        
       | nialv7 wrote:
       | I feel they only explained the "what" but not the "why". i.e.,
       | you do this and this and suddenly your code is 250x faster. But I
       | want to know _why_ this is faster, or _why_ the old version is
       | slow, which is not explained.
        
       | diimdeep wrote:
       | I know it is a joke [1], but they also built it in rust and open
       | sourced it.
       | 
       | [1] Interview with Misled Core Developer 2024
       | https://youtu.be/A_iyxlB3-Iw
        
       ___________________________________________________________________
       (page generated 2024-02-18 23:01 UTC)