[HN Gopher] Zed Editor: All Occurrences Search from 1s to 4ms
___________________________________________________________________
Zed Editor: All Occurrences Search from 1s to 4ms
Author : drakerossman
Score : 113 points
Date : 2024-02-18 10:16 UTC (12 hours ago)
(HTM) web link (registerspill.thorstenball.com)
(TXT) w3m dump (registerspill.thorstenball.com)
| tiffanyh wrote:
| Here's the commit:
|
| https://github.com/zed-industries/zed/pull/6700
| adamtaylor_13 wrote:
| This blog post about how they built Zed from the learnings of
| Atom was fascinating to me. They mention this upgrade in the
| interview: https://zed.dev/blog/we-have-to-start-over
| CharlesW wrote:
| FYI this was discussed yesterday:
| https://news.ycombinator.com/item?id=39408288
| avinassh wrote:
| This is a fun post!
|
| I installed Zed recently and I indeed find it fast. I was able to
| quickly open lots of large files / projects.
|
| I am trying to hack on SQLite and unfortunately, I couldn't
| figure out how to setup the project (which comes with a Makefile)
| with Zed for code completion and go to definition. I couldn't
| find any help in the documentation either. VSCode was
| automatically able do it once I installed the C plugin. I will
| keep an eye on the project and probably try again after a few
| months.
| elbear wrote:
| I had a similar experience while trying to use Zed with Python.
| I expected there to be a setup or at least to be told what to
| install like VSCode does. Since neither of those things
| happened I left it for now and returned to vim.
| canistel wrote:
| Don't have Mac. Won't buy Mac. Please, Linux.
| nitinreddy88 wrote:
| https://lapce.dev/
|
| This is Zed replacement for me. Cross platform. Same
| performance as Zed. Written in Rust for all the benefits
| canistel wrote:
| I have to say I have used Lapce and was quite impressed. A
| little rough around the edges, but it is still Alpha (if I am
| not wrong).
| itishappy wrote:
| How's it perform at the task at hand: finding all instances
| of `<span` in Poe's poetry?
|
| https://github.com/standardebooks/edgar-allan-
| poe_poetry/blo...
| zokier wrote:
| Do we really need to have this discussion in every single Zed
| thread? Support for other platforms is planned and some work
| towards that is already happening.
| azeemba wrote:
| I mean Zed is clearly making a marketing push recently.
|
| With that attention comes many new people finding it but also
| discovering that they can't use it.
| jsheard wrote:
| It doesn't help that their frontpage sales pitch never
| mentions that it's currently Mac-only, so it's not until
| you get invested and go to download it that you realise you
| can't actually use it.
| jasode wrote:
| _> Please, Linux._
|
| If you skim through the comments on the github ticket about Zed
| being ported to Linux, the TLDR is... _they 're currently
| working on it._ : https://github.com/zed-
| industries/zed/issues/7015
|
| It's a small set of contributors so expect that it will be a
| while.
| diimdeep wrote:
| Anyone tried to build it from sources, I bet what would kill this
| editor in the end is slow rust build time and complexity of
| language.
| soap- wrote:
| Why would that kill it? Have you ever tried to compile a
| browser?
| diimdeep wrote:
| no browser is fully built with rust, and that is why.
| shakow wrote:
| > I bet what would kill this editor in the end is slow rust
| build time
|
| I hope you don't know how long it takes to build
| Linux/llvm/macOS/gcc/Chrome/Firefox/...
| 149765 wrote:
| Full debug build on my 7 year old pc takes 1m 33s with all
| dependencies already downloaded.
| itishappy wrote:
| VS Code cannot be built from (public) sources at all, and that
| doesn't seem to have affected it's market share.
|
| > Building VS Code application is proprietary and hence this is
| not shared, but you can always build your own open source
| version of VS Code by cloning it.
|
| https://github.com/Microsoft/vscode/issues/18083
| jonstewart wrote:
| I think the conclusion, about Rust having zero-cost abstractions,
| is completely wrong. The before code and the after code are
| functionally equivalent. Find all the matches, do some checking,
| select all of them -- if Rust's abstractions were zero-cost, the
| performance differences between the versions would be negligible.
|
| Obviously, it's a straw man argument for me to make about the
| zero-cost abstraction claim. But the point here, about which the
| blog is curiously silent, is that contemporary CPUs are fickle
| beasts, and they can run 250X faster, without resorting to exotic
| techniques, if you structure your code just so. This is the
| complete opposite of the zero-cost abstraction claim, which I
| don't mean to be pedantic about, but is worth stating bald-faced
| because it's what performance engineers have learned, that the
| answer is always "it depends."
| karolist wrote:
| I don't like my editors being fast and pushing burden of
| productivity on me. I need something to blame with regards to my
| output, so no thanks!
| jsnell wrote:
| I'm kind of confused by this post. It talks about how this is an
| example of zero-cost abstractions, while actually showing the
| opposite.
|
| The original implementation was grotesquely inefficient, it's a
| 172kB file! To fix that, they ended up manually inlining and
| duplicating 70 lines of code. And the optimized version is still
| 50x slower than grep --color would be.
|
| _Why_ was the original version so slow? Can 't be due to the
| searching, given they were _4 orders of magnitude off_ from where
| they should be. It can 't be due to this kind of batch oriented
| programming having more efficient memory access patterns. The
| data set is so tiny that it's all going to fit in caches.
| Something in the editor's internal data structures working better
| when the stages are not interleaved? That would be my first
| guess. Maybe making changes to the editor data structures
| invalidated the search state, forced each search step to restart
| from the start of the buffer, and you ended up with an
| accidentally quadratic algorithm?
|
| There's clearly a real performance problem in their system that
| was papered over by copy-pasting some code, but that they'll
| inevitably keep hitting over and over as people write the obvious
| code, it seems to work fine, but then has bad performance in
| practice. The right thing to do is to either fix that underlying
| problem, or to make it as easy to write the well-performing
| version as the bad one. (E.g. have some kind of scoped "batch
| context" abstraction, which queues up the edits and then applies
| them in a batch when leaving the scope.)
|
| Anyway, all of this makes the conclusion quite odd. The real
| moral of the story should be that CPUs are really fast, not that
| Rust is magical and will change the way one thinks.
| jonstewart wrote:
| First, I agree with your overall point, that this post
| illustrates the opposite of what it claims.
|
| However, the discrepancy can be due to cache misses (and likely
| is, in part). One thing people get wrong about caches is
| looking at some smallish data set and thinking, "oh that fits
| in cache."
|
| However, cache is divided into small 64-byte chunks ("lines").
| And there are typically three levels of cache, each using
| lines. Hardware may detect sequential access and start pulling
| in next lines for you in advance, but it depends a lot on
| access patterns.
|
| With the original loop, the access patterns are such that
| there's no guarantee the next bit of data will be in memory.
| It's proceeding sequentially, perhaps, but through several
| different areas of memory. I'd love to instrument the code to
| compare dcache miss rates, and L1 vs L2 vs L3, though not
| enough to deal with building zed myself. My guess is that the
| before version has a worse hit rate than the after version.
|
| What's more, there's also the instruction cache. The original
| version pulls a lot of code into that loop. My guess is that
| the icache hit rate is also worse in the original. That's
| deadly -- the CPU does nothing. And it might stall enough that
| the OS context switches (after all, the times reported in the
| blog post are probably wall clock time; reducing idle enough so
| that the OS doesn't context switch goes a long ways towards
| explaining a 250X speed up).
|
| Finally, with the batched loops, there's not that much code in
| each one and they're classic for-loops. The compiler can apply
| unrolling; even if it doesn't, the CPU reorder
| buffer/dispatcher basically will. My bet is that the after
| version has a significantly higher instructions per cycle (IPC)
| score, ie better instruction-level parallelism.
|
| I'm agnostic about your conclusion, that the main data
| structures in place in zed doom it to similar performance
| problems. Not sure? It may have reasonable data structures. But
| as you point out with the grep comparison, the competition when
| it comes to these operations are tools and techniques that have
| been honed to an unreasonable degree over decades, to the point
| where their codebases and CPU architectures have co-evolved.
| It's a steep hill to climb.
| jsnell wrote:
| It really can't be just cache effects. You might not be
| appreciating just how slow that original program was, or how
| little data this is.
|
| To illustrate, I wrote a Python program[0] that does the
| search using the naive algorithm, implemented using pure
| interpreted Python rather than calling out to any optimized
| builtin or native code libraries, with each character being a
| separate object, and with each character being stored in a
| separate 128 element array just to waste more cache. So
| really absurdly inefficient.
|
| It executes the benchmark search on the sample file in 2ms on
| a Ryzen 2700.
|
| Sure, cache effects would have contributed a bit to the
| difference between the original and optimized version, but
| the contribution just has to be minor.
|
| [0] https://www.snellman.net/tmp/39417829.py
| jonstewart wrote:
| As context, I've implemented a specialized grep engine in
| C++, with decent performance (not the best, but better than
| classic PCRE). So, not saying you're wrong, simply that, I
| have some background/intuition about searching, too.
|
| Why is the original Rust code slow, though? You say it's
| bad code -- what's bad? Why are the data structures bad? I
| haven't looked further into zed than this blog post so
| curious whether you have more insight.
|
| With your Python script, it's a straightforward state
| machine and just counts the number of matches. Now, Python
| is slow. But this should be fairly simple, small pcode all
| the same and doesn't compare to all that the original code
| is doing (ie manipulating the offsets and making
| selections).
|
| Do you insight into what their selection code is doing? Is
| that drawing at the same time? Or somehow has complicated
| nonlinear access times? I'm just learning Rust and don't
| yet have good intuition for its pitfalls.
|
| I find it curious that they didn't present any info from a
| profiler. I suspect it's not their search code that's slow,
| but, again, not clear about Rust or surrounding
| code/context of zed and its data structures. Appreciate
| your comments.
| jsnell wrote:
| First, just to be clear, I didn't call their code or data
| structures bad. The only context where I used that word
| was "make it as easy to write the well-performing version
| as the bad one", which was talking about the performance,
| not the code.
|
| I know nothing about their code or data structures beyond
| what was in this blog post. I'm just working from the
| numbers, which don't add up.
|
| In the interleaved version they're spending about 430
| microseconds for each of the editor operations. In the
| batch version they're spending <2 microseconds. As per
| the blog post, the order of operations is the same. (The
| bit about removing overlapping search results is
| irrelevant, their search term can't self-overlap, so
| doing that portion in batch doesn't seem relevant.)
|
| So, given the order of operations is the same, how much
| cache pressure is there between two calls to the editor
| data structures in the? The file is tiny and the results
| are dense. There's a span element on average evey 74
| bytes. There's basically no pressure. And still the
| interleaved version is taking 430 microseconds longer.
| Given the minimal cache pressure between the calls, it is
| just impossible for that to be due to cache misses. It'd
| basically require that having accessed a couple of
| cachelines worth of (easily prefetchable) data would have
| the side effect of turning thousands of memory accesses
| from L1 hits to L3 misses in the next function call.
| Seems pretty implausible.
|
| Why do you think that this is a more likely explanation
| than second order effects from the interleaving? (It
| doesn't need to be my initial guess of edits resetting
| the search state, there's plenty of other similar issues
| where a leaky abstraction causes minor changes in the
| order of operations to have outsized performance impact.
| For example, maybe they don't commit edits immediately,
| but only on the next read operation.)
| kibibu wrote:
| To me, the order of magnitude suggests that the select
| function used in the first version did UI updates for each
| occurence, which the new version deferred and did only
| once.
| KTibow wrote:
| It's worth noting that this isn't a find command, it's a find
| and select command
| nightowl_games wrote:
| Would love to know more about why the old version was slow. Would
| love to see profiler output.
| jasonjmcghee wrote:
| My understanding of zero-cost abstractions is that rust doesn't
| have the runtime overhead for things like vtable lookups and
| calling a function during execution - that many other languages
| do.
|
| Isn't that concept completely unrelated to this?
|
| Unless I'm misunderstanding the issue and fix, this is
| performance improvements due to batched operations.
|
| Search for all occurrences, perform necessary processing.
|
| Vs.
|
| if not complete, find next occurrence, perform processing,
| repeat.
|
| What am I missing?
| noam_k wrote:
| I didn't delve too deeply into the code in the post, but no, I
| wouldn't say that this is example of zero-cost abstractions at
| all.
|
| Zero cost can come in two forms (that come to my mind right
| now, feel free to comment with more): runtime and memory. You
| mentioned dynamic dispatch (when calling a vtable method) that
| _can_ be zero cost when using generics, the compiler will
| "inline" the type and know what method is being called. Note
| that there is a compile-time trade-off here, and the binary
| size would be larger than using `&dyn` (a pointer to a vtable).
|
| To illustrate memory zero-cost, consider an optional boolean
| argument to a program. A boolean has two valid states, but used
| 8 bits of memory, defining `Option<bool>` would naively use one
| byte for the discriminant, and another for the value. Rust uses
| the niche optimization[0] to promise that this type will only
| use one byte of memory.
|
| [0]https://google.github.io/comprehensive-rust/smart-
| pointers/b...
| nikeee wrote:
| Is there a plan to adopt more of the VSCode ecosystem like VSC
| plugins? This would tremendously drive adoption.
| nialv7 wrote:
| I feel they only explained the "what" but not the "why". i.e.,
| you do this and this and suddenly your code is 250x faster. But I
| want to know _why_ this is faster, or _why_ the old version is
| slow, which is not explained.
| diimdeep wrote:
| I know it is a joke [1], but they also built it in rust and open
| sourced it.
|
| [1] Interview with Misled Core Developer 2024
| https://youtu.be/A_iyxlB3-Iw
___________________________________________________________________
(page generated 2024-02-18 23:01 UTC)