[HN Gopher] Static search trees: faster than binary search
       ___________________________________________________________________
        
       Static search trees: faster than binary search
        
       Author : atombender
       Score  : 547 points
       Date   : 2025-01-01 00:08 UTC (22 hours ago)
        
 (HTM) web link (curiouscoding.nl)
 (TXT) w3m dump (curiouscoding.nl)
        
       | westurner wrote:
       | suffix-array-searching/static-search-tree/src/s_tree.rs:
       | https://github.com/RagnarGrootKoerkamp/suffix-array-searchin...
       | partitioned_s_tree.rs:
       | https://github.com/RagnarGrootKoerkamp/suffix-array-searchin...
        
       | ryao wrote:
       | This would have been easier for a larger number of people to read
       | if it had been in C or Python.
        
         | kubb wrote:
         | Maybe, but should the author really cater to people who can't
         | overcome a language barrier? They probably won't understand the
         | concept anyway.
        
           | ryao wrote:
           | The ACM for a time only accepted code examples in ALGOL
           | because it was felt that there should be a standard language
           | that everyone understood. ALGOL eventually gave way to C and
           | others, but there is definite value in ensuring things are
           | accessible to the largest possible audience.
        
             | dxbydt wrote:
             | usaco explicitly nudges you towards C++ on the first page
             | itself. problems are graded on running time, even with
             | moderately large datasets, Python simply doesn't make the
             | cut. So most high schoolers end up using C++ for usaco, and
             | Java for AP Comp Science.
        
             | kubb wrote:
             | You're free to use ALGOL in your posts if you want. It
             | reads like pseudocode used in algorithmic books, and should
             | be familiar to anyone who used the Pascal family.
             | 
             | ALGOL gave way to C, and C is now giving way to Rust, Zig
             | and others. You're playing the role of the people who
             | wanted to keep the example snippets in ALGOL.
        
               | ryao wrote:
               | The point was that examples should be in widely
               | understood languages. The niche languages you list do not
               | qualify as widely understood.
               | 
               | Estne melius scribere exempla latine?
        
             | mplanchard wrote:
             | This is someone's blog, not an ACM publication. Why hold it
             | to the same standard? They can write in whatever they like.
             | I have a hard time reading Haskell, but I don't sit and
             | kvetch about it anytime someone wants to use it in a blog
             | post.
        
           | mangamadaiyan wrote:
           | Erm, the concepts that you're referring to are independent of
           | the language being used. The author is free to use whatever
           | language they want. The GP (I think) was merely trying to
           | make a point about making the article accessible to a wider
           | audience that is capable of appreciating the concepts
           | involved.
        
             | kubb wrote:
             | If you can't understand the code snippets in this post, the
             | concepts will most likely be lost on you, no matter what
             | other language would be used for the code.
        
               | mangamadaiyan wrote:
               | Isn't that precisely the point that the GP was trying to
               | make, about the accessibility of the article?
               | 
               | Edit: Eytzinger layout, Cache lines, SIMD, et cetera are
               | independent of the particular language used in the
               | article. They're just as valid in C, for example. I don't
               | understand your point about them being tied to Rust.
        
             | ryao wrote:
             | You are correct.
        
         | Rendello wrote:
         | I wonder what percentage of Rust users have significant
         | experience with C. The difficulty curve of Rust was too high
         | for me when I was coming from Python, but after doing HolyC and
         | Zig, I find it fairly straightforward to conceptualize, though
         | still difficult.
         | 
         | I've been working with real C today and have been having fun,
         | but it's very different for me to step back into the C world
         | again after working with more modern tools.
        
         | barbegal wrote:
         | Neither of those languages gives you portable SIMD. Rust is
         | rapidly becoming the language of choice for high performance
         | code.
        
           | npalli wrote:
           | google has a mature C++ library for portable SIMD. The
           | original article seems to be a translation of the excellent
           | algorithmica site which had it in C++.
           | 
           | https://github.com/google/highway
        
             | ryao wrote:
             | You can do portable SIMD in GNU C/C++ using the vector
             | extension:
             | 
             | https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html
        
               | swiftcoder wrote:
               | I'm not sure that "portable" and "only works on one
               | specific compiler vendor" are very compatible concepts
        
               | bigpingo wrote:
               | gcc is available on and targets more platforms than rust.
               | So if GNU C is not considered portable you can forget
               | about 'portable rust'.
        
               | swiftcoder wrote:
               | Portability in the C and C++ worlds generally means
               | across standard-conforming compilers, not just across
               | platforms
        
         | wolfgangK wrote:
         | I don't think that Python would be the right language for such
         | low-level performance maxxing endeavor. I would have picked C++
         | but t was eye opening for me to see how rust enabled such low
         | level optimization, so I'm grateful for the choice.
        
           | curiouscoding wrote:
           | Yeah, I don't think python is the right tool here. C++
           | definitely would be an option though.
           | 
           | Anyway very happy that this is also showing off what rust can
           | do
        
             | ryao wrote:
             | As far as I can tell, the community is largely divided into
             | three major groups. Those that can read C, those that can
             | read Python and those that can read both. Using either of
             | them would have dodged criticism that your code examples
             | are not accessible to much of the community.
             | 
             | That said, you are right that Python is not the best
             | language to use when you need to use intrinsics, as you
             | would be writing it in another language and using the
             | Python FFI to access it.
        
             | oguz-ismail wrote:
             | > this is also showing off what rust can do
             | 
             | To people who already know Rust, yes. To others, not so
             | much
        
           | ryao wrote:
           | C would be useful to the broadest audience. C++ programmers
           | can read C while C programmers cannot always read C++,
           | especially when the newer language constructs are used. I
           | mentioned Python because of its popularity.
           | 
           | Interesting, the Algorithmica article the author cited is in
           | C:
           | 
           | https://en.algorithmica.org/hpc/data-structures/s-tree/
        
             | npalli wrote:
             | C++ not C. The author explains his choices elsewhere in the
             | site.
        
               | ryao wrote:
               | The code snippets should be accepted by a C compiler and
               | therefore are valid C code.
        
               | npalli wrote:
               | If you want to be pedantic, there is a constexpr in the
               | snippets, wont be accepted by the C compiler. More
               | broadly though it is a C++ tutorial. I insisted on it
               | because the author of algorithmica actually explains he
               | chose C++ and perhaps sometime later C/Rust or something
               | might be better. Should respect his POV.
        
               | ryao wrote:
               | I had not seen the use of constexpr. You are correct that
               | those are in C++.
        
               | lubutu wrote:
               | C23 has constexpr, albeit only for objects, not
               | functions. The code also uses namespaces, as in
               | `std::aligned_alloc(P, T)`.
        
               | ryao wrote:
               | You have to be 1/3 down the page to see any of this in
               | the code examples. By that point, you would already have
               | seen a number of examples that are valid C code and it is
               | reasonable to expect the rest to be. At least, that was
               | my experience.
        
         | pxmpxm wrote:
         | I suspect like 9/10 of these type of articles are really just
         | meant to be PR for rust. Like someone that wants to write Bart
         | Simpsonensque "rust is great" over and over, but have to hide
         | it into something more interesting.
        
           | curiouscoding wrote:
           | Rust _is_ great : ")
           | 
           | But no, this is actually part of my PhD research. The next
           | step will be to use this in a fast suffix-array search
           | algorithm.
        
             | jackschultz wrote:
             | Great stuff. We get told O notation is what matters for
             | data structures / algorithms, but improvements low level
             | with memory and storage with things like rust is much more
             | where improves can be made. These types of tricks for
             | anctual run times are so valuable, and interesting to
             | follow.
        
               | curiouscoding wrote:
               | Ohh yeah, don't get me started in big-O...
               | 
               | It was great while computers were not really a thing yet,
               | but these days it's often so meaningless. We see papers
               | with 2x speedup with a lot of novel algorithmic stuff
               | that sell better than 10x speedup just by exploiting CPUs
               | to the fullest.
               | 
               | Even myself I kinda think theoretical contributions are
               | cooler, and we really need to get rid of that (slightly
               | exaggerating).
        
               | Andys wrote:
               | You're really just exploiting hidden O(?) implementations
               | inside the CPU, so it still applies, just less visibly.
        
               | ryao wrote:
               | He is improving the constant factor in big O notation.
               | University algorithm classes tend to ignore cases where
               | the constant factor difference is significant enough to
               | favor a asymptomatically slower algorithm. Matrix
               | multiplication is the quintessential example of this,
               | since a good implementation of the O(n^3) algorithm will
               | outperform asymptotically faster algorithms, such as the
               | famous O(n^2 * log(n) * log(log(n))) one that uses the
               | FFT. At least, it outperforms it on matrix
               | multiplications people actually do in practice.
        
               | ryao wrote:
               | Your article led me to wonder if our b-trees would be
               | faster if I switched the intra node operations to use
               | Eytzinger ordered arrays:
               | 
               | https://github.com/openzfs/zfs/commit/677c6f8457943fe5b56
               | d7a...
               | 
               | There are two ways to look at this Big O wise. One is
               | that insertions and deletions would be asymptomatically
               | faster since memmove() is a linear operation while bubble
               | up/down are logarithmic operations. Look ups would not be
               | any different asymptotically, but the constant factor
               | might improve from being able to do prefetch. The other
               | way is that the N is bounded, such that it is all O(1)
               | and the difference is how big the constant factor is.
               | 
               | I imagine I could implement it and benchmark it. However,
               | my intuition is that the end result have lookups be
               | marginally faster to the point of splitting hairs while
               | insertions and deletions would be slower. While memmove()
               | is a technically a linear time operation, it is a
               | sequential operation that has a very low constant factor.
               | The bubble up and bubble down operations needed to do
               | insertions and deletions in a Eytzinger ordered array are
               | technically random access, which has a higher constant
               | factor. At some point, the Eytzinger ordered array
               | operations should win, but that point is likely well
               | beyond the size of a b-tree node.
               | 
               | My reason for saying this is to say that Big O notation
               | still matters, but understanding when the constant factor
               | is significant is important.
        
               | jackschultz wrote:
               | Same with async and throwing threads at a problem. People
               | love do those and think it's the right answer, but you
               | can do a ton with smarter memory management and actually
               | looking at what the code is doing lower level rather than
               | abstractions.
               | 
               | Video about this that was very interesting to follow and
               | somewhat related to what you're doing:
               | https://www.youtube.com/watch?v=5rb0vvJ7NCY
        
             | quotemstr wrote:
             | Rust is... okay. I still prefer C++'s template system to
             | Rust generics, in part because Rust doesn't have
             | specialization and won't for a while. Memory safety by
             | default is a big enough win to make me overlook these nits.
             | Really, though, most people most of the time should be
             | writing managed code.
        
           | remram wrote:
           | Do you think _this article_ was written specifically as just
           | PR for Rust?
           | 
           | Who do you think is behind this? The government?
        
             | estebarb wrote:
             | Clearly it is Big Compiler propaganda! </bazinga>
        
           | stevenhuang wrote:
           | You suspect wrong. People are interested in things you're
           | not. Get over it.
        
         | plasticeagle wrote:
         | As a computer science article, it should have mostly been
         | pseudocode. Concrete implementations can be given in an
         | appendix, where they belong.
        
           | curiouscoding wrote:
           | The problem with pseudocode is that it's completely
           | underspecified. And how would I ever write intrinsics in
           | pseudocode? Much easier to do a proper language directly so
           | people can actually Google things and read their official
           | documentation.
        
             | sigbottle wrote:
             | My former roommate works in a similar domain;
             | https://dl.acm.org/doi/pdf/10.1145/3448016.3452841 is an
             | example of a paper implementing intrinsics in pseudocode.
             | They "unroll" the intrinsics with a comment saying that
             | this is implemented with an intrinsic. Of course though,
             | your blog isn't a paper, don't know why people are getting
             | up in arms about it.
             | 
             | Also the other comment saying that "psuedocode is not
             | concerned with intrinsics" is false. You can get "great"
             | theoretical speedups (the shaveoffs are tiny but hey,
             | they're better) with "simple" intrinsic operations - that's
             | my roommate's entire research lol. The external memory
             | model, for example, formalizes caching and allows all these
             | low level optimizations to flourish. I'm not sure how
             | intrinsics tied into it, but he's published so I'm not
             | gonna question it :)
             | 
             | ---
             | 
             | Speaking of which, I noticed that you did competitive
             | programming. How does CP compare to research? I loved data
             | structure manipulation problems in CP when they were clever
             | - often because they involved noticing that you can take a
             | "common" model, but then optimize it significantly because
             | you only needed to support a subset of the operations
             | through a clever mathematical proof based on the structure
             | of the problem - but as I got to the higher levels it felt
             | more and more that a lot of them became really obscure
             | "support 413 operations on a tree, and yes, you really need
             | to support all 413 operations on a tree" and that's kind of
             | my opinion of data structure research unfortunately as well
             | :( I guess because solving broad general instances is more
             | important. I'd love to hear your perspective though.
        
               | ryao wrote:
               | Pseudo code can be whatever you want it to be. You can do
               | SIMD pseudo code, but most generally don't as that is
               | often an implementation detail.
        
           | kccqzy wrote:
           | That's unrealistic given that the author is not just doing
           | computer science but also engineering using SIMD. The kind of
           | available SIMD instructions actually affects code choice.
        
           | nostradumbasp wrote:
           | I strongly prefer actual implementations and adore the way
           | this was written. Maybe its all of the "to get this to run in
           | O(N) time we will leave the trick for the reader to discover"
           | I've bumped into. Or the classic "use this esoteric pay-
           | walled reference for instructions for this important
           | subalgorithm" then when you get that reference you realize
           | its doing the same sort of thing. Super cool for quizzing
           | masters students, pretty ridiculous for people in industry or
           | hobbyists who learn by playing with code. Unfortunately when
           | that happens there is almost never an OSS implementation with
           | the "trick" either.
        
         | quotemstr wrote:
         | Rust is now common enough in the industry that everyone should
         | be expected to at least read it, just like every programmer
         | should be able to fizzbuzz in Python and JavaScript just as a
         | matter of general technical literacy. There's no advanced Rust
         | here and no concepts specific to the language.
         | 
         | It's a good article.
         | 
         | If Heinlein were in our industry, he might write this:
         | 
         | A programmer should be able to write a shell, parse binary
         | data, design a protocol (and reverse engineer one too),
         | authenticate a password, not fumble multi-code-point graphemes,
         | wrangle semaphores, plug memory leaks, compress data, migrate
         | databases, automate builds, git rebase a tree, profile tail
         | latency, write a simple design doc for a complex system, draw a
         | triangle, multiply tensors, and do fizzbuzz in every popular
         | language on GitHub. Specialization is for insects.
        
           | ryao wrote:
           | Here is a list of languages people can be generally expected
           | to be able to read:                 * C       * C++       *
           | Java       * JavaScript       * Python
           | 
           | It is the top 6 of the TIOBE index, minus C#:
           | 
           | https://www.tiobe.com/tiobe-index/
           | 
           | Rust is not very high on the TIOBE index. It has rank 14 with
           | a 1.29% rating. That is not much higher than COBOL.
        
             | quotemstr wrote:
             | Rust is where the action is though. It's not just about
             | cumulative SLOC weight, but about mindshare among the
             | "movers and shakers" of the industry. Think of it as a
             | prestige language, an acrolect, you should at least vaguely
             | know so that you can participate in the most challenging
             | conversations in the field.
        
               | ryao wrote:
               | That is an interesting perspective. Here is my
               | perspective. There are no popular projects that use Rust
               | as a primary language. The current fashionable area of
               | computers is machine learning. As far as high level
               | languages go, you will see a mix of Python, C++ and
               | perhaps a tiny bit of C there. Rust is nowhere to be
               | seen.
               | 
               | Your description would be better applied to C, C++ and
               | Python. That is what the majority of popular software
               | uses and people having challenging conversations are
               | often using one of those languages.
        
               | tyre wrote:
               | > There are no popular projects that use Rust as a
               | primary language.
               | 
               | Servo is pretty notable and visible. ripgrep, influxdb,
               | wasmer, deno. uv and ruff in the python ecosystem are
               | written in rust.
               | 
               | AWS, Cloudflare, Discord (iirc), and Microsoft have all
               | adopted Rust extensively.
               | 
               | As of late 2023 there are committed drivers in the Linux
               | kernel written in rust. (Linux is pretty popular.) prior
               | to those, only C and assembly were allowed.
        
               | ryao wrote:
               | None of this contradicts what I said.
        
               | quotemstr wrote:
               | Don't forget Pydantic
        
               | dwattttt wrote:
               | > There are no popular projects that use Rust as a
               | primary language
               | 
               | You may be surprised to learn, Python's `cryptography`
               | project is written (EDIT: maybe not primarily, it's hard
               | to compare given dependencies) in Rust, and saw around
               | 8.1 million downloads from PyPI yesterday. That puts it
               | close to the top 20 (the 20th project comes in at 8.5m).
        
               | nostradumbasp wrote:
               | I would like to also point out that rust is boring as
               | hell. Lots of otherwise ordinary people write super
               | simple non-fancy programs using it too :)
        
             | bschwindHN wrote:
             | You forgot the part where no one cares about TIOBE though.
        
               | ryao wrote:
               | As the TIOBE index says "The ratings are based on the
               | number of skilled engineers world-wide, courses and third
               | party vendors". Very many people do care about that. Is
               | there any reason I should think your remark is from some
               | legitimate issue rather than you not liking the index
               | results?
        
               | bschwindHN wrote:
               | I would argue most people working software jobs either
               | haven't heard of TIOBE or don't care about it. I only
               | occasionally hear about it from strange internet comments
               | that are far too focused on language particulars.
        
               | ryao wrote:
               | That is surprising. I heard about the TIOBE index often
               | about 15 years ago, but now that I think about it, I have
               | not heard about it lately. I wonder if the rise of Python
               | and ML has caused people to stop asking "what language
               | should I learn". That is the question the TIOBE index
               | often was cited to answer.
        
             | zahlman wrote:
             | As an aside: why is Java winning out over C# these days?
             | When I first experienced C# I didn't want to go back.
        
             | forrestthewoods wrote:
             | If you know C/C++ you could have learned to read Rust in
             | the time you spent commenting in this thread. The number of
             | new syntaxes you need to learn is quite minimal.
        
         | unrealhoang wrote:
         | What a weird thing to complain about. The Rust code in TFA are
         | purely arithmetic and imperative function calls. Any C
         | developer that is good enough to care about the algorithm
         | mentioned will have no issue reading such Rust code, heck, they
         | could even throw the code to chatgpt to translate for them no
         | problem.
         | 
         | Such a good and detailed article that packed with techniques
         | that are applicable everywhere yet the complaints are: "but
         | rust".
         | 
         | Regarding Python, how could such optimizations be implemented
         | in Python? Generating LLVM bytecodes directly aside.
        
           | quotemstr wrote:
           | > Regarding Python, how could such optimizations be
           | implemented in Python? Generating LLVM bytecodes directly
           | aside.
           | 
           | You don't. You instead use your favorite Python extension
           | system (regular Python modules, Cython, Numba, whatever) to
           | implement this tree algorithm and expose it to Python already
           | shaped into a container.
        
           | ryao wrote:
           | I have not studied Rust and find its syntax alien. It looks
           | like a cross between a functional language and C++. Trying to
           | read it is like forcing myself to read Italian. I don't know
           | Italian, but have studied Spanish and Latin. I can sort of
           | figure things out if I put in effort, but there is always
           | uncertainty when I do. That is the problem with reading Rust.
           | 
           | This could be addressed if I just learned Rust (much like my
           | difficulty with Italian could be addressed by studying
           | Italian). However, I already know at least 10 programming
           | languages. I am unwilling to learn the latest fashionable
           | language unless it can promise me that the language will not
           | change and there will never be another fashionable language
           | for me to learn. Given that Rust rejected the standardization
           | process that C and C++ embraced, and Zig is expected to
           | replace Rust one day, I doubt Rust can make such assurances.
           | 
           | As for Python, it is often used for teaching CS and
           | everything except the assembly code is doable in it. Using
           | low level assembly from Python would require using the FFI.
           | Also, Python has nothing to do with LLVM.
        
             | quotemstr wrote:
             | > Zig is expected to replace Rust one day
             | 
             | That's a bold prediction considering that Zig is unsafe and
             | Rust is safe.
             | 
             | If Zig's idioms make it safe, then modern C++ is safe too.
        
               | ryao wrote:
               | Zig replacing Rust is not my prediction. It is what I am
               | hearing from the Zig enthusiast I know. I have also seen
               | others express the sentiment that the next fashionable
               | language after Rust will be Zig. Regardless of whether it
               | is Zig, there will be another. Rust is just one part of a
               | never ending cycle of people trying to reinvent
               | programming.
        
               | nicoburns wrote:
               | > Rust is just one part of a never ending cycle of people
               | trying to reinvent programming.
               | 
               | This is true of every single programming language.
        
               | ryao wrote:
               | That is my complaint against the next fashionable
               | language. Once you have some level of proficiency in
               | around 10 languages, adding more to the list is a chore.
               | It lacks the enjoyment that came from the first few. At
               | least, that is my experience. That is why I am resistant
               | to learning new languages unless software I want to
               | modify is already written in it. Then, I am forced to
               | learn and time spent on it is time that actually needed
               | to be spent, unlike my brief time spent teaching myself
               | FORTRAN in college, which was largely a waste of time.
        
               | Sharlin wrote:
               | > Zig enthusiasts
               | 
               | Yeah, don't you think they might be perhaps just slightly
               | biased? I have some news for you about your average
               | technology enthusiast...
               | 
               | Anyone who claims Zig is going to replace Rust knows
               | nothing about either Zig, Rust, or both. Zig is designed
               | to solve none of the problems that Rust solves. Trillion-
               | dollar corporations are rewriting things in Rust because
               | it solves problems that really matter to them.
               | 
               | The _US government_ strongly recommends writing all new
               | mission-critical code in memory-safe languages. Guess
               | which one of Rust and Zig is memory-safe?
               | 
               | I have nothing against Zig, it looks like a very nice
               | better C, and the comptime stuff is both cool and useful.
               | But its priorities are not Rust's priorities.
        
               | nottorp wrote:
               | Of course, Rust evangelists are also more than slightly
               | biased.
        
               | kllrnohj wrote:
               | There's approximately a 0% chance of Zig seeing non-
               | trivial adoption anywhere other than maybe the embedded
               | world.
               | 
               | It otherwise solves basically nobodies problems in the
               | real world and is just a love letter to C. It's cute,
               | kinda, but that's about it.
        
               | quotemstr wrote:
               | Zig is like the Vulcan Centaur next to Starship.
        
               | rurban wrote:
               | Both are unsafe, just zig is usually safer than rust.
               | 
               | And also so more readable, ie. maintainable
        
               | Sharlin wrote:
               | That's one crazy claim to make.
        
             | unrealhoang wrote:
             | Suck to be you, I guess? If you don't care enough then you
             | don't care. You wouldn't bitch when there's an article
             | written in Italian with good knowledge inside: "but
             | Italian". You either trying to read using translation tool
             | or ignore it entirely.
             | 
             | I didn't say python has anything to do with LLVM, it's just
             | a technique, read about it, or not.
        
           | shawn_w wrote:
           | Fluent in C and C++ among others, know nothing about Rust.
           | It's mostly easy to follow, but I'm having trouble with
           | notation like `&[u32]` and `&[u32; P]`. Arrays that hold
           | unsigned 32 bit integers? But there's `Vec<u32>` which also
           | seems like an array type? Is it like the difference between
           | C++ primitive arrays and std::vector?
        
             | dwattttt wrote:
             | For `&[u32]` and `&[u32; P]`, those are both references to
             | an array of unsigned 32 but integers, however the former is
             | to an array of any size (and carries its size at runtime),
             | whereas the latter is to an array with an explicitly known
             | size at compile time. It's unusual in C, I think you can
             | express it as `int[24] *` (or possibly some other
             | permutation, I'm away from a compiler).
             | 
             | Vec<u32> is pretty much like std::vector compared to them,
             | yeah.
        
             | loeg wrote:
             | `&[u32]` is essentially `std::span<const uint32_t>`.
             | 
             | `&[u32; P]` is similarly something like `std::span<const
             | uint32_t, P>`.
             | 
             | `[u32; P]` (no `&`) is essentially `std::array<uint32_t,
             | P>`.
             | 
             | `Vec<u32>` is essentially `std::vector<uint32_t>`, yes.
        
               | shawn_w wrote:
               | Thanks
        
         | FridgeSeal wrote:
         | I have a far easier time reading Rust than C. Probably because
         | it's been years since I've had to do C, and I find some of the
         | syntax and patterns quite C-specific and confusing. Python also
         | does not express some of the important implementation details
         | well enough in its standard syntax IMO: there's no obvious
         | delineation between references and pass by value in Python,
         | borrowing, etc.
         | 
         | So realistically, this is just the other half of the coin for
         | all the articles where the code examples were written in C and
         | everyone who didn't really read C just had to put up with it.
        
           | ryao wrote:
           | The TIOBE index suggests that you are in a much smaller group
           | than those who have difficulty reading Rust:
           | 
           | https://www.tiobe.com/tiobe-index/
        
             | coliveira wrote:
             | Yes, he is in a bubble.
        
         | bawolff wrote:
         | I've never programmed in rust and i can understand find.
         | 
         | At the end of the day rust is just another imperative
         | programming language. You shouldn't need to know the language
         | at all to understand the very simple examples written in rust.
        
         | peutetre wrote:
         | Well, you can do the C implementation then.
         | 
         | Don't see it as a problem, see it as an opportunity.
        
         | tmtvl wrote:
         | Oh no, it would've been better if it was written using the most
         | readable language: Scheme.
        
       | wolfgangK wrote:
       | Amazingly thorough ! I love how the author leaves no stone
       | unturned. I had no idea you could do the kind of low level
       | efficiency shaving in Rust. I wonder how a C++ implementation
       | with https://github.com/jfalcou/eve would compare.
        
         | curiouscoding wrote:
         | Thanks! It's somewhat tiring to not have loose ends, but I
         | agree it pays off :)
         | 
         | Doing this stuff in Rust is absolutely possible, and I'd do it
         | again since my C++ days are now past me, but the endless
         | transmuting between portable-simd types, plain rust arrays, and
         | intrinsic types _is_ quite annoying.
         | 
         | Also rust is not made for convenient raw pointer arithmetic.
         | There plain C would be much more to the point.
        
           | ant6n wrote:
           | You kind of lost me towards the end, so I'm not sure whether
           | you attempted it, but I was wondering whether it would be
           | possible for the internal nodes to store only the upper 8/16
           | bits (that are nonzero for the current subtree). This implies
           | that one 64 byte cache line stores 32 or 64 entries instead
           | of 16 (or better: 31 or 62/63, since u may need some
           | bookkeeping dat).
           | 
           | The next level would be to keep track of how many prefix bits
           | are implied by the parents, so leaf nodes could perhaps also
           | only use 8/12/16 bits, if the the higher bits are implied by
           | the parent. or instead of bit masks, use offsets (i.e. leaves
           | store k bits offset from parent value).
           | 
           | That may screw around with the branch predictor, and may work
           | very good with evenly distributed data vs uneven data.
        
             | curiouscoding wrote:
             | Ohh yes some good ideas there! I've thought about pretty
             | much exactly this at some point but then didn't end up
             | including it. But I do think it's quite promising, and most
             | of the bookkeeping can be made to be branchless.
             | 
             | On the other hand, most of the latency is in the last few
             | layers, and probably there isn't as much to be saved there.
             | 
             | The biggest problem might be that the bottom layer will
             | anyway have to store full-width numbers, since we must be
             | sure to have the low-order bits somewhere in case they
             | weren't covered yet in earlier layers. Or we could have
             | variable width encoding per node maybe (instead of per
             | layer) but that does sound a bit iffy on the branch
             | predictor.
             | 
             | In the end I guess I kinda like the simplicity and hence
             | reliability of the 'just do the full tree and the first
             | layers are cheap anyway' approach. Probably another factor
             | 2 speedup is possible on specific datasets, but anything
             | beyond this may not be reliably good on worst case inputs
             | (like is the issue for the prefix partitioning methods).
        
               | ant6n wrote:
               | The lowest level could be encoded with whatever number of
               | bits is required for the numeric distance between the
               | leafs. With proper book keeping, the full 32 bit values
               | don't need to be stored. The value for each node should
               | be something like (parent << shift + node), where node
               | has ideally 8 Bits, or maybe 16 bits.
               | 
               | It kind of comes down to how to deal with bad
               | distribution of values, like large differences between
               | adjacent values. One could for example deal with this by
               | deliberately leaving ,,gaps" in the tree, like using a
               | 59-ary node on places (make the last values in the array
               | large, so they will never get visited). With dynamic
               | programming, perhaps an optimal distribution of the leaf
               | level could be had - although it requires more
               | bookkeeping bits of one needs to have the index of the
               | found value, not just whether the value is in the tree.
               | 
               | The question could also be whether it could be
               | interesting to store delta valeues, so that 63 8-bit
               | valeues gives a larger numeric range - this means
               | computing some sort of cumulative sum on on the 63
               | values, not sure whether there is a simd instruction for
               | that.
               | 
               | One more thought: if there is different ways different
               | layers of the tree are stored, but they are always the
               | same for each layer, the code be unrolled, with every
               | layer having different code to deal with it.
               | 
               | Last thought: if the index is not important to know, just
               | whether the value is in the set or not, one could use a
               | bitmap as a data structure. It would require 256MB for
               | the 32bit space, but large spans of zeros imply empty
               | pages.
        
         | npalli wrote:
         | Yeah, Compare to highway as well
         | 
         | https://github.com/google/highway
        
         | secondcoming wrote:
         | Have you used Eve successfully for anything? It's very
         | confusing, it took me far too long to uppercase a string with
         | it.
        
         | rurban wrote:
         | There are stones still unturned. For typical unicode table
         | lookup's you'd need batches for all characters of a word. It
         | would be much faster to lookup eg 16 chars at once, to optimize
         | cache misses.
         | 
         | But when I tested this even Eytzinger was too slow.
        
       | estebarb wrote:
       | Recently I tried to optimize a set implementation and found that
       | interpolation search works quite well, being competitive with
       | HashSet in C# (both single digit ns for my use case), with zero
       | memory overhead. Unfortunately, it only works well with uniform
       | data, so I guess I'll give static search trees a try. This
       | explanation was clear and extremelly well detailed.
        
         | curiouscoding wrote:
         | At some point I was playing around with interpolation search on
         | the human genome dataset, and it was really really terrible. It
         | had some very bad worst case behaviour when the input data has
         | big plateaus and you're searching for something around the
         | edges of the plateau. This can be fixed by only using
         | interpolation for the first few iterations and doing binary
         | search for as long as needed afterwards, but that kills a lot
         | of the predictability of the current approach. So while I
         | really like it in theory, in practice I'm not quite convinced
         | it can be made to work efficiently.
        
       | koverstreet wrote:
       | Nifty thing about eytzinger trees is that there's a formula for
       | converting from a node index to its position in an inorder
       | traversal.
       | 
       | This is useful if your primary keystore is a normal sorted set of
       | keys - easier to work with - you then don't need to store
       | explicit pointers.
       | 
       | Also, he didn't mention that with eytzinger you can prefetch
       | multiple loop iterations in advance - use 1-based indexing so
       | that sibling nodes line up nicely on cachelines.
        
         | curiouscoding wrote:
         | You're right. While I wasn't very explicit about this (because
         | algorithmica and the linked paper spend plenty of words on it
         | already), the code snippet for Eytzinger does indeed do both
         | the prefetching and the formula.
         | 
         | In fact, I'm pretty sure a similar formula could be made to
         | work for higher branching factors, although it would surely be
         | slower. (Probably it depends on the number of times 17 divides
         | the final index it so, which is not great, but with B=15 it
         | would be the number of factors of 16 which is easy again.) The
         | more annoying issue with not storing everything in the final
         | layer is that we have to keep a running-answer for each query
         | as we go down the tree, which I suspect will add measurable
         | runtime overhead. But maybe that's a tradeoff one is willing to
         | make to avoid the 6.25% overhead.
        
       | sujayakar wrote:
       | this is unbelievably cool. ~27ns overhead for searching for a u32
       | in a 4GB set in memory is unreal.
       | 
       | it's interesting that the wins for batching start diminishing at
       | 8. I'm curious then how the subsequent optimizations fare with
       | batch size 8 (rather than 128).
       | 
       | smaller batch sizes are nice since it determines how much request
       | throughput we'd need to saturate this system. at batch size 8, we
       | need 1s / ~30ns * 8 = 266M searches per second to fully utilize
       | this algorithm.
       | 
       | the multithreading results are also interesting -- going from 1
       | to 6 threads only improves overhead by 4x. curious how this fares
       | on a much higher core count machine.
        
         | curiouscoding wrote:
         | Just fyi: the throughput numbers with batching are per _query_,
         | not per _batch_, so I think the *8 is too optimistic ")
         | 
         | I suspect that at higher core counts, we can still saturate the
         | full RAM bandwidth with only 4-5 cores, so that the marginal
         | gains with additional cores will be very small. That's good
         | though, because that gives CPU time to work on the bigger
         | problem to determine the right queries, and to deal with the
         | outputs (as long as that is not too memory bound in itself,
         | although it probably is).
        
       | orlp wrote:
       | One dimension that is not explored is partitioning the _queries_
       | in batches. The primary cost is doing lookups on the out-of-cache
       | table, so if you have a sufficiently large amount of queries you
       | can resolve a couple layers of the tree in one step while those
       | layers are in cache, grouping them based on where they land
       | deeper in the tree, and then resolving all those queries that
       | touch the same deeper part of the tree in one batch as well.
       | 
       | In theory, with an infinitely large amount of input queries you
       | will never have to do an out-of-cache lookup, it can be
       | completely amortized away into linear scans.
       | 
       | That said, you now end up with a bunch of results that need to be
       | put back into the correct output order, which likely makes it not
       | worth it. But if the operation can be fused into a reduction
       | (e.g. find the _sum_ of the binary search results) or the output
       | order does not matter in some other way then all of a sudden it
       | might make sense.
        
         | mikepurvis wrote:
         | Interesting stuff, definitely the kind of real world
         | optimization that happens when you're able to look at actual
         | access characteristics rather highly abstracted models.
         | 
         | At one level you could just be saving the results into a
         | hashtable to bubble them out again, but at the extreme end the
         | lookup is actually spread across multiple machines in a
         | cluster, so it's more like throwing the query to the proper one
         | with the right chunk of the index, along with instructions
         | about where the final RPC is supposed to go with the result.
        
         | koverstreet wrote:
         | That presumes there's both locality in your queries, and it's
         | not an online system - result latency doesn't matter much.
         | 
         | That's not terribly common.
        
         | tmyklebu wrote:
         | There are theory papers on "buffer trees"---B-trees where each
         | node is augmented with an O(B)-length array of pending updates
         | and queries. I believe there were also some attempts at
         | building SQL databases based on them. It sounds like you're
         | reaching for the same trick.
        
           | koverstreet wrote:
           | that's a hybrid compacting data structure: compacting within
           | the btree node, normal btree topology otherwise.
           | 
           | And it works much better than pure compacting (i.e. the
           | leveldb lineage), because you avoid lock contention at the
           | root on multithreaded update workloads, and the
           | compaction/resort is much lower overhead when it fits in l2.
           | 
           | incidentally, there's a filesystem that uses this technique.
        
             | loeg wrote:
             | > incidentally, there's a filesystem that uses this
             | technique.
             | 
             | BetrFS?
        
         | bobmcnamara wrote:
         | This works really well for subset testing, especially if sets
         | sorted.
         | 
         | Walk the larger tree, using the smaller tree.
        
         | natmaka wrote:
         | It enhances the throughput (on average everyone waits less) at
         | the price of a higher max latency (some, who posted a request
         | mobilizing a very-recently-out-of-cache-index, will wait way,
         | way more...), isn't it? In the real world those worst cases
         | quite often kill such optimization.
         | 
         | The (data size / cache size) ratio and queries local (in time)
         | dispersion are key.
        
         | mlochbaum wrote:
         | I have explored it, see https://mlochbaum.github.io/BQN/impleme
         | ntation/primitive/sor....
         | 
         | I implemented this method in Dyalog 18.0 with BlockQuicksort-
         | like partitioning, using vectorized comparison with bit-boolean
         | output. It's faster than you'd expect, maybe twice as slow as
         | regular binary search when searched values are in cache, and
         | better once they fall out of L2. But Dyalog reverted to 17.1
         | for later versions so you won't see it in a new download. It'll
         | probably make it into CBQN eventually, perhaps with radix
         | partitioning. Note that both quicksort and radix partitioning
         | can be done and undone in a cache-friendly way.
         | 
         | Unlike quicksort, there's no issue of pivot selection since you
         | always choose the midpoint of the searched values. However,
         | there's a complementary issue of memory if the partitions
         | become unbalanced, because the larger partition can require
         | saved memory of roughly the depth times the number of elements.
         | With a bit per comparison it's bounded by the size of the
         | input.
        
           | mlochbaum wrote:
           | Wasn't the best section link, last paragraph here has more
           | detail: https://mlochbaum.github.io/BQN/implementation/primit
           | ive/sor...
        
         | curiouscoding wrote:
         | Ah yes, I did consider sorting the queries at some point but I
         | guess I then forgot about it again. If the queries are random
         | and much less than the input size, probably most queries will
         | hit different cache lines in the final layer, and I suspect
         | benefits will be limited at maybe 2x best case since the last
         | layer is where the bottleneck is.
         | 
         | Also (radix) sorting is very memory bound usually, and we
         | probably need to sort in at 16^2=256 buckets or so to get
         | sufficient reusing of the higher layers, but I don't have the
         | numbers of what a round of radix sort takes. (My guess is order
         | 1ns per query? Maybe I'll find time to investigate and add it
         | to the post.)
        
         | hinkley wrote:
         | There was some work a while back on streaming data past
         | queries, but you need a fairly bounded data set for that to
         | work. Having ten years of historical data in a data set would
         | gum that up severely.
        
       | NooneAtAll3 wrote:
       | I wish graph did not show 2 values with the same blue color
        
         | jasonjmcghee wrote:
         | It wasn't only the one graph - there's another graph that has a
         | whole spectrum of blue colors - nearly impossible to tell which
         | is which without going in with an eye drop tool.
         | 
         | I found this to really detract from an otherwise interesting
         | post.
        
           | curiouscoding wrote:
           | You're absolutely right. At some point I spent so long
           | working on the plotting code that I really couldn't be
           | bothered anymore to make it prettier. Hopefully I eventually
           | find some motivation to fix this.
        
       | owenthejumper wrote:
       | Elastic Binary Trees
       | https://wtarreau.blogspot.com/2011/12/elastic-binary-trees-e...
        
       | purple-leafy wrote:
       | cant understand the comp sci stuff yet, but love the way this
       | site looks
        
         | curiouscoding wrote:
         | Thanks! I did spend some time fiddling with CSS to make the
         | yellow highlights so that the titles stand out a bit more, and
         | to improve the code blocks, but otherwise it's a relatively
         | common theme for Hugo. (I think it's linked in the footer.)
        
       | shawn_w wrote:
       | All this nattering about rust and C++, and then there's me
       | thinking about how to do it in Common Lisp, possibly keeping the
       | simd bits.
        
         | wk_end wrote:
         | FWIW, I'd love to read a blog post about that, too.
        
       | colesantiago wrote:
       | This is a great new interview question for junior and senior
       | software engineering candidates.
       | 
       | Definitely going to use this for our next round of interviews.
        
         | butterlettuce wrote:
         | Please don't. Just test us on how to center a div.
        
           | animal531 wrote:
           | "Go ahead Junior, please implement this world class work and
           | explain it to me as you go along. Oh, starting salary? No no,
           | this is an unpaid 6 month internship."
           | 
           | Fun aside, asking someone to work through a problem with you
           | is fine, but when it diverges so much from the actual work
           | then it just becomes ridiculous.
        
       | openquery wrote:
       | This is really interesting and a thorough write up. Thanks to the
       | author for sharing their work.
       | 
       | Whenever I read about super low-level optimisation, my immediate
       | feeling is that of gratitude to the author for spending so much
       | time shaving off nanoseconds which the entire SE community gets
       | to enjoy.
       | 
       | I wonder how much time humanity has collectively saved simply as
       | a result of how all these optimisations stack up.
        
         | curiouscoding wrote:
         | Assuming this is going to save someone 10ns per query in the
         | end and I spent 150h on coding and writing this up, we only(?)
         | need around 10^14 queries to make it worth it!
         | 
         | But yes, as new technologies stack up, we achieve exponential
         | growth in throughput in the end, which usually enables new
         | science :)
        
       | DerSaidin wrote:
       | I think there is an error in 1.7 figure 3: Layer 1 should have a
       | 10 instead of a 12.
       | 
       | Edit: Also, 1.3 figure 1: Should the y-axis label be "Inverse
       | throughput" to match later figures?
        
         | curiouscoding wrote:
         | You're right about the mistake in the figure. Will fix soon,
         | thanks :)
         | 
         | And yeah, maybe I should just label all of then with throughout
         | for consistency.
        
       | vlovich123 wrote:
       | Is binary search on su ch large integer data sets common? I guess
       | maybe time series data. But I would think that this isn't
       | amenable to representing strings?
        
         | throw-qqqqq wrote:
         | Hash the string to convert to integer -> double hash or
         | truncate to lowest 32 bits and this technique works for strings
         | as well.
         | 
         | You can replace string with any data structure you can hash or
         | index really.
        
           | vlovich123 wrote:
           | Ah but this then absolutely fails for range queries but makes
           | sense if you want to build an index for point lookups.
        
             | throw-qqqqq wrote:
             | Definitely! If the integers are (truncated) hashes, ranges
             | are completely meaningless.
             | 
             | But what would a range over strings even represent?
             | 
             | Perhaps if you e.g. use string indices into an array of
             | sorted strings (by length, by prefix or whatever), you
             | could use it for something. Depends on the application I
             | guess :)
        
       | swiftcoder wrote:
       | Absolutely love the level of detail here! Not often you see the
       | process of optimising something spelled out in so much depth
        
       | astronautas wrote:
       | Cool! Thanks for the investigation.
        
       | topbanana wrote:
       | Interesting. If you need to support occasional writes, you could
       | have a large static search tree and a small writable tree that
       | you check afterwards. The when you have enough changes, you could
       | occasionally ship those changes into a new version of the static
       | tree
        
       | skrebbel wrote:
       | Totally off topic, but I've started to notice that more and more
       | algorithmic low-ish level content assumes Rust by default. My
       | entire life I've been used to anything algorithmic being either
       | written in sciency pseudocode (invariably in latex-generated
       | PDFs), or plain old C(++), the vulgar latin of computer
       | programming. I know C and I don't really know Rust but
       | nevertheless I love that this is changing! I'd never advise a new
       | programmer interested in low-level/fast stuff to begin with C
       | now, so much good content is written around Rust. Like the old C
       | examples, the post doesn't even say "this is Rust". You're
       | expected to expect it to be Rust.
       | 
       | And, well the language choice doesn't really matter for the core
       | subject here. I could follow this post fine even though I don't
       | really know Rust (just like I assume a Rust programmer could
       | follow a well written algorithms deep dive with examples in C
       | with a bit of dedication). EDIT okok that was a lie, I could
       | follow the first half of this post, ish, and then it got way too
       | deep for me.
       | 
       | But anyway, standardize on Rust, I like it. I'd love if a bit
       | more Zig was thrown into the mix here but either of them feel
       | like a better default than C at this point to me. After decades
       | of C being _The Standard_ for this stuff, I love that this is
       | finally changing.
        
         | uecker wrote:
         | This assumes that Rust is actually a better language. IMHO it
         | isn't.
        
           | norman784 wrote:
           | I would argue that in some sense it is, just because of the
           | enforced safety and correctness.
        
             | peoplefromibiza wrote:
             | So it's Ada since 1980, before C++ was even created.
        
               | sroussey wrote:
               | I wrote my first optimizing compiler for Ada in C++
               | around 1990.
        
             | uecker wrote:
             | It does neither enforce correctness nor safety. It enforces
             | memory safety in programs that stick to the safe subset.
             | But don't get me wrong, Rust certainly has many good ideas
             | and the focus on memory safety is good. I still do not
             | think it is a great language.
        
           | p2detar wrote:
           | I'm still unsure of how to feel about Rust. But if it has
           | made it into the Linux Kernel, then I assume adoption is
           | ongoing.
           | 
           | Still, Zig seems somewhat more intuitive to me, even though
           | I've used it as little as Rust thus far.
           | 
           | A nice write up on the topic Zig and Rust:
           | https://matklad.github.io/2023/03/26/zig-and-rust.html
           | 
           | edit: typos
        
             | deaddodo wrote:
             | Generally Zig is going to be more intuitive for C
             | developers (people who prefer more low-level and procedural
             | style coding), while Rust would be more intuitive/conducive
             | to C++ developers (those who prefer more abstractions and
             | language level features/guarantees).
             | 
             | Which is totally fine, there's a reason C++ never
             | completely subsumed C; both are valid avenues.
        
           | jltsiren wrote:
           | I think it's almost irrelevant how good Rust is as a
           | language. Rust is winning in many contexts, because you can
           | use it to replace C++, but the standard library and tools are
           | substantially better.
        
             | nostradumbasp wrote:
             | The entire user experience is so much nicer that even if
             | Rust was slightly worse I would prefer it. I don't miss
             | header files, makefiles, not having a package manager,
             | maintaining all the crazy code people wrote themselves as
             | pet projects instead of using an existing project in an
             | ecosystem. Thankfully its not worse, so I don't even have
             | to weigh the odds.
        
               | tomcam wrote:
               | Wait isn't cargo a package manager? Not a rust expect so
               | I'm probably missing something
        
               | Tanjreeve wrote:
               | I think their point was that Rust has a batteries
               | included package manager (cargo). Another advantage of
               | which being that all rust projects are pretty standard in
               | that particular aspect.
        
           | 110jawefopiwa wrote:
           | I think the main problem is that Rust doesn't allow for basic
           | data structures like graphs or doubly-linked lists without
           | extra effort, which is why I've generally always preferred
           | pseudocode.
        
             | fpoling wrote:
             | Well, one can always emulate that in Rust with array
             | indexes. And it will be closer how the actual CPU works
             | where a pointer is just an offset into a memory chunk.
             | 
             | Perhaps this is the reason performance oriented articles
             | prefer Rust. In allows to skip a lot of memory allocation
             | while still catching memory-safety bugs, which is a hard
             | problem with C++.
        
               | szundi wrote:
               | How could something to implement a pointer be more
               | straightforward than a pointer? People seem to like Rust
               | in excessive, sometimes botheringly excessive
               | proportions.
        
             | emilsayahi wrote:
             | I often hear this and am confused; not only are things like
             | 'object soup'[0] possible and straightforward (putting
             | things in collections and referring to them by indices), I
             | never concretely hear why a graph or doubly-linked list
             | becomes uniquely difficult to implement in Rust (and would
             | genuinely be curious to learn why you feel this way). If
             | you needed such data structures anyway, they're either in
             | the standard library or in the many libraries ('crates' in
             | Rust-lingo) available on Rust's package registry[1]---using
             | dependencies in Rust is very straightforward & easy.
             | 
             | [0]: https://jacko.io/object_soup.html
             | 
             | [1]: https://crates.io/
        
             | thayne wrote:
             | It doesn't though.
             | 
             | Those are fairly straightforward to implement if you are ok
             | using a reference counted type (along with weak
             | references), although that will hurt performance. That
             | would be similar to using a shared_pointer in c++.
             | 
             | And you can implement code similar to what you would in c
             | or c++ if you use raw pointers and unsafe code.
             | 
             | Where you run into difficulties is if you want something
             | that is fast _and_ safe. And then you need a more
             | complicated design that probably involves using indices
             | into a Vec instead of pointers.
        
         | Narew wrote:
         | I don't know if it's good or bad, with pseudo code you put no
         | constraints on how it should be implemented. It's known that
         | some kind of algorithm are really hard to implement in Rust
         | (every one use the link list data structure as an exemple). So
         | having the article that use rust is good to see it can fit to
         | rust constraints but at the same time does this constraint
         | limit the algorithm itself ?
        
         | curiouscoding wrote:
         | Anything in particular that threw you off? I'd be happy to add
         | a few words to briefly explain some of the less intuitive rust
         | syntax.
        
           | skrebbel wrote:
           | Nah just the final bits, stuff like `Simd::<u32,
           | 8>::splat(q)`, I'm not sure what splatting is or how Rust's
           | Simd library works or, admittedly, how SIMD works in detail
           | at all. So eg I'm not sure what that 8 is doing there in the
           | type parameters, details like that. Maybe this isn't a Rust
           | thing but a SIMD thing, btw, I don't know much Rust but I
           | also never wrote any SIMD code ever. I don't know how the
           | article could be clearer, IMO once you go into the deep nitty
           | gritty optimization stuff you just gotta assume the reader
           | knows the tools you're using. I'm OK with dropping out
           | halfway cause the core idea is there even before you squeeze
           | out all the perf.
        
             | burntsushi wrote:
             | `Simd::<u32, 8>` is describing a vector with 8 lanes, where
             | the width of each lane is 32-bits. For targets that support
             | it, that should get you a 256-bit register.
             | 
             | The `splat(q)` constructor is just saying, "populate each
             | lane with `q`." That is, the value is "splatted" across
             | every lane in the vector.
             | 
             | The main idea of SIMD is simple: you represent multiple
             | points of data in the same register and use specialized CPU
             | instructions to operate on those points simultaneously. The
             | difficulty of SIMD is coming up with clever algorithms that
             | use the available instructions in an efficient way.
        
               | skrebbel wrote:
               | Ahh right, I've been doing too much TypeScript lately. I
               | thought a type parameter couldn't impact behavior but
               | only typechecking but clearly in Rust (like in C++) it
               | can. Thanks for the explanation!
               | 
               | Ah so splat is like a memset for the entire vector to the
               | same value. OK makes sense, I bet that wasn't actually a
               | Rust-ism at all, just basic SIMD lingo I didn't know.
               | Thanks again!
        
               | bennythomsson wrote:
               | Nice to see other real experts (and a bit of celebrities)
               | in here.
               | 
               | (For the uninitiated: Burntsushi of ripgrep fame.)
        
               | skrebbel wrote:
               | yeah tbh i was a bit starstruck that burntsushi would
               | reply to _my_ comment explaining what i assume is utter
               | SIMD basics. that must be how swifties feel when she (or,
               | likelier, her PR drones) click  "like" on their insta
               | comments.
        
               | freeopinion wrote:
               | Does burntsushi have PR drones (who can explain SIMD)? :)
        
         | npalli wrote:
         | Quite honestly, doesn't seem like Rust is a win here over C++.
         | In fact, C++ has fewer sigils that make it somewhat easier to
         | read sort of striking the balance between being simple enough
         | (C-like) and abstractions (templates). Readability wise, I
         | would have preferred Julia as that would have been the cleanest
         | explanation of the code through the different topics at
         | sufficient level of detail. Alas, it seems to have stalled
         | somewhat (relatively speaking). It also doesn't help that every
         | Rust article has Rust advocates jumping on you with "WAHT ABOUT
         | SAFETY" which is off-putting since not every program needs to
         | deal with all the complexity of the Rust memory/lifetime model
         | to enable safety (simple bounds checking etc. might be
         | sufficient).
        
           | fpoling wrote:
           | The article is about how to squeeze absolute maximum in
           | performance from the hardware. So you need a language that
           | allows a relatively low-level access to hardware. And big
           | plus of Rust is that it has standard SIMD libraries while C++
           | will get them only in C++26
        
         | User23 wrote:
         | Fun fact: a lot of that "sciency pseudocode" is actually Algol.
        
           | mananaysiempre wrote:
           | Because that's the original problem statement for Algol! As
           | in, we've been publishing all that pseudocode for quite a bit
           | and it seems like conventions have emerged, let's formalize
           | them and program in that. ... People were markedly more naive
           | back then, but then that's sometimes what it takes.
        
       | gorset wrote:
       | The number of unique int32 values is not that great, and a full
       | bitset would only be 512MB. Hard to bit a dense bitset in
       | performance.
       | 
       | As a general purpose data structure, I would look at roaring
       | bitmaps for this purpose, which has good trade-offs with great
       | performance for most use-cases.
       | 
       | If only simple lookups are needed over a static set, then it's
       | worth looking at minimal perfect hash functions
       | (https://sux.di.unimi.it).
        
         | curiouscoding wrote:
         | Hmm, bitmaps is an interesting idea! If the data is dense
         | enough, then yeah I guess a quick linear scan would work.
         | 
         | There's also the extreme of simply storing the answer for each
         | possible query as a u32 and just index the array, but there the
         | overhead is much larger.
         | 
         | Rank-select is also interesting, but I doubt it comes anywhere
         | close in performance.
         | 
         | I happen to also be working on a minimal perfect hashing
         | project that has way higher throughput than other methods
         | (mostly because batching), see
         | https://curiouscoding.nl/posts/ptrhash-paper/ and the first
         | post linked from there.
        
           | gorset wrote:
           | Nice work! I love the care and discussion around low level
           | optimizations, as such things are often ignored.
           | 
           | There are a lot of interesting variants of rank/select and
           | other succinct data structures which has interesting
           | tradeoffs. Maybe most useful when compression is critical. At
           | least my own data structures can't compete with the query
           | performance you are showing, but they are great when the
           | memory footprint matters.
        
       | jules wrote:
       | Nice post. Would the larger amount of code result in different
       | performance in a scenario where other code is being run as well,
       | or would the instruction cache be large enough to make this a
       | non-issue?
        
       | nielsole wrote:
       | I've never had the use case for this amount of throughput. What
       | are use-cases for this. From the website I presume this is gene-
       | matching related?
        
         | curiouscoding wrote:
         | Jup. I've added to the future work section now that the plan is
         | to use this to speed up suffix array searching. Suffix arrays
         | are pretty regularly used to build indices over say a human
         | genome, that can then be queried.
         | 
         | And since DNA sequencers are still increasing there throughput
         | and accuracy, it's always good to have faster algorithms as
         | well.
        
       | jokoon wrote:
       | so a tree when the data cannot be updated?
       | 
       | What is the point?
        
         | curiouscoding wrote:
         | I've added a little motivation section on this :)
         | 
         | The final goal is to index DNA, say a human genome, or a bunch
         | of them, and this is static data.
         | 
         | Then as new DNA comes in (eg is read by a DNA sequencer) we can
         | efficiently query against the index built on the reference
         | genome, and this reference is often fixed.
        
       | xpe wrote:
       | I predict that more and more of these kinds of optimized
       | algorithms will be designed / written / assisted by machines.
       | 
       | I wonder when the next such malicious algorithm(s) passes under
       | our noses, and what its hidden purpose might be.
        
       ___________________________________________________________________
       (page generated 2025-01-01 23:00 UTC)