[HN Gopher] Why Aren't We Sieve-Ing?
       ___________________________________________________________________
        
       Why Aren't We Sieve-Ing?
        
       Author : xyzzy_plugh
       Score  : 219 points
       Date   : 2024-01-08 13:39 UTC (9 hours ago)
        
 (HTM) web link (brooker.co.za)
 (TXT) w3m dump (brooker.co.za)
        
       | kevindamm wrote:
       | Better analysis here than in the SIEVE paper, imho.
       | 
       | Ultimately, the right eviction policy depends on the application
       | behavior, and you'll only find the right answer through profiling
       | against actual data from service logs... and the difference
       | between miss rates isn't large so I really don't think it should
       | even be the first thing that gets profiled.. use LRU or LFU
       | (based on intuition about usage) and move on. But I do love to
       | see newly proposed algorithms, more tools for the bag.
        
         | remram wrote:
         | This is one thing you can probably easily test on historical
         | data. In fact I wonder if there is an automated tool/framework
         | that can consume some structured log of accessed keys to select
         | and tune the optimal policy.
        
           | kevindamm wrote:
           | Easy to measure, sure, but that's different than impact on
           | performance. You're much more likely to get a bigger win from
           | analyzing database schema, divide-and-conquering the
           | workload, tuning the algorithms in the critical path, or even
           | choosing _what_ gets cached. If tuning the cache eviction is
           | the main thing it is probably because that's the only thing.
        
           | danbruc wrote:
           | This should be doable. On each access append the key to a
           | list of recent keys which would also not be terrible
           | expensive if the keys are much smaller than the values. Ever
           | so often you can then look through the list of recent keys
           | and analyze the access patterns and maybe change the eviction
           | algorithm.
           | 
           | But I think this might also not be too helpful overall. I
           | think in many cases there is an obvious choice depending on
           | how and for what the cache is used. And then there are caches
           | that have more unpredictable access patterns, a file system
           | cache might be a good candidate, experiencing the access
           | patterns of all the different applications.
           | 
           | In that case there is probably no best choice as several
           | applications with different access patterns concurrently hit
           | the cache. Or if there is a best choice, it might change very
           | quickly as different applications or components of
           | applications execute. In such a scenario it seems more
           | promising to me to try to categorize the access patterns on a
           | per source basis and treating them accordingly instead of
           | trying to find one best policy for all sources combined.
           | 
           | On the other hand something like this might quickly become
           | prohibitively expensive.
        
           | NovaX wrote:
           | The problem with that approach is not only the time and
           | effort to capture and analyze the trace, but that the
           | workload changes over the application's lifetime. When the
           | workload is expensive and important then cache analysis is
           | worth doing, and there are cache simulators that are pretty
           | easy to work with.
           | 
           | For the general case the ideal is an eviction policy that
           | adapts to fit the given workload. Unfortunately adaptivity is
           | not a very well explored area so most policies have heuristic
           | settings that default to work well in a target range of
           | workloads, but this has to be tuned for others. In this and
           | the author's other paper, S3-FIFO, their lower hit rates with
           | TinyLFU turned out to be due to the implementation. When
           | running it with Caffeine, a productionized version that adds
           | hill climbing and jitter, it had equivalent hit rates [1].
           | That cache is pretty robust across a wide range of workloads,
           | but also much more complicated to implement. I'd really like
           | to see more research on adaptivity because I think all
           | policies are converging on similar structures and goals, but
           | self-tuning is still an error prone art.
           | 
           | [1] https://github.com/Yiling-J/theine/issues/21#issuecomment
           | -18...
        
         | TheCondor wrote:
         | In college we included the optimal case in understanding
         | caching. It's a very worthwhile process. Sieve's claim is
         | simplicity and lower eviction latency. That should be measured
         | against hits. A tiny fraction more hits will almost always
         | erase the differences between various eviction algorithms.
         | Depending on cache size and the size of what you're caching,
         | LRU can add a nontrivial amount of memory.
         | 
         | It is promising to see more experimenting with these
         | algorithms.
        
           | titzer wrote:
           | I mostly agree with this, except for hardware (CPU) caches,
           | hit latency is absolutely critical, and all logic must be
           | implemented with circuits. On top of the latency requirement,
           | organizing the data structures and handling coherency are
           | really tough design problems.
        
         | contravariant wrote:
         | If you don't want to think about it, use LRU.
         | 
         | The optimal algorithm can _at best_ be twice as good as LRU
         | using half the memory. And yes that includes an algorithm that
         | knows _exactly_ which memory is going to be requested. If LRU
         | is not enough you need to be clever or use more memory.
         | 
         | Something like LFU has no such guarantees. Just read a big
         | block of data 10 times and you can get close to 100% cache
         | misses just by continuing to read random bits of data 10 times
         | after each other. LRU would only have 10%, and I think that's
         | optimal.
        
       | rpigab wrote:
       | So I guess SIEVE stands for SImple EViction? Couldn't find the
       | definition. Because it's all caps, I thought it was an apronym.
        
         | waynecochran wrote:
         | I thought it was inspired by the Sieve of Eratosthenes which
         | scans for composite numbers.
        
           | HankB99 wrote:
           | I thought the article might be discussing the impact of CPU
           | caches on the Sieve of Eratosthenese when used as a processor
           | benchmark. I'm pretty sure I thought wrong.
           | 
           | Next I thought it might be an algorithm for managing the CPU
           | caches, Wrong again (I think.)
           | 
           | I think it's a discussion of managing application caches or
           | perhaps things like disk caches. Not sure if I'm right.
        
             | bee_rider wrote:
             | It is a cache eviction algorithm. Discussed recently here:
             | 
             | https://news.ycombinator.com/item?id=38850202
        
         | tromp wrote:
         | The acronym is sort of suggested by this sentence in the
         | Introduction of their paper [1]:
         | 
         | > The new algorithm is called SIEVE: a simple and efficient
         | turn-key cache eviction policy
         | 
         | [1] https://junchengyang.com/publication/nsdi24-SIEVE.pdf
        
         | wgjordan wrote:
         | I don't think it's an acronym, it just seems to be a convention
         | in the literature to capitalize eviction-algorithm names (cf.
         | "CLOCK").
         | 
         | According to the footnote in the paper where the name is first
         | mentioned, "SIEVE sifts out unpopular objects from cache over
         | time".
        
       | Aardwolf wrote:
       | Not really on topic, but question about papers: when viewing the
       | linked paper, at the top it says title and names of authors and
       | universities, but no date. The way to guestimate the age is to
       | scroll to the bottom to the references and see the most recent
       | date visible in the references. But of course that method is not
       | fool proof if the references happen to be all older.
       | 
       | Putting author names but no date at the top seems to be some sort
       | of standard template or convention that every single paper author
       | or institution everywhere uses since forever, since I never see
       | dates at tops of papers.
       | 
       | Why is date not part of the standard template? Knowing the date
       | is so incredibly important to know the paper in context, and in
       | relation to other science. Especially for science I just don't
       | understand why they don't have a convention to date their stuff.
       | 
       | ---
       | 
       | Anyway, why I was interested in the date was because the article
       | said "Why aren't we all SIEVE-ing?". Well turns out the paper is
       | from 2023 or newer, so that could explain why.
        
         | paulluuk wrote:
         | I never noticed this before, but you're right, it is very
         | weird!
        
         | junon wrote:
         | I was thinking this same thing the other day when reading a
         | paper. Seems like a critical piece of information that is often
         | left out.
        
         | xyst wrote:
         | I just use the publication date.
         | 
         | Just looking at NSDI '23, I can see the first published paper
         | in the Spring:
         | https://www.usenix.org/conference/nsdi23/presentation/liu-ti...
         | 
         | Looking at the BibTex metadata can give you a better idea of
         | how recent/relevant the paper is:
         | 
         | year = {2023}, ... month = apr
         | 
         | But since this is not yet published (to appear in 2024 of
         | NSDI), I use a similar method you describe. Most of the time,
         | can also infer from the paper itself (sometimes in the abstract
         | or the discussion)
        
         | xyzzy_plugh wrote:
         | > Well turns out the paper is from 2023 or newer, so that could
         | explain why.
         | 
         | TFA and the paper both wonder why this extremely simple
         | algorithm was not discovered sooner. Both posit that no paper
         | should be prerequisite to applying this modification.
         | 
         | The question in TFA is less about the paper and more about the
         | concept -- is there some reason why this has evaded us? Is it
         | too good to be true?
         | 
         | The article does a better job than the paper digging in to the
         | details IMO.
        
           | ImPostingOnHN wrote:
           | Asking why humanity didn't invent an invention earlier, given
           | that we had all the prerequisites, is kind of a banal
           | question. Why didn't we invent cars sooner? Why didn't we
           | invent the internet sooner? We had all the prerequisites, or
           | else they wouldn't have been invented. Why didn't we invent
           | them _faster_?
           | 
           | The answer to all these questions is that having the
           | prerequisites for inventing something doesn't mean we
           | instantly invent that thing, a la Sid Meier's Civilization.
           | The expectation that we would is actually what's kind of odd
           | here. Humans doesn't even all share the same info, much less
           | automatically know all the possibilities of all combinations
           | of all humanity's knowledge.
           | 
           | Another factor might be that Yet Another Eviction Algorithm
           | isn't necessarily one of humanity's top focuses for
           | inventions at the moment. We could ask any of us here on HN,
           | for example, why _we_ didn 't think of it, and sooner. The
           | answer is likely that we were all working on something else
           | we found more important than YAEA.
        
             | xyzzy_plugh wrote:
             | That banal question is in the original paper. The paper
             | spends significant time explaining that this because this
             | is such a simple modification they find it difficult to
             | believe it's been overlooked until now.
             | 
             | The article's question "why aren't we SIEVE-ing?" follows
             | this theme.
             | 
             | No one is asking the equivalent of "why didn't we invent
             | cars sooner?" here.
        
               | ImPostingOnHN wrote:
               | I don't doubt that the banal question in question, is in
               | the original paper. My point was how banal the question
               | was, and how easily answered it is, not where it was.
               | 
               | Many inventions in humanity were "simple modifications"
               | of other things. Why didn't we invent them _sooner_? The
               | answer is in the post you replied to: because it 's
               | totally normal for "obvious" inventions to not be
               | invented for a long time, even when somebody thinks,
               | after the fact, that it's a "simple modification".
               | Indeed, what percentage of humanity is currently
               | searching for Yet Another Eviction Algorithm, and what
               | percentage of that percentage is looking in this
               | direction, instead of one of the hundreds of other
               | directions?
               | 
               | So yes, the question is indeed equivalent to _" why
               | didn't we invent [X] sooner"_, replacing X with any
               | invention that at least 1 person out of billions in the
               | world finds "obvious" or a "simple modification".
        
               | xyzzy_plugh wrote:
               | Not everything is an invention.
               | 
               | Do you have any routines that you've refined over the
               | years? Is every step you take in a routine something you
               | were taught or learned elsewhere? Or did you "invent"
               | part of it yourself?
               | 
               | I know how to make tea or coffee. Should someone take the
               | credit for the fact that I know to start boiling water
               | first, while I get everything else together, because this
               | results in the most efficient use of my time?
               | 
               | My perception of the paper is this analogy isn't too far
               | off. It's not an invention, it's a discovery.
        
               | ImPostingOnHN wrote:
               | Perhaps not everything is an invention, but I think
               | everything we're talking about here is, including
               | algorithms. In any case, that's just semantics: you can
               | call it a "discovery" if you feel better about it,
               | perhaps even run _s /invention/discovery/g_ on my post.
               | 
               | The points therein still stand: [inventions/discoveries]
               | which at least 1 person in the world thinks _(after the
               | fact)_ are obvious and just a  "small modification" of
               | other [inventions/discoveries], usually go
               | [uninvented/undiscovered] for quite some time.
               | 
               | The paper and TFA asks, _" why?"_ But the better question
               | is, _" why wouldn't they?"_ There's no reason to think
               | they _would be_ be immediately [invented /discovered] as
               | soon as all the prerequisites are [invented/discovered]
               | (again, even if someone thinks they're similar), for the
               | reasons described in my above posts.
        
             | shermantanktop wrote:
             | Or: why did we invent X so early, far earlier in history
             | than seems logical given the "tech tree" view we get from
             | sim games?
             | 
             | The fax machine and the electric car come to mind.
        
         | chaosite wrote:
         | Papers usually use publication date, and this is a preprint -
         | it is to be published in NSDI'24, which will be in 3 months
         | from now.
         | 
         | Usually the final paper PDF, from the conference proceedings,
         | contains the publication information.
        
           | Aardwolf wrote:
           | Articles are being written about it now, cool that it's in
           | NSDI'24 in 3 months, but that seems almost irrelevant if
           | we're talking about it now.
           | 
           | Wouldn't the target audience of a preprint also benefit from
           | knowing a rough date?
        
         | jamessb wrote:
         | > Putting author names but no date at the top seems to be some
         | sort of standard template or convention that every single paper
         | author or institution everywhere uses since forever, since I
         | never see dates at tops of papers.
         | 
         | The year usually appears in the bottom left of the first page,
         | and in the header or footer of subsequent pages.
         | 
         | The first linked PDF [1] says only "14th USENIX Symposium on
         | Operating Systems Design and Implementation" (without the
         | year), but does have the year on an initial cover page. The
         | second [2] says "HotOS '23, June 22-24, 2023, Providence, RI,
         | USA". The third [3] says "SOSP '23, October 23-26, 2023,
         | Koblenz, Germany" in the bottom-left of the first page.
         | 
         | However, this information may be missing from an author
         | produced files/preprints, as it may be inserted along with page
         | numbers when the conference proceedings or journal issue gets
         | created. This seems to be the case for the fourth linked PDF
         | [4].
         | 
         | [1]: https://www.usenix.org/system/files/osdi20-yang.pdf
         | 
         | [2]: https://jasony.me/publication/hotos23-qdlp.pdf
         | 
         | [3]: https://dl.acm.org/doi/pdf/10.1145/3600006.3613147
         | 
         | [4]: https://junchengyang.com/publication/nsdi24-SIEVE.pdf
        
           | Aardwolf wrote:
           | Links like
           | https://junchengyang.com/publication/nsdi24-SIEVE.pdf is the
           | one we can actually read though, relying on a third party to
           | put this critical piece of information seems suboptimal,
           | rather than make the paper self-contained.
           | 
           | I noticed that arxiv at least puts the dates in vertical on
           | the left side of the PDFs there, but it seems that too is
           | something arxiv puts there, not the paper itself.
           | 
           | If it's about subtleties like "what is exact publication
           | date", at least put _some_ date, at least let us know it 's
           | from the 2020's or the 2000's, it's not guaranteed to be easy
           | to tell. It should be possible to put a year during which the
           | text is being typed there.
        
         | vidarh wrote:
         | > or newer, so that could explain why.
         | 
         | The title is more raising the question of why an algorithm that
         | performs this decently and _is so simply_ doesn 't seem to have
         | been invented earlier.
        
       | annowiki wrote:
       | This is not really "sieve-ing" per the article, but what prevents
       | me from running another process that periodically queries the
       | data in a cache? Like just running a celery queue in Python that
       | continually checks the cache for out of date information
       | constantly updating it? Is there a word for this? Is this a
       | common technique?
        
         | ImPostingOnHN wrote:
         | You're describing cache maintenance (and cache eviction), a
         | practice for which there are many algorithms (FIFO, LRU, LFU,
         | etc.), including the algorithm the article describes (SIEVE)
        
           | aleksiy123 wrote:
           | I think this is orthogonal to cache maintenance and cache
           | eviction. Instead this is having a background process
           | periodically refreshing the data in the cache to keep it hot.
        
             | ImPostingOnHN wrote:
             | Refreshing the cache to keep it hot, and deciding how to do
             | it, e.g. which parts do we do with the caching layer
             | directly, which parts do we do with an external process,
             | what to evict to make room, etc, are subtopics of cache
             | management.
             | 
             | If I understand you correctly, you're asking if this is
             | different because an external process is involved. I don't
             | see a use in drawing a distinction, and as far as I know,
             | there's no special term for that pattern.
             | 
             |  _Update: after looking into it, it looks like this cache
             | /architecture pattern is called "refresh-ahead"_
        
         | leoqa wrote:
         | I think this is not as simple, because to achieve good metrics
         | (latency, cache hit) you will need to be predicting the actual
         | incoming query load, which is quite hard. Letting the query
         | load itself set the values is the state of the art.
         | 
         | In some ways, caching can be seen a prediction problem. And the
         | cache hit rate is the error as we lag the previous history at
         | time T. Blending load over time is effectively what these
         | various cache algorithms do to avoid overfitting.
        
         | aleksiy123 wrote:
         | If you have an idea of what you need to cache or can fit
         | everything into the cache it's extremely effective.
         | 
         | Tho potentially just refreshing out of date data in the cache
         | could increase effectiveness given that general assumption of
         | the cache is whats in the cache will probably be used again.
         | 
         | I called it a periodically refreshing cache when I wrote one.
         | Not sure if there is a more formal name.
        
         | toast0 wrote:
         | You might call that prefetching. That's what unbound calls it
         | when it returns a near expired cached entry to a client and
         | also contacts upstream to update its cache. I remember having a
         | similar option in squid, but it might have been only in an
         | employer's branch (there were a lot of nice extensions that
         | unfortunately didn't make it upstream)
        
       | onetoo wrote:
       | I'm sad that this ends on a cliffhanger, I was excited to see the
       | results for the circular array version. Hopefully I don't lose
       | track of this story when (if) someone who knows what they're
       | doing does that test (I am not such a person, or I'd do it
       | myself).
        
         | vidarh wrote:
         | I haven't tested an actually fast version of this, but I did
         | hack together two versions that used rings of nodes instead of
         | lists. There are two separate issues, IMHO: Rings - be it in an
         | array or using linked nodes - makes the logic even simpler
         | because you don't need to test for the start/end:
         | 
         | https://gist.github.com/vidarh/b72615c16673cb38630cab5148190...
         | 
         | Implementing it using arrays isn't _hard_. You have two obvious
         | options:
         | 
         | One is to _still_ use a linked ring. In that case your array
         | just saves you dynamic allocations and allocator overhead (and
         | if in a gc 'd language, gc pressure).
         | 
         | The other is to copy to compact the right on eviction. Whether
         | that is a horrible idea or might work will depend a great deal
         | on your performance characteristics and the _number of entries_
         | in your cache.
         | 
         | Personally I think the "linked ring in an array" approach is a
         | nice middle ground - you're likely to get better (CPU) cache
         | locality than a naive linked list (whether enough to matter
         | really depends on your implementation languages underlying
         | allocator - and the singly linked version has quite low extra
         | overhead (and it can be made smaller if size is a concern -
         | e.g. it doesn't need to be a pointer but can be an array index)
         | vs. just an array.
        
       | leoc wrote:
       | "We are always getting ready to SIEVE, but never SIEVEing." --
       | Ralph Waldo Emerson. (
       | https://books.google.ie/books?id=MpNaAAAAMAAJ&q=%22We+are+al... )
        
       | gumby wrote:
       | I love this kind of work. I've had a (40+ year old) belief that
       | LRU is probably the right approach, and since I don't pay much
       | attention to cache algos that belief has settled comfortably in
       | the back of my mind.
       | 
       | It's so great to see that this area is subject if theoretical and
       | empirical work and in fact my default view is likely wrong in
       | most cases.
        
       | benlivengood wrote:
       | https://arxiv.org/abs/2301.11886 for storage and CDN caches
       | achieves pretty remarkable improvements over standard heuristics
       | by using common heuristics as a filter for ML decisions. I
       | imagine even TLB caches might achieve some benefit from ML models
       | with batched pre-prediction as done in MAT, but L1-3 are almost
       | certainly too high throughout.
        
       | rdtsc wrote:
       | Original HN discussion about SIEVE
       | https://news.ycombinator.com/item?id=38850202
        
       | tangentstorm wrote:
       | Is this something you can actually configure on modern
       | processors, or is it more hypothetical advice for future chip
       | makers?
        
         | vidarh wrote:
         | No, it's not. For starters because it's new, and second it's
         | not a given it'd be useful for CPUs at all. Caches are used in
         | all kinds of places, not just CPUs. E.g. operating systems,
         | content delivery networks, databases, reverse web caches.
         | There's a huge number of places for different kinds of cache
         | algorithms, and the tradeoffs that works best varies a great
         | deal depending on workloads and requirements (e.g. whether
         | worst case latency, average latecy, or hit rate, or any number
         | of other factors matters most)
        
       | o11c wrote:
       | Wait, what's wrong with removal? I assumed the algorithm was
       | talking about "copy to compact as you go" ...
        
         | vidarh wrote:
         | Copy to compact _might_ be fine if your cache is _small_. The
         | larger the cache is, the more expensive copy to compact gets
         | relative to the rest of your logic. Hence why the original in
         | the paper uses a linked list.
         | 
         | You can still get closer to what an array would buy you. E.g.
         | the example code in the paper uses a doubly linked list and
         | there's reason to, so you can reduce overhead right away there.
         | One of the other benefits of an array is you can wrap around
         | just by using a modulo of an index, but you can avoid the
         | conditionals for their list implementation by linking the list
         | together into a ring instead. Those bits I've implemented in a
         | quick and dirty Ruby version (the singly linked ring is last
         | here:
         | https://gist.github.com/vidarh/b72615c16673cb38630cab5148190...
         | ).
         | 
         | You can then also do a "linked ring in array" version, and use
         | that to both reduce the memory overhead (both from individual
         | node allocations and from the size of the pointers in the
         | linked list) by placing linked list/ring nodes _in_ a
         | statically sized array. You just need to relink the node you
         | evict into where you want to insert. You can save on pointer
         | size if that matters to you as well by using array indices
         | instead of pointers in that case.
         | 
         | Whether any of these _matters_ is something you really should
         | measure in your preferred implementation language.
        
       | shermantanktop wrote:
       | Do people really look for the best possible general-purpose cache
       | eviction approach?
       | 
       | It appears evident that there is no single approach that will
       | work for all workloads (i.e. combinations of access patterns,
       | read/write ratios, scans, processor/cache parameters, etc.).
       | Perhaps I am wrong, and there's a holy grail out there, but...
       | 
       | If there is no single approach, shouldn't the focus be on
       | matching algorithms to workloads? Better selection approaches
       | would seem to yield far more value than yet another boutique
       | algorithm.
        
         | jandrewrogers wrote:
         | The optimal algorithm is not computable. Approximately optimal
         | algorithms are computationally intractable for all practical
         | systems. This is why practical real-world algorithms are all
         | narrowly optimized, it is required to make the algorithm
         | usefully tractable. It can also work relatively well if you can
         | very narrowly define the workload characteristics you need to
         | optimize for.
         | 
         | Cache replacement algorithms are in the family of _universal
         | sequence prediction_ problems. These problems are unfortunately
         | related to the problem of generally optimal AI /learning
         | algorithms, which gives you a sense of why the task is so
         | difficult.
        
           | dist-epoch wrote:
           | What if you kept statistics of what access patterns you
           | encounter every day. A lot of workloads do the same thing day
           | in day out. Some workloads almost never scan, others only
           | scan, it would be a quick win to automatically optimize these
           | cases.
        
             | willcipriano wrote:
             | The JVM feels like it could do that sort of thing.
        
               | shermantanktop wrote:
               | That's pretty much what the self-tuning elements of JVM
               | GCs are doing. I think we might label the overall problem
               | as "non-computable" but we have real-world systems doing
               | it all day long.
        
           | jedberg wrote:
           | > The optimal algorithm is not computable
           | 
           | No but it's easily findable. If you have a group of say 10
           | cache machines, you can set each one to a different eviction
           | algorithm, and then track misses, hits, and eviction rates.
           | You can even use a one-arm bandit algo for the algos if you
           | want to be really thorough.
           | 
           | But the main question would be why? In 99.9% of cases, the
           | cache eviction algorithm is not going to be your bottleneck.
           | So might as well just use LRU and move on to more important
           | problems.
        
             | onetoo wrote:
             | Strictly speaking, that approach would help you find a
             | _heuristically good_ algorithm, but not necessarily the
             | _optimal_ one. Those are very different.
        
               | jedberg wrote:
               | But does it matter to anyone other than an academic? In
               | practice the heuristically good == optimal.
        
               | onetoo wrote:
               | We should use the right words for things.
               | 
               | Insertion sort is heuristically good in many practical
               | circumstances. You wouldn't call it optimal, nor choose
               | to use it as your general-purpose sorting algorithm.
        
             | Jabbles wrote:
             | If you have 1e4 machines and run a global video
             | distribution network it might be worth an engineer's salary
             | to look into it.
             | 
             | Or if you're implementing a feature in the standard library
             | of a widely used language, the effects of your choice may
             | be enormous.
        
               | jedberg wrote:
               | > If you have 1e4 machines and run a global video
               | distribution network it might be worth an engineer's
               | salary to look into it.
               | 
               | Sure, and that would be the 0.1% of the time where it
               | make sense.
               | 
               | > if you're implementing a feature in the standard
               | library of a widely used language
               | 
               | I would contend that unless your library is very
               | opinionated on exactly how data will be accessed, you
               | can't possibly optimize cache access in a way that makes
               | sense for every user.
        
               | onetoo wrote:
               | You can't possibly optimize sorting in a way that makes
               | sense for every user. Let us use insertion sort in our
               | standard library and call it a day. Why are you bothering
               | to discuss general-case sorting algorithms?
        
             | jandrewrogers wrote:
             | > No but it's easily findable
             | 
             | It really isn't. This has been known for things like
             | silicon design since at least the 1980s, and there are a
             | few different seemingly basic properties that make an
             | otherwise simple data model have intrinsically poor cache
             | characteristics. One of the best known is the use of high-
             | dimensionality access methods (e.g. graphics, spatial,
             | graph, etc processing).
             | 
             | The traditional solution to cache replacement
             | intractability is latency hiding architectures (e.g. GPUs,
             | barrel processors, etc) for applications that are mostly
             | about throughput. In fact, this is a reasonable working
             | definition of the difference between latency-oriented and
             | throughput-oriented design (caching versus latency hiding).
             | 
             | > In 99.9% of cases, the cache eviction algorithm is not
             | going to be your bottleneck.
             | 
             | What types of cases are we talking about here? For data
             | intensive applications like database engines, using a naive
             | algorithm like LRU will definitely show up in the profiler
             | unless the rest of your code is similarly naive. You don't
             | notice it for high-throughput workhorse systems because
             | someone else has already spent a lot of time tweaking and
             | optimizing this, using some combination of caching and
             | latency hiding.
        
         | onetoo wrote:
         | Having a strong default starting point is nothing to scoff at.
         | You wouldn't want a standard library to use insertion sort
         | because "no singular approach works optimally in all cases, so
         | why bother trying for good general case?"
        
       | erellsworth wrote:
       | Personally, I haven't been using SIEVE because I didn't know what
       | it was. However, after reading this article I still don't know
       | what it is.
        
         | scottlamb wrote:
         | I know what it is because I saw/skimmed a related article the
         | other day: https://news.ycombinator.com/item?id=38850202
         | 
         | ...but a good part of what this article is discussing is why
         | you don't know what it is; why it hasn't been in textbooks for
         | 40+ years:
         | 
         | > if it's so effective, and so simple, and so closely related
         | to an algorithm that's been around forever, why has nobody
         | discovered it already?
        
       | Dwedit wrote:
       | SIEVE is a terrible name because its name is too similar to the
       | Sieve of Eratosthenes for generating prime numbers.
        
         | pmarreck wrote:
         | Right. You google that word and "algorithm" and you have to
         | minus Eratosthenes for sure
        
       | scotty79 wrote:
       | The idea of using circluar array and copying tail item into hand
       | and moving all pointers left to make room at the head for new
       | item sounds great.
       | 
       | I wonder how messing of the order of items between hand and tail
       | influences the benchmarks.
        
       | neilkk wrote:
       | I'm having a hard time understanding why this is better than
       | CLOCK. It seems like a hack to make updates easier which however
       | makes both practical performance and conceptual justification
       | much worse.
       | 
       | The elements which are between 'hand' and 'tail' are going to
       | stay there for a long time, until the hand pointer cycles. If
       | there are lots of elements which get put in the cache but don't
       | really deserve it, that could take a long time. But that's
       | exactly the case where you wouldn't want to keep the elements
       | behind the hand around for a long time. Remember, they got
       | 'promoted' for appearing twice in around the cache size records.
       | Not a great indication that they should be kept for a long time.
       | On the other hand, when the pointer cycles, they could get thrown
       | away quite easily.
       | 
       | On the other hand CLOCK converges to a situation where elements
       | which have appeared a lot are at the front. They are much less
       | likely to be thrown out even if they have a temporary run of
       | scarcity. The elements which don't appear much quickly bubble to
       | the back and become much more likely to be thrown out in the next
       | few evictions. This is accentuated if there are lots of elements
       | which are getting hit relatively more frequently.
       | 
       | After a long time of running SIEVE, there's not really anything
       | about the state which will improve future performance compared to
       | when it starts cold. In many cases it seems it will be locked in
       | to a suboptimal state by overadapting to past observations.
        
       ___________________________________________________________________
       (page generated 2024-01-08 23:00 UTC)