[HN Gopher] How we tracked down a Go 1.24 memory regression
       ___________________________________________________________________
        
       How we tracked down a Go 1.24 memory regression
        
       Author : gandem
       Score  : 99 points
       Date   : 2025-07-17 20:02 UTC (2 days ago)
        
 (HTM) web link (www.datadoghq.com)
 (TXT) w3m dump (www.datadoghq.com)
        
       | nitinreddy88 wrote:
       | I am more interested to learn about Swiss tables than bug fix :)
       | 
       | What are the best places to learn modern implementations of
       | traditional data structures. Many of these utilise SIMD for last
       | mile usage of modern hardware
        
         | skavi wrote:
         | could read one of the implementations. there's the original
         | abseil implementation and rust's in the hashbrown crate.
         | probably many more.
        
         | gandem wrote:
         | OP here, I wrote another blog post that explains how Swiss
         | Tables work, see https://news.ycombinator.com/item?id=44597562
        
         | woadwarrior01 wrote:
         | I'd recommend reading the Swiss table design notes[1] in the
         | Abseil documentation. You might also like F14 maps[2] from
         | Folly.
         | 
         | [1]: https://abseil.io/about/design/swisstables
         | 
         | [2]: https://engineering.fb.com/2019/04/25/developer-tools/f14/
        
         | SkiFire13 wrote:
         | In addition to this comment's siblings resources, I also
         | suggest this really good Cppcon presentation on Swisstable
         | https://www.youtube.com/watch?v=ncHmEUmJZf4
        
       | neuroelectron wrote:
       | Great write up. It almost made me miss my old DevOps job.
        
       | dh2022 wrote:
       | I am somewhat surprised to see the bucket memory layout which is:
       | [k1/v1],[k2,v2],[k3/v3] etc. where k1,k2,k3 are keys and v1,v2,v3
       | are values. The CPU cache will not contain more than one [k,v]
       | pair - because the CPU cache line is about 64 bytes and the size
       | of [k,v] pair was about 56 bytes.
       | 
       | So iterating through the bucket looking for a key will require
       | each iteration to fetch the next [k,v] pair from RAM.
       | 
       | Compare this with the following layout: k1,k2,k3,... followed by
       | v1,v2,v3. Looking up the first key in the bucket will end up
       | loading at least one more key in the CPU cache-line. And this
       | should make iterations faster.
       | 
       | The downside of this approach is if the lookup almost all the
       | time results in the first key in the bucket. Then
       | [k1,v1],[k2,v2],k3,v3] packing is better-because the value is
       | also in the CPU cache line .
       | 
       | I am wondering if people on this forum knowvmore about this
       | trade-off. Thanks!!
        
       ___________________________________________________________________
       (page generated 2025-07-19 23:00 UTC)