[HN Gopher] Exploring How Cache Memory Works
___________________________________________________________________
Exploring How Cache Memory Works
Author : imadj
Score : 98 points
Date : 2024-06-21 18:04 UTC (5 days ago)
(HTM) web link (pikuma.com)
(TXT) w3m dump (pikuma.com)
| emschwartz wrote:
| In a similar vein, Andrew Kelly, the creator of Zig, gave a nice
| talk about how to make use of the different speeds of different
| CPU operations in designing programs: Practical Data-Oriented
| Design https://vimeo.com/649009599
| wyldfire wrote:
| Drepper's "What Every Programmer Should Know About Memory" [1] is
| a good resource on a similar topic. Not so long ago, there was an
| analysis done on it in a series of blog posts [2] from a more
| modern perspective.
|
| [1] https://people.freebsd.org/~lstewart/articles/cpumemory.pdf
|
| [2] https://samueleresca.net/analysis-of-what-every-
| programmer-s...
| seany62 wrote:
| Super interesting. Thank you!
| eikenberry wrote:
| In case you are wondering about your cache-line size on a Linux
| box, you can find it in sysfs.. something like..
| cat /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size
| Hello71 wrote:
| grep .
| /sys/devices/system/cpu/cpu*/cache/index*/coherency_line_size
|
| would be better, but lscpu -C
|
| is more useful.
| eikenberry wrote:
| Didn't know about 'lscpu -C'.. thanks!
| dangoldin wrote:
| Really cool stuff and a nice introduction but curious how much
| modern compilers do for you already. Especially if you shift to
| the JIT world - what ends up being the difference between code
| where people optimize for this vs write in a style optimized
| around code readability/reuse/etc.
| tux1968 wrote:
| JIT compilers can't compensate for poorly organized data.
| Ultimately, understanding these low-level concepts, affect
| high-level algorithm design and selection.
|
| Watching the Andrew Kelly video mentioned above, really drives
| home the point that even if your compiler automatically
| optimizes structure ordering, to minimize padding and alignment
| issues, it can't fix other higher-level decisions. An example
| being, using two separate lists of structs to maintain their
| state data, rather than a single list with each struct having
| an enum to record its state.
| kllrnohj wrote:
| JIT languages tend to have the worst language-provided locality
| as they are often accompanied by GCs and lack of value types
| (there are exceptions to this, but it's broadly the case). And
| a JIT cannot re-arrange heap memory layout of objects as it
| must be hot-swappable. This is why despite incredibly huge
| investments in them such languages just never reach aot
| performance despite how much theoretical advantage a jit could
| have.
|
| AOT'd languages _could_ re-arrange a struct for better locality
| however the majority (if not all) languages rigidly require the
| fields are laid out in the order defined for various reasons.
| hinkley wrote:
| Wait wait wait.
|
| M2 processors have 128 byte wide cache lines?? That's a big deal.
| We've been at 64 bytes since what, the Pentium?
| monocasa wrote:
| Yeah, 64 bytes is kind of an unstated x86 thing. It'd be hell
| for them to change that, a lot of perf conscious code aligns to
| 64 byte boundaries to combat false sharing.
| kllrnohj wrote:
| all ARM-designed cores are also 64-bytes. It's not _just_ an
| x86 thing
| monocasa wrote:
| The Cortex A9 had 32 byte cache lines for one prominent
| counterexample.
|
| But my point was more that the size is baked into x86 in a
| pretty deep way these days. You'd be looking at new
| releases from all software that cares about such things on
| x86 to support a different cache line size without major
| perf regressions. So all of the major kernels, probably the
| JVM and CLR, game engines (and good luck there).
|
| IMO Intel should stick a "please query the size of the
| cache line if you care about it's length" clause into APX,
| to push code today to stop #defining CACHE_LINE_SIZE (64)
| on x86.
| jcranmer wrote:
| > IMO Intel should stick a "please query the size of the
| cache line if you care about it's length" clause into
| APX, to push code today to stop #defining CACHE_LINE_SIZE
| (64) on x86.
|
| CPUID EAX=1, bits 8-15 (i.e., second byte) of EBX in the
| result tell you the cache line size. It's been there
| since Pentium 4, apparently.
|
| You can also get line size for each cache level with
| CPUID EAX=4, along with the set-associativity and other
| low-level cache parameters.
| kllrnohj wrote:
| > The Cortex A9 had 32 byte cache lines for one prominent
| counterexample.
|
| Ok, all arm-designed cores for the last 15 years then :)
| 201984 wrote:
| Some Cortex-A53s have 16-byte cachelines, which I found out
| the hard way recently.
| CyberDildonics wrote:
| In practicality intel CPUs have pulled down 128 bytes at a
| minimum when you access memory for a very long time.
|
| 64 byte cache lines are there an part of other alignment
| boundaries for things like atomics, but accessing memory pull
| down two cache lines at time.
| boshalfoshal wrote:
| I think cache coherency protocols are less intuitive and less
| talked about when people discuss about caching, so it would be
| nice to have some discussion on that too.
|
| But otherwise this is a good general overview of how caching is
| useful.
| branko_d wrote:
| Why is the natural alignment of structs equal to the size of
| their largest member?
| kllrnohj wrote:
| To ensure that member is itself still aligned properly in
| "global space". The start of the struct is assumed to be
| universally aligned (malloc, etc.. make that a requirement in
| fact) or aligned for the requirements of the struct itself (eg,
| array). Thus any offset into the struct only needs to be
| aligned to the requirements of the largest type.
|
| https://www.kernel.org/doc/html/latest/core-api/unaligned-me...
| has a lot more general context on alignment and why it's
| important
| jcranmer wrote:
| It's not. It's equal the maximum alignment of their members.
| For primitive types (like integers, floating-point types and
| pointers), size == alignment on most machines nowadays
| (although on 32-bit machines, it can be a toss-up whether a
| 64-bit integer is 64-bit aligned or 32-bit aligned), so it can
| look like it's based on size though.
| ThatNiceGuyy wrote:
| Great article. I have always had an open question in my mind
| about struct alignment and this explained it very succinctly.
___________________________________________________________________
(page generated 2024-06-26 23:00 UTC)