[HN Gopher] The Memory Wall: Past, Present, and Future of DRAM
___________________________________________________________________
The Memory Wall: Past, Present, and Future of DRAM
Author : klelatti
Score : 54 points
Date : 2024-09-03 07:19 UTC (2 days ago)
(HTM) web link (www.semianalysis.com)
(TXT) w3m dump (www.semianalysis.com)
| deater wrote:
| if Sally Mckee, who coined the term "Memory Wall", had a nickel
| for each time it gets mentioned, she'd have a lot of nickels
| didgetmaster wrote:
| I think that 'memory wall' is completely different from 'memory
| hole', but I forget how.
| magicalhippo wrote:
| Back when I was young, which isn't _that_ far ago, one tried to
| put as much pre-computed stuff into memory as it was much faster
| than the CPU. Lookup tables left, right and center.
|
| These days you can do thousands of calculations waiting for a few
| bytes of memory. And not only is the speed difference getting
| worse, but memory sizes aren't keeping up.
|
| Guess we're not far away from compressing stuff before putting it
| in memory is something you'd want to do most of the time. LZ4
| decompression[1] is already just a factor of a few away from
| memcpy speed.
|
| [1]: https://github.com/lz4/lz4
| danieldk wrote:
| On the latter point, macOS had had compression memory for a
| long time by now and some Linux distributions also use it out
| of the box (I don't know anything about Windows).
| wongogue wrote:
| It also has Compressed Memory since a version of Windows 10.
| You can see it in Task Manager.
| hinkley wrote:
| One of the time series databases streams compressed blocks in
| memory when doing searches, doing distinct blocks per core.
| For some scenarios it's faster to do a table scan than keep
| extra indexes hot.
| vardump wrote:
| I think the grandparent was talking about transparent memory
| bus data compression. CPU would fetch some bag of bytes and
| decompress to cache.
| jarbus wrote:
| I think quantization for large language models already do
| something like this - they compress the parameters in memory
| and then decompress when performing the forward passes
| hinkley wrote:
| I was re-reminded of this when watching a performance analysis
| video that may or may not have been posted here (sometimes I
| get things here, or reddit, but sometimes the real story is in
| the related videos). It doesn't take a very big lookup table
| for it to be faster to rerun the calculations.
|
| Especially when you throw multiprocessing in. We need better
| benchmarking tools that load up competing workloads in the
| background so you can tell how your optimization really works
| in production instead of in your little toy universe in the
| benchmark.
| DaoVeles wrote:
| Reminded of Kaze Emanuars work on optimising N64 software.
| (https://www.youtube.com/channel/UCuvSqzfO_LV_QzHdmEj84SQ)
|
| The RAM was so terrible that essentially you try to keep the
| processors running in cache for as long as possible. RAM access
| is painful.
|
| There is a performance profiling tool built into F3DEX3 that
| now shows that approximately 70% of the time the system is idle
| while running Zelda OoT. It is just waiting for memory
| transfers. The folks at SGI/RAMBUS cut corners a little too
| hard building that system.
|
| But turns out this kind of performance profile is just prep for
| were we are heading apparently.
| hulitu wrote:
| What happened to SRAM ?
| Tuna-Fish wrote:
| Still in wide use in CPU caches. It does not scale well to
| smaller sizes, and inherently requires more transistors than
| DRAM, which means it's not cost-effective for main memory.
| DaoVeles wrote:
| I forgot were I saw it but it was an analysis of cache to die
| size and how it has effectively stopped scaling. Larger cache
| just means larger chips now.
|
| It sort of makes sense because SRAM cannot have the same
| transient nature of a calculations transistor. It has to hold
| the state for longer than one 3 billionth of a second. So it
| has to be a little more robust.
|
| This is my intuitive take so it could be completely wrong.
| Dylan16807 wrote:
| It's huge compared to DRAM. It's worth making a few megabytes
| of SRAM per core, but not a lot more than that.
___________________________________________________________________
(page generated 2024-09-05 23:00 UTC)