[HN Gopher] The Memory Wall: Past, Present, and Future of DRAM
       ___________________________________________________________________
        
       The Memory Wall: Past, Present, and Future of DRAM
        
       Author : klelatti
       Score  : 54 points
       Date   : 2024-09-03 07:19 UTC (2 days ago)
        
 (HTM) web link (www.semianalysis.com)
 (TXT) w3m dump (www.semianalysis.com)
        
       | deater wrote:
       | if Sally Mckee, who coined the term "Memory Wall", had a nickel
       | for each time it gets mentioned, she'd have a lot of nickels
        
       | didgetmaster wrote:
       | I think that 'memory wall' is completely different from 'memory
       | hole', but I forget how.
        
       | magicalhippo wrote:
       | Back when I was young, which isn't _that_ far ago, one tried to
       | put as much pre-computed stuff into memory as it was much faster
       | than the CPU. Lookup tables left, right and center.
       | 
       | These days you can do thousands of calculations waiting for a few
       | bytes of memory. And not only is the speed difference getting
       | worse, but memory sizes aren't keeping up.
       | 
       | Guess we're not far away from compressing stuff before putting it
       | in memory is something you'd want to do most of the time. LZ4
       | decompression[1] is already just a factor of a few away from
       | memcpy speed.
       | 
       | [1]: https://github.com/lz4/lz4
        
         | danieldk wrote:
         | On the latter point, macOS had had compression memory for a
         | long time by now and some Linux distributions also use it out
         | of the box (I don't know anything about Windows).
        
           | wongogue wrote:
           | It also has Compressed Memory since a version of Windows 10.
           | You can see it in Task Manager.
        
           | hinkley wrote:
           | One of the time series databases streams compressed blocks in
           | memory when doing searches, doing distinct blocks per core.
           | For some scenarios it's faster to do a table scan than keep
           | extra indexes hot.
        
           | vardump wrote:
           | I think the grandparent was talking about transparent memory
           | bus data compression. CPU would fetch some bag of bytes and
           | decompress to cache.
        
         | jarbus wrote:
         | I think quantization for large language models already do
         | something like this - they compress the parameters in memory
         | and then decompress when performing the forward passes
        
         | hinkley wrote:
         | I was re-reminded of this when watching a performance analysis
         | video that may or may not have been posted here (sometimes I
         | get things here, or reddit, but sometimes the real story is in
         | the related videos). It doesn't take a very big lookup table
         | for it to be faster to rerun the calculations.
         | 
         | Especially when you throw multiprocessing in. We need better
         | benchmarking tools that load up competing workloads in the
         | background so you can tell how your optimization really works
         | in production instead of in your little toy universe in the
         | benchmark.
        
         | DaoVeles wrote:
         | Reminded of Kaze Emanuars work on optimising N64 software.
         | (https://www.youtube.com/channel/UCuvSqzfO_LV_QzHdmEj84SQ)
         | 
         | The RAM was so terrible that essentially you try to keep the
         | processors running in cache for as long as possible. RAM access
         | is painful.
         | 
         | There is a performance profiling tool built into F3DEX3 that
         | now shows that approximately 70% of the time the system is idle
         | while running Zelda OoT. It is just waiting for memory
         | transfers. The folks at SGI/RAMBUS cut corners a little too
         | hard building that system.
         | 
         | But turns out this kind of performance profile is just prep for
         | were we are heading apparently.
        
       | hulitu wrote:
       | What happened to SRAM ?
        
         | Tuna-Fish wrote:
         | Still in wide use in CPU caches. It does not scale well to
         | smaller sizes, and inherently requires more transistors than
         | DRAM, which means it's not cost-effective for main memory.
        
           | DaoVeles wrote:
           | I forgot were I saw it but it was an analysis of cache to die
           | size and how it has effectively stopped scaling. Larger cache
           | just means larger chips now.
           | 
           | It sort of makes sense because SRAM cannot have the same
           | transient nature of a calculations transistor. It has to hold
           | the state for longer than one 3 billionth of a second. So it
           | has to be a little more robust.
           | 
           | This is my intuitive take so it could be completely wrong.
        
         | Dylan16807 wrote:
         | It's huge compared to DRAM. It's worth making a few megabytes
         | of SRAM per core, but not a lot more than that.
        
       ___________________________________________________________________
       (page generated 2024-09-05 23:00 UTC)