[HN Gopher] The Future of Memory
       ___________________________________________________________________
        
       The Future of Memory
        
       Author : PaulHoule
       Score  : 52 points
       Date   : 2024-01-20 13:33 UTC (1 days ago)
        
 (HTM) web link (semiengineering.com)
 (TXT) w3m dump (semiengineering.com)
        
       | ksec wrote:
       | Not a single word on the actual BOM cost of DRAM. I could only
       | wish we have technology to make the current $1/GB sustainable and
       | profitable.
        
         | PaulHoule wrote:
         | Isn't it a chronic problem (since at least the 1980s) that
         | memory has cycles of gluts and shortages?
        
           | ksec wrote:
           | Yes. But but I am referring to actual production cost of
           | DRAM, not its selling price which goes through boom and bust
           | cycle. BOM cost has been pretty much constant for the past
           | 10- 15 years. Although one could argue if we had adjusted for
           | inflation it has still gotten a bit cheaper.
        
             | cmrdporcupine wrote:
             | At the same time that RAM costs have plateau'd and seem to
             | be going up, non-volatile storage prices and speeds have
             | been getting better and better.
             | 
             | We need more software engineering progress on paging &
             | persistent storage systems.
        
       | spintin wrote:
       | Above 64GB you need registered RAM increasing the latency.
       | 
       | So as you increase bandwidth you reduce program speed.
       | 
       | Higher frequency results in more heat.
       | 
       | We are fast approaching the need for a Wii like Broadway
       | architecture where the program is running in "fast" SRAM and the
       | data is on "slow" DDR.
        
         | lmz wrote:
         | The 7800X3D already has 96MB of L3. Surely that's enough for a
         | lot of programs.
        
           | spintin wrote:
           | Yes, but the cost in terms of watt and manufacturing are not
           | scalable.
           | 
           | Also most programs have cache-misses.
        
             | _a_a_a_ wrote:
             | Why not have a 'pin-to-cache' functionality then.
        
               | smolder wrote:
               | What would it do when you pin more things than your cache
               | can hold? Trigger an interrupt? It basically becomes
               | another memory layer you'd need to manage.
        
               | _a_a_a_ wrote:
               | Well duh. How is using SRAM going to be any different
               | when you run out of that?
        
               | smolder wrote:
               | I was not here arguing in favor of explicitly tiered
               | memory. The implied answer to your original question,
               | "why not have a pin to cache functionality?" is that it's
               | effectively the same as having another OS managed memory
               | layer, which is _bad_ since it complicates the
               | architecture. I 'll take some cache misses over having to
               | manage it explicitly.
        
               | AnthonyMouse wrote:
               | Not only that, if you had enough cache to fit everything
               | then there wouldn't be cache misses, and if you didn't,
               | cache misses are pretty unavoidable.
               | 
               | It's like the existing APIs for pining things in memory
               | so they can't get paged out. They have very specific uses
               | and normal programs generally don't use them and
               | shouldn't.
        
               | kimixa wrote:
               | Much of the cache "management" can be done with
               | specialist load/store instructions that skip the cache
               | rather than being OS managed like a mapping.
        
               | foobiekr wrote:
               | They certainly have this. A lot of embedded boot loaders
               | run entirely from cache until they can bring main memory
               | up and check it.
        
         | anonymousDan wrote:
         | Sorry can you explain what registered RAM is and why it
         | increases latency?
        
           | kimixa wrote:
           | Registered memory has a buffer for communication between the
           | dram and the memory controller. So the DDR bus is attached to
           | an intermediate buffer chip, rather than directly to the dram
           | chips on the DIMM.
           | 
           | This can give better electrical characteristics of the bus,
           | as the buffer chip to the DIMM connector can have simplified
           | routing and higher power signaling without putting more load
           | on the DRAM chips, and the buffer chip design being focused
           | on this interface signaling rather than compromising between
           | that and the actual DRAM cells.
           | 
           | It's a bit more expensive, being an extra chip on each DIMM,
           | and has a latency penalty, as the buffer chip means
           | everything on the DDR bus is effectively 1 clock behind what
           | the DRAM chips themselves provide. But it's often necessary
           | if you have a large number of DIMMs on a single channel or
           | very long traces required for packing lots of DIMMs around a
           | CPU, as that increases the electrical capacitance and noise
           | of each path, which many DRAM chips can struggle to drive,
           | especially at higher speeds.
           | 
           | As dram chip density increases you can get higher capacities
           | without the longer bus traces and more DIMMs per channel that
           | might require registered ram, there's nothing "fundamental"
           | about 64gb needing registered ram, and you are already seeing
           | 48gb DDR5 DIMMs that can work on consumer platforms, which
           | often have no issues running 4 DIMMs without registered ram.
        
       | ilaksh wrote:
       | I wonder if it's possible to design the next AI systems along
       | with the hardware at the same time. For example, maybe by
       | focusing on more approaches like mixture of experts or similar,
       | there are ways to keep much of the data close to the cores that
       | operate on it.
        
         | CyberDildonics wrote:
         | That's called CPU cache. It doesn't require "mixture of
         | experts" (whatever that would mean) it just needs transistors
         | for SRAM.
        
           | ilaksh wrote:
           | That's one example of the more general category of what I am
           | talking about. But I was trying to get just a little more
           | specific.
        
             | CyberDildonics wrote:
             | Can you give another example and explain how "mixture of
             | experts" gets data closer to a CPU?
        
       ___________________________________________________________________
       (page generated 2024-01-21 23:01 UTC)