[HN Gopher] The Future of Memory
___________________________________________________________________
The Future of Memory
Author : PaulHoule
Score : 52 points
Date : 2024-01-20 13:33 UTC (1 days ago)
(HTM) web link (semiengineering.com)
(TXT) w3m dump (semiengineering.com)
| ksec wrote:
| Not a single word on the actual BOM cost of DRAM. I could only
| wish we have technology to make the current $1/GB sustainable and
| profitable.
| PaulHoule wrote:
| Isn't it a chronic problem (since at least the 1980s) that
| memory has cycles of gluts and shortages?
| ksec wrote:
| Yes. But but I am referring to actual production cost of
| DRAM, not its selling price which goes through boom and bust
| cycle. BOM cost has been pretty much constant for the past
| 10- 15 years. Although one could argue if we had adjusted for
| inflation it has still gotten a bit cheaper.
| cmrdporcupine wrote:
| At the same time that RAM costs have plateau'd and seem to
| be going up, non-volatile storage prices and speeds have
| been getting better and better.
|
| We need more software engineering progress on paging &
| persistent storage systems.
| spintin wrote:
| Above 64GB you need registered RAM increasing the latency.
|
| So as you increase bandwidth you reduce program speed.
|
| Higher frequency results in more heat.
|
| We are fast approaching the need for a Wii like Broadway
| architecture where the program is running in "fast" SRAM and the
| data is on "slow" DDR.
| lmz wrote:
| The 7800X3D already has 96MB of L3. Surely that's enough for a
| lot of programs.
| spintin wrote:
| Yes, but the cost in terms of watt and manufacturing are not
| scalable.
|
| Also most programs have cache-misses.
| _a_a_a_ wrote:
| Why not have a 'pin-to-cache' functionality then.
| smolder wrote:
| What would it do when you pin more things than your cache
| can hold? Trigger an interrupt? It basically becomes
| another memory layer you'd need to manage.
| _a_a_a_ wrote:
| Well duh. How is using SRAM going to be any different
| when you run out of that?
| smolder wrote:
| I was not here arguing in favor of explicitly tiered
| memory. The implied answer to your original question,
| "why not have a pin to cache functionality?" is that it's
| effectively the same as having another OS managed memory
| layer, which is _bad_ since it complicates the
| architecture. I 'll take some cache misses over having to
| manage it explicitly.
| AnthonyMouse wrote:
| Not only that, if you had enough cache to fit everything
| then there wouldn't be cache misses, and if you didn't,
| cache misses are pretty unavoidable.
|
| It's like the existing APIs for pining things in memory
| so they can't get paged out. They have very specific uses
| and normal programs generally don't use them and
| shouldn't.
| kimixa wrote:
| Much of the cache "management" can be done with
| specialist load/store instructions that skip the cache
| rather than being OS managed like a mapping.
| foobiekr wrote:
| They certainly have this. A lot of embedded boot loaders
| run entirely from cache until they can bring main memory
| up and check it.
| anonymousDan wrote:
| Sorry can you explain what registered RAM is and why it
| increases latency?
| kimixa wrote:
| Registered memory has a buffer for communication between the
| dram and the memory controller. So the DDR bus is attached to
| an intermediate buffer chip, rather than directly to the dram
| chips on the DIMM.
|
| This can give better electrical characteristics of the bus,
| as the buffer chip to the DIMM connector can have simplified
| routing and higher power signaling without putting more load
| on the DRAM chips, and the buffer chip design being focused
| on this interface signaling rather than compromising between
| that and the actual DRAM cells.
|
| It's a bit more expensive, being an extra chip on each DIMM,
| and has a latency penalty, as the buffer chip means
| everything on the DDR bus is effectively 1 clock behind what
| the DRAM chips themselves provide. But it's often necessary
| if you have a large number of DIMMs on a single channel or
| very long traces required for packing lots of DIMMs around a
| CPU, as that increases the electrical capacitance and noise
| of each path, which many DRAM chips can struggle to drive,
| especially at higher speeds.
|
| As dram chip density increases you can get higher capacities
| without the longer bus traces and more DIMMs per channel that
| might require registered ram, there's nothing "fundamental"
| about 64gb needing registered ram, and you are already seeing
| 48gb DDR5 DIMMs that can work on consumer platforms, which
| often have no issues running 4 DIMMs without registered ram.
| ilaksh wrote:
| I wonder if it's possible to design the next AI systems along
| with the hardware at the same time. For example, maybe by
| focusing on more approaches like mixture of experts or similar,
| there are ways to keep much of the data close to the cores that
| operate on it.
| CyberDildonics wrote:
| That's called CPU cache. It doesn't require "mixture of
| experts" (whatever that would mean) it just needs transistors
| for SRAM.
| ilaksh wrote:
| That's one example of the more general category of what I am
| talking about. But I was trying to get just a little more
| specific.
| CyberDildonics wrote:
| Can you give another example and explain how "mixture of
| experts" gets data closer to a CPU?
___________________________________________________________________
(page generated 2024-01-21 23:01 UTC)