[HN Gopher] JEDEC Extends DDR5 Memory Spec to 8800 MT/S, Adds An...
       ___________________________________________________________________
        
       JEDEC Extends DDR5 Memory Spec to 8800 MT/S, Adds Anti-Rowhammer
       Features
        
       Author : zdw
       Score  : 93 points
       Date   : 2024-04-22 14:51 UTC (8 hours ago)
        
 (HTM) web link (www.anandtech.com)
 (TXT) w3m dump (www.anandtech.com)
        
       | theandrewbailey wrote:
       | > Unfortunately, the laws of physics driving DRAM cells have not
       | improved much over the last couple of years (or decades, for that
       | matter), so memory chips still must operate with similar absolute
       | latencies, driving up the relative CAS latency. In this case 14ns
       | remains the gold standard, with CAS latencies at the new speeds
       | being set to hold absolute latencies around that mark.
       | 
       | Some gaming memory kits can do 10ns or less latency. Though I
       | guess if memory latency is your bottleneck, you should look at
       | HBM.
        
         | spintin wrote:
         | HBM is slower than DDR per pin, the speed gain is from a hugely
         | parallel bus.
         | 
         | Parallel means latency if you have non "embarrassingly
         | parallelizable" tasks?
        
           | Tuna-Fish wrote:
           | The smallest transfer done from memory is a single cache
           | line, which on most desktop machines is 64 bytes, or 512
           | bits. You could imagine a memory bus that was 512 bits wide
           | and transferred a cache line per clock, and this would
           | improve latency when compared to a serial bus with higher
           | clock speed. HBM doesn't do that, though, instead every HBM3
           | module has 16 individual 64-bit channels, with 8n prefetch
           | (that is, when you send a single request to a single channel,
           | it will respond with 512 bits over 8 cycles).
        
             | dist-epoch wrote:
             | DDR5 has 2 independent 32-bit lanes. Multiple transfers are
             | required for 64 bytes.
        
               | Tuna-Fish wrote:
               | DDR5 has a 16n prefetch, so a single transfer from a
               | 32-wide channel moves 64 bytes.
        
         | moffkalast wrote:
         | I don't think they make HBM RAM kits. /s
        
         | nsteel wrote:
         | As others have said, there is nothing low latency about HBM.
         | 
         | Renesas did have a special Low Latency HBM thing at one point,
         | but I don't think it ever saw the light of day.
        
         | jeffbee wrote:
         | > Some gaming memory kits can do 10ns or less latency
         | 
         | Without a thorough analysis by real engineers my interpretation
         | of this statement is "DRAM marketers can print anything they
         | want on the sticker".
        
         | NavinF wrote:
         | > Some gaming memory kits can do 10ns or less latency
         | 
         | Source? My overclocked desktop RAM shows 45ns in benchmarks. I
         | call bullshit on 4.5x faster RAM. Most people fight for an
         | extra 5% latency reduction
        
           | wmf wrote:
           | That's probably 10 ns for the DRAM and 35 ns for the caches
           | and memory controller.
        
       | Night_Thastus wrote:
       | I'm a bit confused, DDR5 products are already out - as are CPUs
       | and motherboards that support them.
       | 
       | How can this change happen retroactively? Would motherboard
       | manufacturers just need to update the BIOS to enable new XMP
       | configurations? (For when this new, higher transfer rate RAM
       | becomes available)
        
         | braiamp wrote:
         | > while leaving the spec open to further expansions with faster
         | memory as technology progressed
         | 
         | They only set the current standard, but allowed to, if
         | technology progresses, that other speeds/timings would also be
         | jedec compatible, rather than being some kind of XMP.
         | Motherboard manufacturers do not need to upgrade their previous
         | models if the hardware doesn't meet the required SN ratios, or
         | whatever. But they _could_ if they believe they had the
         | hardware to support it.
        
         | tadfisher wrote:
         | Not even that; this just sets standard speed/latency values for
         | memory modules without XMP. You can already exceed these
         | numbers with XMP.
         | 
         | PRAC would need handling in the memory controller, so that
         | would require a CPU update if I understand correctly.
        
           | doikor wrote:
           | PRAC should happen automatically in the background when
           | possible and when it really needs to stop the controller from
           | accessing something while waiting for the bits to refresh it
           | uses the already existing ALERTn signal.
           | 
           | https://stefan.t8k2.com/rh/PRAC/index.html
           | 
           | > Panopticon retrofits an existing signal in the DDR
           | specification, called ALERTn, to effectively "trick" the
           | memory controller to pause issuing new DDR commands. DRAM
           | uses ALERTn to signal errors to the memory controller. Upon
           | receiving this signal, the memory controller stops issuing
           | new DRAM commands and instead re-issues the old memory
           | access. By making use of ALERTn, Panopticon requires no
           | modifications to any hardware other than DRAM itself.
           | 
           | (As I understand PRAC uses the same design as Panopticon for
           | this part)
        
         | dist-epoch wrote:
         | The spec is just a bunch of numbers which are already
         | configurable and motherboards can already be set at much higher
         | frequencies.
         | 
         | It doesn't mean that any particular combination of
         | CPU/motherboard/RAM will work.
        
         | imtringued wrote:
         | It doesn't. If you buy A DDR5-6400 DIMM it doesn't get updated
         | to 8800. It will stay at 6400. This just means that
         | manufacturers will be able to brand their tested DDR5 DIMMs as
         | supporting 8800. You still need a CPU and Mainboard that has
         | been validated at those speeds. You're going to need a 8700G if
         | you actually want to hit those speeds by the way.
        
       | londons_explore wrote:
       | I'd like to see the spec tackle latency with a "send then
       | confirm" approach.
       | 
       | Ie. The RAM can reply to a read request with data, then a couple
       | of clock cycles later it can confirm (via a flag) that the data
       | it originally sent was correct.
       | 
       | This is helpful because it means the timing can be tightened to
       | the typical access times, rather than the worst case access time
       | (eg. the slowest preamp on the highest capacitance memory
       | row/column).
       | 
       | Things like CPU's already have provisions for handling not-yet-
       | confirmed information, and can roll back state if delivered info
       | turns out to be wrong.
       | 
       | Yes, it adds complexity to the whole system, but it seems worth
       | it for a -30% change to memory latency.
        
         | pshirshov wrote:
         | And potentially opens a whole new family of side channels.
        
           | touisteur wrote:
           | I wish we could mix 'I don't care about side channels, use
           | them all' with 'I'm paranoid about side channels, plug them
           | all' on the same machine. Disable speculative execution on
           | one core, no frequency adjustment, no prefetching, sr-
           | io/pcie-bypass some devices... E-cores but for the side-
           | channel-paranoid (in a good way).
        
             | smallmancontrov wrote:
             | Bring back EIEIO, like on Old Macs, but perhaps with a
             | slightly expanded definition of what constitutes I/O:
             | Enforce In-order Execution of I/O (EIEIO) is an assembly
             | language         instruction used on the PowerPC central
             | processing unit (CPU) which         prevents one memory or
             | input/output (I/O) operation from starting until
             | the previous memory or I/O operation completed. This
             | instruction is needed ]         as I/O controllers on the
             | system bus require that accesses follow a
             | particular order, while the CPU reorders accesses to
             | optimize memory          bandwidth usage.
        
               | colejohnson66 wrote:
               | You mean memory fences? The big architecture (x86, ARM,
               | RISC-V) all contain instructions for them.
        
               | touisteur wrote:
               | I mean permanently disable all speculative execution on a
               | specific core and reduce/disable all side-channels of the
               | kind. If you're saying I can do through injection of
               | fence instructions between every instruction, coupled
               | with isolcpus... I might have a fun weekend coming
               | playing with Intel Pin. But I'm guessing the performance
               | hit might be worse than 'just' disabling speculative
               | execution on a core - if it was possible at all - or that
               | the fence instructions might not be enough there? Haven't
               | thought it through.
               | 
               | But it would be a fun question to ask the likes of Daniel
               | Gruss...
        
           | bee_rider wrote:
           | I mean nobody really believes there aren't already countless
           | side channels in existing hardware, right? No reason to give
           | up performance for nothing.
        
             | thfuran wrote:
             | Things are bad, so make no attempt to better or even avoid
             | worsening them?
        
               | frutiger wrote:
               | The front door is already open. Let's open the bedroom
               | window if we want more fresh air there.
        
               | bee_rider wrote:
               | Anything can be made to sound wrong or right if you get
               | abstract and vague enough.
               | 
               | We shouldn't sacrifice something for nothing.
        
               | thfuran wrote:
               | A considerable amount of effort goes into mitigating side
               | channels precisely because it isn't for nothing.
        
               | gosub100 wrote:
               | How about "One size doesn't fit all." ?
        
         | mungoman2 wrote:
         | How many cycles could this actually save? I would assume the
         | latency to actually get data from DDR is only a small part of
         | the whole round-trip in a L1 miss. Actual savings much smaller
         | than 30%.
        
           | foota wrote:
           | Most of the cost of an L3 miss comes after the miss itself,
           | for most architectures I've seen.
           | 
           | E.g., on Skylake an L3 hit is 80 cycles (~20ns) whereas a RAM
           | access is 80 cycles plus 50 nanos (~70 nanos). See
           | https://www.7-cpu.com/cpu/Skylake_X.html
        
       | AzzyHN wrote:
       | I'm always a fan of bigger numbers, but I wish more
       | time/money/whatever was put into letting DDR5 run at those
       | XMP/EXPO speeds when using 4 DIMMs.
        
         | imtringued wrote:
         | How do you expect that to happen? By sharing memory channels
         | you are no longer using a point to point connection and are now
         | prone to reflections in the PCB traces where you have split the
         | signal. There is no "money" that can be put into this, that
         | won't also improve the performance of the single DIMM per
         | channel setup disproportionately. I don't even understand what
         | your point is. Quad channel support would be a much better idea
         | since it doubles your memory bandwidth, while remaining a point
         | to point connection, but you're going to complain that you
         | can't add eight DIMMs then.
        
           | zrm wrote:
           | What if you give each slot independent command pins but not a
           | complete memory channel?
           | 
           | What if you use a similar technology to registered or load-
           | reduced memory, but put the register on the system board
           | instead of the DIMM so it's in front of multiple DIMMs that
           | then share the channel into the processor but not the traces
           | on the system board? This may also allow higher capacity
           | DIMMs in consumer systems.
        
           | atlas_hugged wrote:
           | You doing ok dude?
        
         | Aurornis wrote:
         | 4 DIMMs on a consumer board means 2 DIMMs per channel. This is
         | inherently a compromise in signal integrity that must come with
         | a speed tradeoff, unfortunately. We're dealing with laws of
         | physics.
         | 
         | In the past some motherboards tried a T-topology for RAM slots
         | to optimize for 2 DIMMs per channel, but this would cause
         | problems with 1 DIMM per channel usage. Not worth it for the
         | average consumer.
        
         | smolder wrote:
         | The way to do that would be with a chip & socket that has 4
         | independent memory channels (threadripper I think has eight
         | now, but maybe used to have four?) and a many-layered
         | motherboard that optimizes the routing and placement of each
         | dimm slot, ideally with only 1 dimm slot per channel for
         | maximum speed. The high end stuff with many memory channels
         | generally isn't designed for pushing RAM clocks to gaming
         | desktop speeds, though. You'd probably need to skip on ECC too
         | or overclock and manually time some.
        
       | oneplane wrote:
       | The article doesn't mention much about chip-to-controller
       | distance or path length, presumably this suffers from the same
       | issues we currently see where low power devices (and in some
       | desktop configurations as well) can't really ever get those
       | speeds unless the DRAM chips are near or on top of the CPU
       | substrate.
       | 
       | It's nearly impossible to do those numbers in modern mobile form
       | factors, even CAMM is having a hard time getting there with
       | modularised memory.
        
         | Aurornis wrote:
         | Generally the highest speeds aren't intended for low power
         | devices. They're targeted at applications where performance is
         | the most important goal and the power tradeoffs are not an
         | issue.
         | 
         | Enthusiast motherboards and RAM kits can already exceed these
         | speeds. Having official JEDEC timings just makes these speed a
         | more universal target for long-term high end designs.
        
           | oneplane wrote:
           | While that's true, It's also true that mobile devices tend to
           | be a rather static configuration during their lifetime, and
           | if you're going to have a fleet of those, having the best
           | performance during that lifetime is a nice bonus. So I
           | believe that form factor specific considerations are still a
           | good value to write about.
        
       | snvzz wrote:
       | Please make ECC mandatory.
        
         | wmf wrote:
         | It's never going to happen because Dell counts every penny.
        
           | snvzz wrote:
           | JEDEC could only standardize ECC modules.
           | 
           | Microsoft, Intel or AMD can, anytime, require ECC for their
           | certification/logo programs.
           | 
           | Intel and AMD could even make their new chips only boot with
           | ECC.
           | 
           | And FCC could make ECC a requirement for certification.
           | 
           | All these parties (and more) are enabling non-ECC memory, to
           | the detriment of mankind.
        
       | transpute wrote:
       | https://stefan.t8k2.com/rh/PRAC/index.html
       | Chapter 16: "DDR5 Per Row Activation Counting (PRAC)". PRAC
       | introduces two key mechanisms for comprehensive Rowhammer
       | defenses: an Activation Counter for every DRAM row and a
       | mechanism that triggers when an Activation Counter reaches a
       | specific threshold. This allows the DRAM to pause the memory
       | controller from issuing new commands, giving it time to refresh
       | potential victim rows. In the words of a DRAM industry veteran
       | who will remain nameless, PRAC is the biggest change to DRAM in
       | decades. Thus, I thought I should write up a brief article
       | summarizing the change and its potential to solve Rowhammer once
       | and for all.
        
       | 2genders4117 wrote:
       | hi are u lonely want ai gf?? https://discord.gg/elyza
       | rHfEgglcrzsQLefTk
        
       | 2genders24716 wrote:
       | hi are u lonely want ai gf?? https://discord.gg/elyza
       | ZgyfFVGUDfKZeyrTb
        
       | 2genders11504 wrote:
       | Are you lonely? Do u want an AI girlfriend?
       | https://discord.gg/elyza CEstOnmHXPSXWyJMe
        
       | 2genders4161 wrote:
       | hi are u lonely want ai gf?? https://discord.gg/candyai
       | KvqwCNJbiqdocbiZw
        
       | 2genders14511 wrote:
       | hi are u lonely want ai gf?? https://discord.gg/elyza -- FOLLOW
       | THE HOMIE https://twitter.com/hashimthearab FbBBwZvVGBSUbcZFB
        
       | 2genders49017 wrote:
       | Are you lonely? Do u want an AI girlfriend?
       | https://discord.gg/elyza JLKwlCPuqOLXSKpbl
        
       | 2genders23029 wrote:
       | Are you lonely? Do u want an AI girlfriend?
       | https://discord.gg/elyza -- FOLLOW THE HOMIE
       | https://twitter.com/hashimthearab xnXyWVuYllFnCpNRp
        
       | 2genders45371 wrote:
       | hi are u lonely want ai gf?? https://discord.gg/elyza -- FOLLOW
       | THE HOMIE https://twitter.com/hashimthearab JLiZosqCbqFIrKkzt
        
       | 2genders3467 wrote:
       | Are you lonely? Do u want an AI girlfriend?
       | https://discord.gg/elyza asygsKBjuLVEMtjKQ
        
       | 2genders35563 wrote:
       | Are you lonely? Do u want an AI girlfriend?
       | https://discord.gg/elyza IlmsQNxxZkcFZMOvB
        
       | 2genders17435 wrote:
       | Are you lonely? Do u want an AI girlfriend?
       | https://discord.gg/elyza -- FOLLOW THE HOMIE
       | https://twitter.com/hashimthearab lQrPFRYbQbXaACnPK
        
       | 2genders9531 wrote:
       | hi are u lonely want ai gf?? https://discord.gg/elyza -- FOLLOW
       | THE HOMIE https://twitter.com/hashimthearab pcXjiEIiDCBeujpwd
        
       | 2genders7442 wrote:
       | hi are u lonely want ai gf?? https://discord.gg/candyai
       | eDTtqnSdroDAZuElx
        
       | 2genders35501 wrote:
       | Are you lonely? Do u want an AI girlfriend?
       | https://discord.gg/elyza -- FOLLOW THE HOMIE
       | https://twitter.com/hashimthearab tNpzzIiSOrhOayHJS
        
       | 2genders34771 wrote:
       | Are you lonely? Do u want an AI girlfriend?
       | https://discord.gg/candyai MBPWpFESSkgcQJvxH
        
       | 2genders13050 wrote:
       | hi are u lonely want ai gf?? https://discord.gg/elyza
       | ENFYIWOJkhXrdVGCq
        
       ___________________________________________________________________
       (page generated 2024-04-22 23:00 UTC)