[HN Gopher] Comparing DDR5 Memory from Micron, Samsung, SK Hynix
___________________________________________________________________
Comparing DDR5 Memory from Micron, Samsung, SK Hynix
Author : JoachimS
Score : 47 points
Date : 2022-02-15 19:59 UTC (3 hours ago)
(HTM) web link (www.eetimes.com)
(TXT) w3m dump (www.eetimes.com)
| bcrl wrote:
| If anyone thinks that on-die ECC is a good thing as the
| manufactuers are touting, please go read the discussions on this
| topic over in the forums at www.realworldtech.com. The goal of on
| die ECC is purely to ensure that DRAM manufacturers are able to
| obtain better yields by reducing the impact of defects, which is
| not the same as ensuring data integrity. This means that it fails
| the "trust but verify" tenant. Even worse is that some failures
| may not even get reported to the system as is the case with ECC
| implemented in the memory controllers and caches of modern CPUs.
| The industry is trying to make this look like a good thing, but
| I'm on the same side as Linus Torvalds: all modern systems should
| ship with ECC memory. IBM got it right with parity memory in the
| IBM PC.
| grue_some wrote:
| DDR5 contains two forms of ECC. The first is standard ECC which
| is used to correct for bit flips in transmission. The second
| on-die ECC is used to correct bit flips on the die, hence the
| name. The world has already accepted that standard ECC on high
| speed interfaces is a good idea, so why would on-die ECC be a
| bad idea? Yes, they correct different error types, but they
| both attempt to correct corrupted bits and the do so in a
| mathematically similar way.
|
| All that said, there are still ECC (has an ECC memory) and no-
| ECC dimms for DDR5. So if the on-die ECC is concerning for
| anyone, they can still get a DIMM with a separate ECC memory.
| But the ECC happening at the interface between the DIMM and the
| CPU will still exist always and you will have to trust it.
| bcrl wrote:
| Again, going back to the discussions over on RWT: some of the
| less robust forms of ECC that DRAM manufacturers typically
| implement can end up amplifying the problem by turning double
| bit flips into silent multi bit flips which makes the memory
| controller's job much harder. DRAM manufacturing process tech
| is not optimized for logic like CPUs are, and those
| limitations really do constrain how much logic (or "how
| good") the ECC implemented on DRAM chips is. I trust CPU
| manufacturers to get memory controllers right more than I
| trust DRAM manufactures to get ECC right for one simple
| reason: row hammer.
| kimixa wrote:
| It's entirely possible that on-die ECC is still a good thing
| for the end user - to really judge you need to compare the
| error rate (and proportion the ecc corrected) of dies that
| would have previously failed validation. It may be that it's
| good for both - IE more dies can be used (so higher supply and
| lower prices to the consumer), yet the un-fixed error rate is
| still lower than dies that would have previously passed
| validation but lack on-die ECC.
|
| I doubt any manufacturer would make that public, however, but
| an estimate may be made if error rates actually start
| increasing in the real world due to ddr5 allowing this.
|
| I agree that end-to-end ECC really should be the default for
| consumer products these days, but so long as the big players
| see it as a "Professional User" product differentiation point
| it'll always be more expensive than it should be.
| deckard1 wrote:
| > so long as the big players see it as a "Professional User"
| product differentiation point it'll always be more expensive
| than it should be.
|
| Right. The more important Linus to speak up for ECC isn't
| Torvalds. It's Linus Sebastian, of Linus Tech Tips. He's made
| a few videos on ECC targeted towards gamers. Gamers drive the
| enthusiast PC market and when they start caring, more ECC
| gets made which will drive the cost down a bit. Last time I
| bought 32GB DDR4 UDIMM ECC there was literally one SKU. Not
| manufacturer. Not brand. _SKU_. One single item in production
| in the entire world. 16GB wasn 't much better off, either.
|
| It's a hard sell, though. Non-ECC will always be cheaper
| because it costs less to produce. Gamers don't really care
| that ECC prevents one crash in years because they are used to
| frequent crashes already. They are largely being fed dogshit
| from the AAA gaming industry and they have learned to just
| deal with it. Crashes are just part of being on the bleeding
| edge of gaming and Nvidia/Radeon drivers. One less crash in a
| sea of crashes isn't something gamers are lining up for. But
| a better model GPU or bigger SSD? It's an obvious choice.
| kimixa wrote:
| > Gamers don't really care that ECC prevents one crash in
| years because they are used to frequent crashes already.
|
| I work on GPU drivers for one of those companies.
|
| We regularly get reports and backtraces that cannot be
| reproduced, or "Cannot Happen" without some external factor
| (e.g. some other bit of code poking around our memory
| space). Often they're just silently dropped or ignored on
| the long tail of issues that nobody can get any traction
| on.
|
| My understanding is the stats from hyperscalers is that ECC
| correction events happen a lot more than "Common Knowledge"
| may imply - I wonder just what proportion of things that
| are blamed on software may actually be due to hardware
| issues like this?
|
| Again, without a significant change in the market (IE
| enough gamers start using ECC to actually be statistically
| relevant and comparing stability) this cannot really be
| tested, but I've wondered.
| bcrl wrote:
| Except that anyone using Intel desktop CPUs pretty much
| can't use ECC thanks to marketing deciding that ECC is a
| market segmentation feature.
|
| The real way to make ECC happen industry wide is for OS
| vendors like Microsoft to make it a platform requirement.
| A no ECC, no boot policy would change things overnight.
| Sadly, we can't even get DRAM manufacturers to fix row
| hammer properly, so the likelihood of this happening is
| pretty much nil.
| sliken wrote:
| If people cared, they would buy ECC capable chips. In
| fact my desktop is a Xeon e3-1230v5, which as cheaper and
| slightly slower (3.4 vs 3.6 GHz or something) then the
| equivalent i7. It was $50 more for the motherboard and
| $100 more for the ram. I'm sure if the market flooded to
| ECC capable chips (the silicon is the same) Intel would
| sell them.
|
| So many people grumble, but I'm not really sure Intel
| should push ECC if desktops users aren't willing to pay a
| modest premium for it.
|
| Many cheer AMD, which does not disable ECC on desktop
| chips, but neither do they promise ECC will actually
| work. It's a confusing mess between physical capacity
| (ram increases by 16GB when you add a 16GB dimm), and the
| actual correction of errors and telling the OS about the
| event. Only on the EPYC does AMD test and certify that
| ECC will work.
| wmf wrote:
| You can use ECC by buying the Xeon version which is only
| slightly more expensive.
| g42gregory wrote:
| Maybe I am not understanding something, but I thought that total
| memory bandwidth is critical for Deep Learning applications. This
| is where HBM on-die would shine, no? I am deferring the purchase
| of a new desktop/server until processors with HBM come to market.
| I think AMD is shipping EPYC engineering samples with some
| version of memory and Intel is slated the release by the end of
| the year. Am I wrong about this?
| wmf wrote:
| The only CPU with HBM is Sapphire Rapids and it may cost $20K;
| for that money you're probably better off buying an H100.
| hulitu wrote:
| Article seems to imply that all DDR5 chips have ECC. Is this true
| ?
| sliken wrote:
| Yes, as discussed on other threads here, the ECC helps increase
| chip yields, but does not prevent offchip errors. So it's not
| equivalent to what people normally mean by ECC memory which
| stores parity that will correct single bit errors and detect 2
| bit errors anywhere in the chip, dimm, dimm slot, motherboard,
| socket, or CPU areas.
| tester756 wrote:
| >DDR5 provides both data and clock rates that double the
| performance up to at least 7,200 MB/s. Additionally, DDR5 lowers
| the operating voltage to 1.1V.
|
| hmm? 7GB/s is the performance that modern disks achieve
| kamilner wrote:
| Why is it that LPDDR is recently faster than DDR of the same
| 'generation'? I thought LPDDR is purely a lower voltage version
| of DDR, so I naively would have expected worse performance. Is it
| because it's typically closer (physically) to the CPU?
| bcrl wrote:
| DDR is typically a bus with more than 1 DIMM slot per channel.
| LPDDR is typically point to point. Electrically, it's a lot
| easier to meet signal integrity requirements on a point to
| point trace than it is to make a multi drop bus work properly.
| grue_some wrote:
| LPDDR uses a wider bus so, at a similar clock rate, it is
| faster.
| dhdc wrote:
| More importantly, because of the low-power requirement, LPDDR
| typically have better binned dies than DDR.
| sliken wrote:
| I believe it's just the advantages you get from very short
| trace lengths. Dimm slots are usually inches away, so you end
| up with long traces from CPU -> dimm slot, pay the overhead of
| the dimm slot connection, and then traces within a dimm.
|
| LPDDR on the other hand move the individual dimm chips as close
| as possible to the CPU and don't have any connector. This also
| makes it much easier to have wider memory. A 13" MBP can have a
| 512 bit wide memory system with at least 16 channels in a
| thin/light laptop that is quite power efficient. To get similar
| with DIMMs you'd have to buy a dual socket server motherboard
| with 8 channels per socket and would be lucky to fit that in an
| ATX size motherboard in a 1.75" thick chassis.
___________________________________________________________________
(page generated 2022-02-15 23:01 UTC)