[HN Gopher] ECC RAM should be a human right
___________________________________________________________________
ECC RAM should be a human right
Author : zdw
Score : 49 points
Date : 2023-01-21 21:44 UTC (1 hours ago)
(HTM) web link (dmitrybrant.com)
(TXT) w3m dump (dmitrybrant.com)
| greenbit wrote:
| Commodity PCs back in the 80s and 90s didn't have error
| correction but they did have parity (iirc). Correction requires
| three extra bits per byte compared to parity only carrying one
| extra bit per byte. I recall around 1990 you could get your 30pin
| SIMs as 9-bit (parity) or 8-bit (no-parity), and virtually all of
| the PCs at the time wanted the 9 bit modules. Parity can't
| correct errors, but at least it can cause an exception when you
| read something that's had a bit flip.
| Felger wrote:
| Yes I can indeed recall systems being shipped with 4x 9-bits
| sticks.
|
| And god the horrific price of thoses sticks...
| phkahler wrote:
| Unfortunately DDR5 is going to complicate rather than fix the
| story.
| zdw wrote:
| It's very strange that DDR5 mandates internal ECC within each
| physical package, but not on the longer and possibly more EMI
| sensitive connections between the memory chips and controller.
|
| I would have thought that the additional cost would be minimal
| (additional wiring on the logic board in some cases), but maybe
| this is just more artificial market segmentation?
| ilyt wrote:
| They have internal ECC coz that allows them to have higher
| yields, what could be considered faulty chip in DDR4 can be
| now sold in DDR5. So it is effectively cost-reducing measure
| for them. Exposing that to the user not only would cost extra
| pennies, but potentially have uses go "hey, this stick is
| shit, look at how many correctable errors it is producing,
| please replace it"
| arp242 wrote:
| Why is that?
| loeg wrote:
| DDR5 will have some minimal ECC on the stick but critically
| does not mandate full runs to the CPU. Or in Wikipedia's
| words:
|
| > Unlike DDR4, all DDR5 chips have on-die ECC, where errors
| are detected and corrected before sending data to the CPU.
| This, however, is not the same as true ECC memory with an
| extra data correction chip on the memory module. DDR5's on-
| die error correction is to improve reliability and to allow
| denser RAM chips which lowers the per-chip defect rate. There
| still exist non-ECC and ECC DDR5 DIMM variants; the ECC
| variants have extra data lines to the CPU to send error-
| detection data, letting the CPU detect and correct errors
| that occurred in transit.
|
| So in some ways it is better than previous generations, but
| it gives vendors another excuse not to implement full-
| coverage ECC. That's my guess of why GP said it complicates
| things.
|
| https://en.wikipedia.org/wiki/DDR5_SDRAM
| adhoc32 wrote:
| Latest Intel desktop CPUs (i.e. i9-13900KF) supports ECC with the
| W680 chipset.
| fortran77 wrote:
| One of the main reasons I buy Xeon desktops is the ECC. With
| 128 GB of memory, and 1 bitflip/GB/year average error rate, it
| seems too risky to not use ECC for production work.
| Retric wrote:
| Real world numbers are closer to 1 bitflip/GB/hour than year
| because bit flips are highly correlated.
|
| "A large-scale study based on Google's very large number of
| servers was presented at the SIGMETRICS/Performance '09
| conference.[6] The actual error rate found was several orders
| of magnitude higher than the previous small-scale or
| laboratory studies, with between 25,000 (2.5 x 10-11
| error/bit*h) and 70,000 (7.0 x 10-11 error/bit*h, or 1 bit
| error per gigabyte of RAM per 1.8 hours) errors per billion
| device hours per megabit. More than 8% of DIMM memory modules
| were affected by errors per year."
| https://en.wikipedia.org/wiki/ECC_memory
|
| A random stick of non ECC memory might be far above average
| fine, but you don't know.
| skunkworker wrote:
| I wish those motherboards didn't cost $450+, I've contemplated
| building a home server with a 13th gen + ECC because you also
| get quicksync onboard.
| coder543 wrote:
| Exactly. $450 for a motherboard just to get ECC support is
| ridiculous. I don't know how it is with AM5, but on AM4, you
| could use ECC memory with many normally-priced motherboards.
|
| Mentioning W680 feels pointless. You've _always_ been able to
| buy high end motherboards and stick ECC in them. The entire
| point of the article is that _all_ computers should be using
| ECC RAM, not just the expensive, workstation class computers.
| Dylan16807 wrote:
| It's worth keeping in mind that the chipset has zero
| involvement in ECC. The CPU is directly attached to the memory
| slots. They're using the chipset as an expensive dongle.
| NelsonMinar wrote:
| Still wild to me we lost ECC RAM. It used to be standard in PCs.
|
| Does Apple hardware come with ECC RAM? If anyone could make it
| make sense as a business, it's them.
| pram wrote:
| The Xeon based Macs had ECC of course. None of the ARM ones do
| (yet)
| [deleted]
| MBCook wrote:
| When was it standard? It's been the high-end extra thing for as
| long as I can remember.
| MisterTea wrote:
| I know the Pentium Pro/2/3 chipsets and motherboards all(?)
| supported it. Unsure of the Pentium 1 as the 430TX on my Tyan
| Tomcat IV doesn't, and that is a dual processor board. 486
| and earlier likely depended on the chipset as there were
| many.
|
| At work I have two working slot 1 PIII 800's each with 1GB
| ECC (4x 256MB DIMMS) on a regular Asus board (doing nothing
| but waiting to go home with me one day). The board reports
| the RAM is in fact ECC and that it is enabled.
| NelsonMinar wrote:
| I was thinking of 386 era computers and strictly speaking it
| was just parity RAM, not ECC. Which often led to annoyances
| when a single parity error would cause your whole computer to
| halt.
|
| Wikipedia says "By the mid-1990s, most DRAM had dropped
| parity checking as manufacturers felt confident that it was
| no longer necessary.".
| https://en.wikipedia.org/wiki/RAM_parity
|
| I'd love to read a technical deep dive on RAM reliability
| over time. You'd think with increasing memory cell density
| and overall larger RAM the number of absolute errors on a
| desktop computer would be going up over time.
| Felger wrote:
| I can remember 486 Motherboard in Packard Bell (quite the
| entry brand...) systems frequently used 36 bits ECC FP SIMMs.
|
| Printers and plotters from this era used ECC modules most of
| the time.
|
| But by the end of the century, they were replaced by
| unbuffered, unregistered 16/32/64 bits modules.
|
| Every mid range server still use ECC. Entry HPE Servers use
| ECC UREG (unregistered, 9 chips) modules, while mid range and
| more use ECC REG modules (9 chip + interface controller
| onboard). Ironically, UREG module are more expensive than ECC
| REG.
|
| Also, most workstations used ECC modules. Less frequently
| since 4-5 years.
| [deleted]
| dale_glass wrote:
| ECC RAM would actually be a boon to everyone, including gamers.
|
| ECC means not only that you know precisely when you've gone too
| far with overclocking, but potentially allows overclocking a bit
| further, relying on that some amount of trouble can now be
| tolerated.
|
| It also means you're not going to break your OS by playing with
| this stuff. Memory corruption carries a huge risk of disk
| corruption, which can mean things like corrupt data, random
| crashes or an unbootable system that persists even after
| reverting everything to defaults.
| p1necone wrote:
| The sweet spot for overclocking ECC ram is still before it
| starts malfunctioning. If it's clocked higher but is correcting
| for errors it will still be slower.
| ilyt wrote:
| Entirely depends on error rate
| [deleted]
| RealityVoid wrote:
| I doubt that would actually be useful with overclocking. I
| don't know the arch of the modern PC well enough to say with
| 100% confidence, but on embedded arches, the RAM has the parity
| bits checked when they get placed on the bus. If the error
| happens on data retrieval(or was already present) , then the
| ECC saves you, but if it happen anywhere else... not really? I
| don't know if.. ALU's for example automatically include the
| parity bits in the computation.
| p1mrx wrote:
| You're talking about overclocking the CPU. ECC is more
| relevant when overclocking the RAM itself, which also affects
| gaming performance.
| Dylan16807 wrote:
| They specifically mean overclocking the memory.
| jjtheblunt wrote:
| totally embarrassingly naive question : why bother overclocking
| ?
| dale_glass wrote:
| I think it's mostly pointless in this day and age.
|
| I'm just saying that it has a potential appeal for gamers
| too, so it's not just a datacenter type of technology that
| some nerds want to play with.
|
| At the very least it'd make overclocking safer and easier, so
| any manufacturer making gamer type boards with a lot of
| overclocking settings in the BIOS should like the idea of it.
| eric__cartman wrote:
| Some people prefer to trade off stability for a slight
| performance improvement. With modern hardware I don't think
| it's worth it to be honest. I want my computer to work day in
| and day out even if it means a 2% lower score in some
| benchmark.
| jjeaff wrote:
| There are also a lot of cases where you can overclock
| without sacrificing stability. The standard clock speed for
| any line of processors is simply the minimum it is tested
| for. But you sometimes get lucky and can get a better chip
| with more viable transistors. So you can boost the clock ok
| those and reap the benefits without any drawbacks.
|
| There are sites and services that do "binning" where they
| test the specific chips and you can buy ones that have been
| vetted to clock higher.
| [deleted]
| LanternLight83 wrote:
| I'm with you, but it's worth noting that errant bit-flips are
| also the most convincing argument for vertically integrated file-
| systems like ZFS and BTRFS.
| whitepoplar wrote:
| At least make it user-configurable! I'd trade off a bit of memory
| capacity for ECC protection in a heartbeat.
| PaulKeeble wrote:
| ECC has been used as an artificial market segmentation mechanism
| for a long time and it needs to come to an end. RAM just like
| SSDs and HDDs ought to have some amount of self protection again
| basic errors, all places where data is stored even for short
| periods needs this.
| thinking001001 wrote:
| Digital privacy should be a human right. ECC RAM is just another
| iteration.
| theandrewbailey wrote:
| * * *
___________________________________________________________________
(page generated 2023-01-21 23:00 UTC)