[HN Gopher] AMD EPYC 7C13 Is a Surprisingly Cheap and Good CPU
___________________________________________________________________
AMD EPYC 7C13 Is a Surprisingly Cheap and Good CPU
Author : PaulHoule
Score : 131 points
Date : 2024-03-27 15:12 UTC (7 hours ago)
(HTM) web link (www.servethehome.com)
(TXT) w3m dump (www.servethehome.com)
| jeffbee wrote:
| This is some fell-off-a-truck stuff. Aren't the weird part
| numbers with infix letters custom made for large customers
| (Amazon, Google, et al.)?
| astrodust wrote:
| Large cloud providers dump their gear in bulk all the time, and
| these parts get picked, tested and packaged for resale.
|
| I'm not sure these custom parts are barred from resale like the
| "ES" (Engineering Sample) type chips are.
| jeffbee wrote:
| I don't think they are barred from sale, but I do think that
| if you're selling secondhand CPUs on Newegg, the used nature
| of the hardware should be prominently stated. That is, for
| customers who are still willing to risk their money on
| Newegg.
|
| With those caveats it's a great deal for something like a
| build box. You could put this into an existing ATX case with
| $1000 worth of RAM (that you may already own?) for less than
| the price of a new Threadripper CPU.
| CogitoCogito wrote:
| Is newegg bad now? It's been a long time since I've ordered
| from them, but I only had positive experiences with them.
| radicality wrote:
| It's become a marketplace ever since it was bought out,
| so lots of sellers of varying qualities. As long as you
| always filter by "sold and shipped by Newegg" you should
| be fine.
| mindcrime wrote:
| Who are the recommended vendor for purchasing PC parts
| these days then? That is, who (if anybody) fills New
| Egg's previous niche? I've actually just bought a bunch
| of stuff from New Egg, after not doing any PC building
| for 15+ years, and didn't initially realized how much
| they had switched to the "marketplace" model.
| jeffbee wrote:
| A good online retailer is B&H Photo. As far as I have
| seen, everything they sell is first-party. It's not a
| marketplace like Amazon or Newegg.
| kjs3 wrote:
| They don't have to be used; there's overstock and grey market
| possibilities.
|
| I've seen many times (usually smaller) runs of products and
| noticed house-marked or otherwise oddly identified chips and
| when asked the producer said "the OEM didn't use them so they
| sold them to us cheap". And I've certainly bought a couple
| brand new big-box labeled motherboards that were really (and
| obviously) minor variations of existing Asus, Gigabyte or
| Supermicro motherboards. Shoot, somewhere I've got a NiB
| Intel Phi card with a weird part number only because it was
| made for (I think) Dell and now that Phi is dead they were
| being fire-saled.
| pengaru wrote:
| Or whole-system vendors like Lenovo/HP?
| JonChesterfield wrote:
| Nah, it's just variation on price of older stock. Relatively
| few buyers and relatively low stock pushes the variance up.
| E.g. I see a 7763 listed at 3k in one store and 4k in another.
|
| If you can find a motherboard to match it's a lot of computer
| for the price.
| derefr wrote:
| Some maybe-interesting observations about my experiences over
| the last few years, as someone who uses both cloud-provisioned
| (GCP N2D) and dedicated-server (e.g. OVH HGR-HCI-class) AMD
| EPYC-based machines at $work.
|
| * GCP N2D instances always had a strict per-AZ allocation
| quota. This allocation quota has _not_ increased over time. And
| when we asked to have it bumped up, it was the only time a
| quota-increase request of ours has ever been denied.
|
| * When OVH was offering their HGR-HCI-6 machine type (2x EPYC
| 7532), we provisioned a few of them. The first few, leased ~2
| years back, each took a few days to provision -- presumably,
| OVH doesn't buy these expensive CPUs until a customer asks for
| a machine to be stood up with one in them. More recently,
| though (~6mo ago), for the same machine type, they gave us a
| provisioning lead time of more than a month, due to supply
| difficulties for the CPU.
|
| * These chips were buggy! Again on OVH, when allocating these
| HGR-HCI-6 machines, we were allocated two separate machines
| that ended up having CPU faults. (Symptoms: random reboots
| after an hour or two of heavily utilizing the native AES-NI
| instructions; and spurious "PCI-e link training errors" in
| talking to the network card and/or NVMe drives.) They were
| replaced quickly, but I've never seen this kind of CPU fault on
| a hardware-managed system before or since.
|
| * Just a month ago, the high-end dedicated-server hosters (OVH,
| but also Hetzner and so forth) seem to have removed all their
| SKUs that use 2nd- and 3rd-gen EPYC 7xxx CPUs. (Except for one
| baseline SKU on OVH, which is probably there because they have
| a big pile of them.) Everything suddenly switched over to 4th-
| gen 9xxx EPYCs just a month or two ago. It might just be that
| availability of these 9xxx EPYCs is finally reaching levels
| where these providers think they can meet demand with them --
| but everyone switching over simultaneously, _and_ dropping
| their old SKUs at the same time?
|
| * GCP recently launched the storage-optimized Z3 instance type.
| They chose to build this instance type on an Intel platform
| (Sapphire Rapids.) That's even though AMD EPYCs have had enough
| PCIe lanes to deliver equivalent performance to this Z3
| platform -- ignoring the "Titanium offload" part, which isn't
| CPU-platform-specific -- for years. (In fact, the need for a
| huge pool of fast NVMe is in part _why_ we switched some of our
| base load from GCP over to those OVH HGR-HCI-6 instances --
| which satisfied our needs quite well.) GCP could _in theory_
| have launched something akin to this instance type, with the
| same 36TiB storage pool size (but PCIe 4.0 speeds rather than
| 5.0) three years ago, using EPYC 7xxxs. Cusomers have been
| asking for something like that for years now -- wondering why
| GCP instances are all limited to 8.8TiB of local NVMe. (We
| actually asked them ourselves, back then, where "the instance
| type with more local NVMe" was. They gave a very handwave-y
| response, which in retrospect, may have been a "we're trying,
| but it's not looking good for delivering this at scale right
| now" response.)
|
| These points all lead me to believe that something _weird_
| happened with the EPYC 7xxx rollout. Supply didn 't grow to
| meet demand over time.
|
| And then, suddenly, _after_ the seeming EOL of these chips --
| but long _before_ cloud providers would normally cycle them out
| -- we 're seeing 7xxxs ending up on the open market, in enough
| bulk to make them affordable? Bizarre.
|
| ---
|
| My own vague hypothesis at this point, is that at some point
| during the generation, AMD discovered a fatal flaw in the
| silicon of the entire EPYC 7xxx platform. Maybe it was the
| hardware crypto instructions, like I saw. Or maybe it was some
| capability more specific to cloud-computing customers (SEV-
| SNP?) that turned out to not work right (which would make more
| sense given that Threadrippers didn't see the same problems.)
| So the big cloud customers immediately halted their purchase
| orders (keeping only what they had already installed so as to
| not disturb existing customer workloads); and AMD responded by
| scaling down production.
|
| This resulted in two things: a supply shock of AMD EPYC-based
| machines/VMs that lasted for a while; but also, negotiated
| settlements with the cloud vendors, where AMD was now obligated
| to fulfill existing POs for 7xxx parts _with 9xxxs_ , as they
| ramped up production of those. Which is why 9xxxs have taken so
| long (2 years!) to make it onto the open market: the lines have
| been dedicated to fulfilling not just 9xxx bulk purchase-
| orders, but also 7xxx purchase-orders.
|
| (And which is why the switchover to 9xxx among smaller players
| is so immediate: such a switchover has been on every hosting
| company's roadmap for a long time now, having been repeatedly
| delayed by supply issues due to the huge volume of 9xxx parts
| required to satisfy the clouds' backlogged demand. They've had
| a stock of 9xxx-compatible motherboards + memory + PSUs +
| chassis just sitting there for months/years now, waiting for
| 9xxx CPUs to slot into them.)
|
| Perhaps we're seeing these cloud-customer 7Cxx parts on the
| open market now, because the clouds have finally received
| enough 9xxxs to satisfy their actual demand for 9xxxs, _and_
| their backlogged demand for 7xxxs; and the clouds are now
| finally at the point where they can replace their initial
| _actual_ (faulty / feature-disabled) 7xxx parts they were sent
| with 9xxxs, selling off the 7xxx parts.
|
| My guess is that, now that they have "fixed" AMD chips in
| place, we'll soon see the cloud providers heavily hyping up
| some particular AMD-silicon-enabled feature that they had been
| _starting_ to market four years ago, but then went radio-silent
| on. ( "Confidential computing", maybe.)
|
| ---
|
| I'd love to hear what someone with more insider knowledge
| thinks is happening here.
| AnthonyMouse wrote:
| CPU faults on individual machines aren't that rare. The
| machine has a dodgy power supply that almost works but has
| voltage drop under load etc. Sometimes this can be caused by
| environmental factors. The rack is positioned poorly and has
| thermal issues, the UPS is supplying bad power etc. Then you
| can see issues with multiple machines, or replace the machine
| without fixing the issue. Vendors often put machines for the
| same customer in the same rack for various reasons, e.g.
| because they might send a lot of traffic to each other and
| put less load on their network if connected to the same
| switch, but then if there is a problem in that rack it
| affects more of your machines.
|
| The Epyc 7000 series was popular. There have been enough of
| them in private hands for long enough that if there were
| widespread issues they would be well-known.
|
| It's possible that AMD didn't order enough capacity from TSMC
| to meet demand, and couldn't get more during the COVID supply
| chain issues. For the 9000 series they learned from their
| mistake, or there is otherwise more fab capacity available
| now, so customers can get them. Meanwhile cloud providers
| really like Zen4c because they can sell "cores" that cost
| less and use less power, so they're buying it and replacing
| their existing hardware as they tend to do regardless. That
| is typically how they expand their business: If you add more
| servers you need more real estate and power and cooling. If
| you replace older servers with faster ones, you don't.
| derefr wrote:
| To be clear, it was a CPU fault that doesn't occur at all
| when running e.g. stress-ng, but _only_ (as far as I know)
| when running our particular production workload.
|
| And only after _several hours_ of running our production
| workload.
|
| But then, once it's known to be provokeable for a given
| machine, it's extremely reliable to trigger it again -- in
| that it seems to take the same number of executed
| instructions that utilize the faulty part of the die, since
| power on. (I.e. if I run a workload that's 50% AES-NI and
| 50% something else, then it takes exactly twice as long to
| fault as if the workload was 100% AES-NI.)
|
| And it _isn 't_ provoked any more quickly, by having just
| provoked it and then running the same workload again --
| i.e. there's no temporal locality to it. Which would make
| both "environmental conditions" and "CPU is overheating /
| overvolting" much less likely as contributing factors.
|
| > There have been enough of them in private hands for long
| enough that if there were widespread issues they would be
| well-known.
|
| Our setup is likely a bit unusual. These machines that
| experienced the faults, have every available PCIe lane
| (other than the few given to the NIC) dedicated to NVMe;
| where we've got the NVMe sticks stuck together in
| extremely-wide software RAID0 (meaning that every disk read
| fans in as many almost-precisely-parallel PCIe packets
| contending for bus time to DMA their way back into the
| kernel BIO buffers.) On top of this, we then have every
| core saturated with parallel CPU-bottlenecked activity,
| with a heavy focus on these AES-NI instructions; and a high
| level of rapid allocation/dellocation of multi-GB per-
| client working arenas, contending against a very large
| _and_ very hot disk page cache, for a working set that 's
| far, far larger than memory.
|
| I'll put it like this: _some_ of these machines are "real-
| time OLAP" DB (Postgres) servers. And under load, our PG
| transactions sit in WAIT_LWLOCK waiting to start up,
| because they're actually (according to our profiling)
| _contending over acquiring the global in-memory pg_locks
| table_ in order to write their per-table READ_SHARED locks
| there (in turn because they 're dealing with wide joins
| across N tables in M schemas where each table has hundreds
| of partitions and the query is an aggregate so no
| constraint-exclusion can be used. Our pg_locks Prometheus
| metrics look _crazy_.) Imagine the TLB havoc going on, as
| those forked-off heavy-workload query workers also all
| fight to memory-map the same huge set of backing table heap
| files.
|
| It's to the point that if we don't either terminate our
| long-lived client connections (even when _not_ idle), or
| restart our PG servers at least once a month, we actually
| see per-backend resource leaks that eventually cause PG to
| get OOMed!
|
| The machines that _aren 't_ DB servers, meanwhile -- but
| are still set up the same on an OS level -- are blockchain
| nodes, running https://github.com/ledgerwatch/erigon, which
| likes to do its syncing work in big batches: download N
| blocks, then execute N blocks, then index N blocks. The
| part that reliably causes the faults is "hashing N blocks",
| for sufficiently large values of N that you only ever
| really hit during a backfill sync, not live sync.
|
| In neither case would I expect many others to have hit on
| just the right combination of load to end up with the same
| problems.
|
| (Which is why I don't really believe that whatever problem
| AMD might have seen, is related to this one. This seems
| more like a single-batch production error than anything,
| where OVH happened to acquire multiple CPUs from that
| single batch.)
|
| ---
|
| > It's possible that AMD didn't order enough capacity from
| TSMC to meet demand, and couldn't get more during the COVID
| supply chain issues.
|
| Yes, but that doesn't explain why they weren't able to ramp
| up production at _any_ point in the last four years. Even
| now, there are still likely some smaller hosts that would
| like to buy EPYC 7xxxs at more-affordable prices, if AMD
| would make them.
|
| You need an additional factor to explain this lack of ramp-
| up _post_ -COVID; and to explain why the cloud providers
| never _started_ receiving more 7xxxs (which they _would_
| normally do, to satisfy legacy clients who want to
| replicate their exact setup across more AZs /regions.)
| Server CPUs don't normally have 2-year purchase
| commitments! It's normally more like 6!
|
| Sure, maybe Zen4c was super-marketable to the clouds'
| customers and saved them a bunch of OpEx -- so they
| negotiated with AMD to _drop_ all their existing spend
| commitments on 7xxx parts purchases in favor of committing
| to 9xxx parts purchases.
|
| But why would AMD agree to that, without anything the
| clouds could hold over their head to force them into it? It
| would mean shutting down many of the 7xxx production lines
| early, translating to the CapEx for those production lines
| not getting paid off! Being able to pay off the production
| lines is why CPU vendors negotiate these long purchase
| commitments in the first place!
|
| And if the clouds _are_ replacing capacity, then where are
| all those _used_ CPUs going?
|
| Take notice that the OP article isn't talking about a used
| CPU, but a "new server" -- namely (I think) this one:
| https://www.newegg.com/tyan-s8030gm4ne-2t-supports-amd-
| epyc-...
|
| This server was never in an IaaS datacenter. This is a
| motherboard straight from the motherboard vendor, with an
| EPYC 7C13 prepopulated into it.
|
| This isn't the sort of thing you get when a cloud resells.
| This is the sort of thing you get when a cloud (or other
| hosting provider) _stops buying unexpectedly_ -- and
| upstream suppliers /manufacturers/integrators are left
| holding the bag, of preconfigured-to-spec hardware they no
| longer have a pre-committed buyer for.
| 486sx33 wrote:
| NOT shipped by newegg but very interesting
| https://www.newegg.com/tyan-s8030gm4ne-2t-supports-amd-epyc-...
|
| Seems like a good value per dollar for a monero mining rig approx
| 4 x the performance of a 5950x on monerobenchmark site. Given
| that the epyc has about 4 times the cache (256MB vs 64MB) this
| makes sense in the monero world. I'd assume real world
| performance side by side compairison the epyc would get up past
| 4x than a real world 5950x which requires a lot of tweaking to
| get anywhere close the monerobenchmark numbers.. I'd expect the
| epyc runs better out of the box
| bethekind wrote:
| I've always wondered how the epycs with huge amounts of L3
| would perform on monero.
|
| Is there an optimal core to l3 ratio? Or is it always more is
| better
| 486sx33 wrote:
| here is a quick blurb from someone on reddit which sums up
| the general advice in a way that matches my experience
|
| "most mining algorithms targeted for CPUs require certain
| amount of L3 cache per thread (core), usually 1-4MB, so just
| divide your total amount of CPU L3 cache by this number and
| the result is how many threads can you run max on your cpu.
| For example if an algorithm requires 2MB of cache per thread
| and you have a 10-core 16MB L3 cache cpu, you can run at most
| 16/2=8 threads, 9 or 10 threads will result in worse
| performance as cores will be kicking out each other's data
| from the cache. " https://www.reddit.com/r/MoneroMining/comme
| nts/jurv6j/proces...
|
| Below are two real world examples of monero mining I run
| myself
|
| An example is my i9-10850K. exact same performance using 8
| cores or 10 cores (16 threads or 20 threads).. in fact it
| slightly goes down performance wise at 20 threads.. Given
| that it has 20MB cache, it is an example of the bottom limit
| not being optimal. using this example i'd say minimum is
| around 1.12MB per thread, or 2.25MB per core
|
| Another machine I have, the 5950x crunches away all day and
| night using all threads (32) on all cores (16) with 64MB
| cache no problem this correlates to 4MB per core / 2MB per
| thread.. and it seems like more than what it needs because I
| can use the machine all day for daily tasks with no hiccups
| and while mining full out. If you have a desktop at work and
| need to kill it all day long, the 5950x will absolutely take
| anything and everything you throw at it.. samsung b-die ram
| helps monero mining as well. I run only 32GB but in 4 8GB
| b-die sticks.
| nsbk wrote:
| Is Monero CPU mining profitable nowadays or does it need
| free energy to be?
|
| Edit: I have a spare 5950x collecting dust
| bethekind wrote:
| Profitable, yes but pennies per day. 5950X should make
| ~$5/month after all is said and done iirc.
|
| If you live in cold climates where space heaters are
| used, it makes sense, as you would've been heating the
| house anyways.
| pclmulqdq wrote:
| I think ~2 MB (a bit more) is that ratio, at least as
| designed. It's possible that you can hyperthread Monero with
| 4-5 MB of cache per core.
| mise_en_place wrote:
| It's good but I have a feeling anything you find on aftermarket
| will be used and abused. These chips are designed to handle high
| thermal load, but if it's been in a DC or server room, it may
| impact its longevity.
| wmf wrote:
| "Abused" chips are mostly a myth and Milan is not that old so
| these chips should have plenty of life left in them.
| londons_explore wrote:
| Agreed - with CPU's, if it's working the day you buy it, it
| will most likely still be working in 10 years with typical
| desktop use cases, no matter the past life it had.
| paulmd wrote:
| that's not true at all, XMP is very much within the
| wheelhouse of "typical desktop use-cases" and can
| absolutely damage a CPU from electromigration within a
| matter of years.
|
| (or rather, the overclocked/out-of-spec memory controller
| usually requires the board to kick up the CPU memory
| controller (VCCSA/VSOC) voltages, and that's what does the
| damage.)
|
| https://youtu.be/HLNk0NNQQ8s?t=510
|
| https://www.youtube.com/watch?v=uMHUz16MuYA
|
| People have generally convinced themselves that it's safe
| but, the rate of CPU failures is _incredibly high_ among
| enthusiasts compared to the general enterprise fleet and
| the reason is XMP. This has been "out there" for a long
| time if you know to look for it. But, enthusiasts fall into
| that classic "can't make a man understand when his salary
| depends on not understanding it" thing - everyone has every
| reason to convince themselves it doesn't, because it would
| affect "their lifesyle".
|
| But electromigration exists. Electromigration affects parts
| on consumer-relevant timescales, if you overclock.
| Electromigration particularly affects memory
| controllers/system fabric nowadays. And yes, you can
| absolutely burn out a memory controller with just XMP (and
| the aggressive CPU voltages it applies) and this is not new
| or a secret. And the problem of electromigration/lifespan
| is accelerating as the operating range becomes narrower on
| newer nodes etc.
|
| https://semiengineering.com/aging-problems-at-5nm-and-
| below/
|
| https://semiengineering.com/3d-ic-reliability-degrades-
| with-...
|
| https://semiengineering.com/on-chip-power-distribution-
| model...
|
| Similarly: "24/7 safe" fabric overclocks are really not.
| Not on the order of years. Everyone is already incentivized
| to push the "official" limit as much as is safe/reliable -
| AMD/Intel know about the impact on benchmark scores too,
| they want their parts to look as good as they can. There is
| no "safe" increase above the official spec, not really.
|
| The unique thing about Asus wasn't that they killed a chip
| from XMP - it's that they put _so much_ voltage into it
| that it went into immediate runaway and popped instantly,
| explosively, and visibly. And it 's not surprising it was
| Asus (those giant memory QVLs come from just throwing
| voltage at the problem) but low-key everyone has been
| applying _at least some_ additional voltage for a long
| time. Eventually it kills chips. It 's overclocking/out-of-
| spec and very deliberately and specifically excluded from
| the warranty (AMD GD-106/GD-112).
|
| It's completely understandable why AMD wants to make some
| fuses/degradation canary cells to monitor whether the CPU
| has operated out-of-spec as far as warranty coverage. This
| is a serious failure mode that probably causes a large % of
| overall/total "CPU premature failure" warranty returns etc.
| And essentially it continues to get worse on every new node
| and with every new DDR standard, and with the increased
| thermals that currently are characteristic of stacked
| solutions etc.
|
| https://www.amd.com/en/legal/claims/gaming-details.html
|
| https://www.extremetech.com/computing/amds-new-
| threadripper-...
| wmf wrote:
| Fortunately servers don't have XMP.
| paulmd wrote:
| true, I am just pushing back on the idea that "abusing a
| chip is mostly a myth" and "if a CPU is working on the
| day you buy it, it's fine for desktop use-cases". For
| server parts that can't be OC'd - true, I guess. For
| regular CPUs? Absolutely not true, enthusiasts abuse the
| shit out of them and even if you do no further damage
| yourself, the degradation can continue over time etc as
| parts of the circuit just become critically unstable from
| small routine usage etc.
|
| (people treat their CPUs like gamer piss-jugs, big
| deferred cost tomorrow for a small benefit today.)
|
| But yes - ironically this means surplus server CPUs are
| actually way more reliable than used enthusiast CPUs. In
| some cases they are drop-in compatible in the consumer
| platform (although not so much in newer stuff), and the
| server stuff got the better bins in the first place, and
| it's cheaper (because they sold a lot more units), and
| also hasn't been abused by an enthusiast for 5 years etc.
| If you are on a platform like Z97 or X99 that supports
| the Xeon chips, the server chips are a complete no-
| brainer.
|
| And some xeons are even multiplier unlocked etc - used to
| be a thing, back in the day.
|
| ("server bins are binned for leakage and don't compete
| with gaming cpus" is another myth that is not really true
| except for XOC binning - server CPUs are better binned
| than enthusiast ones for ambient use-cases.)
| londons_explore wrote:
| But are there many cases of a CPU being overclocked (&
| overheated & overvolted), then later not being
| overclocked (and working fine), but then failing shortly
| afterwards?
|
| Yes, I understand it is theoretically possible. But I
| think it is just super rare - I've never heard of a
| single case.
| irusensei wrote:
| It kinda feels like the mining GPU being bad myth when in
| fact miners nursed those GPUs like babies because their
| income depended on those devices.
| namibj wrote:
| Tbf the fans may be broken on them, or at least not far
| from being broken. I.e., plan to waterblock it.
| epolanski wrote:
| People forget there's decade old servers out there working
| 24/7.
|
| Anyway there's a Microsoft research paper on silicon which
| essentially says that failure rates of CPUs increase by
| mostly two factors: - cycles. The more calculations the
| higher the rate of failure
|
| - temperature/power. I will let you guess it by yourself.
| Even minor slips overvoltages and overclocks can enhance
| failure rates by magnitude of orders.
|
| Getting back to your comment: I would've chosen a GPU used
| for mining (if properly cleaned during its life span, far
| from a given) over years rather than one used by some kid
| benchmarking and overclocking any day. Because years of
| crunching calculations did very little damage in comparison
| to a kid trying to find the overclock limits for few days.
| Most mining GPUs were used undervolted and underclocked
| (especially as Ethereum mining was memory rather than core
| intensive).
| usefulcat wrote:
| I'd much rather have something that came from a server room.
| Lots of cool, dust-free air--far better than a machine that's
| been sitting under someone's desk, clogged with dust and
| exposed to who know what temperatures.
| jmole wrote:
| I recently picked up a 64-core AMD EPYC Genoa QS (eng sample) on
| ebay for $1600, and have been very pleased with the performance.
| mhuffman wrote:
| Agreed! I have a AMD EPYC 7702P in an ASRocks mb and have been
| very pleased with performance in a homelab.
| naked-ferret wrote:
| What app are they using in this screenshot?
| https://www.servethehome.com/amd-epyc-7c13-is-a-surprisingly...
|
| Looks very neat!
| qwertox wrote:
| s-tui: https://github.com/amanusk/s-tui
| MenhirMike wrote:
| Still rocking my EPYC 7282 in my home server, which really sits
| in a sweet spots: 16 Cores, about $700, 120W TDP (because of the
| reduced memory bandwidth).
|
| Looks like the 7303 fills that same niche in the Milan generation
| (and should be compatible with any ROME mainboard, possibly after
| a BIOS update), or if you're building a new system you can get
| the 32-Core Siena 8324PN for about 130W TDP.
|
| (While it may be silly to look at TDP for a server CPU, it does
| matter for home servers if you want a regular PC Chassis and not
| a 1U/2U case with a 12W Delta cooling fan that is audible three
| cities over. In fact, you can get the 8-Core 80W 8324PN and still
| get all those nice PCIe lanes to connect NVMe SSDs to, and of
| course ECC RAM that doesn't require hunting down a specific
| motherboard and hoping for the best.)
| semi-extrinsic wrote:
| With these constraints, what is the benefit of Epyc over
| Threadripper? I've been running a 3970x in my workstation for
| several years now. Sure it's about twice the TDP, but with
| water cooling it stays quite silent even on full load.
| MenhirMike wrote:
| I wanted remote management (IPMI), which none of the
| Threadripper boards offered. I went with the ASRock Rack
| ROMED8-2T, which also has 2x 10G Ethernet on board, which was
| another nice thing I didn't have to sacrifice a PCIe slot
| for. It does require a Tower Case with space for fans on top
| though, because the CPU slot is rotated 90 Degrees compared
| to Threadripper boards, so the airflow is different.
|
| The EPYC CPU was also a quite a bit cheaper than the then-
| equivalent Threadripper 2950X (though the mainboard being
| $600 made up for that). This is even more true today because
| AMD really jacked up the prices for Threadripper to the point
| that EPYC is actually a good budget alternative. I guess that
| making 16 Core Ryzen made low-end Threadrippers less
| attractive, but it's the PCIe slots that were so great about
| those!
|
| Also, I do believe that it was much easier to find 64 GB
| RDIMMs whereas 64 GB ECC UDIMMs were not available or much
| more expensive, though my memory (ha!) is hazy on that, I
| just remember it being a PITA.
|
| So that EPYC system was just much more compelling.
| paulmd wrote:
| ROMED8-2T is one of the all-star boards of the modern era
| imo. Like that's literally "ATX-maxxed" in the Panamax
| sense - you can't go bigger than that in a traditional ATX
| layout, and there is no point to having a bigger CPU (even
| if you do not use all the pins) because it starts to eat up
| the space for the Other Stuff. It's a local optimum in
| board design.
|
| EEB/EE-ATX can push things a little farther (like
| GENOAD8X-2T) but you can't pull any more PCIe slots off, so
| it has to be MCIO/oculink instead. And imo this is the
| absolute limit of what can be done with single-socket Epyc.
|
| And you can't really get more than 8 memory slots without
| moving the CPU over to the other side of the board, like
| MZ32-AR0 or MZ33-AR0, which means it overhangs the PCIe
| slots etc. IIRC you can _sorta_ do 16-dimm SP3 if you don
| 't do OCP 2.0 (gigabyte or asus might have some of these
| iirc) and you drop to like 5 pcie slots or something. But
| it's really hard to get 2DPC on epyc at all, the layouts
| get very janky very quickly.
|
| You can fit more RAM slots into EEB/EE-ATX with a smaller
| socket (dual 2011-3 with 3DPC goes up to 24 slots in EE-
| ATX) but 2DPC is as big as you can go with epyc in a
| commodity form-factor. In SP5 this gets fully silly,
| MZ33-AR0 is an example of 2DPC 12-channel SP5, and it's
| like, oops all memory slots, _even with EEB and completely
| overlapping every single pcie slot_.
|
| And of course dual-socket epyc gets very cramped even on
| EEB/EE-ATX even with only 8 slots per socket (MZ72-HB0).
| You just are throwing away a tremendous amount of board
| space and you lose pcie, MCIO, everything. SP3 is already a
| honkin big socket let alone SP5, let alone two SP3, let
| alone two SP5 (me when I see a honkin pair), etc... they
| are big enough that you have to make Tough Choices about
| what parts of the platform you are going to exploit, or
| accept a non-"standard" form factor (it's not standard for
| anyone except home users/beige boxes). Servers don't use
| EEB/EE-ATX form factors anymore, because it just isn't the
| right shape for these platforms. And you need to be pulling
| a significant amount of the IO off in high-density
| formfactors (MCIO, Oculink, SlimSAS, ...) already, and your
| case ecosystem needs to support that riser-based paradigm,
| etc. ATX is dying and enthusiasts are not even close to
| being ready for the ground to shift underneath them like
| this.
|
| There's still good AM4, AM5, and LGA1700 server boards
| (with ECC) btw - check out AM5D5ID-2T, X570D4I-2T, X470D4U,
| W680 ACE IPMI, W680D4U-2L2T/G5, X11SAE-M, X11SAE-F,
| IMB-X1231, IMB-X1314, X300TM-ITX, etc. And Asrock Rack and
| Supermicro do make threadripper boards too, although I
| think they're not viable since threadripper is leaning
| farther and farther into the OEM market and it just doesn't
| make cost sense unless you really need the clocks. It's not
| like the X99 days where HEDT was just "better platform for
| enthusiasts", there is a big penalty to choosing HEDT right
| now if you don't need it.
|
| Unregistered DDR4 tops out at 32GB per stick (UDIMM or
| SODIMM), registered can go larger. DDR5 unregistered will
| go larger, and actually a few 48GB sticks do exist already,
| but generally you can't use all four slots without a
| massive hit to clocks (current LGA1700/AM5 drop to 3600
| MT/s) so consumers/prosumers have to consider that one
| carefully.
|
| (this generally means that drop-in upgrades are not viable
| for DDR5 memory btw - 4-stick configs suck, you should plan
| on just buying 2 new sticks when you need more. And the
| slots on the mobo are worse than useless, since the empty
| slots worsen the signal integrity compared to 2-slot
| configurations without the extra parasitics...)
| MenhirMike wrote:
| I agree, the ROMED8-2T has everything I want and
| compromises almost nothing. One of the PCI Express slots
| is shared with one of the on-board M.2 slots, SATA, and
| Oculink, but even then, you get to choose: Run the slot
| at x16 and turn off M2/Sata/Oculink? Run the Slot in x8
| and get M2/Sata but lose Oculink? Or disable the slot and
| get M2/Sata/Oculink? I think that's a great compromise (I
| run the slot at x8 and use it for a Fibre Channel card to
| my backup tape drive). Lovely block diagram in the manual
| as well.
|
| Plenty of Fan headers as well, and using SFF-8643
| connectors for the SATA ports makes so much sense (though
| it's an extra cost for the cables). They even put a power
| header if you run too many high-powered PCIe cards (since
| PCIe allows AFAIK to pull up to 75W from the slot).
|
| They really put every feature that makes sense onto that
| board, and yeah, if you want Dual CPUs or 16 DIMM Slots,
| chances are that a proper vendor server is more what you
| want.
|
| I can't think of anything that I don't like about the
| board. Well, I wish the built-in Ethernet ports weren't
| RJ45 but SFP+, but that's really the only thing I wish to
| change.
| z8 wrote:
| It should be noted that non-vendorlocked 7282s can be had for
| as little as 80 bucks on eBay. Bought one just a few weeks ago.
| Lovely piece of silicon.
| gigatexal wrote:
| What issues come with the vendor locking?
| MenhirMike wrote:
| Only works on the original motherboard (or maybe only
| motherboards made by the specific vendor the CPU was locked
| to) - so if you buy a used vendor-locked CPU, there's a
| risk it's basically just a nice looking paperweight. Serve
| The Home has a pretty good video:
| https://www.youtube.com/watch?v=kNVuTAVYxpM
| tiffanyh wrote:
| Last Generation
|
| Am I mistaken but isn't this AMD's last generation server proc?
|
| The current generation is 7xx4 / 9xx4.
|
| Which should be surprising it's cheaper.
| wmf wrote:
| Yes, it's the previous generation. Homelabbers mostly buy older
| used equipment at deep discounts.
| dheera wrote:
| This naming is confusing. Is 7C13 > 7950X? Why can't companies
| stick to simple conventions of "higher numbers are better" ...
|
| Even NVIDIA ... A800 > A100 > A10 but A6000 < A100
| Osiris wrote:
| Completely different platform. The Ryzen 7950X is a consumer
| CPU. The 7C13 is a server CPU and follows a separate naming
| convention.
|
| It shouldn't be confusing because you really wouldn't be
| comparing them to each other.
___________________________________________________________________
(page generated 2024-03-27 23:01 UTC)