[HN Gopher] AMD 3rd Gen EPYC Milan Review
       ___________________________________________________________________
        
       AMD 3rd Gen EPYC Milan Review
        
       Author : pella
       Score  : 165 points
       Date   : 2021-03-15 15:11 UTC (7 hours ago)
        
 (HTM) web link (www.anandtech.com)
 (TXT) w3m dump (www.anandtech.com)
        
       | zhdc1 wrote:
       | It looks like Zen 2 processors about about to become even more of
       | a bargain than they already are.
       | 
       | I'll take a 7702P at $2-3K over a 7713P at $5K ten times out of
       | ten.
        
         | dragontamer wrote:
         | Considering both are made from the 7nm TSMC process, AMD
         | probably aren't going to make any more Zen2 processors at this
         | point.
         | 
         | I think you're right: that buying a generation old or so can
         | offer gross cost savings. But that's only true for the time-
         | period where those chips are available.
        
           | IanCutress wrote:
           | AMD is going to keep Zen 2 EPYC sales going for a good while
           | yet. Both families will co-exist in the market.
        
             | blagie wrote:
             | I suspect so. A lot of the commercial market wants
             | stability. Once I've validated a server config for a
             | particular use, I want to be able to continue building
             | those servers for a long time (often long past
             | obsolescence).
             | 
             | That may seem odd, but a lot of safety-critical
             | applications (e.g. medical, military, aerospace, etc.)
             | require spending tens of thousands, hundreds of thousands
             | of dollars, or even millions of dollars (not to mention
             | months of time) re-validating a system with any substantive
             | change.
             | 
             | Even for less critical applications, spending $2000 extra
             | on each CPU is a bargain compared to re-validating a
             | system.
             | 
             | If AMD wants to be a credible presence in those markets,
             | and I'm pretty sure it does, it needs to chips with many
             | year lifespans before EOL.
             | 
             | Some companies manage this by having a subset of devices or
             | of software which is LTS.
        
               | derefr wrote:
               | Rather than buying new-old-stock CPUs, why not just buy
               | all the CPUs the long-term program will ever need when
               | they're still cheap, and stockpile them? It's not like
               | they go bad.
        
       | bryanlarsen wrote:
       | Hopefully they can fix their idle power consumption with a
       | firmware tweak or a new stepping, that's a massive regression. It
       | looks like that causes a significant performance degradation too
       | -- more of the power budget going to IO means less to the compute
       | cores.
        
       | blinkingled wrote:
       | INVLPGB New instruction to use instead of inter-core
       | unterrupts to broadcast page invalidates, requires
       | OS/hypervisor support                VAES / VPCLMULQDQ AVX2
       | Instructions for         encryption/decryption acceleration
       | SEV-ES Limits the interruptions a malicious hypervisor may
       | inject into a VM/instance             Memory Protection Keys
       | Application control for access-         disable and write-disable
       | settings without TLB management               Process Context ID
       | (PCID) Process tags in TLB to reduce         flush requirements
       | 
       | Interruptions (Instructions) and Unterrupts (Interrupts) aside
       | (the article obviously was pushed out as fast as AT could lol) -
       | these additions seem like they would help with performance when
       | it comes to mitigations of all the speculation vulnerabilities in
       | an hypervisor env?
        
       | stillbourne wrote:
       | I'm waiting for news on genesis peak, I'd love to get 4th gen
       | threadripper on my next box.
        
       | fvv wrote:
       | Benchmarks
       | https://www.phoronix.com/scan.php?page=article&item=epyc-700...
        
         | IanCutress wrote:
         | The article linked at the top has pages of benchmarks. Did....
         | you miss them?
        
       | modzu wrote:
       | how is a 300W cpu cooled in a server environment? just high rpms
       | and good environmentals? ive stayed on intel with my workstation
       | so i can keep a virtually passive and quiet heatsink without
       | having to go water
        
         | [deleted]
        
         | numpad0 wrote:
         | By an array of fans specced like 8cm 10W 10krpm, like six fans
         | wide two fans deep, blowing into a passive heatsink, with help
         | of air ducts inside the chassis
         | 
         | Intel or AMD or SPARC or ARM the setup is all the same for
         | rackmount hardwares, pure copper passive heatsink and high
         | power axial fans
        
         | rodgerd wrote:
         | Servers are horrifyingly loud, and datacentres will destroy
         | your hearing in no time flat. Airflow management in datacentres
         | is quite the art form, as well: proper packing of the racks for
         | airflow, height of raised floors to accommodate blown air, and
         | so on; some people are starting to go back to the future with
         | things like liquid-cooled racks as well.
        
         | formerly_proven wrote:
         | Typical servers are 1U or 2U high boxes. 1U=44.5 mm and each
         | box is "full depth", so around 700 mm long (sometimes more),
         | and ~450 mm wide. The 1U boxes typically just have a block of
         | copper with a dense bunch of fins on them as a CPU cooler,
         | while 2U designs usually incorporate heatpipes.
         | 
         | Lots of airflow.
         | 
         | High-density systems are usually 2U, but with four dual-socket
         | systems per chassis.
        
         | ben-schaaf wrote:
         | To put it in perspective, the Intel Xeon 6258R is rated at
         | 205W. Servers have oodles of (likely airconditioned) airflow so
         | this isn't a problem.
        
         | folago wrote:
         | When I worked in HPC in Tromsoe, arctic Norway, one of local
         | supercomputers was liquid cooled and the heat used warmed up
         | some buildings on the university campus.
        
         | Latty wrote:
         | Yeah, noise is relatively unimportant so they tend to use a ton
         | of extremely high RPM fans and big hunks of copper, from what
         | I've seen.
        
           | wffurr wrote:
           | And the DC staff wears hearing protection when they're
           | working among the racks.
        
             | myself248 wrote:
             | That still blows my mind. Coming from telecom where
             | everything prior to the #5 ESS was convection cooled, a
             | happy office is a quiet office.
             | 
             | Data got weird.
        
               | rbanffy wrote:
               | I'm assuming telecom had very different volume-power
               | requirements. Where I grew up there were many mid-city
               | phone switches that were large, concrete exterior, almost
               | windowless, buildings.
        
             | singlow wrote:
             | I wish that hearing protection had been required or at
             | least offered when I used to visit data centers frequently.
             | They made a big deal of the fire suppression training, but
             | never even suggested ear plugs. 20 year-old me had no idea
             | how bad that noise was for my ears. I hope the staff there
             | were wearing plugs, but it was never apparent.
        
               | mhh__ wrote:
               | You only get one set of ears.
               | 
               | I suspect I still have the record for most expletives
               | used in front of the headmaster at my old school because
               | someone turned a few kW speaker on while I was wiring
               | something under the stage - i.e. I wasn't pleased.
        
         | willis936 wrote:
         | Servers are in 4U rack mounted enclosures with (relatively) low
         | height heatsinks and huge amounts of airflow. Intake in front,
         | exhaust in rear. Most clients will have beefy air conditioning
         | to keep the ambient intake temp and humidity low.
        
           | wtallis wrote:
           | I think 4U servers are quite rare these days, except for
           | models designed to accommodate large numbers of either GPUs
           | or 3.5" hard drives. Most 2-socket servers with up to a few
           | dozen SSDs are 2U designs.
        
             | touisteur wrote:
             | Y'all can try to pry my quad-cpu 3-upi 4U HPE DL.580 from
             | my cold dead hands.
        
         | [deleted]
        
         | Out_of_Characte wrote:
         | a cooling unit provides a delta between surface area
         | temparature and the ambient temparature. Epyc chips are
         | significantly larger and more spread out with the chiplets so
         | the heat density of these chips are relatively similar. So the
         | cooler doesn't have to provide a larger temparature delta and
         | ony has to be slightly larger if at all.
         | 
         | Overall heat density on chips has increased due to litography
         | changes so the chiplet architecture in a way is only a stopgap.
        
         | epmaybe wrote:
         | Performance per watt is better on amd right now, at least last
         | I heard.
        
           | unicornfinder wrote:
           | Performance per watt _is_ better but unfortunately the idle
           | power consumption is fairly high.
        
             | epmaybe wrote:
             | Oh I see that in the article now. It also seems like the
             | performance gains you saw on other Zen3 chips doesn't
             | correlate with EPYC due to the sheer number of cores and
             | other components on chip.
        
             | kllrnohj wrote:
             | Epyc's idle power consumption is fairly high but Ryzen's
             | isn't. The more workstation-focused Threadripper &
             | Threadripper Pro is also still significantly better than
             | Epyc here.
        
       | whatshisface wrote:
       | That logarithmic curve fit to the samples from a step function...
        
       | rbanffy wrote:
       | Why so little L3 cache on the competition?
        
         | xirbeosbwo1234 wrote:
         | First off, it's not a direct comparison. The Epyc has one L3
         | cache per chiplet. This means that latency is not uniform
         | across the entire L3 cache. This was a serious concern on the
         | first generation of Epyc, where accessing L3 could take
         | anywhere from zero to three hops across an internal network.
         | AMD has greatly improved the problem on the more recent
         | generations by switching to a star topology with more
         | predictable latency.
         | 
         | That said, there are two major reasons:
         | 
         | 1. Epyc is on a chiplet architecture. Large chips are harder to
         | make than small ones. Building two 200mm^2 chips is cheaper
         | than building one 400mm^2 chip. Since Epyc has a chiplet
         | architecture, this means they can put more silicon into a chip
         | for the same price. This means that Epyc can be just plain
         | _bigger_ than the competition. This comes with some complexity
         | and inefficiency but has, so far, paid off in spades for them.
         | 
         | 2. Epyc is on a newer process. This means AMD can fit more
         | transistors even with the same area. Intel has had serious
         | problems with their newer processes, so this is not an
         | advantage AMD expected to have when designing the part. The use
         | of a cutting-edge process was, in part, enabled by the chiplet
         | architecture. It is possible to fabricate several small chips
         | on a 7nm process even though one large chip would be
         | prohibitively expensive, and AMD has been able to use a 14nm
         | process in parts of the CPU that wouldn't benefit from a 7nm
         | process to cut costs.
         | 
         | The first point is serious cleverness on the part of AMD. The
         | second point is mostly that Intel dropped the ball.
        
           | totalZero wrote:
           | What is the likelihood that mixed-process chiplets become the
           | state of the art?
        
             | uluyol wrote:
             | Intel already said they would use chiplets [1] and TSMC has
             | been talking about the various packaging technologies being
             | developed [2].
             | 
             | [1] https://www.anandtech.com/show/16021/intel-moving-to-
             | chiplet...
             | 
             | [2] https://www.anandtech.com/show/16051/3dfabric-the-home-
             | for-t...
        
             | Macha wrote:
             | Aren't they already for big (desktop/workstation/server)
             | chips? I'd say Zen3 is the state of the art in that market
             | and that uses a mixed process. The IO dies are global
             | foundries 12nm for AMD.
             | 
             | The mobile market cares more about efficiency than easily
             | scaling up to much bigger chips, so the M1 and other ARM
             | chips are probably going to ignore this without much
             | consequence for smaller chips.
             | 
             | Intel still tops sales because of non-perf related reasons
             | like refresh cycles, distrust of AMD from last time they
             | fell apart in the server space, producing chips in
             | sufficient quantities unlike the entire rest of the
             | industry fighting over TSMC's capacity, etc.
        
         | dragontamer wrote:
         | EPYC is a split L3 cache. Any particular core only benefits
         | from 32MBs of L3, the 33rd MB is "on another chip". (EDIT: Zen2
         | was 16MBs, Zen3 is 32MBs. Fixed numbers for Zen3)
         | 
         | As such, AMD can make absolutely huge amounts of L3 cache
         | (well, many parallel L3 clusters), while other CPU designers
         | need to figure out how to combine the L3 so that a single core
         | can benefit it from it all.
        
           | xirbeosbwo1234 wrote:
           | That's not quite accurate. Every core has access to the
           | entire L3, including the L3 on an entirely different socket.
           | CPUs communicate through caches, so if a core just plain
           | couldn't talk to another core's cache then cache coherency
           | algorithms wouldn't work. Though a core can access the entire
           | cache, the latency is higher when going off-die. It is
           | _really_ high when going to another socket.
           | 
           | The first generation of Epyc had a complicated hierarchy that
           | made latency quite hard to predict, but the new architecture
           | is simpler. A CPU can talk to a cache in the same package but
           | on a different die with reasonably low latency.
           | 
           | (I don't have numbers. Still reading.)
        
             | dragontamer wrote:
             | In Zen1, the "remote L3" caches had longer read/write times
             | than DDR4.
             | 
             | Think of the MESI messages that must happen before you can
             | talk to a remote L3 cache:
             | 
             | 1. Core#0 tries to talk to L3 cache associated with
             | Core#17.
             | 
             | 2. Core#17 has to evict data from L1 and L2, ensuring that
             | its L3 cache is in fact up to date. During this time,
             | Core#0 is stalled (or working on its hyperthread instead).
             | 
             | 3. Once done, then Core#17's L3 cache can send the data to
             | Core#0's L3 cache.
             | 
             | ----------
             | 
             | In contrast, step#2 doesn't happen with raw DDR4 (no core
             | owns the data).
             | 
             | This fact doesn't change with the new "star" architecture
             | of Zen2 or Zen3. The I/O die just makes it a bit more
             | efficient. I'd still expect remote L3 communications to be
             | as slow, or slower, than DDR4.
        
           | rbanffy wrote:
           | What AMD does is not magic and is not beyond what others can
           | do. My question is why they chose to have just 32MB for up to
           | 80 cores when AMD can choose to have 32MB per 8-core chiplet.
           | 
           | As a comparison, an IBM z15 mainframe CPU has 10 cores and
           | 256MB per socket.
        
             | dragontamer wrote:
             | > As a comparison, an IBM z15 mainframe CPU has 10 cores
             | and 256MB per socket.
             | 
             | Well, that's eDRAM magic, isn't it? Most manufacturers are
             | unable to make eDRAM on a CPU.
             | 
             | > My question is why they chose to have just 32MB for up to
             | 80 cores when AMD can choose to have 32MB per 8-core
             | chiplet.
             | 
             | From my understanding, those ARM chips are largely I/O
             | devices: read from disk -> output to Ethernet.
             | 
             | In contrast, IBM's are known for database backends, which
             | likely benefits from gross amounts of L3 cache. EPYC is
             | general purpose: you might run a database on it, you might
             | run I/O constrained apps on it. So kind of a middle ground.
        
               | meepmorp wrote:
               | IBM doesn't fab its own chips, right? I thought they used
               | GF.
        
               | wmf wrote:
               | It's basically IBM fabs that were "sold" to
               | GlobalFoundries. AFAIK IBM processors use a customized
               | process that isn't used by any other GF customers.
        
         | ChuckNorris89 wrote:
         | Limitations due to die size and power consumption since Intel
         | Xeon is still on the _ye olde_ 14nm++ process.
         | 
         | Also since Xeon dies are monolithic, unlike AMDs chiplet
         | design, means that increasing the size of certain components on
         | the die, like cache for example, increases the risk of defects
         | wich reduces the yields, making them unprofitable.
        
           | rbanffy wrote:
           | True, but the ARM ones have just 32MB for up to 80 threads.
           | 
           | I wonder if we could get numbers for L3 misses and cycles
           | spent waiting for main memory under realistic workloads.
        
             | dragontamer wrote:
             | That information changes with every application. Literally
             | every single program in the world has its own cache
             | characteristics.
             | 
             | I suggest learning to read performance counters, so that
             | you can get information like this yourself! L3 cache is a
             | bit difficult for AMD processors (because many cores share
             | the L3 cache), but L2 cache is pretty easy to work with and
             | profile.
             | 
             | General memory-reads / memory latency is pretty easy to
             | read with various performance counters. Given the amount of
             | latency, you can sorta guess if its in L3 or in DDR4.
        
       | rbanffy wrote:
       | There must be a typo on the 74F3 price. US$2900 for it is a
       | steal.
        
         | tecleandor wrote:
         | I found it in the original press release (price for !K units,
         | of course)
         | 
         | https://ir.amd.com/news-events/press-releases/detail/993/amd...
        
           | masklinn wrote:
           | Still seems like a typo, it doesn't make any sense that the
           | 24 / 48 would be priced between the 8 / 16 and the 16 / 32.
           | Either the prices of the 73 and 74 were swapped or the tag is
           | just plain wrong. "2900" is also very suspiciously round
           | compared to every other price on the press release.
        
             | dragontamer wrote:
             | How is it suspicious?
             | 
             | 256MB L3 (or really, 8 x 32MBs L3) and 24-cores suggests
             | the bottom-of-the-barrel 3 cores active per 8-core CCX.
             | 
             | 8x CCX with 3-cores. The yields on those chips must be
             | outstanding: its like 62.5% of the cores could have
             | critical errors and they can still sell it at that price.
             | 
             | EDIT: My numbers were wrong at first. Fixed. Zen3 is
             | double-sized CCX (32MBs / CCX instead of 16MBs/CCX)
             | 
             | ---------
             | 
             | In contrast, the 28-core 7453 is $1,570. I personally would
             | probably go with the 28-core (with only 2x32MB L3 cache, or
             | 64MBs) rather than the 24-core with 256MBs L3 cache.
             | 
             | For my applications, I bet that having 7-cores share an L3
             | cache (and therefore able to communicate quickly) is better
             | than having 1 or 2 cores having 32MBs of L3 to themselves.
             | 
             | There are also significant price savings, as well as
             | significant power / wattage savings with the 28-core /
             | 64MBs model.
        
               | [deleted]
        
               | masklinn wrote:
               | > In contrast, the 28-core 7453 is $1,570.
               | 
               | Which is cheaper than the 24c 7443 and 7413 but not the
               | 16c 7343 and 7313.
               | 
               | And it only has half the L3 compared to its siblings
               | (1/4th compared to the 7543 top end), a lower turbo than
               | every other processor in the range (whether lower or
               | higher core counts), well as an unimpressive base
               | frequency, and a fairly high TDP by comparison (as high
               | as the 7543).
               | 
               | The 74F3 has no such discrepancy, it has the same L3 as
               | every other F-series and slots neatly into the range
               | frequency-wise: same turbo as its siblings (with the 72
               | being 100MHz higher), 300MHz base lower han the 73, and
               | 250 higher than the 75.
        
               | dragontamer wrote:
               | > Which is cheaper than the 24c 7443 and 7413 but not the
               | 16c 7343 and 7313.
               | 
               | 28-cores for $1570 seems to be the "cheapest per core" in
               | the entire lineup.
               | 
               | It all comes down to whether you want those cores
               | actually communicating over L3 cache, or not. Do you want
               | 7-cores per L3 cache, or do you prefer 4-cores per L3
               | cache?
               | 
               | 4-cores per L3 cache benefits from having more overall
               | cache per core. But more-cores per L3 cache means that
               | more of your threads can tightly-communicate cheaply, and
               | effectively.
               | 
               | ---------
               | 
               | More L3 cache probably benefits from cloud-deployments,
               | Virtual Desktops, and similar (since those cores aren't
               | communicating as much).
               | 
               | More cores per L3 cache benefits from more tightly
               | integrated multicore applications.
               | 
               | EDIT: Also note that "more cores" means more L1 and L2
               | cache, which is arguably more important in compute-heavy
               | situations. L3 cache size is great of course, but many
               | applications are L1 / L2 constrained and will prefer more
               | cores instead. 24c 7443 with 2x32MB L3 is probably a
               | better chess-engine than 16c 7343 4x32MB L3.
        
             | mrb wrote:
             | It doesn't seem to be a typo. AMD offers many variations of
             | each core configurations, with different base frequencies.
             | It's just that there are simply pricing overlaps between
             | some low-core high-freq version and some higher-core lower-
             | freq versions. For example the 7513 (32 cores) is also
             | cheaper than the 73F3 (16 cores).                 75F3
             | 32-core 2.95GHz $4,860       7543 32-core 2.80GHz $3,761
             | 7513 32-core 2.60GHz $2,840              74F3 24-core
             | 3.20GHz $2,900       7443 24-core 2.85GHz $2,010       7413
             | 24-core 2.65FHz $1,825              73F3 16-core 3.50GHz
             | $3,521       7343 16-core 3.20GHz $1,565       7313 16-core
             | 3.00GHz $1,083
             | 
             | Source: https://ir.amd.com/news-events/press-
             | releases/detail/993/amd...
        
             | coder543 wrote:
             | It makes perfect sense if you're an enterprise customer and
             | your software dependencies charge you extremely different
             | priced tiers for different maximum numbers of cores. AMD is
             | selling a license-optimized part at a higher price because
             | there will be plenty of demand for it.
             | 
             | People who don't save a boatload by getting the license-
             | optimized CPU will invariably choose to buy the 24-core
             | one, which helps AMD by making it easier for them to keep
             | up with the demand for the 16-core variant, and the 16-core
             | variant gets an unusually nice profit margin. Win win.
             | 
             | This is not the first time AMD or Intel have offered a
             | weird inverse-pricing jump like this... I highly doubt it
             | is a typo.
             | 
             | My other comment reiterates some of these points a
             | different way:
             | https://news.ycombinator.com/item?id=26469182
        
               | rodgerd wrote:
               | Yep - in the past I've done "special orders" for not-
               | publicly-advertised CPU configs from our hardware vendor
               | to get low core count, high-clock servers for products
               | like Oracle DB.
        
             | fvv wrote:
             | right, I think 3900 may be the correct price
        
         | wffurr wrote:
         | RTFA:
         | 
         | " Users will notice that the 16-core processor is more
         | expensive ($3521) than the 24 core processor ($2900) here. This
         | was the same in the previous generation, however in that case
         | the 16-core had the higher TDP. For this launch, both the
         | 16-core F and 24-core F have the same TDP, so the only reason I
         | can think of for AMD to have a higher price on the 16-core
         | processor is that it only has 2 cores per chiplet active,
         | rather than three? Perhaps it is easier to bin a processor with
         | an even number of cores active"
        
           | coder543 wrote:
           | I really don't think the article's speculation there is
           | helpful... it's really reaching.
           | 
           | As I said below the article in the comments:
           | 
           | > If I were to speculate, I would strongly guess that the
           | actual reason is licensing. AMD knows that more people are
           | going to want the 16 core CPUs in order to fit into certain
           | brackets of software licensing, so AMD charges more for those
           | to maximize profit and availability of the 16 core parts. For
           | those customers, moving to a 24 core processor would probably
           | mean paying _significantly_ more for whatever software they
           | 're licensing.
           | 
           | This is the more compelling reason to me, and it matches with
           | server processors that Intel and AMD have charged more for in
           | the past.
           | 
           | "Even vs odd" affecting the difficulty of the binning process
           | just sounds extremely arbitrary... definitely not likely to
           | affect customer prices, given how many other products are in
           | AMD's stack that don't show this same inverse pricing
           | discrepancy.
        
         | cm2187 wrote:
         | I am building a single socket server right now, I can't really
         | justify more than twice the price of a 7443P for a marginally
         | higher base clock and twice the cache. Does the cache makes
         | that much of a difference? I thought these are already very
         | large caches vs lots of Intel CPUs.
        
           | dragontamer wrote:
           | Hmm, with AMD Threadripper, you're already looking at TLB
           | issues at these L3 sizes. So if you actually want to take
           | advantage of lots of L3, you need either many cores, or
           | hugepages.
           | 
           | Case in point: AMD Zen2 has 2048 TLB entries (L2), under a
           | default (in Linux and Windows) of 4kB per TLB entry. That's
           | 8MBs of TLB before your processor starts to page-walk.
           | 
           | Emphasis: Your application will pagewalk when the data still
           | fits in L3 cache.
           | 
           | ------------
           | 
           | I'm looking at some of this lineup with 3-cores per CCX
           | (32MBs L3 cache), which means under default 4kB pages, those
           | cores will always require a pagewalk to just read/write its
           | 32MBs L3 cache effectively.
           | 
           | With that being said: 2048 TLB entries for Zen2 processors.
           | Maybe AMD has increased the TLB entries for Zen3. Either way,
           | you probably should start looking at hugepage configuration
           | settings...
           | 
           | These L3 cache sizes are absurd, to the point where its kind
           | of unwieldy. I mean, with enough configuration / programming,
           | you can really make these things fly. But its not exactly
           | plug-and-play.
        
             | justincormack wrote:
             | The 64k page size available on Arm (and Power) makes a lot
             | more sense with these kind of cache sizes. With 2MB amd64
             | hugepages its only 16 different pages in that L3 cache,
             | which for a cluster of up to 8 CPUs is not much at all when
             | using huge pages.
        
               | dragontamer wrote:
               | TLB-misses always slows down your code, even out-of-
               | cache.
               | 
               | So having 2MB (or even 1GB) hugepages is a big advantage
               | in memory-heavy applications, like databases. No, 1GB
               | pages won't fit in L3 cache, but it still means you won't
               | have to page-walk when looking for memory.
               | 
               | 1GB pages might be too big for today's computers, but 2MB
               | pages might be good enough for default now. Historically,
               | 4kB was needed for swap purposes (going to 2MB with Swap
               | would incur too much latency if data paged out), but with
               | 32GBs RAM + SSDs on today's computers... fewer and fewer
               | people seem to need swap.
               | 
               | There might be some kind of fragmentation-benefit for
               | using the smaller pages, but it really is a hassle for
               | your CPU's TLB to try to keep track of all that virtual
               | memory and put it back in order.
               | 
               | ---------
               | 
               | While there is performance hits associated with page-
               | walks, the page-walk process is fortunately pretty fast.
               | So most applications probably won't notice a major
               | speedup... still though, the idea of tons of unnecessary
               | page-walks slowing down untold amounts of code bothers me
               | a bit for some reason.
               | 
               | Note: ARM also supports hugepages. So going up to 2MBs
               | (or bigger) on ARM is also possible.
        
       | temptemptemp111 wrote:
       | But why won't AMD let us do any secure booting?
        
       | ChuckMcM wrote:
       | Nice bump in specs. Perhaps now they will announce the Zen3
       | Threadripper :-). As others have mentioned the TR can starve
       | itself on memory accesses when doing a lot of cache invalidation
       | (think pointer chasing through large datasets). If the EPYC
       | improvement of having the chiplet CPUs all share L3 cache moved
       | into the TR space (which one might assume it would[1]) then this
       | could be a reason to upgrade.
       | 
       | [1] I may be wrong here but the TR looks to me like an EPYC chip
       | with the mult-CPU stuff all pulled off. It would be interesting
       | to have a decap with the chiplets identified.
        
         | gameswithgo wrote:
         | yes TR will have the new cache configuration, just like regular
         | ryzen and epyc do.
        
       ___________________________________________________________________
       (page generated 2021-03-15 23:01 UTC)