[HN Gopher] Near-linear speedup for CPU compute on 20-core Mac S...
       ___________________________________________________________________
        
       Near-linear speedup for CPU compute on 20-core Mac Studio
        
       Author : selimnairb
       Score  : 130 points
       Date   : 2022-04-28 18:02 UTC (4 hours ago)
        
 (HTM) web link (hrtapps.com)
 (TXT) w3m dump (hrtapps.com)
        
       | ChrisMarshallNY wrote:
       | Theodolite is an awesome app (that I hardly ever need to use).
       | 
       | This guy def knows his math (and working with Apple apps).
        
       | ac29 wrote:
       | Doesnt this just suggest they are leaving some performance on the
       | table then? The reason the Intel processors scale non-linearly is
       | because they run each core faster when there are less cores under
       | load.
        
         | sliken wrote:
         | Dunno, looks like classic memory bottleneck to me. The m1 ultra
         | has 800GB/sec, but I believe a bit more than half of that is
         | available to the CPUs. The rest is for the GPU and various on
         | chip accelerators.
         | 
         | So with about half the cores (16 vs 28) and twice the bandwidth
         | (say 420GB/sec vs 180 GB/sec) it manages twice the performance.
         | Looks pretty impressive to me. Looks like the Apple is
         | significantly less memory bottle necked than the 6 channel Xeon
         | W.
        
       | bebort45 wrote:
       | I'm curious why Geekbench haven't put the Mac Studio on their Mac
       | leaderboard yet - https://browser.geekbench.com/mac-benchmarks.
       | There are plenty of benchmarks submitted
       | https://browser.geekbench.com/search?page=7&q=Apple+M1+Ultra...
        
         | 2OEH8eoCRo0 wrote:
         | Apple paid them not to because Intel handily beats them in
         | single core perf for a fraction of the price.
         | 
         | https://browser.geekbench.com/processor-benchmarks
        
           | jdlshore wrote:
           | Do you have any evidence for this statement ("Apple paid them
           | not to"), or are you just making shit up?
        
             | 2OEH8eoCRo0 wrote:
             | It was a lighthearted joke.
        
               | phs318u wrote:
               | Please don't turn HN into Slashdot.
               | 
               | I regularly downvote comments that make no points, are
               | solely there to make a gag, and add no substance to the
               | discussion.
        
               | 2OEH8eoCRo0 wrote:
               | Intel wins on single core performance as well as price
               | though.
        
               | recuter wrote:
               | Spec us out a full comparable system and show benchmarks.
        
               | sudosysgen wrote:
               | You can make a system with a 12900K, 32GB of RAM, and a
               | 1TB NVMe SSR for 1150$ :
               | https://pcpartpicker.com/list/m3QjZw
        
               | recuter wrote:
               | I'll never understand religious loyalty to a corporation.
               | 
               | The computer in the OP is fully assembled and has a 128GB
               | of memory, a nice GPU, and 8TB SSD. Why be so obtuse.
               | 
               | If you're just trying to compare to the entry level Mac
               | Studio at least have the decency to throw in a full parts
               | list, like you know, with a graphics card..
        
               | [deleted]
        
               | 2OEH8eoCRo0 wrote:
               | https://browser.geekbench.com/processors/intel-
               | core-i9-12900...
               | 
               | https://browser.geekbench.com/v5/cpu/14464705
               | 
               | Intel Core i9-12900K - 1991
               | 
               | Apple M1 Ultra - 1554
        
               | sliken wrote:
               | Sure, Intel focuses on single thread perf, high power
               | (241 watt max tdp), and automatically overclocks to 5.1
               | GHz, only if you have enough power, cooling, and a bunch
               | of idle cores. Thus the 15% variation in submitted
               | scores. It's also rather memory bandwidth constrained,
               | and shows impressive numbers with a single core running.
               | 
               | Apple on the otherhand doesn't overclock, focuses on
               | multi-core performance, has great memory bandwidth, and
               | all the submitted scores are within 1%.
               | 
               | The M1 ultra is also 1.32x faster in the multiprocessing
               | benchmark. Looks pretty impressive to me, even ignoring
               | the much less power the M1 ultra uses.
        
               | joakleaf wrote:
               | That's kind of disingenuous.
               | 
               | Searching on Geekbench for Apple M1 ultra single core
               | scores returns values mostly in the 1770-1780 range. E.g.
               | https://browser.geekbench.com/v5/cpu/14597244
               | 
               | Most 12900K score are between 1900 and 2200 but then
               | there is this outlier with single core score of 1252:
               | https://browser.geekbench.com/v5/cpu/14572307
               | 
               | Intel certainly wins on single core, but the m1 Ultra
               | multicore scores are still impressive in comparison being
               | generally 23-24000, while the 12900k are around 15-20000.
        
               | hu3 wrote:
               | So 1900-2200 for Intel and 1770-1780 for M1?
               | 
               | Disingenuous would be to focus on the outlier.
        
               | ModernMech wrote:
               | It's amusing that you're concerned about someone making a
               | joke turning this into Slashdot, because the actual
               | discussion here comparing specs and cost of Intel vs. Mac
               | could be lifted directly from the Slashdot archives circa
               | the late 90s (adjusting for the specs).
        
               | [deleted]
        
           | ceeplusplus wrote:
           | That seems like a poorly thought out theory considering the
           | $500 12900K needs 50 watts to beat the 2 year old M1 in
           | single core performance. Whether you're a laptop user or an
           | enterprise server customer you care about efficiency.
        
             | seabriez wrote:
             | For a workload that is optimized on M1 for 10 more watts
             | you get way faster and better functionality than a 2 yo M1
             | that uses 40 watts and cant even export h264 format video
             | faster than a 5 year old computer.
        
               | mwint wrote:
        
           | zamadatix wrote:
           | Outside the rest of the claim not making sense how exactly
           | would one compare the price of the M1 Ultra to an Intel CPU
           | in the first place? The M1 Ultra doesn't have a price tag and
           | even if it did you'd still not have a number you'd compare to
           | the cost of a CPU.
        
             | 2OEH8eoCRo0 wrote:
             | Apple M1 ultra: $4,000
             | 
             | Intel i9-12900k: $589
        
               | semigroupoid wrote:
               | It is pretty useless to compare the price of a single CPU
               | with the price of an entire PC...
        
               | dekhn wrote:
               | apple doesn't sell unbundled chips. Adding a motherboard
               | and RAM would still be less than $4K for most configs.
        
               | jrockway wrote:
               | You're definitely right there. I put together a build
               | with this CPU and chose the most expensive part available
               | (except GPU because chip shortage, case because there are
               | $5000 ATX cases for no good reason, PSU because I just
               | got the best Seasonic one, and SSD because there are $12k
               | enterprise ones): https://pcpartpicker.com/list/DYxhk9
               | 
               | So that's $3500 without a GPU, buy a $500 used GPU on
               | eBay and you're beating Apple. And, nobody buys $1000
               | motherboards, so that takes $500 off. You don't need a
               | $300 case. Etc. Basically the point of the exercise is
               | that you can max everything out, and get a faster
               | computer for less money, which is what the comment was
               | trying to say.
               | 
               | Someone will reply and say that your time sourcing and
               | assembling the components isn't free, or that it doesn't
               | run OS X, etc. I get it, you don't have to say that. Just
               | adding an actual computer that's expensive as possible
               | that you could have right now to compare to.
        
               | hedgehog wrote:
               | To save the click a 12900K machine from Dell is about
               | $3k. If you need CUDA or say SolidWorks get the PC, for
               | video and multithreaded workloads the Mac would probably
               | be faster, but really only benchmarks of your use case
               | can tell you.
        
               | BolexNOLA wrote:
               | So made a build to essentially show how expensive an M1
               | Mac is compared to an intel machine but left out a
               | critical component because it's too expensive?
        
               | seabriez wrote:
               | Yeah its only useless to compare when M1 is in a bad
               | light, but when its like the article then its totally
               | useful; even though the article is misleading AF trash
               | (calling matrix multiply "General Compute"? Yea, ok).
               | Comparing computers vs Apple that has special HW SOC for
               | performing an operation is a lie. And then he says that
               | somehow this extends to other workloads? LMAO. Its
               | already been proven that M1 is pretty terrible in
               | performance for many tasks, including video 264 export
               | and many others. If you wasted $6k on a MBP thats on you,
               | stop trying to post stupid articles about some kind of
               | "fake breakthrough performance that everyone is [missing]
               | out on" It only shows how un-knowledgable Mac users are,
               | and spreading Apples' fake benchmarks.
        
               | sliken wrote:
               | Umm, running a 1990s fortran code that's a CFD simulation
               | is a "real" workload. Seems relatively likely that any
               | floating point heavy code would act similarly. Hard to
               | say if it's the matrix multiple or the memory bandwidth
               | that's giving apple such a large lead.
               | 
               | Normally I'd discount using a 28 core Intel CPU from
               | 2019, but from what I can tell Intel hasn't improved much
               | since then. Keep in mind that Intel has a specialized
               | vector unit (AVX256 or AVX512 depending on the model),
               | and the listed CPU is pretty high end (with 6 memory
               | channels) where the normal i5/i7/i9 is only 2.
               | 
               | So sure it's not a video compression, gaming, or web
               | browing benchmark, but some folks do run floating point
               | heavy codes. Unlike CUDA, which requires a rewrite, this
               | code wasn't specifically optimized for the M1.
        
               | recuter wrote:
               | created: 3 months ago karma: 64
               | 
               | ---
               | 
               | Are you a bot? I swear there's an influx of random trolls
               | as of late. Like some entity using GPT-3 to sow as much
               | disharmony as possible regardless of subject.
        
               | smoldesu wrote:
               | Ah yes, the bot accusation comment. Everyone's favorite
               | subcategory of HN musing.
        
               | recuter wrote:
               | Hey man, maybe I'm a bot too. Who knows. Blip bloop. But
               | these are just computer chips and fresh low karma
               | accounts with vitriol dialed to the max is a-typical of
               | HN. By all means, M1 is the worst, why froth at the mouth
               | about it in this manner? You explain it.
        
               | ask_b123 wrote:
               | Well, I have lower karma than seabriez and I'm neither a
               | bot nor I thought that I had low karma points.
        
         | jfpoole wrote:
         | There's a bug in the Browser that we haven't been able to track
         | down yet that's preventing the Mac Studio from appearing on the
         | leaderboard.
        
       | mrjin wrote:
       | So a new discovery of Amdahl's Law?
       | 
       | https://en.wikipedia.org/wiki/Amdahl%27s_law
        
       | danieldk wrote:
       | _Above, we're looking at parallel performance of the NASA USM3D
       | CFD solver as it computes flow over a classic NACA 0012 airfoil
       | section at low speed conditions._
       | 
       | If this solver relies on matrix multiplication and uses the macOS
       | Accelerate framework, you are seeing this speedup because M1 Macs
       | have AMX matrix multiplication co-processors. In single precision
       | GEMM, the M1 is faster than an 8 core Ryzen 3700X and a bit
       | slower than a 12 core Ryzen 5900X. The M1 Pro doubles the GFLOPS
       | of the M1 (due to having AMX co-processors for both performance
       | core clusters). And the M1 Ultra again doubles the GFLOPs (4
       | performance core clusters, each with an AMX unit).
       | 
       | Single-precision matrix multiplication benchmark results for the
       | Ryzen 3700X/3900X and Apple M1/M1 Pro/M1 Ultra are here:
       | 
       | https://twitter.com/danieldekok/status/1511348597215961093?s...
        
         | stephencanon wrote:
         | If it were taking advantage of Accelerate, the performance
         | would be much higher, but also the scaling would be quite
         | different. Look at the scaling in the tweet you linked--it's
         | anything but linear in the number of cores used.
        
         | bee_rider wrote:
         | Just for anyone as out of touch with the MacOS ecosystem as me:
         | Accelerate includes a BLAS implementation, so at least seems
         | plausible (depending on how this library was compiled) that
         | their special instructions might have been used.
        
         | torginus wrote:
         | But that makes me think, what prevents people from running
         | these calculations on the GPU? Even the memory is shared - the
         | few 100 gflops they get out of the M1 ultra is pocket change in
         | GPU terms.
        
           | the_svd_doctor wrote:
           | Well that particular NASA code is from the 90's and in
           | Fortran. Maybe with MPI. That's probably part of the reason
           | why.
        
           | CamperBob2 wrote:
           | If a CPU doesn't run the OS and software I need to run, it
           | might as well _be_ a GPU or a Cray 9 or something dug out of
           | the wreckage at Area 51. All of which are good descriptions
           | of a CPU available only on the increasingly-proprietary Mac
           | platform.
           | 
           | So this entire thread is kind of pointless with regard to
           | many real-world use cases.
        
       | Reason077 wrote:
       | > _" Nano-texture glass gives up a little bit of the sharp
       | vibrant look you get with a glossy screen, but it's worth the
       | trade in usability, to be able to see the screen without
       | distractions all day long."_
       | 
       | "Nano-texture glass" is pretty much just what all screens were
       | like back in the days of CRTs and pre-glossy flat screens. Now
       | Apple are charging $300 for it!
        
         | astrange wrote:
         | "Nano-texture" is different from matte - matte LCDs don't have
         | reflective glare, but they also have much lower contrast and
         | you can see the grain if you look closely. Nanotexture doesn't
         | have those issues, but it's expensive.
        
         | numpad0 wrote:
         | It is a marketing name, but refers to a special procedure used
         | to create the matte surface, not the fact that it's matte. By
         | the way, CRTs were glossy. We wiped them with wet towels.
        
           | Reason077 wrote:
           | Most CRTs weren't glossy in the same way that modern flat
           | panels are. They diffused reflections: you couldn't see a
           | crystal-clear mirror image of yourself on a dark screen like
           | you can with today's screens.
        
           | ChrisMarshallNY wrote:
           | _> By the way, CRTs were glossy. We wiped them with wet
           | towels._
           | 
           | I remember the crackling, if you wiped them, within a few
           | minutes of last being on.
           | 
           | Nowadays, there's lots of folks that think "CRT" is a
           | political hotbutton topic.
        
             | toast0 wrote:
             | > Nowadays, there's lots of folks that think "CRT" is a
             | political hotbutton topic.
             | 
             | Well, who wants big government shoving their noses into how
             | efficient our screens are, and whether they have leaded
             | glass or not. :P
        
         | jjtheblunt wrote:
         | the pixels are WAY smaller nowadays, though; perhaps that
         | constrains the nanotexture fabrication process in an expensive
         | way?
        
         | olliej wrote:
         | that was my thought as well ("yay marketing") but apparently
         | it's actually structurally different so that it maintains
         | contrast. I ordered one in early march and if it ever actually
         | arrives I'll try to remember to reply on visible difference :D
         | (not kidding, it was due late march, then slipped to Aripl
         | 22-27, and today moved to May 23-?? )
        
       | ProllyInfamous wrote:
       | https://en.wikipedia.org/wiki/Gustafson's_law
        
       | cglong wrote:
       | Editorialized title. Original was "2022 Mac Studio (20-core M1
       | Ultra) Review".
        
         | pvg wrote:
         | email these in
        
           | cglong wrote:
           | Done! Thank you for the reminder :)
        
           | selimnairb wrote:
           | Could you explain what "these" are that need to be emailed
           | in?
        
             | electroly wrote:
             | Title corrections. @dang can't read every thread to see
             | these posts, but if you email him, he can take a look.
        
               | selimnairb wrote:
               | Makes sense. Thank you for clarifying.
        
         | MBCook wrote:
         | I didn't submit it but in this case the original title is so
         | generic no one would have looked at it so I'm kind of happy
         | they put the important part in the headline here.
        
       | sydthrowaway wrote:
       | I bet you can build an AMD system that beats this handily and
       | costs half as much.
        
       ___________________________________________________________________
       (page generated 2022-04-28 23:00 UTC)