[HN Gopher] How M1 Macs feel faster than Intel models: it's abou...
___________________________________________________________________
How M1 Macs feel faster than Intel models: it's about QoS
Author : giuliomagnifico
Score : 465 points
Date : 2021-05-17 11:23 UTC (11 hours ago)
(HTM) web link (eclecticlight.co)
(TXT) w3m dump (eclecticlight.co)
| CitizenKane wrote:
| This certainly makes a lot of sense, and it's a brilliant
| strategy to improve overall performance. But it doesn't account
| for the full picture either. My M1 Macbook Pro feels about as
| responsive as my desktop system with an AMD Ryzen 3900X, which is
| impressive to say the least. It doesn't quite have the raw
| computing power of the 3900X, but given that it's a portable
| device that's something else.
| [deleted]
| nottorp wrote:
| Well, they did it first in iOS which is more responsive than
| Android even when compared to devices that win benchmarks vs
| the iPhone.
|
| It's just doable when you care enough to optimize the OS for
| responsiveness.
| agloeregrets wrote:
| Much like the M1, I don't think anything actually wins
| benchmarks vs the iPhone. There's display refresh rate and a
| few multicore stores that are closer but in any real world
| test, Single core dominance wins out. (Plus the big/little
| design in the A series chips is better than the three scales
| seen in QC chips as QC backfills the single large core with a
| ton of little cores to make the multicore numbers look
| competitive when in reality, any multithreaded task of two
| cores is much faster and you don't end up with mismatched
| race conditions.)
| nottorp wrote:
| Are you sure? iOS has been more responsive from day 1 (I
| mean the first 2-3 iphone versions), when they were using
| an off the shelf CPU that everyone else had.
| smoldesu wrote:
| I think the commenter is saying that iOS provides no
| discernible performance advancements over Android. If you
| ran Android on an M1, it likely would score just as well
| as an iOS/MacOS machine.
| CitizenKane wrote:
| It would be an interesting comparison. Non-apple silicon
| has definitely not been great and lags behind
| significantly. Would be very interesting to see what
| Android would be like on more modern CPU tech.
| smoldesu wrote:
| Non-Apple has been perfectly fine. I mean, AMD was
| putting higher-performing APUs in laptops half the price
| of the M1 Macbook Air, and they did that 18 months before
| the M1 even hit the market. If you wanted an 8 core, 4ghz
| CPU with the GPU power of a 1050ti, the R74800u was
| prevalent in notebooks as cheap as $400, and was even
| lauded for a short period as one of the best budget
| _gaming_ options on the laptop market.
|
| If you're exclusively referring to the handheld market
| though, then you're mostly right. Apple's big lead in
| this generation is mostly because of how fast they were
| able to buy out the world's 5nm node. The only real
| "Android" manufacturer doing the same thing is Samsung,
| which is only because they have the means to fab their
| own chips.
| GeekyBear wrote:
| > AMD was putting higher-performing APUs in laptops half
| the price of the M1 Macbook Air, and they did that 18
| months before the M1 even hit the market
|
| APUs with Zen and Zen 2 cores were absolutely not
| outperforming the M1's Firestorm cores.
|
| >In SPECint2006, we're now seeing the M1 close the gap to
| AMD's Zen3, beating it in several workloads now, which
| increasing the gap to Intel's new Tiger Lake design as
| well as their top-performing desktop CPU, which the M1
| now beats in the majority of workloads.
|
| In the fp2006 workloads, we're seeing the M1 post very
| large performance boosts relative to the A14, meaning
| that it now is able to claim the best performance out of
| all CPUs being compared here.
|
| https://www.anandtech.com/show/16252/mac-mini-
| apple-m1-teste...
| smoldesu wrote:
| How does this surprise anyone at all? You're comparing a
| 5nm-based core to a 7nm-based core, of course there will
| be a direct disparity in performance. If anything, I'm
| surprised they weren't able to eke more performance out
| of it. The GPU is frankly pathetic, and the memory
| controller and IO subsystems are both obviously gimped
| too. As a matter of fact, the current M1 Mac Mini quite
| literally could not connect to all of my hardware with
| all the dongles in the world. There's just not enough
| bandwidth to run it all. Even if I splurged for the "Pro"
| device, it will still have the same bottlenecks as the
| consumer product.
|
| You can't spend a decade pretending that performance
| doesn't matter, only to rear your head again when Moore's
| law is sputtering out and gasping for air. Apple's power
| over the industry is unchecked right now, meaning they
| can play all sorts of petty tricks to gaslight mainstream
| manufacturers into a playing different game. ARM is the
| first step in that direction, and I for one couldn't be
| more happy with the hardware I own. Well, maybe if I sold
| my Macbook Air...
| GeekyBear wrote:
| > How does this surprise anyone at all? You're comparing
| a 5nm-based core to a 7nm-based core, of course there
| will be a direct disparity in performance
|
| Given that Apple took the power/heat drop with the
| transition from 7nm to 5nm instead of a performance
| increase, that's just not as relevant as it could have
| been if they had gone the other way.
|
| However, that M1 is still running with a significant
| frequency and power draw deficit vs the Ryzen as well.
|
| >During average single-threaded workloads on the 3.2GHz
| Firestorm cores, such as GCC code compilation, we're
| seeing device power go up to 10.5W with active power at
| around 6.3W.
|
| https://www.anandtech.com/show/16252/mac-mini-
| apple-m1-teste...
|
| However, you compete with the performance you have today,
| not some theoretical performance you might have if your
| chip was made differently.
| smoldesu wrote:
| > Given that Apple took the power/heat drop with the
| transition from 7nm to 5nm instead of a performance
| increase, that's just not as relevant as it could have
| been if they had gone the other way.
|
| I still don't understand your argument here though. It's
| impressive because Apple was able to get first dibs on
| next generation technology? Should I be applauding their
| engineers or fearing their supply chain managers?
|
| > However, you compete with the performance you have
| today, not some theoretical performance you might have if
| your chip was made differently.
|
| Sure, but there's not much fun in bragging about how your
| V8 demolished a 4-stoke. Plus, as the market for silicon
| fabs continues to heat up, Apple's lead on logistics
| isn't going to be so noticeable. They've also painted
| themselves into a bit of a scalar issue, too: the M1
| occupies a bit of an ARM sweet spot right now. As far as
| I can tell, Apple's only choice for making a more
| powerful machine is to use better silicon: which won't be
| available until 4nm/3nm starts to hit production in
| Q32022, at the earliest.
| GeekyBear wrote:
| >I still don't understand your argument here though. It's
| impressive because Apple was able to get first dibs on
| next generation technology?
|
| The M1's performance lead wasn't created by a move from
| TSMC 7nm to TSMC 5nm.
|
| See, here: https://news.ycombinator.com/item?id=27184236
|
| The Firestorm cores in the M1 in the M1 get their
| performance from (among other things) having a very wide
| core that performs more instructions in a single clock
| cycle, despite running the cores much more slowly than
| AMD does.
|
| There's a good break down of that and other design
| advantages here:
|
| https://www.anandtech.com/show/16226/apple-
| silicon-m1-a14-de...
| lupire wrote:
| Is that AMD example running at similar wattage?
| smoldesu wrote:
| It's pretty close. It's lowest TDP (before tinkering)
| rests at around 8-10w, and can get to 20-25w if you're
| running at full load (upper bounds normally come into
| play if the GPU is also running full blast.
|
| So no, it's the the _same_ wattage, but it 's similar
| enough to be a valid point of comparison, especially for
| a chip that appears in laptops half the price of the base
| Macbook Air. Plus, as time goes on, AMD seems to be
| getting better about power management. If you take a look
| at the latest reviews for the Surface Laptop 4 with
| Ryzen, many reviewers were surprised to find that it
| lasted nearly 17 hours of video playback.
| sod wrote:
| Yes, it's a bummer that Apple has a deathgrip on 5nm and
| after that 3nm.
|
| Zen2 would be incredible on 5nm. IMHO the improvements we
| saw with geforce 30XX series and zen 2 were 80% the
| switch to 7nm tsmc. I'd fully expect another jump if
| nvidia and amd were able to get their hands on 5nm, but
| it's apple exclusive for now :(.
|
| Well, at least we get huge investments in chip
| manufacturing thanks to apple throwing money at tsmc.
| tokamak-teapot wrote:
| Yes - I sold my Ryzen desktop because I no longer feel the need
| to go to the 'more powerful' machine. The Air is just as fast.
| CitizenKane wrote:
| That's true, although I do miss the memory! And
| realistically, when it comes to heavy compute or multi-
| tasking the 3900X really shows its power. I'm honestly very
| interested to see AMD tech at 5nm. I think the next few years
| are going to be fairly mind-blowing in terms of CPU
| technology.
| ahartmetz wrote:
| Try a low latency kernel on your desktop if you use Linux, that
| should improve responsiveness some more. Apparently the
| increase in power consumption is minimal - I use (K)Ubuntu's
| low-latency kernel even on laptops these days.
| jorvi wrote:
| I feel it's kind of sad this doesn't happen automatically for
| distros specially geared to be installed on workstations. You
| sacrifice a tiny bit of throughput for making the computer
| continually smooth even under heavy load.
|
| This, automatically turning on battery optimizations when an
| internal battery is detected, and automatically applying a
| 'small speaker' EQ to internal speakers are a few changes
| that would make Linux feel so much better on a laptop.
| StavrosK wrote:
| I run KUbuntu, how can I enable this?
|
| EDIT: I searched a bit and, although I haven't found
| instructions anywhere, I did find people saying that the low-
| latency kernel decreases throughput as a tradeoff to lower
| latency. Have you found this to be the case?
|
| Also, another source says that the preempt kernel might be
| better for some workloads: https://itectec.com/ubuntu/ubuntu-
| choose-a-low-latency-kerne...
|
| Can anyone comment on these?
| ahartmetz wrote:
| Basically you install linux-lowlatency and reboot. Grub
| will prefer it over the generic kernel. Make sure that you
| also get linux-modules-extra in the lowlatency version or
| you might boot into a system with missing drivers. Happened
| on my laptop when I switched to lowlatency.
| cout wrote:
| I have not tried the preempt kernel. I have tried the
| lowlatency kernel and found that it improved overall
| performance, not just latency.
|
| Turns out that the cost of NUMA balancing (https://www.kern
| el.org/doc/Documentation/sysctl/kernel.txt) was outweighing
| any benefit we might get from it. The problem is that NUMA
| balancing is implemented by periodically unmapping pages,
| triggering a page fault if the page is accessed later. This
| lets the kernel move the page to the NUMA node that is
| accessing the memory. The page fault is not free, however;
| it has a measurable cost, and happens even for processes
| that have never migrated to another core. The lowlatency
| kernel turns off NUMA balancing by default.
|
| Instead of switching to the lowlatency kernel, we set
| `kernel.numa_balancing = 0` in `sysctl.conf` and accepted
| the consequences.
| pizza234 wrote:
| There's an interesting Stack Overflow question/answers:
| https://askubuntu.com/q/126664.
|
| In general terms, one either just installs the specifically
| recompiled package (-preempt, -lowlatency etc.), or
| recompiles their own. The related parameters can't be
| enabled, because they're chosen as kernel configuration,
| and compiled in.
|
| I'd take such changes with a big grain of salt, because
| they're very much subject to perceptive bias.
| adrianN wrote:
| If there were one simple trick to make your computer feel
| faster without downsides, defaults would change.
| [deleted]
| ahartmetz wrote:
| True in general, but there is seriously little downside
| to using a low-latency kernel. Sometimes it even improves
| throughput, usually the effect is tiny either way.
|
| I think most people just don't care or don't know that
| they care about responsiveness.
| jerf wrote:
| You can also do some "nice"-level management. My XMonad
| environment is simpler than most, so it's easier for me than
| someone running a full desktop environment with dozens of
| processes running around, but I got a lot of responsiveness
| improvements by nice-ing and ionice-ing my browser down in
| priority. It does mean that occasionally I switch back to it
| and it hiccups, but on the flip side it has stopped consuming
| so many resources and everything else feels much more
| responsive.
|
| I'm actually not sure why it has the effect it does,
| entirely, because at least in terms of _raw_ resource use the
| browser doesn 't seem to be eating _that_ much stuff, but the
| effect in practice has been well beyond what can be explained
| by perceptual placebo. (Maybe there 's too many things
| competing to get the processor at the VBlank interval or
| something? Numerically, long before the CPUs are at 100% they
| act contended, even in my rather simple setup here.)
|
| Or, to perhaps put it another way, Linux already has this
| sort of prioritization built in, and it works (though it's
| obviously not identical since we don't have split cores like
| that), but it seems underutilized. It's split into at least
| two parts, the CPU nice and the IO nice. CPU nice-ing can
| happen somewhat automatically with some heuristics on a
| process over time, but doing some management of ionice can
| help too, and in my experience, ionice is very effective. You
| can do something like a full-text index on the "idle" nice
| level and the rest of the system acts like it's hardly even
| happening, even on a spinning-rust hard drive (which is
| pretty impressive).
| CitizenKane wrote:
| Oh very cool! I had no idea about this. I'm traveling for
| awhile but I'll definitely check that out once I'm back and
| working with my desktop machine again!
| whywhywhywhy wrote:
| Worth keeping in mind all mac users are going from underpowered
| Intel processors running in a terrible thermal environment for
| them too.
|
| My 2018 USB-C MBP honestly feels like trash at times, just feels
| like every part of it is choking to hold the thing up.
| dboreham wrote:
| All this reminds me of when the HP PA machines were first
| introduced. Loads of posts about how compilation was now so fast
| that the screen turned red due to time dilation, etc. Of course
| now nobody remembers PA.
| jmull wrote:
| You're talking about the HP's old precision architecture,
| right?
|
| https://en.wikipedia.org/wiki/PA-RISC
|
| That's an interesting comparison because it's another case
| where a big player created their own chips for a competitive
| advantage.
|
| There are surely some lessons in there for Apple silicon, but I
| don't there there's any particular reason to think it will
| follow the same path as HP PA. PA started in the '80s. It was
| aimed at high-end workstations and servers. My impression was
| that the value proposition was narrow: it was sold to customers
| who needed speed and had a pretty big budget. Until checking
| now, I was unaware, but it seems to have carved out a niche and
| was able to live there for a while, so it wasn't exactly a
| failure.
|
| Time will tell how apple silicon fares. Today the M1 is a fast,
| power-efficient, inexpensive chip, with a reasonable (and
| rapidly improving) software compatibility story, and is
| available in some nice form factors... so basically a towering
| home run. But we'll just have to see how Apple sustains it over
| the long run. Frankly, they can drop a few balls and still have
| a strong story, but it's hard to project out 5, 10, 20 years
| and understand where this might end up.
|
| (Personally, I'm almost surely going to get an Apple silicon
| Mac, as soon as they release the one for non-low-end machines.
| The risk looks small, the benefits look big.)
| bombcar wrote:
| The real key for Apple is they already had a huge customer
| for their new chip - the iPhones and iPads have been using it
| for years.
|
| This means they do not have to be dependent on Mac sales to
| support M1 development - they can amortize it over the
| largest phone manufacturer in the world.
|
| And every time they do a successful architecture switch they
| make it easier to do it again in the future. If the IntAMD386
| becomes the best chip of 2030 they can just start using it.
| jackcviers3 wrote:
| So, like nice[1], except the developer sets the level of
| importance ahead of time? Makes sense, as long as I can
| reprioritize if needed.
|
| 1. https://en.wikipedia.org/wiki/Nice_(Unix)
| bombcar wrote:
| It's nice but with additional limitations - on a normal Unix
| system even if you nice everything to the lowest level possible
| they'll still take over all cores. This limits things below a
| certain nice level to the efficiency cores, meaning that the
| performance cores are _always_ available for foreground action.
| damagednoob wrote:
| So nice + taskset[1]?
|
| 1: https://linux.die.net/man/1/taskset
| archi42 wrote:
| Seems so. Just that it works "automagically" and reserves
| some cores for certain priority groups. Maybe one can write
| a script/daemon that `taskset`s all tasks with a certain
| niceness range (or by additional rules) to a few
| "background cores" and everything else to the "interactive
| cores"?
|
| On Linux you might also get some problems with kworker
| threads landing in the wrong group, but I'm not that much
| into Kernel internals I've to admit.
| jamespo wrote:
| Could do it with SystemD AllowedCPUs functionality, but
| that's not dynamic
| wmf wrote:
| These days you'd probably want to use cpuset cgroups so
| you don't have to worry about individual processes.
| ape4 wrote:
| Of course the applications (programmers of the applications)
| need to declare their "nice" levels correctly - they can't be
| greedy
| stefan_ wrote:
| Nothing has ever stopped Apple from pinning indexing work to
| specific cores. They just didn't give a shit.
| bombcar wrote:
| I suspect they did it less for performance and more for
| energy savings - this prevents the scheduler from moving a
| background process to a performance core (which a naive one
| would do if it saw the performance core underutilized).
| astrange wrote:
| It's always done this but the effect wasn't visible on x86.
| ytch wrote:
| Maybe like the big.little cores of many ARM CPU in Android
| phone?
|
| I'm not familiar with Android APP, maybe there are similar APIs
| in Android?
| agloeregrets wrote:
| It's the same big.Little concept, slightly tweaked to allow
| this QoS work, I believe most of it was then used on Android
| but I doubt that Android uses the APIs well if they exist.
| bryanlarsen wrote:
| These kinds of optimizations often harm benchmarks. Benchmarks
| are batch jobs, even the ones that try and mimic interactive
| sessions. Apple has made a latency/throughput optimization that
| often harms throughput slightly. This is the right call, but
| benchmarks are the reason why this low hanging fruit isn't being
| picked on the Intel side.
| fpoling wrote:
| This should not harm benchmarks unless they mark their threads
| as background.
|
| As for a particular strategy I presume Apple could, for
| example, use 6 high-performance cores instead of 4+4 hybrid.
| But then thermal management will be an issue. So the choice was
| about throughout/latency/energy efficiency. Using simple low-
| performance cores for tasks that can wait is very good strategy
| as one can put more of those cores as opposite to running the
| performance core under a low frequency.
| bryanlarsen wrote:
| I was thinking about things like the HZ setting in Linux.
| Lowering the setting will get you a slight throughput
| advantage but interactive use will be not as smooth.
| MrFoof wrote:
| What's impressive is Apple's first "high-end" custom silicon is
| about equal performance to an AMD Ryzen 7 5600X.
|
| That's not bad!
|
| However the M1 is only about 20-25% of the power consumption of
| the Ryzen 7 5600X.
|
| That's _crazy_!
|
| It'll be very interesting to see what the M3 or M4 look like
| (especially in X or other higher end variants) several years from
| now. We already have tastes of it from Marvell, Altera, Amazon
| and others, though it looks as if we will see ARM make huge
| headway in the data center (it already is) and even on desktops
| before 2025.
| robotresearcher wrote:
| I wonder how much of this nice result could be achieved on a
| homogenous CPU if everyone used an API that expressed priorities
| on a fine grained task level. Process priority and CPU affinity
| is already a thing. But I don't think I've ever set the 'nice'
| level for a single thread inside my code.
|
| My point is, the snappiness is a result of decent system-wide
| adherence to a priority API more than the heterogeneous chip
| design. The low power consumption of course relies on both.
| cmsj wrote:
| I'm curious if there is some architectural limit that prevents
| Apple from doing something similar on Intel Macs, at least ones
| with 8+ cores. The scheduler could arbitrarily nominate 4 cores
| as "efficiency" cores and keep them for background tasks, leaving
| the rest free for "performance".
| archi42 wrote:
| Without knowing the inner workings of Apple's scheduler: I
| don't think there is any reasons this wouldn't work on x86
| hardware. All you need is some API to tell the scheduler
| whether a task should be done "asap" or "whenever".
|
| But you then get a huge tradeoff: If you reserve over 50% of
| the computational power for "asap" stuff, then all the
| "whenever" stuff will take twice as long as usual (assuming the
| tasks to be limited by compute power). On a 8C Intel that means
| that, for a lot of stuff, you're now essentially running on a
| 4C Intel. Given the price of these chips that MIGHT be a
| difficult sell, even for Apple. On the M1 they don't seem to
| care.
| kingsuper20 wrote:
| >All you need is some API to tell the scheduler whether a
| task should be done "asap" or "whenever".
|
| We could call it ::SetThreadPriority()
| twoodfin wrote:
| I think the issue is that thermals, and thus overall
| efficiency, is optimized on the Intel chips when long-running
| work is split evenly across the cores.
| agloeregrets wrote:
| For intel that would cause some serious thermal issues. The
| Four E cores in the M1 use about the same amount of power as
| one big core (5w total). In an Intel chip that would mean the
| four 'E' cores would use probably 10 watts each, that's 40
| watts cooking a chip.
|
| Intel does have a chip for you though, they have a version with
| mixed UlV cores and big cores.
| Jiokl wrote:
| Afaik, that isn't launching until 2021.
| https://en.wikipedia.org/wiki/Alder_Lake_(microprocessor) Or
| do you mean another CPU?
| gsnedders wrote:
| Lakefield launched in Q2 2020 with the i5 L16G7 and the i3
| L13G4.
| lars512 wrote:
| I would kill to pin Dropbox to efficiency cores like this. Half
| the time when my fans are whirring I discover it's just Dropbox
| frantically trying to sync things.
| jtriangle wrote:
| There are plenty of self-hosted dropbox alternatives that don't
| do that.
|
| I get that not everyone is a "take matters into their own
| hands" kinda person, but it's worthwhile to at least look into
| it and see if it's something worth pursuing for you.
| nikanj wrote:
| My 16 inch pro Intel is plenty fast - for the first 45 seconds
| before thermal throttling kicks in. I have a feeling the M1 is
| faster because it can actually do work for more than just the
| briefest bursts
| post_break wrote:
| The problem I ran into the other day, had an app crash taking
| 50% of cpu. I was on the couch for hours on battery. Only
| noticed when my battery life was at 60% when normally it would
| be at 85%. My intel machine would have alerted me by burning my
| legs and trying to create lift with the fans.
| geerlingguy wrote:
| Ha, speaking of burning legs, I was doing some video editing
| in FCPX last night, and I had to keep changing positions
| trying to optimize for not burning my legs too badly and not
| blocking the air flow through the vents on the bottom/sides
| causing even more throttling.
| hinkley wrote:
| Bamboo cutting boards make great, cheap lap desks.
| MAGZine wrote:
| I was sitting cross legged during a long civ session and
| using my left hand to balance part of my MBP up.
|
| It wasn't until the next day that I discovered a painful
| red spot on my hand. That was a few months ago. While the
| spoot is no longer painful, it is still there today. The
| macbook slowly cooked a portion of my hand.
|
| The thermals on those old machines are just terrible, and I
| have evidence of it :-)
| deergomoo wrote:
| The heat and noise was what led me to sell my 16" MBP in favour
| of a 13" Air. I was working from home the whole time I owned it
| and my partner could often hear it from a different room.
|
| No idea how anyone works in an office with one of those things
| without irritating the hell out of their coworkers.
| yurishimo wrote:
| Most likely because everyone already wears headphones to
| isolate themselves. There is outside traffic noise and aircon
| to compete with, as well as general drone of conversations.
| It's only really an issue in environments that are already
| quiet.
| bjoli wrote:
| Apples intel MBP thermals are pretty abysmal wrt noise. My off
| brand 7th gen i7 sounds about the same as my colleagues
| similarly species MBP, but never throttles.
|
| There were even people saying they deliberately sabotaged their
| intel thermals to make the up and coming M1 look better.
| gjsman-1000 wrote:
| Those conspiracy theories were never confirmed. Linus thought
| the MacBook Air Early 2020 model's cooling was clearly
| designed for an M1, but then the M1 launched without a fan at
| all.
| jpraamstra wrote:
| Would it be possible to achieve something like this in windows by
| pinning certain processess to specific cores? I've googled a bit
| and came accross this article below where it shows how to set the
| affinnity for a process to specific cores, and it looks like a
| fun experiment to do this with the system processess to see
| whether the performance of other applications will improve. :)
|
| It would be a bit of a hassle to figure out which processes would
| all need to be changed, and whether these changes persist after
| rebooting and there's probably a lot more that I didn't think off
| though..
|
| https://www.windowscentral.com/assign-specific-processor-cor...
| andrekandre wrote:
| The Time Machine backup pictured above ran ridiculously slowly,
| taking over 15 minutes to back up less than 1 GB of files. Had I
| not been watching it in Activity Monitor, I would have been
| completely unaware of its poor performance. Because Macs with
| Intel processors can't segregate their tasks onto different cores
| in the same way, when macOS starts to choke on something it
| affects user processes too.
|
| this is really smart, and makes one wonder why intel didnt have
| the insight to do something like that already.... is this the
| result of thier "missing the boat" on smartphone processors
| (a.k.a giving up xscale)??
| kryps wrote:
| It does not just feel faster, in many cases it _is_ faster.
|
| E.g. compiling Rust code is so much faster that it is not even
| funny. _cargo install -f ripgrep_ takes 22 seconds on a Mac Book
| Air with M1, same on a hexacore 2020 Dell XPS 17 with 64GB RAM
| takes 34 seconds.
| alkonaut wrote:
| Are you compiling for ARM in both cases (or x86 in both cases)?
| Otherwise you are building 2 completely different programs, one
| is ripgrep for ARM and the other is ripgrep for Intel.
| thefourthchime wrote:
| It's been amazing for me.... EXCEPT for when resuming from
| sleep, for some reason it would be very slow for a minute.
|
| I have since FIXED THIS! The solution is to just not let it
| sleep. I'm using Amphetamine and since then it has been
| amazingly fast all the time.
| RankingMember wrote:
| > I'm using Amphetamine and since then it has been amazingly
| fast all the time.
|
| I think these app makers might need to start focus grouping
| some of these app names. Not only is it going to be harder to
| search for this app, but it'll inevitably also lead to some
| humorous misunderstandings.
| blinkingled wrote:
| What OS are you running on the Dell though? Windows is
| notoriously slower for file system operations even without AV
| and the default there is to prefer foreground apps vs something
| like compile tasks.
| kryps wrote:
| Yep, see below. With WSL and an ext4 virtual disk it takes
| only 29 seconds. Still quite a bit more than the 22 seconds
| on the Macbook.
| blinkingled wrote:
| WSL2 right? Given the virtualization overhead for IO I
| would expect native Linux to be faster.
| kryps wrote:
| WSL2, correct.
| elcomet wrote:
| Is 22 seconds insetad of 34 seconds _so much faster_ ? The
| difference does not feel huge to me
| leetcrew wrote:
| yes, it compiled the project in 35% less time. doesn't really
| matter much for such a small project, but imagine if on a
| larger project the intel part took an hour and the m1 did it
| in 39 minutes. that's a substantial speedup. of course, we
| might see different results after leaving the turbo window.
| cptskippy wrote:
| You're assuming that the scaling is linear though. What if
| there's a fixed 10 second ding on x86? That 39 minute
| compile on the m1 would only be 39:10 on the x86.
| coder543 wrote:
| > What if there's a fixed 10 second ding on x86?
|
| There isn't.
| [deleted]
| [deleted]
| [deleted]
| pedrocr wrote:
| In previous discussions about this it was pointed out that LLVM
| may be significantly faster at producing ARM code versus x86.
| The comparison may still be the one that actually matters to
| you but at least in part be an advantage of ARM in general and
| not just the M1. rust is very good at cross-compiling so
| compiling to ARM on Dell and to x86 on the M1 may add some
| interesting comparisons.
| mumblemumble wrote:
| So then the next question is, is this really an apples-to-
| apples comparison? It wouldn't surprise me at all if the x86
| back-end takes more time to run because it implements more
| optimizations.
| Nullabillity wrote:
| Hard to compare without mentioning the actual CPU in it. FWIW
| my Ryzen 3900X completes that compile in 15s.
|
| To be fair, that is a relatively high-end desktop CPU, but it's
| also a massive margin for a last-gen model.
| sundvor wrote:
| My water-cooled 3900XT is locked to 4.4ghz all cores with
| relatively fast double banked 2x16GB c15 3600mhz memory. (It
| never drops under 4.4 on any of the 12/24 cores.)
|
| Also has a Samsung 980 1TB drive, and a 3070 GPU; the system
| truly is extremely quick. (Win10+WSL2).
|
| (I'll have to try that compile in the next day or so; will
| reply to myself here).
| throwaway5752 wrote:
| We also have to consider the i7 and M1 run at about 1/7 the
| TDP of the Ryzen. Just underscores the good design behind the
| QoS and judicious use of the Performance cores vs Efficiency
| cores.
| cbsmith wrote:
| Yeah, but that's previous generation. Get a 5850U or even a
| 5980HX...
| vetinari wrote:
| For me, it takes 34 seconds on M1 (MBP13):
| Finished release [optimized + debuginfo] target(s) in 34.50s
| ... cargo install -f ripgrep 137,66s user 13,00s
| system 435% cpu 34,632 total
|
| For comparison TR1900x (yeah, desktop, power-guzzler, but also
| several years old), Fedora 34: Finished
| release [optimized + debuginfo] target(s) in 25.15s ...
| real 0m25,271s user 4m41,553s sys
| 0m7,216s
| Fergusonb wrote:
| Thank you for the perspective.
|
| Here's r7 4800HS (35w) on linux:
|
| Finished release [optimized + debuginfo] target(s) in 24.98s
|
| real 0m25.151s
|
| user 4m54.987s
|
| sys 0m6.764s
| agloeregrets wrote:
| In native Typescript compile of a very large angular app I see
| an even more dramatic 1:20s to 40s compared to a desktop i9. I
| feel as if the M1 may have been designed around a detailed and
| careful look at how computers and compliers work and then
| designed a CPU around that rather than the other way around.
|
| It's like buying a car and modifying to take it racing vs
| buying a race car, the race car was designed to do this.
| jeswin wrote:
| Typescript compilation doesn't use all the cores available on
| the CPU; I think it maxes out at 4 as of now [1]. This might
| be working well for M1, which has 4 high performance cores
| and 4 efficiency cores.
|
| [1]: https://github.com/microsoft/TypeScript/issues/30235
| zerkten wrote:
| >> I feel as if the M1 may have been designed around a
| detailed and careful look at how computers and compliers work
| and then designed a CPU around that rather than the other way
| around.
|
| This is the position that Apple have set up for themselves
| with their philosophy and process. It would seem that Intel
| and AMD have to play a very conservative game with
| compatibility and building a product that increments support
| for x86 and x64. They can't make some sweeping change because
| they have to think about Linux and Windows.
|
| Apple own their ecosystem and can move everything at once (to
| a large degree.) This also gives an opportunity to design how
| the components should interact. Incompatibility won't be
| heavily penalized unless really important apps get left
| behind. The improvements also incentivize app makers to be
| there since their developer experience will improve.
| karmelapple wrote:
| App developers are incentivized in another way: software
| that takes advantage of new features or performance are
| often what Apple chooses to promote in a keynote or in an
| App Store's featured section.
| apozem wrote:
| Exactly this. Apple has spent decades drilling a message
| into third-party developers: Update your apps regularly or
| get left behind.
|
| Everyone who develops for one of their platforms is just
| used to running that treadmill. An ARM transition is just
| another thing to update your apps for.
| cbsmith wrote:
| > It would seem that Intel and AMD have to play a very
| conservative game with compatibility and building a product
| that increments support for x86 and x64.
|
| Talk to people who design chips. The compatibility barely
| impacts the chip transistor budget these days, and since
| the underlying CPU isn't running x86 or x64 instructions,
| it really doesn't impact the CPU design. There may be some
| intrinsic overhead coming from limitations of the ISA
| itself, but even there they keep adding new instructions
| for specialized operations when opportunities allow.
| Keyframe wrote:
| Are you counting download time as well? It took 8.30 sec on
| hexacore now when I tried it.
| kryps wrote:
| Nope, pure compile time on a Core i7-10750H with Windows and
| Rust 1.52.1, antivirus disabled. WSL did not make much of a
| difference though (29 seconds).
| Filligree wrote:
| The Windows filesystem is extremely slow, even without WSL.
| It can be mitigated to some extent by using completion
| ports, but I doubt the Rust compiler is architectured that
| way--
|
| You should benchmark against Linux as well.
| kryps wrote:
| cargo install on WSL2 uses the non-mapped home directory,
| which is on an ext4 virtual disk. That is probably one
| reason why it is six seconds faster.
| otabdeveloper4 wrote:
| > Windows
|
| There's your problem.
|
| Even Microsoft concedes that Windows is legacy technology
| now.
| tobyhinloopen wrote:
| How long does it take on a Ryzen 9 5950X
| [deleted]
| InvaderFizz wrote:
| Ran a few cycles on my 5950x hackintosh:
| invaderfizz@FIZZ-5950X:~$ hyperfine 'cargo -q install -f
| ripgrep' Benchmark #1: cargo -q install -f ripgrep
| Time (mean +- s): 19.190 s +- 0.392 s [User: 294.890
| s, System: 17.144 s] Range (min ... max): 18.352 s
| ... 19.803 s 10 runs
|
| The CPU was averaging about 50% load, the dependencies go
| really fast when they can all parallel compile, but then the
| larger portions are stuck with single-thread performance.
| V1ndaar wrote:
| Just for reference (5950x too): ran 3 times just measuring
| with `time` on void linux using cargo from xbps. Mean of
| the last two runs (first was syncing and downloading) was
| ~13.8 s.
| peteri wrote:
| My 5950X came in faster running in WSL2, but then I do have
| a Samsung 980 PRO running in a PCIe 4.0 slot.
| Benchmark #1: cargo -q install -f ripgrep Time
| (mean +- s): 11.585 s +- 0.474 s [User: 176.733 s,
| System: 5.677 s] Range (min ... max): 11.271 s
| ... 12.867 s 10 runs
|
| Which I suspect leans towards an IO bottleneck.
| Synaesthesia wrote:
| Note that's a 16 core CPU with 105W TDP compared to a quad
| core < 10W M1
| cbsmith wrote:
| 5850U is an 8 core CPU with a TDP that is adjustable
| between 10-25w. If you want to keep it to 4 cores just so
| it is a fair fight, there's the 5400U/5450U. AMD does
| pretty well...
| jbverschoor wrote:
| Yup, and the new macbook pro will probably be twice as fast
| ChuckNorris89 wrote:
| Source?
| jbverschoor wrote:
| Gut feeling ;-)
| mousepilot wrote:
| I'd gladly spend 14 seconds if it helps me avoid Apple
| products!
| chmod775 wrote:
| OS matters for this comparison. If your Dell XPS was running
| Windows, that might explain the discrepancy.
|
| For instance Windows on NTFS etc. is notoriously slow at
| operations that involve a lot of small files compared to Linux
| or whatever on ext4.
|
| Unless your Dell was also running OSX, you're probably not
| comparing hardware here.
| hawski wrote:
| And here I thought that having a desktop CPU like Ryzen 5 2400G
| with loads of RAM would take me somewhere. It took the machine
| around 71 seconds.
|
| EDIT: could you measure C compilation. For example:
| git clone --depth 1 --branch v5.12.4 "git://git.kernel.org/pub/
| scm/linux/kernel/git/stable/linux.git" cd linux
| make defconfig time make -j$(getconf _NPROCESSORS_ONLN)
|
| For me it's slightly below 5 minutes.
| danieldk wrote:
| _It took the machine around 71 seconds._
|
| Are you including download time? On my Ryzen 7 3700X, it's
| done in 17 seconds.
| hawski wrote:
| Specifically I did it two times to not see the
| download/syncing time. Also to a fresh home. 3700X has 16
| threads, 2400G has 8, that probably makes up for the most
| of the difference. But M1 has 8 cores (4x high-performance
| + 4x high-efficiency. Maybe rust version also counts? I'm
| on rust 1.48.
|
| EDIT: I checked on 1.52.1, which is latest stable and it
| went down to 54 seconds. So that also makes up a
| significant difference.
| adriancr wrote:
| I just did a test on my old Lenovo laptop... Intel i7-8565U CPU
| @ 1.80GHz
|
| # git clone https://github.com/BurntSushi/ripgrep && cd ripgrep
|
| # cargo clean
|
| # time cargo build
|
| real 0m23,805s
|
| user 1m11,260s
|
| sys 0m3,806s
|
| How is my crappy laptop on par with your M1? :)
| kryps wrote:
| You are not doing a release build.
| adriancr wrote:
| thanks, problem solved - now it's 1 minute...
| bscphil wrote:
| Wow. The fact that my Sandy Bridge laptop from 2011 does
| it in only 1:41 is pretty indicative of how badly
| processor improvements stalled out last decade. My
| processor (like yours, 4c8t): https://ark.intel.com/conte
| nt/www/us/en/ark/products/52219/i...
| jfrunyon wrote:
| Okay, compiling is faster. But you compile once and run many
| times. What about running it?
| josephg wrote:
| I did a side-by-side comparing with my friends' M1 macbook and
| my (more expensive, nearly brand new) ryzen 5800x workstation
| compiling rust. The ryzen was faster - but it was really close.
| And the macbook beats my ryzen chip in single threaded
| performance. For reference, the ryzen compiles ripgrep in 19.7
| seconds.
|
| The comparison is way too close given the ryzen workstation
| guzzles power, and the macbook is cheaper, portable and lasts
| 22 hours on a single charge. If Apple can keep their momentum
| going year over year with CPU improvements, they'll be
| unstoppable. For now it looks like its not a question of if
| I'll get one, but when.
| postalrat wrote:
| How long is the battery life on both if compiling non-stop?
| Assuming both keep similar compile from start of battery to
| end it would be interesting to see if the ryzen is truly
| guzzling batteries.
| gpm wrote:
| I'm guessing the 5800x work station is a desktop, not
| battery powered...
| postalrat wrote:
| You are right. Missed then when i was angerly responding.
| NicoJuicy wrote:
| Are you sure Rust compilation can use multiple cores to the
| fullest?
|
| I don't use Rust, but a quick search returned for example
| this issue:
|
| https://github.com/rust-lang/rust/issues/64913
| josephg wrote:
| My cpu usage graph shows all cores at 100% for most of the
| compilation. But near the end it uses fewer and fewer
| cores. Then linking is a single core affair. It seems like
| a reasonable all-round benchmark because it leans on both
| single- and multi- core performance.
|
| And I really care about rust compile times because I spend
| a lot of small moments waiting for the compiler to run.
| omegabravo wrote:
| I've done a side by side with my Ryzen 3700x compiling a Go
| project.
|
| 6 seconds on the Ryzen, vs 9 seconds on the M1 air.
|
| `time go build -a`, so not very scientific. Could be
| attributed to the multicore performance of the Ryzen.
|
| Starting applications on the M1 seems to have significant
| delays too, but I'm not sure if that's a Mac OSX thing.
| Overall it's very impressive, I just don't see the same lunch
| eating performance as everyone else.
|
| The battery life and lack of fans is wonderful.
|
| edit: Updated with the arm build on OSX. 16s -> 9 seconds.
| yaktubi wrote:
| I'd like to point out that compiling stuff is usually
| disk/io intensive. Could this not just be that the Apple
| machine has a faster hard drive/memory?
| vetinari wrote:
| It doesn't. It has the same 4 lanes, PCIe 3.0 flash as
| everyone else.
| monocasa wrote:
| They have an extremely heavily integrated flash
| controller. Part of it is what they bought from Annobit.
| vetinari wrote:
| I know, but the flash controller isn't the slow part
| (usually ;) ).
| monocasa wrote:
| It does for real world tasks, rather than benchmarks. The
| FTL plays a big part in real world, small access latency.
| vetinari wrote:
| We still don't know how good is the FTL in the Apple
| controller; all the devices are still too new and haven't
| been dragged through the coal as all the other
| controllers. It is still in the "easy job" part of it's
| lifecycle, with brand new flash cells.
|
| However, to quote @Dylan16807 from similar discussion few
| weeks ago
| (https://news.ycombinator.com/item?id=26118415):
|
| > The analog parts are the slow parts.
| monocasa wrote:
| They've been using custom controllers for over a decade.
|
| And the Annobit IP includes the analog parts as a large
| piece of their value add.
| vetinari wrote:
| We are getting too deep into irrelevant things. We don't
| know how much of Anobit IP was used in M1 macs; they may
| own it, but they might not use it all. They purchase
| their NAND and it may be not compatible with the current
| gen, just like when it was not compatible with Samsung's
| V-NAND/3D TLC.
|
| In practice, the I/O performance of M1-based Macs is
| comparable to random PCIE 3.0 NVMe drive. (I'm typing
| this comment on M1 MBP, I'm well aware how it performs).
| [deleted]
| [deleted]
| cbsmith wrote:
| > Starting applications on the M1 seems to have significant
| delays too, but I'm not sure if that's a Mac OSX thing.
|
| That's a Mac OSX thing. It's verifying the apps.
|
| https://appletoolbox.com/why-is-macos-catalina-verifying-
| app...
| kemayo wrote:
| If it's Intel-compiled apps, it's also that Rosetta is
| translating them to run on M1 when they're first run, as
| I understand it.
|
| Both of these are first-time launch issues, so comparing
| launch times on the second launch is probably more
| reasonable.
| cbsmith wrote:
| Good point. I'm really surprised they don't translate
| during installation (ahead of time).
| coder543 wrote:
| > `time go build -a`, so not very scientific.
|
| A tool I have enjoyed using to make these measurements more
| accurate is hyperfine[0].
|
| In general, the 3700X beating the M1 should be the expected
| result... it has double the number of high performance
| cores, several times as much TDP, and access to way more
| RAM and faster SSDs.
|
| The fact that the M1 is able to be neck and neck with the
| 3700X is impressive, and the M1 definitely can achieve some
| unlikely victories. It'll be interesting to see what
| happens with the M1X (or M2, whatever they label it).
|
| [0]: https://github.com/sharkdp/hyperfine
| ngngngng wrote:
| M1 S Pro Max
| brailsafe wrote:
| Faster SSDs on the Ryzen machine? The SSD on even my 2019
| Macbook pro is ridic
| ideamotor wrote:
| I find M1 optimized apps load quickly while others vary.
| But what really varies is website load times. Sometimes
| instant, sometimes delayed.
|
| All this said, there really is no comparison. I don't even
| think about battery for the M1, I leave the charger at
| home, and it's 100% cool and silent. It's a giant leap for
| portable creative computing.
| omegabravo wrote:
| update:
|
| time GOARCH=amd GOOS=linux go build -a
|
| 6s for the M1 too!
| jrockway wrote:
| Do keep in mind that go heavily caches for "go build":
|
| https://golang.org/cmd/go/#hdr-Build_and_test_caching
| f6v wrote:
| > If Apple can keep their momentum going year over year with
| CPU improvements
|
| I've been blown away with games running on M1. If Apple could
| up their GPU game as well, that'd be really cool.
| barbazoo wrote:
| Power consumption is a good point. I wonder what the M1's
| power consumption is during those 19.7 seconds of compiling
| ripgrep compared to other platforms.
| minimaul wrote:
| Low.
|
| For me, SoC power draw when compiling apps is usually
| around 15 to 18W on a M1 mac mini.
|
| My example app (deadbeef) compiles in about 60 seconds on
| that mac mini.
|
| On a 2019 i7 16" MBP (six core), it takes about 90 seconds,
| and draws ~65W for that period.
|
| So... radically more power efficient.
|
| edit: this is the same version of macOS (big sur), xcode,
| and both building a universal app (intel and ARM).
| anw wrote:
| > macbook ... and lasts 22 hours on a single charge
|
| I keep hearing this but have not experienced this in person.
| I usually get about 5 hours of life out of it. My usual open
| programs are CLion, WebStorm, Firefox (and Little Snitch
| running in the background).
|
| However, even with not having IDEs open all the time, and
| switching over from Firefox to Safari, I'm only seeing about
| 8 hours of battery life (which is still nice compared with my
| 2013 MBP that has about 30 minutes of battery life).
| jhickok wrote:
| The Macbook Air does not get 22 hours but the Macbook Pro
| gets kinda close under the conditions specified in the
| test.
| spideymans wrote:
| >However, even with not having IDEs open all the time, and
| switching over from Firefox to Safari, I'm only seeing
| about 8 hours of battery life (which is still nice compared
| with my 2013 MBP that has about 30 minutes of battery
| life).
|
| I would consider getting a warranty replacement. Something
| is wrong.
|
| For reference, my M1 Air averages exactly 12 hours of
| screen-on time (yes, I've been keeping track), and the
| absolute worst battery life I've experienced is 8.5 hours,
| when I was doing some more intense dev workflows.
| therockhead wrote:
| I'm seeing a big different when compiling our project, with
| the M1 Macbook Pro beating the iMac Pro when building our
| projects (23min vs 38 mins).
|
| Is it all CPU though, or is building ARM artefacts less
| resource intensive than bulding ones for Intel ?
| ben-schaaf wrote:
| iMac Pro has a large variety of performance, going from 8
| to 18 cores and it uses a 4+ year old Xeon CPU.
| Unsurprisingly the 18 core handily beats the M1 in multi-
| core benchmarks:
|
| 18 core iMac Pro 13339:
| https://browser.geekbench.com/macs/imac-pro-
| late-2017-intel-...
|
| M1 Mac mini 7408: https://browser.geekbench.com/macs/mac-
| mini-late-2020
| yaktubi wrote:
| I think these performance metrics are somewhat limited in
| their usefulness. A Ryzen workstation might not have the same
| single-core performance or energy efficiency--however, a
| ryzen workstation can have gobs of memory for massive data-
| intensive workloads for the same cost as a baseline M1
| device.
|
| In addition: let's talk upgradability or repairability. Oh
| wait, Apple doesn't play that game. You'll get more mileage
| on the workstation hands-down.
|
| The only win for those those chips I think is battery
| efficient for a laptop. But, then why not just VNC into a
| beastmode machine on a netbook and compile remotely? After
| all, that's what CI/CD pipeline is for.
| zepto wrote:
| > But, then why not just VNC into a beastmode machine on a
| netbook and compile remotely? After all, that's what CI/CD
| pipeline is for.
|
| Is this how you work?
| wayneftw wrote:
| My work takes place at a beefy desktop machine. I
| wouldn't want it any other way... I get to plug in as
| many displays as I need, I get all the memory I need, I
| can add internal drives, there's no shortage of USB ports
| or expansion - and I get them cheap. For meetings or any
| kind of work away from my desk I'll remote in from one of
| my laptops.
|
| All that and my preferred OS (Manjaro/XFCE), which runs
| on anything, has been more stable than any Mac I've ever
| owned. Every update to macOS has broken something or
| changed the UI drastically and in a way I have no control
| over...
|
| If I ever switch away from desktops, it will be for a
| Framework laptop or something similar.
| zepto wrote:
| This is interesting - in the sense that you are someone
| who doesn't want the UI to change, but it's really not
| clear what this has to do with the question or the
| article.
| coupdejarnac wrote:
| I'm not the guy above, but I concur with the sentiments.
| After a while, adjusting to trivial UI changes becomes a
| huge chore and unnecessary cognitive overhead. It's
| relevant, because in order to use the M1, you have to buy
| into Apple's caprice.
| zepto wrote:
| Caprice seems like a weird way to characterize an aspect
| of the Mac a lot of people like.
|
| I think it's valid to want not to have to deal with the
| cognitive overhead of UI evolution.
|
| It's equally valid not to want to deal with the cognitive
| overhead of various attributes of Linux.
|
| What's not obvious is why people sneer about it.
| usefulcat wrote:
| I've worked that way for 10 years. My current desktop is
| a 5 year old Intel i3 NUC with a paltry 8G of memory.
| Granted, it uses all that memory (and a bit more) for a
| browser and slack, and the fan spins up any time a video
| plays. But usually it's silent, can drive a 4k monitor,
| and most of the time I'm just using mosh and a terminal,
| which require nearly nothing.
|
| OTOH, the machine that I'm connecting to has 32c/64t,
| half a terabyte of RAM and dozens of TB of storage.
| hiq wrote:
| > the machine that I'm connecting to has 32c/64t, half a
| terabyte of RAM
|
| Ok I'll bite, what do you do? Do you think halving the
| number of cores / RAM would impact your productivity?
| monoideism wrote:
| > Is this how you work?
|
| This is exactly how I've worked for a number of years
| now, for my home/personal/freelance work. Usually using a
| Chromebook netbook ssh'ing into my high spec home server.
| I'd do the same for work, but work usually requires using
| a work laptop (MacBook).
| yaktubi wrote:
| Well, actually I have a beastmode mobile workstation that
| gets maybe 3 hours of battery life on high intensity. And
| when the battery is depleted I find a table with an
| outlet and I plug it in.
|
| Everything in the machine can be upgraded/fixed so it
| should be good for a while.
|
| I'm not saying this to be snarky. I just want to
| emphasize that while M1 is great innovation, I put
| repairability/maintainability and longevity on a higher
| pedestal than other things. I also highly value many
| things a computer has to offer: disk, memory, CPU, GPU,
| etc. I want to be able to interchange those pieces; and I
| want to have a lot of each category at my disposal. Given
| this, battery life is not as important as the potential
| functionality a given machine can provide.
| emteycz wrote:
| Which 2021 laptop has replaceable CPU?
| w0m wrote:
| That's 90% how I've worked in ~15 years as an SWE.
| simonh wrote:
| > The only win for those those chips
|
| I suspect the number of people, even developers, for whom
| 16GB memory is plenty probably greatly exceeds the number
| who need a beast mode Ryzen. But even then, a large
| proportion of the devs who might need a Build farm on the
| back end would be doing that anyway so they might as well
| have an M1 Mac laptop regardless.
|
| Anyway Mac Pro models will come.
| harikb wrote:
| granted they charge exorbitant prices for their hardware,
| but I can't believe how my 2010 MacBook Pro is still
| functioning perfectly fine.. except for them making it
| unsupported. I can't say that about any other pc/laptop I
| have had. Not even desktops
| apexalpha wrote:
| I don't know, I feel other laptops _at the same price
| point_ as Apple Macbooks do this too, sometimes even
| better. I bought a HP 8530w in 2009 or so and it still
| works. Replacing the DVD drive for a SSD required just a
| common Philips screwdriver and battery replacements are
| sold by HP themselves or many others.
| salamandersauce wrote:
| I was using my Dad's old ThinkPad 385XD from 1998 in
| 2009. Battery was unsurprisingly dead but every other
| piece was stock and worked although at some point I
| swapped the worn down trackpoint nub with one of the
| included spares we still had.
| eldaisfish wrote:
| The point is that with mid-2010s apple laptops, >5 year
| lifespans are the norm. With the majority of other, even
| comparably priced laptops, that is the exception.
|
| There are other laptops that are similar or superior
| build quality to those from Apple (N.B. - older MacBooks,
| not the newer ones) but those are also easy to spot.
| They'll usually be ThinkPads or some XPS models from
| dell.
| salamandersauce wrote:
| Oh that's baloney. There's nothing special about Apple
| laptops besides the metal case. Arguably they have worse
| cooling than most PC laptops. My 2018 MBP runs like it's
| trying cook an egg and has since day one. My Brother's
| 2012 MBP suffered complete logic board failure after 4 or
| 5 years.
|
| If it wasn't for the replacement keyboard warranty
| offered by Apple a good chunk of butterfly keyboard Macs
| would be useless junk due to the fact it's so hard to
| replace them. Frayed MagSafe adapters were a regular
| occurrence. And swollen batteries pushing up the top case
| not that rare either.
|
| I think maybe people keep MacBooks longer, but it
| probably has more to do with the fact they spent so much
| on them that they feel it's worthwhile to repair/pay for
| AppleCare than them actually being magically more
| durable.
| kogepathic wrote:
| _> With the majority of other, even comparably priced
| laptops, that is the exception._
|
| Consumer grade PC hardware has terrible build quality,
| and regardless of the price of your unit, the consumer
| build spec is just inferior to the business/professional
| lines. Asus, MSI, Sony, Acer, etc laptops all have
| consumer grade build quality and they just aren't
| designed to last a decade.
|
| _> They'll usually be ThinkPads or some XPS models from
| dell._
|
| Precision/XPS and Thinkpad models (with the exception of
| the L and E series) are almost always in the same price
| range as a MacBook. Any business-class machine (Thinkpad,
| Precision/Latitude, Elitebook) should easily last >5
| years. These are vendors which will sell you 3-5 year on-
| site warranties for their laptops.
|
| This is why you can find so many off-lease corporate
| laptops on eBay from any model year in the last 10 years
| or so. The hardware doesn't break, it just becomes
| obsolete.
| supercheetah wrote:
| For Dell, at least the business class desktops, they're
| trash, and are barely useable after 2-3 years and usually
| have some kind of problem long before that. I'm pretty
| sure Dell expects most businesses to buy new ones in that
| time frame.
| spideymans wrote:
| I really want to like Dell's XPS line. I really do. But
| their technical support is atrocious. My XPS trackpad
| stopped working months after purchase, and getting them
| to repair it was an utter nightmare. Their tech support
| seemingly hasn't improved at all in the past decade
| (which is when I last vowed to never buy a Dell again due
| to their horrible tech support). They may fool me twice,
| but never again.
|
| (I do hear that their business support is pretty good
| though)
| w0m wrote:
| > and getting them to repair it was an utter nightmare
|
| ~8 years ago; within 48h of the laptop breaking - had a
| Dell repair tech sitting at my kitchen table replacing
| mainboard on an XPS laptop. Has turnaround when you have
| the proper support contracts gotten that much worse?
|
| (admittedly, we did pay for the top support tier for a
| personal device as it was expensed for work. I wouldn't
| do anything else from any manufacturer though unless I
| had on-site tech support/replacement.)
| varjag wrote:
| Not sure about consumer side, but as a business we have
| 24h turnaround service with Dell.
| [deleted]
| [deleted]
| davidy123 wrote:
| Exactly. Too many people compare a $400 cheap Windows
| laptop to a $1200 Macbook. Compare like for like, and the
| thing is likely to last until it's absolutely obsolete.
| And, while I don't really support this, some might find
| it an advantage to replace the computer three times for
| the same price. But people should be comparing to a well
| built, upgradable laptop (especially those that support
| not just RAM and disk but also display upgrades and
| adequate ports), running an operating system that has no
| arbitrary end of life.
| simonh wrote:
| I'd love to see an actual serious comparison between an
| M1 Mac and a $400 laptop. That would be hilarious. Since
| there are so many of them, can you direct me to one, or
| even a few?
| emteycz wrote:
| Lenovo or Dell displays are much worse than Apple
| displays even though the machine costs the same.
| davrosthedalek wrote:
| Sure, but my macbook pro has cooked it's display /twice/
| now. Didn't go to sleep properly, and it overheated in my
| laptop bag.
|
| No good way to check for it, because no LEDs on the
| outside. Only way to check is to see if the fans switched
| on after five minutes in the bag.
| emteycz wrote:
| You don't want to know what the Dells are capable of
| doing. My XPS 15 2020 literally got on fire somewhere on
| the motherboard - not even a battery thing. Then I
| decided to go Apple only.
| sleepybrett wrote:
| I just had storage fail on my first gen touchbar macbook.
| It's a PITA, the storage is soldered onto a board. They
| replace the board, the can't recover the data (didn't
| expect them to). I'd pay the extra mm or two it would
| require them to just use a standard like m2. SSD storage
| just fails after awhile, especially if you do lots of
| things that thrash the disk.
| eropple wrote:
| My "writing desk" PC is a Thinkpad X201 tablet from 2010,
| with the same SSD upgrade I put in my own 2010 Macbook
| Pro (a dedicated Logic Pro machine these days). There
| have always been manufacturers for whom that's the case
| on the PC side of things--you just kinda had to pay for
| it up front.
| FpUser wrote:
| I have an old gaming ASUS laptop from 2010. Still works
| like a charm after hard drive was switched to SSD. I have
| an even older Asus Netbook (15 years old eee PC I think)
| that still works. Netbook is too slow for modern software
| and I do not really use it but it works.
| KozmoNau7 wrote:
| My two main PCs are a Phenom II-based desktop and a
| Thinkpad X220i (with the lowly Core i3, even!). Both are
| perfectly functional and usable today, with a few minor
| upgrades here and there, the usual SSDs, more RAM and a
| Radeon RX560 for the desktop.
|
| The Thinkpad is obviously no powerhouse, but still works
| great for general desktop use, ie. browsing, email,
| document editing, music, video (1080p h264 is no
| problem). The desktop plays GTA V at around 40-50 FPS at
| 1080p with maximum settings. And this isn't some premium
| build, it's a pretty standard Asrock motherboard with
| Kingston ValueRAM and a Samsung SSD.
|
| Decade-old hardware is still perfectly viable today.
| xfer wrote:
| Using 2011 sandybridge motherboard with a xeon-1230 i
| bought in 2012. I Had to replace 2 HDD + started using
| ssd for OS partition. It's working great, need to replace
| my nvidia GPU that is EOL but still working great.
| ryanmarsh wrote:
| After buying my M1 and benchmarking it against the top of the
| line i9 I considered shorting Intel's stock, alas they're so
| large it'll take a while for the decline to catch up with
| them.
| cbsmith wrote:
| I dunnoh... the Ryzen 5800x laptops seem to be able to stay
| ahead of the M1's for most tasks.
| josephg wrote:
| Huh? 5800x laptop? The 5800x is a desktop chip.
| cbsmith wrote:
| The 'X' was meant to be "various letter extensions on the
| 5800 series", such as 5800U, 5800H, 5800HS. I probably
| should have used different terminology from the model
| number, as there are other Zen 3 mobile processors like
| the 5900HX, 5980HS and 5980HX that if anything make the
| point stronger.
| atq2119 wrote:
| Did both machines compile to the same target architecture? If
| you did native compiles, then perhaps LLVM's ARM backend is
| simply faster than its x86 backend...
| bombcar wrote:
| I think this is potentially a huge part of it - I'd do the
| benchmark by doing a native compile and a cross compile of
| each, and also do the same on a RAM disk instead of SSD (a
| large compile can end up just being a disk speed test if
| the CPU is waiting around for the disk to find files).
| outworlder wrote:
| Maybe that would have been the case if only compilation
| times were reported to be good. But no, this is across
| many different kinds of workloads. Even Blender running
| in Rosetta can beat native x86 processors, which is
| bonkers.
| ArcMex wrote:
| >For now it looks like its not a question of if I'll get one,
| but when.
|
| My exact sentiments. I've been looking for a gateway into
| Apple and the M1 Air seems like it. It has now become a
| matter of time and not just a fleeting thought.
| snarfy wrote:
| I'm still unconvinced it's the M1's design and not TSMC's fab
| process.
| GeekyBear wrote:
| When you move to a smaller process node, you have a choice
| between improving performance or cutting power. (or some
| mix of both)
|
| Apple seems to have taken the power reduction with the A14
| and M1 on TSMC 5nm, not the performance increase.
|
| >The one explanation and theory I have is that Apple might
| have finally pulled back on their excessive peak power draw
| at the maximum performance states of the CPUs and GPUs, and
| thus peak performance wouldn't have seen such a large jump
| this generation, but favour more sustainable thermal
| figures.
|
| https://www.anandtech.com/show/16088/apple-
| announces-5nm-a14...
| cbsmith wrote:
| I think the latest Ryzen 5800x CPUs kind of prove it's the
| TSMC fab process. You've now got M1s, Graviton2s, and
| Ryzens all crushing it to similar levels.
| JudasGoat wrote:
| I have a Ryzen 7 4700g. I wanted to compare this to the GPU
| side of the M1. On Geekbench OpenGL test, it was slightly
| faster than the M1. I would like to find a better test.
| bvm wrote:
| > These [QOS levels] turn out to be integer values spread evenly
| between 9 and 33, respectively.
|
| Anyone know why this is? seems fairly arbitrary...why not have
| 1,2,3,4,5?
| stan_rogers wrote:
| They're a bit mask, with the LSB always set. The values are
| 100001 (33), 011001 (25), 010001 (17), and 001001 (9). And
| they're probably ORed with other values so multiple parameters
| can be set with a single word.
| oddevan wrote:
| Best guess: either (1) it maps to internal values that macOS
| uses, or (2) they wanted to build in the potential to have more
| granular values later.
| martin_drapeau wrote:
| I love the fact that Apple prioritizes the human over the
| machine. I always felt that iPhones and iPads felt faster than
| desktops. I reckon the QoS strategy is the reason.
| bjackman wrote:
| FWIW Android (and surely iOS, although I don't know anything
| about that in particular) have had systems like this for a long
| time. Aside from all the task scheduler smarts in the kernel
| itself, there are all kinds of things that can be marked as low
| priority so they run on the LITTLE (what Apple calls
| "efficiency") cores, while e.g. a UI rendering thread might go
| straight to a "big" ("performance") core even if its historical
| load doesn't show it using much CPU b/w, because it's known (and
| marked as such by Android platform devs) to be a latency-senstive
| element of the workload.
|
| Been a few years since I worked on this so I'm a bit fuzzy on the
| details but it's interesting stuff! Without this kind of tuning,
| modern flagship devices would run like absolute garbage and have
| terrible battery life.
| grishka wrote:
| While yes, the hardware had been there for a long time, Android
| hasn't been taking much of an advantage of it and I'm not quite
| sure it does now.
|
| Google Play likes to do shit in the background. A lot of it.
| All the time. Even if you have <s>deliberate RCE</s> auto-
| update disabled in settings, it will still update itself and
| Google services, silently, in the background, without any way
| to disable that. And while it's installing anything, let alone
| something as clumsy as Google services, your device grinds to a
| halt. It sometimes literally takes 10 seconds to respond to
| input, it's this bad. It's especially bad when you turn on a
| device that you haven't used in a long time. So it definitely
| wasn't scheduling background tasks on low-power cores, despite
| knowing which tasks are background and which are not (that's
| what ActivityManager is for). My understanding from looking at
| logcat was that this was caused by way too many apps having
| broadcast receivers that get triggered when something gets
| installed.
|
| Now, in Android 11 (on Pixel 4a), this was partly alleviated by
| limiting how apps see other apps, so those broadcasts don't
| trigger much anything, and apps install freakishly fast. That,
| and maybe they've finally started separating tasks between
| cores like this. Or maybe they started doing that long ago but
| only in then-current kernel versions, and my previous phone was
| stuck with the one it shipped with despite system updates.
| wiremine wrote:
| Sounds like the consensus here is that the M1 feels faster, and
| is actually faster, for a lot of tasks. So, for a lot of
| engineers, the M1 is the better choice over Intels. (Faster
| compile times are a good thing for devs, among other things)
|
| In that regard, the moment feels similar to the mid 2000s, when
| you suddenly saw a huge uptick in Macbooks at technical
| conferences: software developers seemed to gravitate in droves to
| the Mac platform (over windows or linux). Over the last 5+ years,
| that's cooled a bit. But I wonder if the M1 will solidify or grow
| Apple's (perceived) dominance among developers?
| smoldesu wrote:
| > So, for a lot of engineers, the M1 is the better choice over
| Intels.
|
| I don't know which engineers you hang out with, but the only
| "engineer" I know who uses an M1 Mac does web design. Besides
| that, the M1 doesn't support most of the software that most
| engineers are using (unless you're working in a field already
| focused on ARM), and the fragility of a Macbook isn't really
| suited for a workshop environment. Among them, Thinkpads still
| reign supreme (though one guy has a Hackintosh, so maybe you
| were right all along?)
|
| > But I wonder if the M1 will solidify or grow Apple's
| (perceived) dominance among developers?
|
| Developers are going to be extremely split on this one. ARM is
| still in it's infancy right now, especially for desktop
| applications, so unless your business exclusively relies on
| MacOS customers, there's not much of a value proposition in
| compiling a special version of your program for 7% of your
| users. Because of that, MacOS on ARM has one of the worst
| package databases in recent memory. Besides, there's a
| lightning round full of other reasons why MacOS is a pretty
| poor fit for high-performance computing:
|
| - BSD-styled memory management causes frequent page faults and
| thrashes memory/swap
|
| - Quartz and other necessary kernel processes consume
| inordinate amounts of compute
|
| - MacPorts and Brew are both pretty miserable package managers
| compared to the industry standard options.
|
| - Macs have notoriously terrible hardware compatability
|
| - Abstracting execution makes it harder to package software,
| harder to run it, and harder to debug it when something goes
| wrong
|
| ...and many more!
|
| In other words, I doubt the M1 will do much to corroborate
| Apple's perceived superiority among the tech-y crowd. Unless
| people were really that worried about Twitter pulling up a few
| dozen milliseconds faster, I fail to see how it's any better of
| an "engineering" laptop than it's alternatives. If anything,
| it's GPU is significantly weaker than most other laptops on the
| market.
| kortex wrote:
| I dewelop machine learning data pipelines. My dev environment
| of choice is a Macbook running JetBrains suite, Alfred,
| BetterTouchTool, and an iTerm pane or 3 with Mosh and Tmux
| running on an Ubuntu Server. HID is a tricked out Ergodox EZ,
| right hand mouse, left hand Magic Trackpad. Prior to that I
| developed directly on Ubuntu 18/20, and windows well before
| that. The experience unparalleled. The OS/windowed desktop is
| smoother and less buggy than Gnome 3, more responsive than
| Win 7 or 10.
|
| I'm talking a 16GB Intel i7 8-thread Macbook vs a 32 thread
| twin-socket 72GB RTX2080 beast running Ubuntu 20. The Mac
| crushes it in terms of feel, fit and finish. I haven't tried
| M1 yet but I bet it'll one-up my current Intel macbook. I'm
| quite eager to get one.
|
| > Besides that, the M1 doesn't support most of the software
| that most engineers are using...
|
| ??? Other than CUDA, the macbook meets my needs 95% of the
| time. I'm mostly want for a native Docker networking
| experience a small minority of the time. I need a responsive
| GUI head to act as my development environment. All the heavy
| lift compute is done on servers/on-prem/in-cloud.
|
| > - BSD-styled memory management causes frequent page faults
| and thrashes memory/swap
|
| Only under super heavy memory demand. I close some browser
| tabs or IDE panes (I normally have dozens of both).
|
| > - Abstracting execution makes it harder to package
| software, harder to run it, and harder to debug it when
| something goes wrong
|
| Almost everything I do is containerized anyways, so this is
| moot.
|
| I was squarely one of those "why would anyone use mac? It's
| overpriced, lock in, $typical_nerd_complaints_about_mac"
| until COVID happened and it became my daily driver. Now I
| can't go back.
|
| > - MacPorts and Brew are both pretty miserable package
| managers compared to the industry standard options.
|
| No snark intended - Like what? I'm not exactly blown away by
| Brew, but it's been generally on par with Apt(get). Aptitude
| is marginally better. There's not a single package manager
| that doesn't aggravate me in some way.
| thebean11 wrote:
| My company seems intent on switching to M1 as soon as we can.
| I can't see it being very long until it's possible to develop
| and run our backend services locally (Go, Docker, IntelliJ,
| maybe a couple other small things).
| astrange wrote:
| > - BSD-styled memory management causes frequent page faults
| and thrashes memory/swap
|
| macOS doesn't use the BSD VM.
|
| > If anything, it's GPU is significantly weaker than most
| other laptops on the market.
|
| You meant to say "stronger" at the task of not burning your
| lap.
| 8fGTBjZxBcHq wrote:
| This is a lot of really well expressed information that
| completely fails to grapple with the fact that most
| developers write code for the web and apple computers are
| extremely popular among web developers.
|
| All of the things you mentioned are also true of intel macs,
| which again are wildly popular among web devs of all kinds.
|
| If you can't explain the popularity of those machines in
| spite of those limitations, I don't see why I should accept
| those as reasons why apple arm computers won't be popular.
| astrange wrote:
| > This is a lot of really well expressed information
|
| Well, no, it's all totally made up.
| bombcar wrote:
| And developers will often be willing to take a grab at a
| new thing - and even if you're not doing web dev lots of
| your code likely runs on build farms anyway - the chance to
| play around with a workable ARM machine might be
| attractive.
| dimgl wrote:
| This is pretty disconnected from reality.
|
| > BSD-styled memory management causes frequent page faults
| and thrashes memory/swap
|
| You're basically pulling this out of nowhere. Not once in six
| years of using MacOS has this ever happened to me.
|
| > Quartz and other necessary kernel processes consume
| inordinate amounts of compute
|
| Yes, because Windows is so much better. This is sarcasm. Just
| pull `services.msc` and take a look at everything that's
| running.
|
| > MacPorts and Brew are both pretty miserable package
| managers compared to the industry standard options.
|
| In your opinion, what is industry standard? `apt-get` and
| `yum`? I have yet to come across a better package manager
| than Brew. Brew just works. Additionally, most binaries that
| are installed in Brew don't require elevation. Which is
| fantastic because almost every program installation requires
| elevation in Windows.
|
| > Macs have notoriously terrible hardware compatability
|
| Hardware compatibility in what sense? As in, plug-and-play
| devices on a MacOS powered machine?
|
| I'd argue the inverse; I often just plug in devices to my
| MacBook without having to install a single driver. Imagine my
| shock years ago when I plugged in a printer to my MacBook and
| I was able to immediately start printing without installing a
| single driver. Same with webcams, mice, etc.
|
| Do you mean hardware compatibility in terms of build targets?
| I think here you might be correct, but even then you can
| compile for different operating systems from within MacOS...
| so again, I'm not entirely sure what you mean here.
|
| I guess if you're talking about legacy devices where the
| hardware manufacturer hasn't bothered to create drivers for
| anything other than Windows, then your point might be valid,
| but how often does this happen...?
|
| > Abstracting execution makes it harder to package software,
| harder to run it, and harder to debug it when something goes
| wrong
|
| ...more disingenuous statements. What do you mean by this?
| Under the hood, MacOS is Unix. Everything that runs on a
| MacOS machine is a process. You can attach to processes just
| as you would on a Windows machine. Similarly, if you have the
| debugging information for a binary you can inspect the
| running code as well.
|
| MacOS is not a perfect operating system; for one, I do wish
| that it was better for game development. But I'm really
| struggling to understand your points here. Every single one
| is either not applicable or just straight up wrong.
| wiremine wrote:
| I'm a VP of a small software shop focused on IoT. We do
| embedded, cloud, mobile, web and ML. My entire department is
| 100% mac.
| artursapek wrote:
| I switched away from Macs to Thinkpads when the butterfly
| keyboards came out. Just happily switched back to Macs and
| loving my M1 Air.
| aidenn0 wrote:
| That really confused me until I discovered that Macs had
| something called a "butterfly keyboard" as well. At first I
| thought you switched to Thinkpads in the mid 90s:
|
| https://en.wikipedia.org/wiki/IBM_ThinkPad_701
| simondotau wrote:
| It's amazing how much of a dumpster fire the butterfly
| keyboard was, but perhaps more bewilderingly how long they
| persisted with it. That keyboard might well have done more
| than any other product design decision in the last decade to
| push people away from the Mac platform.
| buu700 wrote:
| If the M1 MBA (or early 2020 Intel MBA) didn't exist, when
| my 2013 MBP started dying last month I would have dropped
| Apple for an Ubuntu XPS.
|
| I don't understand how a company of that size kept up a
| blunder of that magnitude for half a decade, but a half-
| broken keyboard with an entire missing row of keys was just
| a nonstarter.
| artursapek wrote:
| I think it did, and the touch bar was a close second.
| Thankfully, whoever was in charge of those decisions seems
| to have lost decision-making power and Apple has walked
| back both.
| totalZero wrote:
| > whoever was in charge of those decisions seems to have
| lost decision-making power
|
| I guess we're both speculating here, but even if the same
| people were in charge, I think they breathed a sigh of
| relief when they realized that the personal computing
| segment had spiced up again. From about 2014 to 2019, Mac
| revenue basically plateaued in line with the entire
| laptop market. People were crazy about phones but laptops
| had hit a wall.
|
| When you have to sell laptops to people whose yesteryear
| laptops do everything they need, you start adding random
| bullshit to the product because you have to capture the
| market's attention somehow. I think this is how we ended
| up with the touch bar. It's a step backward, but it's
| flashy and made the product look fresh(er) despite the
| form factor being identical to what they were selling in
| 2013.
| Arcuru wrote:
| Interesting, the article isn't clear how it's implemented and
| whether they dynamically adjust the QoS/Core, so I wonder if they
| solved the inversion problem. Basically you don't want a high-
| priority process to be dependent on a lower-priority process
| that's locked on a LITTLE core.
|
| Windows solves this by placing basically everything onto the
| LITTLE cores, and user interactions serve as a priority bump up
| to the big cores. The key (which is difficult to implement in
| practice) is that they pass that priority along to every other
| called thread in the critical path so that every path necessary
| to respond to the user can be placed on the big cores. This means
| that if you are waiting for a response from the computer, every
| thread necessary to provide that response, including any normally
| low-priority background services, is able to run on the big
| cores.
|
| I'd expect all the other OS's do a similar optimization for the
| big.LITTLE architectures. It seems to be the natural solution if
| you've worked in that space long enough.
| dep_b wrote:
| It would be interesting to see if there are also optimisations to
| keep the main (UI) thread clear as much as possible. I mean if I
| just clicked a button then probably that is the most interesting
| thing for me to happen, so responding to that click should not be
| blocked by other high priority tasks.
|
| I often have the feeling on my i9 machine it's almost idle but
| the specific core of my UI is working on something else which I
| don't particularly care for at the moment I'm interactive with
| the computer.
| varispeed wrote:
| People say that Ryzen is faster - but if it has a fan, then this
| is a deal breaker. I have become so annoyed by fans I often just
| stop work when they start whirling, go make a coffee etc. It
| seems like M1 is now the only laptop that can let you work in
| silence and be productive.
| CyberDildonics wrote:
| If fans are keeping you from working it might be time for water
| cooling or ear plugs.
| deepdmistry wrote:
| usernanme checks out :)
| varispeed wrote:
| Wow I am fluttered
| theresAt wrote:
| M1 will soon have a fan due to the overheating issue.
|
| Although I imagine given how slow the response was on the
| butterfly keyboard issue it might be a few years out.
| gjsman-1000 wrote:
| What overheating issue? There isn't one.
| toast0 wrote:
| When was the last time Apple made anything with good thermal
| design?
|
| The M1 is designed for this. Wide cores that do a lot per
| clock, but can't be clocked high is perfect, because Apple
| would rather not clock high. Intel and AMD are still fighting
| the MHz wars, and can't release a core design that doesn't
| approach 5 GHz, even if the lower power chips won't.
| coldtea wrote:
| What overheating issue?
|
| There isn't one in M1.
| kllrnohj wrote:
| There isn't any thermal slowdowns on the M1 devices with a
| fan, no. The only M1 device without a fan shows thermal
| limitations, though.
|
| Not an overheating issue, no, but the M1 still benefits
| from a fan, and obviously whatever Apple does in a Mac Pro
| class machine will also have a fan. They aren't going to
| keep at ~25W in a full size desktop, that'd be nonsense.
| gjsman-1000 wrote:
| Only if you run it at 100% for like half an hour and even
| then the throttling is so minor that most reviewers said
| to not buy the MBP for a fan.
| kortex wrote:
| Question for the peanut gallery: In a -nix OS system, let's say
| single-threaded process is running on core 0. It makes a syscall.
| Does core 0 have to context-switch in order to handle the
| syscall, or could the kernel code run on core 1? This would allow
| a process to run longer without context switching and cache
| dumping, which would improve performance (as well as potentially
| security).
|
| Corollary question: could one make a kernel-specific core and
| would there be benefit to it? Handle all the I/O, interrupts and
| such, all with a dedicated, cloistered root-level cache.
| benlivengood wrote:
| Cross-core interrupts and syscalls are slow. The caches for any
| memory to be transferred have to be flushed on the calling core
| and its APIC would have to interrupt the destination core to
| start executing the syscall, which means entering kernel mode
| on the calling core to get access to the APIC in the first
| place.
|
| If the goal is low latency then staying on a single core is
| almost always better. To effectively use multiple cores
| requires explicit synchronization such as io_uring which uses a
| lock-free ring buffer to transfer opcodes and results using
| shared buffers visible from userspace and the kernel. io_uring
| has an option to dedicate a kernel thread to servicing a
| particular ring buffer, and this can also be limited/expanded
| to a set of cores. I have zero experience with io_uring in
| practice and so I don't know what a good tradeoff is between
| servicing a ring buffer from multiple or single cores. The
| entries are lightweight and so cache coherency probably isn't
| too expensive and so for a high CPU workload that also needs
| high throughout allowing other cores to service IO probably
| makes sense.
|
| I think newer x86_64 chips also allow assigning interrupts from
| specific hardware to specific cores to effectively run kernel
| drivers mostly on a subset of cores, or to spread it to all
| cores under heavy I/O.
| LightG wrote:
| "A peanut gallery was, in the days of vaudeville, a nickname
| for the cheapest and ostensibly rowdiest seats in the theater,
| the occupants of which were often known to heckle the
| performers.[1] The least expensive snack served at the theatre
| would often be peanuts, which the patrons would sometimes throw
| at the performers on stage to convey their disapproval. Phrases
| such as "no comments from the peanut gallery" or "quiet in the
| peanut gallery" are extensions of the name.[1]"
|
| https://en.wikipedia.org/wiki/Peanut_gallery
| mousepilot wrote:
| Posts like these are the gems that I come to HN for, as much
| as I've used the phrase, I never considered the source of it.
| I'd imagine I'd be in the cheap seats as well. Thanks!
| kccqzy wrote:
| If the system call is being handled by core 1, what does core 0
| do? It must somehow wait for the core 1 system call to return.
| I'm not sure the waste created by waiting is outweighed by
| cache benefits.
| apexalpha wrote:
| "Because Macs with Intel processors can't segregate their tasks
| onto different cores in the same way, when macOS starts to choke
| on something it affects user processes too."
|
| Why is this? I understand that Intel chips don't use BIG/little
| but couldn't you assign all OS/background tasks to one core and
| leave the other cores for user tasks? Shouldn't the scheduler be
| able to do this?
| jtriangle wrote:
| Because they didn't design the OS to do that. You very well
| could lock a core or two at low freq/power and assign
| background tasks to that. It's not really a new or exotic idea.
|
| macOS doing this is a side effect of them needing to do so,
| with the added benefit of it actually making things nicer all
| around. They could very well do this on intel/amd chips to the
| same effect.
|
| The bummer is, we really don't know how good the silicon is,
| because nothing but macOS runs on it natively, so you can't
| really get an apples to apples comparison.
| cogman10 wrote:
| Probably a fight with CPU frequency ramping. You could put all
| the tasks on one core, but then you risk the CPU frequency
| spiking when it shouldn't causing more power to be drawn than
| necessary.
|
| With slow light power cores, that's less of an issue.
| wmf wrote:
| The OS also controls the frequency so it could lock the
| background core(s) at low frequency.
| sscarduzio wrote:
| Yeah cool, it's fast. But why so little RAM? Virtualisation and
| other professional software must be already a pain to run due to
| the ARM architecture, so why adding a memory handicap to it?
| jtbayly wrote:
| This strategy has had a strange negative effect for me a couple
| of times.
|
| It used to be that I would notice rogue background tasks eating
| up my processor because I could feel the UI acting slow and the
| fans would kick on.
|
| Now the only indication I get is my battery depleting more
| rapidly than expected. The UI is completely responsive still, and
| there are no fans. But I'm never checking my battery level
| anymore, so I don't catch these processes for much longer.
| bombcar wrote:
| I have Pluto running on a side monitor and sometimes catch the
| processors going insane when I don't expect it.
|
| https://news.ycombinator.com/item?id=25624718
| yepthatsreality wrote:
| Zoom! Zoom! Look at my first world computation speeds!
| EricE wrote:
| Ha - from another HN thread:
| https://fabiensanglard.net/lte/index.html
| selimnairb wrote:
| I was wondering how macOS scheduled the efficiency cores but was
| too lazy to dig into the docs. Thanks!
| artursapek wrote:
| The other half of the story for the efficiency cores seems to be
| the battery life. Most of these non-urgent background tasks are
| put on efficiency cores not only to separate them from my
| applications, but also to enable this insanely good battery life.
| jws wrote:
| There is also a thermal benefit. Using the high efficiency
| cores puts a lower load on the cooling system. So even if the
| computer is plugged in and the performance cores are idle it
| may be best not to use them and let the cooling system gain
| some headroom for when the performance cores are needed.
| artursapek wrote:
| Great point. Overall it seems like a very well designed
| system.
| dsabanin wrote:
| Finally, setting all these QoS priorities on DispatchQueue's is
| paying off big time.
| xbar wrote:
| An universally used, opinionated API with enough flexibility to
| let developers express their ideas about priority really helps.
| astrange wrote:
| Most people set way too many of those. Especially "background",
| where you're in for a surprise if you hit the actual background
| performance guarantees, since it may not run for minutes.
| sizzle wrote:
| Will M1 silicon make the jump to iPhone/iPad in the future?
| eyesee wrote:
| The latest iPad Pro does use M1.
|
| It's more accurate to say that M1 made the jump _from_ the A14
| chips already in iPhones, as it 's an enhanced variation of
| that design.
| zelon88 wrote:
| That is all well and good. Great actually. And it makes sense.
| The high efficiency reduced instruction set computer has highly
| optimized architecture to improve efficiency. But we all knew
| that deep down. It's like the Ford Tempo. A mere 80 horsepower
| mated to a well designed and highly optimized transmission gave
| levels of performance that were higher than other cars in the
| same market.
|
| What I can't stand are all the people saying the M1 is capable of
| replacing high end x86 computers. It's like owning a Tempo and
| mistakenly thinking your highly specialized transmission means
| you can walk up and challenge a 6.7L Dodge Challenger to a race.
| It's completely ridiculous and demonstrates a stunning lack of
| self awareness.
| spideymans wrote:
| >What I can't stand are all the people saying the M1 is capable
| of replacing high end x86 computers.
|
| Whether or not this is possible completely depends on their
| unique workload. The M1 could probaly obviate any x86 chip for
| the average programmers. But that wouldn't be the case for
| gamer workloads, for example.
| zelon88 wrote:
| Is it really about workload though?
|
| A high end gaming PC will compile typescript, play games with
| high quality graphics, and run your 100 browser tabs. Let's
| be honest, an Intel Core 2 Duo would obviate most programmer
| workloads just fine.
|
| By your logic the Apple M1 is comparable to a high-end x86
| chips, but you go on in the same breath with the disclaimer
| that it is "only certain workloads."
|
| That is my point. A high end PC doesn't care about your
| workload. It is unobjectively fast. Period. You are the Tempo
| driver claiming that you would have beaten that Challenger if
| the temperature of the drag strip were just a little bit
| hotter....
|
| But that Challenger doesn't care about operating conditions
| or workload or whatever. It will win today, tomorrow, and the
| next day while you're claiming that the Tempo is "comparable
| in the snow on Tuesdays."
| spideymans wrote:
| >Let's be honest, an Intel Core 2 Duo would obviate most
| programmer workloads just fine.
|
| That's absolutely not the case. At least for my workflows.
| My work-issued five year old Intel i5 XPS would get
| incredibly slow, and incredibly hot when running unit tests
| or compiling code in Angular. Goes without saying that a
| Core 2 Duo would be simply unusable.
|
| My personal machine was a 2018 i5 MacBook Pro. On that
| machine I could open just _one_ iOS Simulator or Android
| Emulator and expect to have merely decent performance. And
| by decent, I mean the system would be running hot, the fans
| would be loud and the system would be noticeably slower.
| Having multiple simulators or emulators open is simply not
| a thing this Intel notebook was willing to do. Meanwhile on
| my M1, I 've ran six iOS Simulators / Android Emulators at
| once, and the system did not slow down, _at all_ (for
| reasons discussed in this article).
|
| In a futile attempt to push my M1 to its limits in a dev
| workflow, I even trained TensorFlow models on it. Surprise,
| surprise: not only was the model trained in a decent time
| (for a notebook without a dGPU; I wasn't expecting
| miracles), but it did all this without any perceptible
| slowdown in system performance. The absolute worse thing I
| could have said about the M1s performance here was that the
| MacBook Air's chassis got _a little_ hot. And keep in mind
| that when I way _a little_ hot, I mean that it still
| produced less heat than my Intel notebooks often produce
| when watching a 4K YouTube video (and it does this all
| without a fan!).
|
| So in short, no, the Intel Core 2 Duo would absolutely not
| be fine for my programmer workloads. And, yes, the M1 has
| pretty much obviated Intel's lineup for my work. Dare I
| say, the M1 MacBook Air has been the single biggest
| quality-of-life upgrade I've had as a developer. It's the
| first notebook I've used where I've never been bottlenecked
| by system performance. It's a fantastic machine through and
| through.
| LegitShady wrote:
| I was thinking hard about getting an m1 mini but the upgrade
| pricing for ssd/ram is offensive and you can't upgrade them
| later. I will likely end up getting a newer ryzen system but
| it will end up costing much more for the cpu+motherboard, but
| will have access to dedicated graphics and I can actually
| just move my current 32gb of ddr4, video card, and SSDs over.
| proxyon wrote:
| I think you're right on this. I've also been irritated at the
| "eGPUs / dGPUs are no longer necessary" crowd. If an SoC GPU
| with the performance / size of an RTX could fit into a notebook
| we'd actually have them in notebooks... but we don't. The fact
| of the matter is that this technology doesn't exist. So the
| allegations that GPUs are no longer necessary because SoCs have
| already wholesale replaced them are comical.
| agloeregrets wrote:
| ...This is partially true in that the M1 is overhyped by
| some...
|
| But the M1 is also literally, by the numbers, the fastest
| machine in the world for some workloads. For us, my M1 Macbook
| Air compiles our large TypeScript Angular app in half the time
| of a current i9 with 64Gb ram on node v15. These are real-
| world, tested, and validated results. Nothing comes
| close..because the chip is designed to run JS as fast as
| physically possible. It's insanely fast at some workloads. Not
| all of them of course, but it's no Tempo. I would say it's more
| like a Tesla, faster than it should be considering the HP (a
| model S is much faster than cars with way more HP) but faster
| due to 'How' it makes the power rather than how much.
|
| That said, I replaced a $2000 iMac 4K with a base model Macbook
| Air and it was a huge upgrade for my day to day work. It really
| is perfectly fine to replace some workstations.
| postalrat wrote:
| What environment you compiling your large typescript app in?
| Windows with WSL on a 7400 rpm disk drive?
|
| I think you have other things going on that's slowing down
| the compilation on the i9.
|
| Maybe if run linux and install a fast SSD your i9 will be
| faster than your m1. Then by your logic you should ditch the
| m1.
| agloeregrets wrote:
| Ironically it was a Samsung Pci-E drive in non-WSL Windows.
| A 16 inch Macbook Pro i9 also put out roughly alike times
| in MacOS. Node compile is a single-core workload with a
| very strong focus on small fast translated instructions,
| the M1 is actually that fast at that specific task.
| postalrat wrote:
| https://doesitarm.com/tv/m1-macbook-vs-intel-i9-in-
| continuou...
|
| That dude says both his intels are faster.
| king_magic wrote:
| > That said, I replaced a $2000 iMac 4K with a base model
| Macbook Air and it was a huge upgrade for my day to day work.
|
| Essentially the same here, I just happened to replace my
| $2-3k Win10/Ubuntu desktop with my 8GB(!) M1 Air and it truly
| has been a huge upgrade for my day to day work. It feels like
| I'm working in the future.
|
| That said - I primarily use this Air to ssh to all of my
| servers that do my heavy lifting for me. But that doesn't
| stop me from driving this thing hard - dozens of open tabs,
| multiple heavyweight apps left open (e.g. Office apps),
| multiple instances of VS Code, Slack, Teams, all running -
| zero slowdown. Zero fan sound.
|
| It's black magic good.
| riggsdk wrote:
| I'm curious what you think of all the benchmarks that sais
| otherwise. In a race between a motorized lawn mower and a
| racecar - if the lawn mower wins which one is ultimately the
| best "racecar"?
| steve76 wrote:
| More like, I need a racecar, because I live in a Mad Max
| death race.
|
| vs
|
| I ride a quiet lawn mower at dawn and do my chores in a happy
| culture where people help each other.
|
| My computer is 10 years old. I paid $100 for it when I bought
| it used. It runs fine because I know what I'm doing, and now
| I feel guilty about that. I wished I would run slack now. I
| bought it because of 4K streaming, but then shortly after all
| the home theater devices came out.
|
| Should we really be encouraging borrowing and debt so people
| can buy new TVs and phones they throw away every year?
| Wouldn't that capital be better spent on something like
| medical research? You can borrow all the money you want. Good
| luck getting a job with them.
| nicoburns wrote:
| This is a pretty smart strategy. I'd say the majority of the time
| when something is slow on my machine, it's because there's a
| resource intensive background process (that I don't really need
| to complete quickly) eating up all my system resources.
|
| It seems like the same strategy would also make sense on Intel
| processors, although it probably requires at least 4 cores to
| make sense?
| scoopertrooper wrote:
| > It seems like the same strategy would also make sense on
| Intel processors
|
| Intel agrees!
|
| https://en.wikipedia.org/wiki/Alder_Lake_(microprocessor)
| draw_down wrote:
| > Intel expects to launch Alder Lake in the second half of
| 2021
|
| Let's wait for this to actually ship first, shall we
| heftig wrote:
| You really want those "efficiency" cores in addition to your
| regular ones, otherwise dividing your processor costs you a lot
| of throughput.
|
| You also want them to be low-power even when saturated,
| otherwise you gain responsiveness from the "performance" cores
| but your "efficiency" cores aren't actually efficient.
|
| It seems at least Linux's x86_energy_perf_policy tool lets you
| set multiplier ranges and some performance-vs-power values per-
| core, which means such a setup doesn't seem impossible on
| current Intel hardware.
| wmf wrote:
| You could clock down the background cores and make them run
| more efficiently even if they're identical.
| lupire wrote:
| You can do it on one processor with a process/thread scheduler.
| This is what makes multitasking possible in general.
| nicoburns wrote:
| Well yes, but presumably this impacts responsiveness. The
| whole point of this strategy is to keep cores free for
| interactive tasks.
| mnw21cam wrote:
| This is extraordinarily basic stuff. We knew how to do this
| kind of multitasking, with priorities, back in the 80s (or
| earlier). Yet people still don't understand it.
|
| As an example, a while back I ran a multi-threaded process on
| a shared work server with 24 cores. It used zero I/O and
| almost no RAM, but had to spend a couple of days with 24
| threads to get the result. I ran it inside "chrt -i", which
| makes Linux only run it when there is absolutely nothing else
| it could do. I had someone email and complain about how I was
| hogging the server, because something like 90% of the CPU
| time was being spent on my process. That's because their
| processes spent 90% of their time waiting for disc/network.
| My process had zero impact on theirs, but it took some
| explaining.
| StavrosK wrote:
| That's a useful command to know, thanks! However, shouldn't
| `nice` be handling that sort of stuff? Do you know why
| there are two commands for the same thing (as far as I can
| see)?
| mnw21cam wrote:
| I would love to know why nice doesn't do this, and why
| top/ps doesn't show something more sensible when you do.
| adriancr wrote:
| I'm still assuming it's faster due to unified memory
| architecture, latest generation lithography and less on tailored
| behavior for their OS.
|
| An AMD/Intel using same soldered RAM next to CPU and same process
| node would give Apple a run for its money.
|
| Still, the optimizations on OS side are interesting here.
| Toutouxc wrote:
| I thought the "unified memory" meme had been debunked multiple
| times here.
| SigmundA wrote:
| Memory is unified its just not on die.
|
| CPU and GPU and AI share unified memory which means zero copy
| transfers between them.
| icedchai wrote:
| I laughed when they called it "unified memory." Amazing
| what some marketing can do. In previous years, that was
| called "shared graphics memory" and it was only for low end
| systems.
| floatboth wrote:
| Well, any iGPU systems. Interestingly, game consoles too,
| where GDDR is the only memory, so it's kinda "inverted"
| in a sense from the laptop setup.
| Aaargh20318 wrote:
| Shared graphics-memory is not the same as unified memory.
|
| In a shared graphics-memory system, a part of the system
| RAM is reserved for the GPU. The amount of RAM reported
| to the OS is the total RAM minus the chunk reserved the
| GPU uses. The OS only can use it's own part, and cannot
| access the GPU memory (and vice versa). If you want to
| make something available to the GPU, it still has to be
| copied to the reserved GPU part.
|
| In unified memory both the OS and the GPU can access the
| entire range of memory, no need for a (slow) copy.
| Consoles use the same strategy (using fast GDDR for both
| system and GPU), and it's one of the reasons consoles
| punch above their weight graphically.
|
| The main reason that high-end GPUs use discrete memory is
| because they use high-bandwidth memory that has to live
| very close to the GPU. The user-replaceable RAM modules
| in a typical PC are much too far away from the GPU to use
| the same kind of high bandwidth memory. If you drop the
| 'user replaceable' requirement and place everything close
| together, you can have the benefits of both high
| bandwidth and unified memory.
| floatboth wrote:
| > If you want to make something available to the GPU, it
| still has to be copied to the reserved GPU part.
|
| Or allocate on the GPU side and get direct access to it
| from the CPU, achieving zero-copy:
|
| https://software.intel.com/content/www/us/en/develop/arti
| cle...
| kllrnohj wrote:
| > If you drop the 'user replaceable' requirement and
| place everything close together, you can have the
| benefits of both high bandwidth and unified memory
|
| Rather, if you drop the "big GPU" requirement then you
| can place everything close together. So called APUs have
| been unified memory for years & years now (so more or
| less Intel's entire lineup, and AMD's entire laptop SKUs
| & some desktop ones).
|
| It still ends up associated with low-end because there's
| only so much die space you can spend on an iGPU, and the
| M1 is no exception there. It doesn't come close to a mid-
| range discreet GPU and it likely won't ever unless Apple
| goes chiplets so that multiple dies can share a memory
| controller.
|
| With "normal" (LP)DDR you still run into severe memory
| bottlenecks on the GPU side as things get faster, so that
| becomes another issue with unified memory. Do you
| sacrifice CPU performance to feed the GPU by using
| higher-latency GDDR? Or do you sacrifice GPU performance
| with lower-bandwidth DDR?
| SigmundA wrote:
| From the Anand article you linked further up:
|
| "The first Apple-built GPU for a Mac is significantly
| faster than any integrated GPU we've been able to get our
| hands on, and will no doubt set a new high bar for GPU
| performance in a laptop. Based on Apple's own die shots,
| it's clear that they spent a sizable portion of the M1's
| die on the GPU and associated hardware, and the payoff is
| a GPU that can rival even low-end discrete GPUs."
|
| This is their first low end offering, they seem to be
| taking full advantage of UMA more so than anyone to this
| point. It will be interesting to see if they continue
| this with a higher "pro" offering or stick with a
| discrete CPU to stay competitive.
|
| My guess is Apple will be the one to make UMA integrated
| graphics rival discrete GPU's, it will be interesting to
| see if that happens.
| Pulcinella wrote:
| Yeah I believe the M1 GPU(2.6 TFLOPS) falls between a PS4
| (1.8 TFLOPS) and PS4Pro (4.2 TFLOPS). Yes the original
| PS4 came out in 2013, but still I find it impressive that
| a mobile integrated GPU has that much computational Power
| with no fan and that power budget.
|
| I do wonder what they are going to do with the higher end
| MBP, iMacs, and Mac Pro (if they make one). Will they
| have an "M1X" with more GPU cores or will they offer a
| discrete option with AMD GPUs. I do think we could
| potentially see an answer at WWDC. I wouldn't be
| surprised if eGPU support was announced for ARM Macs at
| WWDC.
| bombcar wrote:
| Part of the issue appears to be the number of PCI lanes
| the M1 can support (which is why it's limited to two
| monitors).
|
| I'm not sure they'll improve that by making a M1X or just
| gluing multiple M1s together.
| kllrnohj wrote:
| None of those games were designed for UMA nor do they
| benefit as the API design for graphics forces copies to
| happen anyway.
|
| The M1's GPU is good (for integrated), but UMA isn't the
| reason why. I don't know why people seem so determined to
| reduce Apple's substantial engineering in this space to a
| mostly inconsequential minor architecture tweak that
| happen a decade ago.
| SigmundA wrote:
| From the article: "Meanwhile, unlike the CPU side of this
| transition to Apple Silicon, the higher-level nature of
| graphics programming means that Apple isn't nearly as
| reliant on devs to immediately prepare universal
| applications to take advantage of Apple's GPU. To be
| sure, native CPU code is still going to produce better
| results since a workload that's purely GPU-limited is
| almost unheard of, but the fact that existing Metal (and
| even OpenGL) code can be run on top of Apple's GPU today
| means that it immediately benefits all games and other
| GPU-bound workloads."
|
| https://developer.apple.com/documentation/metal/setting_r
| eso...
|
| https://developer.apple.com/documentation/metal/synchroni
| zin...
|
| "Note In a discrete memory model, synchronization speed
| is constrained by PCIe bandwidth. In a unified memory
| model, Metal may ignore synchronization calls completely
| because it only creates a single memory allocation for
| the resource. For more information about macOS memory
| models and managed resources, see Choosing a Resource
| Storage Mode in macOS."
|
| I am not trying to minimize the other engineering
| improvements, however I do believe there may be less
| credit being given to the UMA than deserved due to past
| lackluster UMA offerings. As I said it will be
| interesting to see how far Apple can scale UMA I am not
| sure they can catch discrete graphics but I am starting
| to think they are going to try.
| kllrnohj wrote:
| To leverage shared memory in Metal you have to target
| Metal. Otherwise take for example glTexImage2D:
| https://www.khronos.org/registry/OpenGL-
| Refpages/gl4/html/gl...
|
| Apple can't just hang onto that void* that's passed in as
| the developer is free to re-use for something else after
| the call. It _must_ copy, even on a UMA system. And even
| if it was adjusted such that glTexImage2D took ownership
| of the pointer instead, there 'd still be an internal
| copy anyway to swizzle it as linear RGBA buffers are not
| friendly to typical GPU workloads. This is why for
| example per Apple's docs above when it gets to the
| texture section it's like "yeah just copy & use private."
| So even though in theory Metal's UMA exposure would be
| great for games that stream textures, it still isn't
| because you still do a copy anyway to convert it to the
| GPU's internal optimal layout.
|
| Similarly the benefits of UMA only help if transfering
| data is actually a significant part of the workload,
| which is not true for the vast majority of games. For
| things like gfxbench it _may_ help speedup the load time,
| but during the benchmark loop all the big objects are
| only used on the GPU (like textures & models)
| SigmundA wrote:
| I believe most of the benchmarks where Metal based in the
| Anand article, also PBO have been around for quiet a
| while in OpenGL:
|
| https://developer.apple.com/library/archive/documentation
| /Gr...
|
| Any back and forth between CPU and GPU will be faster
| with unified memory especially with a coherent on die
| cache.
|
| This is the same model from iOS so just about anyone
| doing metal will already be optimizing for it same with
| any other mobile development.
|
| It doesn't seem like a minor architectural difference to
| me:
|
| "Comparing the two GPU architectures, TBDR has the
| following advantages:
|
| It drastically saves on memory bandwidth because of the
| unified memory architecture. Blending happens in-register
| facilitated by tile processing. Color, depth and stencil
| buffers don't need to be re-fetched."
|
| https://metalkit.org/2020/07/03/wwdc20-whats-new-in-
| metal.ht...
| kllrnohj wrote:
| > I believe most of the benchmarks where Metal based in
| the Anand article
|
| But that doesn't tell you anything. Being Metal-based
| doesn't mean they were designed nor benefit from UMA.
|
| Especially since, again, Apple's own recommendation on
| big data (read: textures) is to copy it.
|
| > Any back and forth between CPU and GPU will be faster
| with unified memory especially with a coherent on die
| cache.
|
| Yes, but games & gfxbench _don 't do this_ which is what
| I keep trying to get across. There _are_ workloads out
| there that will benefit from this, but the games &
| benchmarks that were run & being discussed aren't them.
| It's like claiming the sunspider results are from wifi 6
| improvements. There _are_ web experiences that will
| benefit from faster wifi, but sunspider ain 't one of
| them.
|
| Things like GPGPU compute can benefit tremendously here,
| for example.
|
| > also PBO have been around for quiet a while in OpenGL:
|
| PBO's reduce the number of copies from 2 to 1 in some
| cases, not from 1 to 0. You still copy from the PBO to
| your texture target, but it can potentially avoid a CPU
| to CPU copy first. When you call glTexImage2D it doesn't
| necessarily do the transfer right then, it instead may
| copy to a different CPU buffer to later be copied to the
| GPU.
|
| > "Comparing the two GPU architectures, TBDR has the
| following advantages:
|
| > It drastically saves on memory bandwidth because of the
| unified memory architecture. Blending happens in-register
| facilitated by tile processing. Color, depth and stencil
| buffers don't need to be re-fetched."
|
| > https://metalkit.org/2020/07/03/wwdc20-whats-new-in-
| metal.ht...
|
| Uh, that blogger seems rather confused. TBDR has nothing
| to do with UMA, nor is Nvidia or AMD immediate mode
| anymore.
|
| Heck, Mali was doing TBDR long before it was ever used on
| a UMA SoC.
| astrange wrote:
| The modern approach to textures is to precompile them, so
| you can hand the data straight over. It's not as common
| to have to convert a linear to swizzled texture, though
| it can happen.
|
| Also, the Apple advice for OpenGL textures was always
| focused on avoiding unnecessary copies. (for instance,
| there's another one that could happen CPU side if your
| data wasn't aligned enough to get DMA'd)
|
| One reason M1 textures use less memory is the prior
| systems had AMD/Intel graphic switching and so you needed
| to keep another copy of everything in case you switched
| GPUs.
| bombcar wrote:
| As SigmundA points out a huge advantage Apple has is
| control of the APIs (Metal, etc) and the ability to
| structure them years ago so that the API can simply skip
| entire things (even when ordered to do them) as it's
| known it's not needed. An analogy would be a copy-on-
| write filesystem (or RAM!) that doesn't actually do a
| copy when asked to, it returns immediately with a
| pointer, and only copies if asked to write to it.
| icedchai wrote:
| It's basically the same thing. It's the same address
| space. If you want to get technical about it, the Amiga
| had "unified memory" in 1985.
| SigmundA wrote:
| Yes the Amiga had a form of UMA as did many other
| systems, the term UMA seems more widely used than "shared
| memory" its definite not just a marketing term.
|
| I don't believe Apple claimed to invent unified memory
| only that they are taking maximum advantage of the
| architecture more so than anyone to this point.
|
| Federighi:
|
| "We not only got the great advantage of just the raw
| performance of our GPU, but just as important was the
| fact that with the unified memory architecture, we
| weren't moving data constantly back and forth and
| changing formats that slowed it down. And we got a huge
| increase in performance."
|
| This seems to be talking about the 16mb SLC on die cache
| that CPU,GPU and other IP cores share:
|
| "Where old-school GPUs would basically operate on the
| entire frame at once, we operate on tiles that we can
| move into extremely fast on-chip memory, and then perform
| a huge sequence of operations with all the different
| execution units on that tile. It's incredibly bandwidth-
| efficient in a way that these discrete GPUs are not. And
| then you just combine that with the massive width of our
| pipeline to RAM and the other efficiencies of the chip,
| and it's a better architecture."
| SigmundA wrote:
| Its not a new term, copying data between cpu and gpu
| memory has always been expensive:
|
| https://docs.microsoft.com/en-
| us/windows/win32/direct3d11/un...
|
| https://patents.justia.com/patent/9373182
| kllrnohj wrote:
| > I'm still assuming it's faster due to unified memory
| architecture
|
| AMD, Intel, ARM, & Qualcomm have all been shipping unified
| memory for 5+ years. I'd assume all the A* SoCs have been
| unified memory for that matter too unless Apple made the
| weirdest of cost cuts.
|
| Moreover literally none of the benchmarks out there include
| anything at all that involves copying/moving data between the
| CPU, GPU, and AI units. They are almost always strictly-CPU
| benchmarks (which the M1 does great in), or strictly-GPU
| benchmarks (where the M1 is good for integrated but that's
| about it)
|
| > An AMD/Intel using same soldered RAM next to CPU and same
| process node would give Apple a run for its money.
|
| AMD's memory latency is already better than the M1's. Apple's
| soldered RAM isn't a performance choice:
|
| "In terms of memory latency, we're seeing a (rather expected)
| reduction compared to the A14, measuring 96ns at 128MB full
| random test depth, compared to 102ns on the A14." source:
| https://www.anandtech.com/show/16252/mac-mini-apple-m1-teste...
|
| "In the DRAM region, we're measuring 78.8ns on the 5950X versus
| 86.0ns on the 3950X." https://www.anandtech.com/show/16214/amd-
| zen-3-ryzen-deep-di...
| adriancr wrote:
| > AMD's memory latency is already better than the M1's.
| Apple's soldered RAM isn't a performance choice:
|
| Careful what you are comparing, in your examples the other
| CPU is also faster.
|
| 3950x is a desktop CPU and is faster then M1 ->
| https://gadgetversus.com/processor/apple-m1-vs-amd-
| ryzen-9-3...
|
| 5950x is even faster ->
| https://gadgetversus.com/processor/apple-m1-vs-amd-
| ryzen-9-5...
|
| Lower latencies are likely due to higher clock.
|
| For equivalent laptop specific CPU, you will get a speedup
| from on-package RAM vs user replaceable RAM placed further
| away, even desktops would benefit but it would not be a
| welcome change there.
| kllrnohj wrote:
| > Lower latencies are likely due to higher clock.
|
| That's not really how dram latency works. In basically all
| CPUs the memory controller runs at a different clock than
| the CPU cores do, typically at the same clock as the DRAM
| itself but not always.
|
| If you meant the dram was running faster on the AMD system
| then also no. The M1 is using 4266mhz modules while the AMD
| system was running 3200mhz ram
|
| > For equivalent laptop specific CPU, you will get a
| speedup from on-package RAM vs user replaceable RAM placed
| further away, even desktops would benefit but it would not
| be a welcome change there.
|
| Huge citation needed. There's currently no real world
| product that matches that claim nor a theoretical one as
| the physical trace length is minimal latency difference and
| far from the major factor.
| skohan wrote:
| As far as I understand, it has a lot to do with the actual
| design of the processor[1], and not so much to do with the on-
| chip memory or the software integration.
|
| [1]: https://news.ycombinator.com/item?id=25257932
| whatever1 wrote:
| Launching apps takes ages on M1 mini. Reminded me of the good
| Windows Phone times.
| jimueller wrote:
| Ah, Windows Phones were very responsive.
| mlvljr wrote:
| Still are.
| toast0 wrote:
| They were and they weren't. UI was pretty responsive, but
| starting programs was often slow. In 8.1 and above, starting
| the home screen was sometimes slow. So many dots crossing the
| screen.
| AJRF wrote:
| Are you sure they aren't being run through Rosetta? If I
| remember correctly, x86 apps run through a translation process
| on first launch which obviously takes time. An acceptable trade
| off given the alternative is to not have the app at all on
| Apple Silicon.
| tylerfontaine wrote:
| Strange. With the exception of electron apps (which are slow
| everywhere), I've found my M1 mini to be extremely responsive.
| Much moreso than my 16" mpb.
| wayneftw wrote:
| VS Code launches instantly on my i7-4770 under Linux with
| XFCE. Same under Windows.
|
| So do Slack and Spotify...
|
| That's an 8 year old CPU! If you're having issues, the
| problem lies elsewhere not with Electron.
| postalrat wrote:
| I think half the people in this thread are posting what
| they want to see and don't reflect reality.
| whatever1 wrote:
| Cold start of Electron, MS office, Affinity (all ARM
| versions) takes me north of 15".
|
| 8GB M1 Mini, so maybe that makes the situation worse.
| Toutouxc wrote:
| Affinity Photo starts in 15 seconds on my M1, which IMO is
| not okay. It's just as slow as on my Haswell Intel Mac.
| Everything else launches in a second or so. Spotify
| (Electron) draws a window in three seconds, but takes
| another four to display all content.
| drcongo wrote:
| For comparison, I have a maxed out current gen 16" Intel
| MBP and Affinity takes 25 seconds to launch.
| ziml77 wrote:
| Not sure what's causing the difference but it launches in
| 4 seconds for me. I have the base model 16" MBP.
| jermaustin1 wrote:
| I have the 8GB M1 as my daily driver, the only apps that
| take a long time for me (but NOWHERE NEARE 15 seconds -
| maybe close to 5, but I'm not counting) are visual studio
| Mac (which is probably rosetta), Unity (which IS rosetta)
| and blender (also rosetta).
| celsoazevedo wrote:
| Can it be related to this?
|
| https://news.ycombinator.com/item?id=23273247
| whatever1 wrote:
| That was my first Mac, so maybe it is a MacOS issue not M1.
|
| I just described my bad experience with M1 with Big Sur. Not
| sure who is to blame.
| coldtea wrote:
| Launching apps is lighting fast on M1 mini, and much faster
| than on Intel that it's not even funny.
|
| Perhaps you have in mind some specific problematic app?
|
| Or you include Intel apps that are translated to ARM on the fly
| on their first launch?
| akmarinov wrote:
| Weird, are they Rosetta apps? I haven't seen that on my M1.
| HatchedLake721 wrote:
| That's a bit disingenuous to compare blazingly fast M1 to
| Windows Phones. What are you launching?
|
| It's been known M1 is super fast launching apps compared to
| Intel macbooks - https://www.youtube.com/watch?v=vKRDlkyILNY
|
| There's tons of video comparisons on YouTube.
| AshamedCaptain wrote:
| While I have my doubts whether this is the main reason for the
| perceived performance, it reminds me of how BeOS was able to also
| show incredible "apparent" performance on even m68k-level
| hardware.
| CyberDildonics wrote:
| Even windows prioritizes programs that have focus over those
| that don't to increase interactivity.
| dboreham wrote:
| Typo in title. Doesn't parse as a result.
| opan wrote:
| "is" should be "it's" probably.
| giuliomagnifico wrote:
| Ok, corrected, sorry.
| giuliomagnifico wrote:
| ?! Sorry but I haven't understood.
| KMnO4 wrote:
| Thankfully my human brain is robust enough to understand by the
| context.
| astrange wrote:
| The only people who think human brains run on strict grammar
| rules are K-12 language teachers.
| api wrote:
| It's much faster even if you max out all cores. This may be part
| of it but the chip is objectively faster.
|
| Lots of Intel damage control these days. AMD is kicking their
| butt from one side and ARM from the other.
| valbaca wrote:
| I have a 2020 Intel Macbook from work and a M1 Macbook for
| personal use.
|
| Granted I put my work laptop under more load (compiling, video
| chats at work) but it just feels like a normal laptop. I feel it
| take its time loading large programs (IDEs) and the fan clicks on
| when I'm doing big compiles.
|
| I use my personal laptop for occasionally coding, playing games,
| and also video chats. But the M1 feels amazing. It's fast,
| snappy, with long battery life, and I've had it for several weeks
| and I _never_ hear the fan. Even playing Magic Arena AND a
| Parallels VM runs silent. Arena alone turns my wife 's Macbook
| Air into a jet turbine. It makes video chats so much nicer
| because it's dead silent.
|
| I've run into occasional compatibility issues with the M1. sbt
| not working was the latest bummer.
|
| So while my two laptops are technically a year apart, they feel
| like 5 years apart. The M1 laptop lives up to the hype and I'm
| glad to just have a laptop that's fast, quiet, has great battery,
| and is reliable.
|
| Edit: More context, I had bought a maxed-out Macbook Air last
| year hoping it would be my "forever-laptop" but it was just not
| giving me the speed I wanted, and the noise was just too much. I
| couldn't play any game or do any real coding in the living room
| without disrupting my wife enjoying her own games or shows. I'm
| so glad I traded up to the M1
| gketuma wrote:
| "I never hear the fan. Even playing Magic Arena AND a Parallels
| VM runs silent."
|
| That is because there is no fan.
| losteric wrote:
| The regular M1 MacBook does have a fan. The M1 Air doesn't,
| but throttles the CPU to avoid overheating
| ralusek wrote:
| In the Macbook Air there isn't. In the MBP there is.
| alien_ wrote:
| I have an M1 Macbook Pro for almost a month now as work
| laptop and I've only heard the fan once so far, while doing a
| multi-file search/replace in VSCode. For whatever reason that
| maxed out the performance cores for a few minutes.
|
| Other than that it's been entirely silent and blazingly fast
| and I had no major issues with it.
| coldtea wrote:
| Well, they don't just "feel" faster. They also complete lots of
| real-world tasks in shorter times, often significantly shorter
| than the last Intel Macbook Pro.
| oregontechninja wrote:
| I was an unbeliever, but we got some M1 Mac Mini's and this is
| the result. They even beat the dedicated workstation many of
| our people have set up. I get when people say it won't quite
| match a new ryzen for compute power, but in all of our tests,
| the M1 beat out all our workstations.
| spockz wrote:
| What workloads do you run? I've got a 16GiB M1 mbp and a 2019
| maxed intel and when compiling Java/sbt projects the intel is
| significantly quicker, albeit also much loader and power
| hungry.
| spockz wrote:
| Wow, based on the comments here I decided try out the
| native M1 builds of Azul.
|
| I see a 4.1x speedup in compiling (including tests),
| excluding dependency resolution: Azul
| Native M1 java JDK11: ./sbt
| "clean;compile;test:compile" 254.86s user
| 11.84s system 519% cpu 51.375 total
| Azul java JDK11: ./sbt
| "clean;compile;test:compile" 490.04s user
| 59.48s system 269% cpu 3:23.81 total
| alienalp wrote:
| That compiler most probably not native to m1 arm processor.
| spockz wrote:
| Indeed, I was using the GraalVM jdk11 build which wasn't
| available in a native version indeed.
| alecthomas wrote:
| As the sibling comment mentions, if you're running Intel
| JDK on M1 it will be slow. You can find M1 native JDK
| builds here: https://www.azul.com/downloads/?version=java-1
| 1-lts&os=macos...
| hugi wrote:
| Just for fun, Oracle has also started providing natively
| compiled EA builds of JDK 17: https://jdk.java.net/17/
| chrisseaton wrote:
| > when compiling Java/sbt projects
|
| Are you comparing a binary being run under dynamic binary
| translation with a native binary?
|
| Not really an honest comparison, if that's the case.
| formerly_proven wrote:
| No idea if that's the case, but I wouldn't have expected
| Java of all things to be run under binary translation.
| biehl wrote:
| Native support in the regular builds will arrive in
| september
|
| https://openjdk.java.net/projects/jdk/17/
| chrisseaton wrote:
| > I wouldn't have expected Java of all things to be run
| under binary translation
|
| Why? The Java community has only just been working on
| ARM64 support at all over the last few years, and it's
| still a little limited, and macOS ARM64 support is only
| out a couple of weeks and only from non-Oracle third-
| party builders I believe.
| nicoburns wrote:
| > I wouldn't have expected Java of all things to be run
| under binary translation
|
| Depends which version you have installed. It's a taken a
| while for the native versions to reach the mainstream
| channels, so unless you've specifically installed an M1
| build you probably have an x86 build being run under
| translation.
| throwaway4good wrote:
| You have to install a Java VM compiled for ARM such as
| the one made by Azul. If you just get the openjdk from
| the main website it is compiled for Intel and will be
| much slower.
| Eric_WVGG wrote:
| The author's analysis sounds less like an explanation of how
| the M1 is faster, and more like an explanation of how it gets
| such amazing battery life.
|
| If someone could figure out a way to get all MacOS apps
| --including the system -- to use the performance cores, perhaps
| the battery life would be back down to Intel levels?
| EricE wrote:
| >If someone could figure out a way to get all MacOS apps
| --including the system -- to use the performance cores,
| perhaps the battery life would be back down to Intel levels?
|
| Other than to prove an utterly useless point - why would you
| want to even remotely do that?!?
|
| If my time machine backup takes four times as long but my
| battery lasts longer still why would I care? The overall
| experience is a HUGE net improvement.
|
| That's the point being glossed over by the majority of
| commenters and the point the original author is making -
| benchmarks are interesting, but nothing beats real world
| experience and in real world experience there are a suite of
| factors contributing to the M1 and Apples SOC approach
| spanking the crap out of their competitors.
|
| There is more to life than rAw PoWeR ;)
| Eric_WVGG wrote:
| curiosity? proving the thesis? knowledge that could improve
| competing laptops?
|
| yah never mind I don't care why these computers are
| impressive let's just play macos chess and browse facebook
| jfc if you're not interested why did you even join this
| conversation
| EricE wrote:
| I don't care? "That's the point being glossed over by the
| majority of commenters and the point the original author
| is making - benchmarks are interesting, but nothing beats
| real world experience and in real world experience there
| are a suite of factors contributing to the M1 and Apples
| SOC approach spanking the crap out of their competitors."
|
| I don't know what point would be proved by maxing out the
| cores other than confirming what seems to be pretty
| obvious - a hybrid approach has multiple benefits - not
| just in power efficiency but the user experience as well.
|
| It's not just one aspect of the design choices - but all
| of them in concert.
| bombcar wrote:
| I'm sure someone has run the battery dead using benchmarks
| that exercise the performance cores - I suspect that Apple
| designed a fast chip and an efficient chip and then married
| them together - that an M1Max with only performance cores
| wouldn't be that interesting power-budget-wise (though likely
| still an improvement).
| Spooky23 wrote:
| It's funny how many articles that just won't say the obvious -
| it's a faster computer.
|
| I have two laptops side by side, one is an M1, the other a HP
| in a slightly higher price point (more memory, bigger SSD). The
| challenges Intel has in the form factor are obvious -- it's a
| 1.2Ghz chip that turbos almost 2x as heat allows.
|
| In any dimension that it can do what you need, the Apple wins.
| Cheaper, faster, cooler, longer battery endurance. The
| detriments are the things it cannot do -- run Windows/Linux or
| VMs or tasks that need more memory than is currently available.
| rsfinn wrote:
| It can't run Intel-based OSs or VMs, of course, but the
| latest version of Parallels Desktop runs ARM-based Linux
| guests on M1 Macs, as well as the Windows 10 ARM preview.
| (VMware has implied that the next release of Fusion will
| support Apple silicon as well.)
| _kbh_ wrote:
| QEMU already supports x86_64 on Apple silicon. It was just
| slow last time I checked not sure of the performance now.
| txdv wrote:
| linux is comming
| [deleted]
| defaultname wrote:
| Interesting, enjoyable read.
|
| The conclusions seem a bit off though
|
| -Low QoS tasks are limited to certain cores. This doesn't
| necessitate efficiency cores (though it makes sense if you want
| power efficiency in a mobile configuration) and they could as
| easily be performance cores. The core facet is that the OS has
| core affinity for low priority tasks and quarantines them to a
| subset of cores. And it has properly configured low priority
| tasks as such.
|
| -It also has nothing to do with ARM (as the original title
| surmised). It's all in the operating system and, again, core
| affinity. Windows can do this on Intel. macOS/iOS has heavily
| pushed priorities as meaningful and important, so now with the
| inclusion of efficiency cores they have established their
| ecosystem to be able to use it widely.
| lupire wrote:
| If it has nothing to do with ARM, why doesn't Mac do it at
| similar speed on Intel?
| bombcar wrote:
| There's no point in shuffling background processes to a
| subset of cores if it doesn't provide power-savings.
|
| It sounds like upcoming Intel chips will have the ability to
| run efficiency-mode on some cores, at which case OS-level
| code to do shuffling makes sense.
| defaultname wrote:
| The focus of the linked article is responsiveness/user
| interaction, attributing that to efficiency cores. Apple
| _can_ achieve exactly the same responsiveness benefit on
| their Intel devices by, as mentioned, limiting low priority
| tasks to a subset of cores. Say 2 of 6 real cores on a MBP
| with an i7.
|
| Apple actually pushes low priority tasks to the efficiency
| cores for power efficiency reasons (per the name), not user
| responsiveness. So that MBP runs for longer, cooler on a
| battery because things that can take longer run on more
| efficient, slower cores. They do the same with the efficiency
| cores on modern iOS devices.
|
| The "feeling faster" is a side effect. I am talking about
| that side effect which isn't ARM specific.
|
| And FWIW, Intel is adding efficiency/"little" cores to their
| upcoming architectures. Again for efficiency reasons.
| twoodfin wrote:
| I wonder if shunting background processes to the slower
| efficiency cores has a salutary effect on other elements of
| system load. I'm thinking of tasks like Time Machine, which
| presumably are throwing out IOPS at a commensurately lower rate
| when effectively throttled by the efficiency cores.
| pjc50 wrote:
| Interesting solution to the problem of app and OS developers
| wasting CPU power at greater rates than Moore's law can deliver
| it: just shunt all the stuff the user isn't looking at (and
| therefore doesn't care about the timeliness of) onto a slower
| processor.
| [deleted]
| Lvl999Noob wrote:
| What about priority inversion? If a high QoS thread waits on a
| low QoS one? It seems like it would be a bigger issue here as the
| low QoS threads are only ever run on half the cores.
| ejdyksen wrote:
| Grand Central Dispatch (the OS-level framework that handles
| these QoS classes) elevates the priority of a queue if there is
| high priority work waiting on low priority work to finish.
|
| https://developer.apple.com/library/archive/documentation/Pe...
| astrange wrote:
| Importantly it doesn't do this for
| dispatch_async/dispatch_semaphore/dispatch_block_wait.
| hyperluz wrote:
| Does having the actual best single thread performance available
| on market, has anything to with it?
| https://www.cpubenchmark.net/singleThread.html
___________________________________________________________________
(page generated 2021-05-17 23:01 UTC)