hngopher.com

       [HN Gopher] Apple's M1 Pro, M1 Max SoCs Investigated
       ___________________________________________________________________
        
       Apple's M1 Pro, M1 Max SoCs Investigated
        
       Author : defaultname
       Score  : 245 points
       Date   : 2021-10-25 13:01 UTC (10 hours ago)
        
 (HTM) web link (www.anandtech.com)
 (TXT) w3m dump (www.anandtech.com)
        
       | jb1991 wrote:
       | > GPU-accelerated machine learning, and more have all been
       | developed or first implemented by Apple
       | 
       | Did not realize Apple was first in that area in previous decades.
        
       | tromp wrote:
       | > I've had extreme trouble to find workloads that would stress
       | the GPU sufficiently to take advantage of the available
       | bandwidth.
       | 
       | Would love to see how well the GPU runs a memory-bound Proof-of-
       | Work like Cuckatoo Cycle [1].
       | 
       | [1] https://github.com/tromp/cuckoo
        
       | xqcgrek2 wrote:
       | Seems the CPU cluster saturates at about 240 GB/s and can't
       | utilize the full memory bandwidth. This bodes well for future
       | clusters with double the number of CPU cores at a node shrink (M2
       | Max?) or for a Mac Pro (Mac Quadra?).
        
         | makomk wrote:
         | Maybe. This seems like a cluster-wide limitation - the
         | individual CPU cores can utilize enough memory bandwidth that
         | together they should be able to saturate the bus, but there's
         | some kind of bottleneck on the entire CPU section of the SoCs
         | and who knows how easy or difficult it would be to alleviate
         | that.
        
         | sliken wrote:
         | Sure, keep in mind that most competing laptops max at 70GB/sec
         | (the never to exceed number), and most desktops are slower than
         | that with 2 channels of DDR4-3200 to 4200 (41-67GB/sec).
         | 
         | So while it's "only" 240, that's an excellent number. Keep in
         | mind that you generally never see 100% of theoretical
         | bandwidth.
        
       | whatever1 wrote:
       | Performance-wise they seem to be on par with the high-end intel
       | 11th gen mobile processors and nvidia 3060.
       | 
       | Power-wise these chips look like they landed from a different
       | planet. 50% less power draw for most workloads.
       | 
       | Will we get such efficiencies when intel hits 5nm?
        
         | uluyol wrote:
         | I don't think these efficiencies are just from the node
         | advantage. The fact is that Apple chips follow mobile designs
         | and are highly integrated SoCs where Apple can optimize every
         | aspect of the system in exchange for losing flexibility (no
         | mixing and matching of components).
         | 
         | In contrast to mobile processors, x86 processors live in a
         | world where flexibility is demanded. I need to be able to pick
         | how much RAM I want, which WIFI modem, which graphics, and so
         | on (where I is a combination of the consumer and laptop
         | manufacturer). Sure, laptop processors have gotten more
         | integrated lately, but it's not to the same degree. Competition
         | from Apple might pressure Intel and AMD to integrate much more
         | and sacrifice this flexibility in order to squeeze out better
         | power efficiencies.
        
           | cududa wrote:
           | As a teenager in 2005/ 2006, had an obsession with
           | overclocking AMD Opteron 165's. DFI motherboards allowed you
           | to set the ratios for FSB/HTT, LDT/FSB, CPU/FSB, etc.
           | 
           | I'd hunt like crazy for specific OCZ DDR2 RAM modules from
           | specific batches that had the tolerances I was looking for.
           | At a few points, had the highest perf overclocks (even among
           | liquid cooling - mine was passive with a polished heatsink)
           | on various leaderboards and my 4GB DDR2 system frequently
           | could beat out 8GB DDR3 systems (with full stability via
           | MemTest x86) on GeekBench like tests.
           | 
           | WRT Apple Silicon I think about those days a lot - just
           | thinking about the perf AMD, OCZ, and DFI could have
           | repeatedly squeezed out if they all were one company setting
           | the same tolerances on all silicon and power delivery.
           | 
           | I have to imagine a large amount of the perf wins come from
           | having consistent FSB, HTT, and LDT channels that can have
           | the channel relay ratios optimally configured instead of
           | buffering up the "lowest common denominator" silicon
           | manufacturing tolerances.
        
         | ksec wrote:
         | >Will we get such efficiencies when intel hits 5nm?
         | 
         | Judging from everything we know. Even in the most optimistic
         | scenario. The answer is no.
         | 
         | Note: And Intel doesn't have 5nm, they go to 4nm and then 3nm.
         | But the answer is still the same.
         | 
         | Edit: For those wondering how the conclusion was arrived, you
         | take a look at the Alderlake SPEC_INT and Geekbench score, look
         | at the power usage per core. ( Forget MT benchmarks ), you
         | scale it by target IPC improvement and node improvement. You
         | should see the gap is in terms of _efficiencies_ is still
         | behind.
        
         | gigatexal wrote:
         | Yeah I am satisfied with this performance.
         | 
         | I'm not a gamer. And even if the CPU can't take advantage of
         | the 400GB/s 250 or so is very good indeed.
         | 
         | This is all low hanging fruit for future revisions of this
         | chip. The A15 based cores will further improve single core IPC
         | and in turn make MT workloads even better. Basically if this is
         | the floor then the sky won't be high enough to contain where we
         | go next.
        
         | GeekyBear wrote:
         | For a laptop chip, single threaded integer performance is on
         | par.
         | 
         | Multi-threaded integer and floating point performance is not.
         | 
         | >In the aggregate scores - there's two sides. On the SPECint
         | work suite, the M1 Max lies +37% ahead of the best competition,
         | it's a very clear win here and given the power levels and TDPs,
         | the performance per watt advantages is clear. The M1 Max is
         | also able to outperform desktop chips such as the 11900K, or
         | AMD's 5800X.
         | 
         | In the SPECfp suite, the M1 Max is in its own category of
         | silicon with no comparison in the market. It completely
         | demolishes any laptop contender, showcasing 2.2x performance of
         | the second-best laptop chip. The M1 Max even manages to
         | outperform the 16-core 5950X - a chip whose package power is at
         | 142W, with rest of system even quite above that. It's an
         | absolutely absurd comparison and a situation we haven't seen
         | the likes of.
         | 
         | https://www.anandtech.com/print/17024/apple-m1-max-performan...
         | 
         | We'll see what happens when they make a desktop chip and are no
         | longer so constrained on thermals and power draw.
         | 
         | The unreleased Mac Pro chip is said to have the resources of
         | either two or four M1 Pro chips glued together.
        
           | lrem wrote:
           | The laptop ones are already 20mm. That likely isn't very good
           | for yield. Going even larger would likely be ruinous in the
           | cost department. Wouldn't putting multiple M1 Max be a better
           | idea?
        
           | simonh wrote:
           | The Mac pro chip will also likely use the next generation of
           | core architecture as well.
        
             | GeekyBear wrote:
             | Yes, this year's iPhone chip did get a newer version of the
             | performance core.
             | 
             | >with a score of 7.28 in the integer suite, Apple's A15
             | P-core is on equal footing with AMD's Zen3-based Ryzen
             | 5950X with a score of 7.29, and ahead of M1 with a score of
             | 6.66
             | 
             | https://www.anandtech.com/show/16983/the-apple-a15-soc-
             | perfo...
             | 
             | You'll have to look in the charts, but on single threaded
             | floating point the scores are 10.15 for the A15, 9.58 for
             | the 11900K, and 9.79 for the 5950X.
             | 
             | Having your phone chip match or beat Intel and AMD's
             | desktop variants on single core performance (with a phone's
             | memory bandwidth) is fairly impressive in itself.
        
               | merb wrote:
               | Is ist? I Tought that glueing memory in the die will
               | always yield a better memory bandwidth? Also the new
               | phone Uses ddr5 which is not possible in the desktop
               | (yet)
        
           | samgranieri wrote:
           | Where did you hear the Mac Pro chip is supposed to have the
           | resources of 2 or 4 M1 chips glued together?
        
             | MR4D wrote:
             | Leak by Mark Gurman. John Siracusa put together an
             | illustrative diagram showing how it works (functionally,
             | not actual layout).
             | 
             | https://twitter.com/siracusa/status/1395706013286809600?s=2
             | 1
        
             | GeekyBear wrote:
             | Bloomberg's Gurman, about half a year ago.
             | 
             | >Codenamed Jade 2C-Die and Jade 4C-Die, a redesigned Mac
             | Pro is planned to come in 20 or 40 computing core
             | variations, made up of 16 high-performance or 32 high-
             | performance cores and four or eight high-efficiency cores.
             | The chips would also include either 64 core or 128 core
             | options for graphics.
             | 
             | https://www.macrumors.com/2021/05/18/bloomberg-mac-
             | pro-32-hi...
        
             | djrogers wrote:
             | Gurman at Bloomberg - his article mailed the M1 Pro/Max (in
             | March!), and this is what it says about the next stage:
             | 
             | "Codenamed Jade 2C-Die and Jade 4C-Die, a redesigned Mac
             | Pro is planned to come in 20 or 40 computing core
             | variations, made up of 16 high-performance or 32 high-
             | performance cores and four or eight high-efficiency cores.
             | The chips would also include either 64 core or 128 core
             | options for graphics."
             | 
             | [1]
             | https://www.bloomberg.com/news/articles/2021-05-18/apple-
             | rea...
        
           | ChuckNorris89 wrote:
           | _> In the SPECfp suite, the M1 Max is in its own category of
           | silicon with no comparison in the market._
           | 
           | How much of that performance is due to the M1 Pro/Max having
           | way more memory bandwidth than the Intel/AMD chips, and also
           | being specifically designed from the ground up to make use of
           | all that bandwidth? AFAIK the RAM used by the M1 Pro/Max is
           | more similar in performance to the GDDR used in graphics
           | cards vs the slow-ish ageing DDR4 used in Intel/AMD systems
           | that are designed to prioritize compatibility with RAM of
           | varying quality, speeds and latencies instead of raw
           | performance at a specific high speed.
           | 
           | So I'm curious to know how an X64 chip would perform if we
           | even the playing field not just in node size but also if
           | Intel and AMD would adapt their X64 designs from the ground
           | up with a memory controller, cache architecture and
           | instruction pipeline tuned to feed the CPU with data from
           | such fast RAM.
           | 
           | I'm asking this since AFAIK, Ryzen is very sensitive to
           | memory bandwidth, the more you give it the better it performs
           | to the point where if you take two identical laptops with the
           | same Ryzen chip but one has 33% faster RAM, then that laptop
           | will perform nearly 33% better in most CPU/GPU intensive
           | benchmarks, all things being equal.
        
             | namibj wrote:
             | Should be fine to compare to an EPYC 72F3 or 73F3 (Zen3,
             | 8/16 cores, 8 channel DDR4-3200 RDIMM (204.8 GB/s
             | theoretical ceiling), $2500/$3500).
             | 
             | If memory latency is that important, one would AFAIK have
             | to compare to Zen3 Threadripper, because EPYC (and afaik
             | Threadripper Pro, the other 8-channel variant of Zen3) is
             | rather locked down in regards to memory "overclock"s.
             | 
             | (Note that this is more of a latency/efficiency tradeoff,
             | an any moderate OC is trivial to cool.)
        
               | dragontamer wrote:
               | > 8 channel DDR4-3200 RDIMM (204.8 GB/s theoretical
               | ceiling)
               | 
               | But the M1 Max is 8-channel LPDDR5 400GBps, or literally
               | twice the bandwidth of that EPYC
        
               | gigatexal wrote:
               | Is anyone else finding it hilarious people are comparing
               | these mid-tier consumer chips to server grade EPYCs?
               | :rofl:
        
               | ChuckNorris89 wrote:
               | Not if you actually understand the technical constraints
               | vs the business applications of each product in their
               | specific market segments.
               | 
               | A Ford F-150 can tow more weight than a Lamborghini
               | despite having less power and being significantly cheaper
               | but each is geared towards a different use case so direct
               | comparisons are just splitting hairs.
        
               | dragontamer wrote:
               | That's also a good point. EPYC has 2TB or 4TB of DDR4 RAM
               | support.
               | 
               | That being said: its amusing to me to see the x86 market
               | move into the "server-like" arguments of the 90s. x86
               | back then was the "little guy" and all the big server
               | folks talked about how much bigger the DEC Alpha was and
               | how that changes assumptions.
               | 
               | It seems like "standard" servers / systems have grown to
               | outstanding sizes. The big question in my mind is if 64GB
               | RAM is large enough?
               | 
               | Moana scene
               | (https://www.disneyanimation.com/resources/moana-island-
               | scene...) is 131 GBs of RAM for example. It literally
               | wouldn't fit on the M1 Max. And that's a 2016-era movie,
               | more modern movies will use even more space. The amount
               | of RAM modern 3d artists need is ridiculous!!
               | 
               | Instinctively, I feel like 64GB is enough for power-
               | users, but not in fact, the digital artists who have
               | primarily been the "Pro" level customers of Apple.
        
               | gigatexal wrote:
               | I am waiting for the photography pros of YouTube to weigh
               | in on that last bit but the Disney bit about 131GB of ram
               | usage is intense. Surely a speedy disk can page but
               | likely not enough to make the 64GB of ram a bottleneck.
               | Maybe things like optane or ssds will get so much quicker
               | that we'll see a further fusion of io down to disks and
               | one day we'll really have a chip that thinks all it's
               | storage is ram/storage and doesn't really distinguish
               | between it. Sure it's unlikely SSDs will get to 400GB/s
               | in speed but if they could get to 10GB/s or more
               | sustained that latency could be handled by smart software
               | probably.
               | 
               | I think for that cohort any future iMac Pro or Mac Pro
               | with the future revisions of these chips will surely
               | increase the ram to 128GB maybe even 256GB or more.
               | 
               | I am super curious how Apple will tackle those folks who
               | want to put 1TB of ram or more into their systems if
               | they'll do an SOC with some ram plus extra slotted ram as
               | another layer?
        
               | dragontamer wrote:
               | > Sure it's unlikely SSDs will get to 400GB/s in speed
               | but if they could get to 10GB/s or more sustained that
               | latency could be handled by smart software probably.
               | 
               | Yeah, I'm not part of the field, but the 3d guys always
               | point to "Moana" to show off how much RAM they're using
               | on their workstations. Especially since Disney has given
               | away the Moana scene as a free-download, so anyone can
               | analyze it.
               | 
               | The 131GB is the animation data (ie: trees swaying in the
               | wind). 93GBs are needed per frame (roughly). So the 131GB
               | can be effectively paged, it takes several seconds (or
               | much much longer) to render a single frame. So really,
               | the data is over 220GBs needed for the whole scene.
               | 
               | In practice, a computer would generate the wind and
               | calculate the effects on the geometry. So the 131 GB
               | animation data could very well be procedural and "not
               | even stored".
               | 
               | The 93GB "single frame" data however, is where all the
               | rays are bouncing (!!!) and likely needs to be all in
               | RAM.
               | 
               | That's the thing: that water-wave over there will reflect
               | your rays basically anywhere in the scene. Your rays are,
               | in practice, bouncing around randomly in that 93GB of
               | scene data. Someone managed to make an out-of-core GPU
               | raytracer using 8GB of GPU-VRAM (they were using a very
               | cheap GPU) to cache where the rays are going, but it
               | still required keeping all 93GB of scene data in the CPU-
               | RAM.
        
               | giantrobot wrote:
               | > Moana scene
               | (https://www.disneyanimation.com/resources/moana-island-
               | scene...) is 131 GBs of RAM for example. It literally
               | wouldn't fit on the M1 Max. And that's a 2016-era movie,
               | more modern movies will use even more space. The amount
               | of RAM modern 3d artists need is ridiculous!!
               | 
               | I doubt there was any laptop available in 2016 that could
               | be loaded with enough RAM to handle those Moana scenes. I
               | doubt such beasts exist in 2021.
               | 
               | It seems the M1s are showing that Apple can just increase
               | the number of cores and memory interfaces to beef up the
               | performance. While there's obviously practical limits to
               | such horizontal scaling, a theoretical M1 Pro Max Plus
               | for a Mac Pro could have another doubling of memory
               | interfaces (over the Mac) or add in an interconnect to do
               | multi-socket configurations.
               | 
               | That's all just horizontal scaling before new cores or a
               | smaller node process becomes available. A 3NM process
               | could get roughly double the current M1 Max circuitry
               | into the same footprint as today's Max.
        
               | dragontamer wrote:
               | > That's all just horizontal scaling before new cores or
               | a smaller node process becomes available. A 3NM process
               | could get roughly double the current M1 Max circuitry
               | into the same footprint as today's Max.
               | 
               | I/O / off-chip SERDES doesn't scale very easily.
               | 
               | If you need more pins, you need to go to advanced
               | packaging like HBM or whatnot. 512-bit means 512-pins on
               | the CPU, that's a lot of pins. Doubling to 16-channel
               | (1024-bit bus) means more pins.
               | 
               | You'll run out of pins on your chip, not without micro-
               | bumps that are on HBM. That's why HBM can be 1024-bit or
               | 4096-bit, because it uses advanced packaging / microbumps
               | to communicate across a substrate.
        
               | dragontamer wrote:
               | EPYC almost certainly has more compute power though.
               | 
               | Honestly, the memory bandwidth is imbalanced. Its mostly
               | there to support the GPU, but the CPU also gains benefits
               | from it. Its hard enough to push an EPYC to use all
               | 200GBps in practice.
               | 
               | EDIT: For workstation tasks however, 64GB is huge for a
               | GPU, while 400GBps is huge for a CPU. Seems like win/win
               | for the CPU and GPU. Its a very intriguing combination.
               | GPU devs usually have to work with much less VRAM, while
               | CPU devs usually have to work with much less bandwidth.
               | 
               | 64GB is small for CPU workstation tasks however. Its
               | certainly a strange tradeoff.
        
               | hajile wrote:
               | The article in question shows that a mere 8 big cores and
               | 2 little ones can use 243gb/s.
               | 
               | I'm guessing Apple will go with HBM3/4 before too long
               | due to the lower power consumption and great performance.
        
               | gigatexal wrote:
               | I thought HBM was a power sucking tech?
        
               | dragontamer wrote:
               | HBM is very low-clock speed and super efficient.
               | 
               | HBM's downside is that it requires many, many, many pins.
               | Each channel is 1024-pins of communications (and more
               | pins for power). In practice, the only thing that can
               | make HBM work are substrates. (Typical chips have 4x to
               | 6x HBM stacks, for well over 4096 pins to communicate,
               | plus more pins for power / other purposes)
               | 
               | But HBM is among the lowest power technologies available.
               | Turns out that clocking every pin at like 500MHz (while
               | LPDDR5 is probably a 3200 MHz clock) saves a lot on
               | power. Because DRAM has such high latency, the channel
               | speed is more for parallelism more so than anything else.
               | (DDR4 parallelizes RAM into 4-bank groups, each with
               | 4-banks. All 16 can be accessed in parallel across the
               | channel).
               | 
               | HBM just does this parallel access thing at a lower clock
               | rate, to save on power. But spends way more pins to do
               | so.
        
             | AnthonyMouse wrote:
             | > So I'm curious to know how an X64 chip would perform if
             | we even the playing field not just in node size but also if
             | Intel and AMD would adapt their X64 designs from the ground
             | up with a memory controller, cache architecture and
             | instruction pipeline tuned for that kind of fast RAM.
             | 
             | We can get a pretty good idea about this by looking at
             | Threadripper, which has more memory channels:
             | 
             | https://www.anandtech.com/bench/product/2842?vs=2666
             | 
             | (This is Zen 2 Ryzen vs. Zen 2 Threadripper because Zen 3
             | Threadripper isn't out yet.)
             | 
             | In nearly everything it's about the same, because most
             | workloads aren't memory bandwidth limited. But then you get
             | into the multi-threaded SPEC tests where the M1 Max is
             | doing really well, and Threadripper does really well. And
             | this is still with less memory bandwidth than the M1 Max
             | because it's using DDR4 instead of LPDDR5.
             | 
             | The lesson here is that memory bandwidth limited workloads
             | are limited by memory bandwidth.
        
         | dangus wrote:
         | > Will we get such efficiencies when intel hits 5nm?
         | 
         | I think the answer is "maybe," "probably," or "sort of."
         | 
         | But I also wonder if x86 can ever truly outdo ARM on power
         | efficiency.
         | 
         | If we want potential evidence, we could look at what AMD is
         | able to do on TSMC's manufacturing: better than Intel, but
         | still short of Apple.
         | 
         | Then again, AMD is tiny compared to Apple and Intel.
         | 
         | Granted, I think I'm vastly oversimplifying processor
         | architecture. I know it's way more complicated than "x86 vs.
         | ARM."
        
           | jjoonathan wrote:
           | Isn't AMD still a node behind Apple? They both use TSMC, but
           | my impression was that Apple was the largest customer
           | bankrolling the leading node and therefore got first crack at
           | it.
        
             | eightysixfour wrote:
             | Yes, AMD is still a full TSMC node behind.
        
             | laydn wrote:
             | I am surprised that Apple sources more wafers than AMD from
             | TSMC. Are they really the largest customer in terms of
             | wafers, or are they getting better deals thanks to their
             | enormous cash reserves and financing abilities?
        
               | neogodless wrote:
               | AMD sources a double-digit market share of PC/server
               | CPUs, plus their GPUs, and APU chips for consoles.
               | 
               | On the flip side, Apple sources iPhones, iPads and other
               | chips, plus this new line of Apple Silicon.
        
               | KptMarchewa wrote:
               | Even server market is small compared to mobile.
        
               | IOT_Apprentice wrote:
               | How many units is AMD shipping/Quarter vs Apple?
               | 
               | Heck, Sony & Microsoft can't even get PS5s and Xboxes in
               | customer hands 1 year after release. I would expect it is
               | a combination of Apple's volume of phones shipped/quarter
               | and cash reserves.
        
               | jjoonathan wrote:
               | Yep, Apple is the biggest customer. Apple is 25% of
               | TSMC's revenue, AMD is only 10% -- and that's after the
               | recent growth spurt. Just one year ago, AMD was behind
               | Apple, Huawei, Qualcomm, Broadcomm, and NVidia.
        
               | Someone wrote:
               | That's a different metric than "number of wafers". Apple
               | likely pays quite a premium for using the newer tech.
        
               | jjoonathan wrote:
               | Of course, but TSMC pays its shareholders in money, not
               | wafers, so money is the correct metric for influence.
        
             | hajile wrote:
             | Even when you compare A12 on 7nm, the numbers don't look
             | very good for AMD.
        
           | api wrote:
           | My understanding is that X86 can never outdo ARM on power
           | because of the difficulty of parallel instruction decoding.
        
             | dahfizz wrote:
             | CISC was always a mistake, it just took a company the size
             | of Apple to overcome the inertia of the established x86.
        
               | api wrote:
               | Sort of... RISC and CISC are really misnomers. The
               | problem with X86 is not the number of instructions (ARM
               | has a lot too!) but the variable length and difficult to
               | decode instruction format. It's fine to have tons of
               | instructions if they are trivial to decode and decoding
               | can be easily parallelized.
        
               | dahfizz wrote:
               | CISC == Complex Instruction Set, not Large Instruction
               | Set. As you say, the issue with x86 is how complicated it
               | is, not necessarily how large it is.
        
               | tenebrisalietum wrote:
               | CISC was not a mistake when RAM was hundreds of dollars
               | per _kilobyte_ in the 70 's and early 80s. Made sense to
               | get as much out of a byte of memory instruction-wise as
               | possible.
        
             | dragontamer wrote:
             | That doesn't seem too hard to me frankly.
             | 
             | Intel has shown that you can just store the sizes of
             | instructions in a new cache (in their new architecture) for
             | example.
             | 
             | But even then: it shouldn't be much more than O(n) work /
             | O(log(n)) depth to determine the length of an instruction.
             | Once lengths are known, you can perform parallel
             | instruction decoding rather easily.
             | 
             | Ex: given "Instruction at X, X+1, X+5, and X+10", decode
             | the instructions. Well, simple. Just shove a decoder at X,
             | X+1, X+5, and X+10. 4 instructions, decoded in parallel.
             | 
             | Even with "Dynamic" length (ex: X, X+4, X+6, and X+7
             | afterwards), its clear how to process these 4 instructions
             | in parallel. Really not a problem IMO.
             | 
             | --------
             | 
             | So solving the length thing is a bit harder to see, but
             | clearly could be implemented as a parallel-regex (which is
             | O(n) work and O(log(n)) depth).
             | 
             | I seriously doubt that decoding is really a problem. I'd
             | expect that Apple just has made many small efficiency gains
             | across the chip, especially the uncore.
             | 
             | I'm personally looking at the L1 / L2 cache hierarchies
             | more so than anything on this Apple chip.
        
               | Veliladon wrote:
               | The problem is how do you know if your just decoded
               | instruction isn't just the operands for another legal
               | instruction?
        
               | dragontamer wrote:
               | When you get to instruction X, "size cache" says "size
               | 2", and this allows you to process instruction X+2 in
               | parallel.
               | 
               | You look at instruction X+2, and "size cache" says "size
               | 4", which allows you to look at instruction X+6 in
               | parallel. Finally, you look at instruction X+10, and it
               | says "size 8", which ends with +18 as where the
               | instruction pointer ends.
               | 
               | This was sequentially described at first, but the
               | parallel version is called Prefix Sum:
               | https://en.wikipedia.org/wiki/Prefix_sum . This allows
               | you to take a set of sums (like say 0, 2, 4, 8) and in
               | parallel, figure out [2, 6, 10, 18], with 18 being the
               | new location of the instruction pointer, and [0, 2, 6,
               | 10] being the 4 instructions you process this clock tick.
               | 
               | A parallel adder across say, 32-bytes would be able to
               | perform this prefix sum very quickly, probably within a
               | clock tick. These sorts of parallel structures (aka:
               | butterfly circuits) are extremely common in practice,
               | your carry-lookahead adders need them, as well as PDEP /
               | PEXT. Intel's single-cycle PDEP/PEXT is way more
               | complicated than what I'm proposing here, I kid you not.
               | (Seriously, the dude who decided to make single-clock
               | cycle PDEP/PEXT or single-clock cycle AESRound would have
               | spent more time than the size-cache that Intel is now
               | using on instruction decoding)
        
               | Veliladon wrote:
               | The problem is that you don't know the instruction length
               | until you've decoded it so where is the size cache
               | getting the size?
        
               | dragontamer wrote:
               | > The problem is that you don't know the instruction
               | length until you've decoded it so where is the size cache
               | getting the size?
               | 
               | You are aware that x86 _CURRENTLY_ (ie: without size-
               | cache) decodes 4-instructions per clock tick, right?
               | That's in parallel, as currently implemented.
               | 
               | Intel just seems to think the size-cache is a potential
               | solution for going faster. I've given it some thought and
               | it seems like it could very well be worth the 4-bits (or
               | so) per byte it'd cost.
               | 
               | ----------
               | 
               | A parallel size-calculator would also be O(n) work and
               | O(log(n)) depth, by using a parallel regex/finite
               | automata on calculating the sizes for arbitrary lengths
               | upwards.
        
               | Veliladon wrote:
               | That's only second stage decoding. In the fetch/pre-
               | decode stage there is a specific piece of hardware that
               | partially decodes 16 byte chunks of instruction streams
               | before inserting instructions into the instruction queue
               | for the decode from macro-op to uop. It can only handle 6
               | instructions per clock and 16 bytes per clock. If you
               | have 7 instructions in a 16 byte block it takes two
               | cycles to process that block. If you have only 2
               | instructions in that 16 byte block you only get those two
               | instructions. It also only looks for instruction length
               | and branching to feed the branch predictor, spitting the
               | same instructions back out albeit fused and tagged with
               | index numbers for insertion into the instruction queue
               | ready for the second stage decoders.
               | 
               | This is the length that Intel has to go to in order to
               | keep the EUs fed. Apple/ARM? Every 4 bytes there's an
               | instruction.
        
               | dragontamer wrote:
               | I'm talking 1st stage decoding actually. I'm ignoring the
               | uOp cache.
               | 
               | The 2nd stage uop can go 6-per or 7-per clock tick. But
               | 1st stage (which executes in practice when the uop cache
               | is thrashing) would still go 4-instructions per clock
               | tick just fine.
               | 
               | > This is the length that Intel has to go to in order to
               | keep the EUs fed.
               | 
               | Yeah. And the task described is O(n) total work and
               | O(log(n)) depth. So... not a big difference? I'd have my
               | doubts that the instruction-length portion of the decoder
               | was taking up a significant amount of power.
        
               | rbanffy wrote:
               | > That doesn't seem too hard to me frankly.
               | 
               | Consider Intel has a more or less infinite amount of
               | money and they don't seem to be able to do that. And they
               | have tried (I even had an Atom-based Android phone for a
               | while).
               | 
               | If you want an easy way to build a reorder buffer, you'll
               | need to push every instruction in a structure that fits,
               | IIRC, 15 bytes, which is the longest x86 instruction
               | possible (for now - mwahahaha). This alone will make it
               | twice as large as a similar arm64 one. Now factor in that
               | the dependencies between instructions are defined in bits
               | that can pretty much be all over the place in those 15
               | bytes and you end up with a nightmare most engineers
               | would consider suicide before having to work on it.
        
               | dragontamer wrote:
               | Or maybe, the problem isn't as hard as you think it is.
               | 
               | Look, I started programming in GPUs a year or two ago.
               | I've begun to "think in parallel", and now I'm beginning
               | to see all sorts of efficient patterns all over the
               | place.
               | 
               | The actual CPU-architects have known about kogge-stone
               | carry-lookahead longer than I have. I'm still a newbie to
               | this mindset of parallel computations... but I enjoy
               | reading the papers on PDEP / PEXT / other parallel
               | structures these CPU designers are doing (and these
               | structures have gross implications to how GPU code should
               | be structured).
               | 
               | But I've had enough practice with Kogge-stone / Carry-
               | lookahead / Prefix-sum / scan pattern (yeah, its __all__
               | the same parallelism), and this pattern has been well
               | published since the 1970s. I have to assume that
               | engineers know about this stuff.
               | 
               | Instruction length decoding is very clearly a kogge-stone
               | pattern / prefix sum / scan problem to me. Now, I'm not a
               | chip architect and maybe there's some weird fanout / chip
               | level thing going on that my ignorance is keeping me out
               | of... but... based on my understanding of parallel
               | systems + very, very common patterns well known to that
               | community, I'd expect that chip-designers would just
               | Kogge-stone their way out of this decoding problem.
               | 
               | -------
               | 
               | Like, I'm coming in from the reverse here. I suddenly
               | realized that chip-designers have incredibly active minds
               | about the layout and structure of parallel computing
               | mechanisms, and have now taken an interest in studying
               | some CPU-level parallelism techniques to apply to my GPU
               | code.
               | 
               | The CPU-designers are way ahead of us in "parallel
               | thinking". I'm a visitor to their subject, they do this
               | stuff for breakfast every day. They have to see the
               | Kogge-stone solution to the decoding problem. If not,
               | they've thought of something better.
        
               | api wrote:
               | The words "just" and "cache" should never appear in the
               | same sentence.
        
               | dragontamer wrote:
               | Why not?
               | 
               | L1 $I cache is already read-only / Harvard architecture.
               | There's only two states here: size == unknown, and size
               | == known. This is a simple size = 0 (default), and size=X
               | (where X is the known size of the instruction) situation.
               | 
               | x86 architecture states that if you write to those
               | instructions, you need to flush L1 cache (ex: JIT Java)
               | before the state is updated. L1 instructions are non-
               | cohesive, so it isn't very hard. Upon the flush, set
               | sizes back to 0 and you're done.
        
               | codedokode wrote:
               | This complicated decoding makes a pipeline longer. This
               | means that in case of branch misprediction there would be
               | a large penalty.
        
               | dragontamer wrote:
               | > This complicated decoding makes a pipeline longer. This
               | means that in case of branch misprediction there would be
               | a large penalty.
               | 
               | PDEP / PEXT are single-clock tick instructions and are
               | far more complex than what I'm proposing here. As is
               | AESRound.
               | 
               | I think you're underestimating the number of gates you
               | can put in parallel and execute in a single stage of the
               | pipeline. 64-bit PDEP / PEXT are more complicated than
               | say... a 64-byte parallel adder in terms of depth. (PDEP
               | / PEXT need both a butterfly circuit forward + inverse
               | butterfly back + a decoder in parallel. 64-byte prefix
               | sum is just one butterfly forward).
        
             | AnthonyMouse wrote:
             | How much of the power budget is actually going to
             | instruction decoding?
        
           | intricatedetail wrote:
           | Now all ARM and Apple need to do is to persuade governments
           | to ban x86 as not being energy efficient. Just like they want
           | to ban diesel cars etc. Could be a tough battle for Intel.
        
       | [deleted]
        
       | lmilcin wrote:
       | > Power Behaviour: No Real TDP, but Wide Range
       | 
       | Actually, TDP stands for "Thermal Design Power" and is not a
       | range. It means "I, the designer, designed it so that this is
       | maximum amount of waste heat as it can safely produce
       | continuously in normal use". It is mainly limited by its physical
       | package and maximum temperature at which the internal components
       | can run.
       | 
       | That you can't observe that max power is due to the fact that
       | those various applications stress the CPU in various ways, not
       | always being able to exercise all internal structures to their
       | maximum potential, at the same time.
       | 
       | > One should probably assume a 90% efficiency figure in the AC-
       | to-DC conversion chain from 230V wall to 28V USB-C MagSafe to
       | whatever the internal PMIC usage voltage of the device is.
       | 
       | (This was regarding idle power usage)
       | 
       | Highly unlikely. I design AC switching power supplies from first
       | principles (and stacks of books). Efficiencies above 90% are
       | normal for newer designs but the PSUs are designed to achieve
       | these efficiencies above significant percentage of their design
       | power. High efficiency at design power is important because it
       | limits worst case waste heat which in turn makes it possible to
       | create smaller PSU. But as PSU is a lot of tradeoffs, one of the
       | tradeoffs that is taken is lesser efficiency at lower power where
       | it doesn't matter as much.
       | 
       | Typically, the lower load on the PSU as portion of design power
       | the lower the efficiency. If the PSU is designed for 140 watt 90%
       | efficiency, I would expect that at 7 watt it is actually much
       | less efficient probably somewhere between 70 and 80 percent.
        
       | lnxg33k1 wrote:
       | I would really love to experiment with one of those CPUs, too bad
       | Apple just really sucks as a company / ethics
        
         | gjsman-1000 wrote:
         | Except for, say, Purism and Framework and a few others, _every_
         | company sucks with ethics.
         | 
         | Except Purism even had the idea to make their own products
         | which were based on free, open-source code, charge for them,
         | and then give no attribution before apologizing after blowback.
         | 
         | So even for those companies their Ethics are suspect.
        
           | tcbawo wrote:
           | Honest question: for someone whose priority is maintaining an
           | ecosystem of general computing devices with freedom to run
           | software of my choice (rather than privacy paranoia), what
           | companies should I get behind and/or avoid? Between custom
           | processors, comprehensive SoC, and secure boot, I'm a little
           | afraid getting caged in over the long term with practically
           | any offerings out there right now.
        
             | rsanheim wrote:
             | https://frame.work is getting good reviews and seems very
             | reputable. Of course its just a laptop, so you have to
             | figure out the rest of your computing world then.
        
           | mrtranscendence wrote:
           | > Except for, say, Purism and Framework and a few others,
           | every company sucks with ethics.
           | 
           | Basically. If I wanted to stay ethically pure in all my
           | purchases I'd be in trouble, unfortunately. It's not like
           | Apple is ethically "worse" than Google or Samsung or other
           | major phone manufacturers.
        
             | noizejoy wrote:
             | Arguably approaching 100% ethical purity would also
             | approach involve not being born in the first place.
             | 
             | Related note: I think that leading large numbers of people
             | as well as having very large numbers of customers also
             | reduces your chances of doing well for all of them.
        
           | masterof0 wrote:
           | > Except Purism even had the idea to make their own products
           | which were based on free, open-source code, charge for them
           | 
           | And then charging 2000 dollars for a barely functional brick.
           | I think the only ethical hardware companies I know would be
           | System76, Framework, and Fairphone.
        
         | throwaway4good wrote:
         | Why would you say that? Of US big tech they seem to be the
         | least evil.
        
           | AlexCoventry wrote:
           | I guess their on-phone CSAM scanner has fallen off the news
           | cycle.
        
             | marcellus23 wrote:
             | Or maybe no one was ever that upset, beyond the small but
             | vocal group of HN commenters who will take any opportunity
             | to bash Apple anyway.
        
             | aserdf wrote:
             | since these new machines come pre-installed with Monterrey
             | I guess CSAM scanning is present from first boot?
        
             | artificialLimbs wrote:
             | As I understand it, they delayed rolling out CSAM scanning
             | on-device due to the backlash.
        
       | rowanG077 wrote:
       | A shame that thermal/power limitations aren't investigated. That
       | is the most deciding factor for me getting a Pro or Max. And
       | something Apple has historically had a lot of trouble with.
        
         | GeekyBear wrote:
         | >A shame that thermal/power limitations aren't investigated.
         | 
         | It's covered in the comments, along with when the "crank up the
         | fans" mode would be useful.
         | 
         | >Any pure CPU or GPU workload doesn't come close to the thermal
         | limits of the machine. And even a moderate mixed workload like
         | Premiere Pro didn't benefit from High Power mode.
         | 
         | It has a reason to exist, but that reason is close to rendering
         | a video overnight - as in a very hard and very sustained total
         | system workload.
         | 
         | https://www.anandtech.com/comments/17024/apple-m1-max-perfor...
        
         | perardi wrote:
         | Did you even bother to read the article? They have an entire
         | page on power consumption under various workloads.
         | 
         | https://www.anandtech.com/show/17024/apple-m1-max-performanc...
        
           | rowanG077 wrote:
           | Can you please quote the part which discusses power and
           | thermal limitations backed up with tests because I don't see
           | any of it on that page.
           | 
           | As far as I read it these test only report package power and
           | wall power used under certain loads. It doesn't say anything
           | about any limitations. No long term tests or temperature
           | graphs. No information at what temperature throttling kicks
           | in. Is CPU temperature truly the only limiting factor or are
           | the VRMs also a pain point? I could go on but I think this
           | illustrates enough of what I want to see.
        
             | GeekyBear wrote:
             | Their stories span multiple pages, but for some reason
             | people on mobile frequently seem to miss the other pages.
             | 
             | Here's the whole story on one page.
             | 
             | https://www.anandtech.com/print/17024/apple-m1-max-
             | performan...
        
               | rowanG077 wrote:
               | Yes I read everything. There is no concrete information
               | or tests about thermal or power limit throttling.
        
               | GeekyBear wrote:
               | Here you go:
               | 
               | >https://www.anandtech.com/comments/17024/apple-m1-max-
               | perfor...
        
               | rowanG077 wrote:
               | Someone saying without any data to back it up doesn't
               | exactly inspire confidence. Like I said nothing concrete.
               | 
               | > Any pure CPU or GPU workload doesn't come close to the
               | thermal limits of the machine.
               | 
               | So does that mean it is thermally limited on a CPU + GPU
               | workload? What about a CPU + GPU + Media engine workload.
               | What about using the NPU? Does SSD load have an impact?
               | SSDs nowadays can consume 15+W of power. dozens of
               | questions, unanswered by such a short sentence. Please
               | just test it out and give us the data.
        
               | GeekyBear wrote:
               | >Someone saying without any data to back it up doesn't
               | exactly inspire confidence.
               | 
               | That "someone" is the editor in chief for Anandtech.
               | 
               | He says you need a workload that stresses both the CPU
               | and GPU as much as possible and let that run overnight
               | before the ability to crank the fans higher than normal
               | is handy.
        
               | rowanG077 wrote:
               | > That "someone" is the editor in chief for Anandtech.
               | 
               | I don't see how that is relevant.
               | 
               | > He says you need a workload that stresses both the CPU
               | and GPU as much as possible and let that run overnight
               | before the ability to crank the fans higher than normal
               | is handy.
               | 
               | Then why is there no data? This is much more interesting
               | then what some of the tests done in the report.
        
             | ksec wrote:
             | What you are looking for is a Laptop Review, how M1 Pro /
             | Max under the MacBook Pro 14 / 16 cooling works with
             | maximum possible TDP and cooling.
             | 
             | But this isn't a Laptop Review, it is an SoC review. My
             | guess Dave2D will look into these sort of thing since he
             | cares about it before _anyone_ on the internet actually
             | test these sort of things for Laptop. ( Possibly due to
             | various PR restrictions )
        
         | neogodless wrote:
         | If I'm understanding you correctly, you're thinking of previous
         | issues with thermals and throttling, but this has been an issue
         | over the past several years due to Intel falling behind AMD and
         | TSMC, and thus driving power through their chips in order to
         | stay competitive, but that generates heat, and ultimately ends
         | up triggering throttling.
         | 
         | If you read about these particular chips, it should be
         | startlingly clear that they are much more efficient than the
         | Intel chips they replace.
         | 
         | In this article:
         | 
         | > Apple doesn't advertise any TDP for the chips of the devices
         | - it's our understanding that simply doesn't exist, and the
         | only limitation to the power draw of the chips and laptops are
         | simply thermals. As long as temperature is kept in check, the
         | silicon will not throttle or not limit itself in terms of power
         | draw.
         | 
         | > The perf/W differences here are 4-6x in favour of the M1 Max,
         | all whilst posting significantly better performance
         | 
         | Read page 3 of this article. They really do cover a lot of
         | this.
        
           | rowanG077 wrote:
           | It is much more efficient. It's also more powerful. You can
           | see in one of their benchmarks they hit 90+W of package
           | power. I doubt it can sustain this. The "turbo-mode" Apple
           | has announced for the 16-inch version also indicates that it
           | will be significantly thermally limited.
        
             | neogodless wrote:
             | In my Lenovo Legion 5, in performance mode, the CPU is
             | configured to draw up to 70W, and the GPU is configured to
             | draw up to 115W. It's able to do this just fine for gaming
             | sessions. Yes, the fans are quite audible while doing this.
             | I think in contrast, having to handle about half that power
             | draw overall should be attainable. For sure, it's now a
             | very large SoC, so the heat might be a bit more
             | concentrated and require some engineering to cool. But it
             | doesn't seem like it should be a showstopping concern. Of
             | course, you can wait for additional reviews and see if
             | anyone addresses longer, more sustained load testing.
        
               | rowanG077 wrote:
               | I don't doubt it's physically possible. I doubt Apple
               | implemented it. So I rather wait till there is some hard
               | data. Apple could also have implemented proper cooling
               | for their Intel laptops but they didn't.
        
               | neogodless wrote:
               | That's perfectly understandable. It's a major purchase
               | decision.
               | 
               | But also, I'm not aware of anyone that implemented
               | "proper cooling" sufficient to handle the last few
               | generations of Intel chips (at least at the high end.) I
               | read reviews of a variety of machines. All of them had
               | issues with throttling.
               | 
               | I was so happy when the Ryzen 4000 mobile chips were
               | reviewed and did not require elaborate cooling systems
               | just to perform their regular duties. I would be shocked
               | if the 2021 Macbook Pro 14/16" have issues with thermal
               | throttling.
        
               | holmium wrote:
               | I haven't seen too many benchmarks yet, but Dave2D ran a
               | "Cinebench R23" test in a both a single benchmark and
               | thirty minute loop. [1] He saw that the score remained
               | the same after the 30 minutes.
               | 
               | He also reported that the loudest fan noise he could get
               | was 38dB, with typical loads under 30dB. [2]
               | 
               | -----
               | 
               | [1] - https://youtu.be/IhqCC70ZfDM?t=360
               | 
               | [2] - https://youtu.be/IhqCC70ZfDM?t=438
        
           | capableweb wrote:
           | > As long as temperature is kept in check, the silicon will
           | not throttle or not limit itself in terms of power draw.
           | 
           | Apple laptops has, for as long as I know, had issues with
           | just thermals. Sometimes they get so hot you can't even have
           | them in your lap, so they are just "tops" at that point.
           | 
           | Has this issue been solved with these new models?
        
             | zepto wrote:
             | Yes
        
             | friedman23 wrote:
             | The m1 macs do not have this problem, they draw a fraction
             | of the power of the old laptops even when boosting.
        
             | MikusR wrote:
             | It was solved with M1 last year
        
       | Leherenn wrote:
       | If people are wondering about why some people in the comments are
       | reporting 3080 vs 3060 levels of performance, it's based on the
       | workload. On synthetic (native I assume) benchmarks, the M1 Max
       | reaches 3080 levels, but in gaming benchmarks (using x86) it
       | reaches 3060 levels.
        
         | Thaxll wrote:
         | I don't think you've read the comments:
         | 
         | However gaming is a poorer experience, as the Macs aren't
         | catching up with the top chips in either of our games.
         | 
         | It's far far from a 3060 for gaming.
        
           | mrtranscendence wrote:
           | I haven't read _all_ the comments, but one of the first
           | comments that shows up says this:
           | 
           | > When it comes to the actual Gaming Performance, the M1X is
           | slightly slower than the RTX-3060.
           | 
           | Edit: also, the 3060 isn't a "top chip", particularly on a
           | laptop.
        
         | marricks wrote:
         | It's interesting especially since it sounds like the reason it
         | doesn't reach "close to 3080" in many games is because it's CPU
         | bound, specifically because it's emulating x86.
         | 
         | Once we get more benchmarks with non-rosetta apps the picture
         | may be rosier? That said, it's not like Apple was every the
         | company for gaming machines so perhaps that will just be the
         | state of things.
        
           | lowbloodsugar wrote:
           | TFA also compares games at 4k, where it is very much GPU
           | bound, and it is about half the speed of a laptop 6800. Which
           | is not great. (And I am speaking as someone whose M1 Max
           | arrives tomorrow).
           | 
           | The M1 GPU is vastly different than an AMD or nVidia GPU, and
           | I suspect it will have not-great scores until someone writes
           | a game and optimizes it specifically for the M1. Which is
           | most likely never.
        
             | xoa wrote:
             | > _and I suspect it will have not-great scores until
             | someone writes a game and optimizes it specifically for the
             | M1. Which is most likely never._
             | 
             | Don't forget that these days few demanding games are
             | "optimized for Platform XYZ", they're generally using one a
             | small set of middleware/engine they license. So it's more a
             | question of if Unreal Engine and Unity and so on get
             | optimizations aimed at Apple's M-series chips. That isn't
             | out of the question at all, given that they definitely have
             | optimization aimed at Apple's A-series chips. Once they do,
             | everything going forward that uses them will be better "for
             | free". Even if they don't hit the performance of something
             | truly hand tuned just to make max use of the arch it won't
             | be entirely ignored either. May not even be that much work.
             | 
             | That's another potential non-technical performance
             | advantage in moving the Mac from x86, they get to piggyback
             | in some areas off the enormously higher market share
             | iDevices. We'll see how it works out of course.
        
       | GeekyBear wrote:
       | The money quote from testing vs Intel's 11980HK:
       | 
       | The perf/W differences here are 4-6x in favour of the M1 Max, all
       | whilst posting significantly better performance, meaning the
       | perf/W at ISO-perf would be even higher than this.
       | 
       | and
       | 
       | >On the GPU side, the GE76 Raider comes with a GTX 3080 mobile.
       | On Aztec High, this uses a total of 200W power for 266fps, while
       | the M1 Max beats it at 307fps with just 70W wall active power.
       | 
       | https://www.anandtech.com/print/17024/apple-m1-max-performan...
        
         | AnthonyMouse wrote:
         | The sad thing is that what you really want to compare is how
         | their GPU is doing against nVidia, but then they pair it with
         | Intel's CPU which is known to have very poor power efficiency
         | vs. AMD.
        
           | dragontamer wrote:
           | 400GB/s is very high for a CPU bandwidth, but is less than of
           | NVidia's GTX 3080 760GB/s bandwidth. Assuming you care about
           | 32-bit of course.
           | 
           | I don't expect the M1 Pro to have very good double-precision
           | GPU-speeds.
        
             | defaultname wrote:
             | It's pretty remarkable that now we're not only comparing
             | Apple's SoC to the best CPUs from dedicated makers, we're
             | comparing it to the best GPUs.
             | 
             | Could you qualify what you mean regarding double precision,
             | though? nvidia consumer GPUs have pretty terrible double
             | precision (usually in the range of 1/64th single
             | precision). And FWIW, the normal cores in the M1 (Max|Pro)
             | have fantastic double precision performance, and comprise
             | the bulk of the SPECfp dominance.
        
               | dragontamer wrote:
               | > It's pretty remarkable that now we're not only
               | comparing Apple's SoC to the best CPUs from dedicated
               | makers, we're comparing it to the best GPUs.
               | 
               | Is it? Apple has 5nm on lockdown right now. Process is
               | nearly everything in performance/watt.
               | 
               | If you want to compare architectures, you compare it on
               | the same process. 5nm vs 5nm is only fair. 5nm vs 7nm is
               | going to be 2x more power efficient from a process level.
               | 
               | When every transistor uses 1/2 the power at the same
               | speed, of course you're going to have a performance/watt
               | advantage. That's almost... not a surprise at all. It is
               | this process advantage that Intel wielded for so long
               | over its rivals.
               | 
               | Now that TSMC owns the process advantage, and now that
               | Apple is the only one rich enough to get "first dibs" on
               | the leading node, its no small surprise to me that Apple
               | has the most power efficient chips. If anything, it shows
               | off how efficient the 7nm designs are that they can
               | compete against a 5nm design.
        
               | GeekyBear wrote:
               | > Process is nearly everything in performance/watt.
               | 
               | Not really. Apple's A15 and A14 phone chips are on the
               | same process node.
               | 
               | >Apple A15 performance cores are extremely impressive
               | here - usually increases in performance always come with
               | some sort of deficit in efficiency, or at least flat
               | efficiency. Apple here instead has managed to reduce
               | power whilst increasing performance, meaning energy
               | efficiency is improved by 17%
               | 
               | The efficiency cores of the A15 have also seen massive
               | gains, this time around with Apple mostly investing them
               | back into performance, with the new cores showcasing
               | +23-28% absolute performance improvements
               | 
               | https://www.anandtech.com/print/16983/the-apple-a15-soc-
               | perf...
        
               | dragontamer wrote:
               | > Not really. Apple's A15 and A14 phone chips are on the
               | same process node.
               | 
               | Yeah, you're talking about 20% performance changes on the
               | same node.
               | 
               | Meanwhile, advancing a process from 7nm to 5nm TSMC is
               | something like 45% better density (aka: 45% more
               | transistors per mm^2) and 50% to 100% better power-
               | efficiency at the same performance levels, and closer to
               | the 100%-side of power-efficiency if you're focusing on
               | idle / near-zero-GHz side of performance. (Pushing to
               | 3GHz is less power of a power difference, but lower idles
               | do have a sizable contribution in practice)
               | 
               | -----
               | 
               | Oh right: and TSMC N5P is 10% less power and 5% speed
               | improvement over TSMC N5 (aka: what TSMC figured out in a
               | year). There's the bulk of your 17% difference from A15
               | and A14.
               | 
               | Yeah, process matters. A LOT.
        
               | musicale wrote:
               | Are you saying that if another company, say AMD, had
               | access to TSMC's 5nm process than it would easily achieve
               | comparable performance/watt to what Apple has done with
               | the M1 series?
        
               | dragontamer wrote:
               | I'm saying that 15.5% of the 17% difference from Apple
               | A14 to Apple A15 is accounted for in the TSMC N5 to TSMC
               | N5p upgrade (Aka: 10% fewer watts at 5% higher clock
               | rates).
               | 
               | The bulk of efficiency gains has been, and for the
               | foreseeable future will be, the efficiency of the
               | underlying manufacturing process itself.
               | 
               | There's still a difference in efficiency above-and-beyond
               | the 15.5% from the A14 to the A15. But its a small
               | fraction of what the __process__ has given.
               | 
               | ---------
               | 
               | Traditionally, AMD never was known for very efficient
               | designs. AMD is more well known for "libraries", and more
               | a plug-and-play style of chip-making. AMD often can
               | switch to different nodes faster and play around with
               | modular parts (see the Zen "chiplets"). I'd expect AMD to
               | come around with some kind of chiplet strategy (or
               | something along those lines) before I expect them in
               | particular to take the efficiency crown.
               | 
               | NVidia probably would be better at getting high-
               | efficiency designs. They're on a weaker 8nm Samsung
               | process yet still have extremely good power/efficiency
               | curves.
               | 
               | I like AMD's chiplet strategy though as a business, and
               | as a customer. Its a bit of a softer benefit, and AMD
               | clearly has made the "Infinity Fabric" more efficient
               | than anyone expected it could get.
        
               | rbanffy wrote:
               | > Process is nearly everything in performance/watt.
               | 
               | ARM has consistently beat x86 in performance/watt at
               | larger node sizes since the beginning. The first
               | Archimedes had better floating point performance without
               | a dedicated FPU than the then market-leading Compaq 386
               | WITH an 80387 FPU.
               | 
               | A lot of the extra performance of the M1 family has
               | nothing to do with node, but with the fact the ARM ISA is
               | much more amenable to a lot of optimizations that allow
               | these chips to have surreally large reordering buffer,
               | which, in turn, keep more of the execution ports busy at
               | any given time, resulting in a very high ICP. Less
               | silicon used to deal with a complicated ISA also leaves
               | more space for caches, which are easier to manage
               | (remember the more regular instructions), putting less
               | stress on the main memory bus (which is insanely wide
               | here, BTW). On top of that, the M1 family has some
               | instructions that help make JavaScript code faster.
               | 
               | So, assume that Intel and AMD, when they get 5nm designs,
               | will have to use more threads and cores to extract the
               | same level of parallelism that the M1 does with an arm
               | (no pun intended) tied behind its back.
        
               | dragontamer wrote:
               | > optimizations that allow these chips to have surreally
               | large reordering buffer
               | 
               | But only Apple's chip has a large reordering buffer. ARM
               | Neoverse V1 / N1 / N2 don't have it, no one else is doing
               | it.
               | 
               | Apple made a bet and went very wide. I'm not 100% sure if
               | that bet is worth the tradeoffs. I'm certain that if
               | other companies thought that a larger reordering buffer
               | was useful, they'd have done it.
               | 
               | I'll give credit to Apple for deciding that width still
               | had places to grow. But its a very weird design. Despite
               | all that width, Apple CPUs don't have SMT, so I'd expect
               | that a lot of the performance is "wasted" with idle
               | pipelines, and that SMT would really help out the design.
               | 
               | Like, who makes an 8-wide chip that supports only 1
               | thread? Apple but... no one else. IBM's 8-wide decode is
               | on a SMT4 chip (4-threads per core).
        
               | rbanffy wrote:
               | SMT is a good way to extract parallelism when your ISA
               | makes it more difficult to do (with speculative
               | execution/register renaming). ARM, it seems, makes it
               | easier to the point I don't think any ARM CPU has been
               | using multiple threads per core.
               | 
               | I would expect POWER to be more amenable to it, but x86
               | borrows heavily from the 8085 ISA and was designed at a
               | time the best IPC you could hope to get was 1.
        
               | defaultname wrote:
               | > Apple has 5nm on lockdown right now
               | 
               | Qualcomm has loads of 5nm chips. They're pretty solidly
               | beaten by Apple's entrants, but they've been using them
               | for over a year now. Huawei, Marvell, Samsung and others
               | have 5nm products too.
               | 
               | This notion that Apple just bullied everyone out of 5nm
               | is not backed by fact. For that matter, Apple's
               | efficiency holds even at the same node.
               | 
               | There is this weird thing where some demand that we put
               | an asterisk on everything Apple does. I remember the
               | whole "sure it's faster but that's just because of a big
               | cache" (as if that negated the whole faster / more
               | efficient thing, or as if competing makers were somehow
               | forbidden from using larger caches so it was all so
               | unfair). Now it's all waved away as just a node
               | advantage, when any analysis at all reveals that to be
               | nonsensical.
        
               | dragontamer wrote:
               | > Qualcomm has loads of 5nm chips.
               | 
               | I think we all know that TSMC 5nm is quite a bit better
               | than Samsung 5nm.
               | 
               | Samsung is "budget" 5nm. It ain't as good as the best-of-
               | the-best that Apple is buying here.
        
               | laserlight wrote:
               | > Process is nearly everything in performance/watt.
               | 
               | > TSMC 5nm is quite a bit better than Samsung 5nm.
               | 
               | These two statements conflict.
        
               | dragontamer wrote:
               | > These two statements conflict.
               | 
               | TSMC 5nm is not the same process as Samsung 5nm though?
               | 
               | All the processes are the company's secret sauce. They
               | aren't sharing the details. Ultimately, Samsung comes out
               | and says "5nm technology", but that doesn't mean its
               | necessarily competitive with TSMC 5nm.
               | 
               | Indeed, Intel 10nm is somewhat competitive against TSMC
               | 7nm. The specific "nm" is largely a marketing thing at
               | this point... and Intel is going through a rebranding
               | effort. (Don't get me wrong: Intel is still far behind
               | because it tripped up in 2016. But the Intel 14nm process
               | was the best-in-the-world at that timeframe)
        
               | codedokode wrote:
               | You can compare transistor count per mm^2 instead of
               | nanometers.
        
               | dragontamer wrote:
               | But you compare power-efficiency by how efficient each
               | transistor is.
               | 
               | TSMC N5p is 10% more efficient and 5% higher clocks than
               | TSMC N5. The same 5nm __BY THE SAME COMPANY__ can change
               | 15.5% in just a year, as manufacturing issues are figured
               | out.
               | 
               | Making every transistor 10% less power and 5% more GHz
               | across the entire chip, while keeping the same size, is a
               | huge bonus that cannot be ignored. I don't know what
               | magic these chip engineers are doing, but they're surely
               | spending some supercomputer on brute forcing all sorts of
               | shapes/sizes of transistors to find the best
               | density/power/speed tradeoffs per transistor.
               | 
               | This is part of the reason why Intel stuck with 14nm for
               | so long. 14nm+++++++ kept increasing clock speeds,
               | yields, and power efficiency (but not density), so it
               | never really was "worth it" for Intel to switch to 10nm
               | (which Intel had some customer silicon tapped out for
               | years, but only at low clock speeds IIRC).
               | 
               | It isn't until recently that Intel seems to have figured
               | out the clock speed issue and has begun offering
               | mainstream chips at 10nm.
        
               | ac29 wrote:
               | > This notion that Apple just bullied everyone out of 5nm
               | is not backed by fact.
               | 
               | In the context of laptops, its true. Neither Intel or AMD
               | has chips being built on TSMC N5 or a comparable process.
               | AMD is on TSMC N7, and Intel is currently on their own 10
               | nm process, moving to "Intel 7" with Alder Lake which is
               | getting formally introduced in 2 days.
        
               | defaultname wrote:
               | "In the context of laptops, its true"
               | 
               | Intel wasn't in competition for TSMC's processes at all,
               | and AMD was in absolutely no hurry to 5nm (especially
               | given that they were targeting cost effectiveness). The
               | fact that Apple readied a 5nm design, and decided that it
               | was worth it for their customers, in no way indicates
               | that they "bullied" to the front.
               | 
               | Quite contrary, for years Intel made their mobile/ "low
               | power" parts on some of their older processes. It was a
               | low profit part for them and they saved the best for
               | their high end Xeons and so on (where the process benefit
               | was entirely spent on speed -- note that there is a lot
               | of BS about the benefit of process nodes where people
               | claim ridiculous benefits when in reality you can have a
               | small efficiency improvement, or a small performance
               | improvement, but not both. The biggest _real_ benefit is
               | that you can pack more on a given silicon space, in Apple
               | 's case loads of cores a fat GPU, big caches, etc). If
               | Apple upset their business model, well tough beans for
               | them.
               | 
               | As an aside, note that the other initial customer of 5nm
               | was HiSilicon (a subsidiary of Huawei) with the Kirin
               | 9000. That's a pretty sad day when AMD and Intel are
               | supposedly sad also-rans to Huawei. Or, more reality
               | based, they simply weren't even in competition for that
               | space, had zero 5nm designs ready, and didn't prioritize
               | the process.
        
               | rbanffy wrote:
               | Well... Intel not having 5nm is entirely Intel's fault.
               | They used process to their advantage and, well, when they
               | messed up their process cadence, the advantage
               | evaporated.
               | 
               | AMD could, but they seem to be very happy where they are.
               | They also have to decide on which fronts they want to
               | outcompete Intel and, it seems, process isn't one of
               | them.
        
             | saberience wrote:
             | > I don't expect the M1 Pro to have very good double-
             | precision GPU-speeds.
             | 
             | Compared to what? There are no laptops quite like these new
             | Apple laptops. Anything with faster graphics also uses
             | LOADS more power and runs WAY hotter.
        
               | dragontamer wrote:
               | > Compared to what? There are no laptops quite like these
               | new Apple laptops. Anything with faster graphics also
               | uses LOADS more power and runs WAY hotter.
               | 
               | Using 2x the power for 2x the bandwidth (on top of
               | significantly more compute power) is a good tradeoff,
               | when the NVidia chip is 8nm Samsung vs Apple's 5nm TSMC.
               | 
               | In any case, the actual video game performance is much
               | much worse on the M1 Pro. The benchmarks show that the
               | chip has potential, but games need to come to the system
               | first before Apple can decidedly claim a victory.
        
               | GeekyBear wrote:
               | > the actual video game performance is much much worse on
               | the M1 Pro
               | 
               | Well, no. The emulated x86 gaming performance is.
               | 
               | They didn't test a game with a native version.
        
               | dragontamer wrote:
               | > They didn't test a game with a native version.
               | 
               | If the native version doesn't exist then... gamers don't
               | care?
               | 
               | Gotta get those games ported over
        
               | rbanffy wrote:
               | > If the native version doesn't exist then... gamers
               | don't care?
               | 
               | I don't think it's a fair assessment of the machine
               | capabilities. Also, games WILL be ported to the platform
               | AND if you really need your games running at full speed,
               | you can keep the current computer and postpone the
               | purchase of your Mac until the games you need are
               | available.
        
               | dragontamer wrote:
               | No.
               | 
               | Next-generation games will be made on the platform.
               | Current-generation and last-generation games no longer
               | have much support / developers, and no sane company will
               | spend precious developer time porting over a year-old or
               | 5-year-old game to a new platform in the hopes of a slim
               | set of sales. (Except maybe Skyrim. Apparently those
               | ports keep making money)
               | 
               | Your typical game studio doesn't work on Skyrim though.
               | They put in a bunch of developer work into a game, then
               | by the time the game is released, all the developers are
               | on a new project.
        
               | GeekyBear wrote:
               | Have you seen how terrible the x86 emulated performance
               | is on a Surface Pro X?
               | 
               | https://www.youtube.com/watch?v=OhESSZIXvCA
        
               | dragontamer wrote:
               | And that's why gamers are buying the Surface Book
               | instead?
               | 
               | The "gamer" community (or really, community-of-
               | communities) only cares if their particular game runs
               | quickly on a particular platform.
               | 
               | Gamers don't really care about the advanced technology
               | details, aside from the underlying "which system will run
               | my game faster, with higher-quality images" (4k /
               | raytracing / etc. etc.)?
        
               | GeekyBear wrote:
               | No, that's why having x86 emulation performance be this
               | good is a minor miracle.
               | 
               | Native performance would be expected to be inline with
               | what the benchmarks are showing.
               | 
               | The MacBook Pro Max would beat the 100 watt mobile
               | variant of the 3080, especially if you unplug both
               | laptops from the wall where the 3080 has to throttle down
               | and the MacBook does not.
        
               | dragontamer wrote:
               | > No, that's why having x86 emulation performance be this
               | good is a minor miracle.
               | 
               | No gamer is going to pay $3000+ for a laptop with
               | emulation when $2000+ gamer laptops are faster at the
               | task (aka: video games are faster on the $2000 laptop).
               | 
               | ------
               | 
               | Look, gamers don't care about all games. They only care
               | about that one or two games that they play. If you want
               | to attract Call of Duty players, you need to port Call-
               | of-Duty over to the Mac, native, so that the game
               | actually runs faster on the system.
               | 
               | It doesn't need to be an all-or-nothing deal. Emulation
               | is probably good enough for casuals / non-gamers who
               | maybe put in 20 hours or less into any particular game.
               | But anyone putting 100-hours or more into a game will
               | probably want the better experience.
        
               | GeekyBear wrote:
               | > No gamer is going to pay $3000+ for a laptop with
               | emulation
               | 
               | They pay $3000 for a laptop whose fans hit 55 decibels at
               | load and that has to throttle way down slower than the
               | MacBook if you use it like a laptop and go somewhere
               | without a power outlet.
               | 
               | https://www.anandtech.com/show/16928/the-msi-ge76-raider-
               | rev...
        
               | dragontamer wrote:
               | The Mac doesn't even do raytracing, does it? So you're
               | already looking at a sizable quality downgrade over AMD,
               | NVidia, PS5, and XBox Series X.
               | 
               | I think the eSports gamers will prefer FPS over graphical
               | fidelity, so maybe that's the target audience for this
               | chip ironically.
               | 
               | But adventure gamers who want to explore raytraced worlds
               | / prettier games will prefer the cards with raytracing,
               | better shadows, etc. etc. (See the Minecraft RTX demo for
               | instance: https://www.youtube.com/watch?v=1bb7wKIHpgY)
        
               | pjmlp wrote:
               | It does,
               | 
               | https://developer.apple.com/videos/play/wwdc2021/10149/
               | 
               | https://developer.apple.com/videos/play/wwdc2021/10150/
        
               | dragontamer wrote:
               | Look, my Vega64 raytraces all the time when I hit the
               | "Render" button on Blender.
               | 
               | But video-game raytracing is about hardware-dedicated
               | raytracing units. Software (even GPU-software rendering)
               | is an order of magnitude slower. Its still useful to
               | implement, but what PS5 / XBox Series X / AMD / NVidia
               | has implemented are specific raytracing cores (or in
               | AMD's case: raytracing instructions) to traverse a BVH-
               | tree and accelerate the raytracing process.
               | 
               | "Can do Raytracing" or "Has an API for GPU-software that
               | does raytracing" is just not the same as "we built a
               | raytracing core into this new GPU". I'm sure Apple is
               | working on their raytracing cores but I haven't seen
               | anything yet that suggests that its ready yet.
        
               | rbanffy wrote:
               | > the actual video game performance is much much worse on
               | the M1 Pro
               | 
               | This is a workstation. For games one should look for a
               | Playstation ;-)
               | 
               | 2x power also means half the battery life. Remember this
               | is a portable computer that's thin and light beyond what
               | would be reasonable considering its performance. Also,
               | remember the GPU has full 400GBps access to all of the
               | RAM, which means models of up to 64GB won't need to pass
               | over the PCIe bus.
        
         | kcb wrote:
         | GFXBench is meaningless.
        
           | dsr_ wrote:
           | A pointer to an article on why it is meaningless is better
           | than the raw assertion.
        
             | ac29 wrote:
             | This article shows why - actual performance in games isn't
             | great. Yes, its partially held back by x86->ARM
             | translation, but if you are a gamer, this isnt a
             | particularly compelling system.
        
               | Aissen wrote:
               | This part of the article was particularly cringe. People
               | don't seem to realize how much of a tech prowess it is
               | that x86 builds of those two games are even _running at
               | an acceptable framerate_.
               | 
               | That said, it's been a year since the M1 release, and
               | Apple could have paid a few hundreds/millions for a few
               | AAA game ports. They didn't, and that says how much they
               | care about gaming on these devices.
        
               | GeekyBear wrote:
               | >This part of the article was particularly cringe. People
               | don't seem to realize how much of a tech prowess it is
               | that x86 builds of those two games are even running at an
               | acceptable framerate.
               | 
               | Spoken like someone who has never seen how poorly
               | Microsoft's x86 to ARM emulation works.
               | 
               | https://www.youtube.com/watch?v=OhESSZIXvCA
               | 
               | They had difficulty getting programs to work at all, much
               | less have acceptable performance under emulation.
        
               | rbanffy wrote:
               | > Spoken like someone who has never seen how poorly
               | Microsoft's x86 to ARM emulation works.
               | 
               | Sometimes I wonder why they are still a software company.
               | I really like their keyboards and mice. Their software,
               | not so much.
        
               | musicale wrote:
               | > that says how much they care about gaming on these
               | devices
               | 
               | Apple's bread and butter is iOS gaming (where it takes in
               | more game profit than Sony, Microsoft, Nintendo and
               | Activision combined) rather than Windows PC game ports to
               | macOS.
        
               | throwaway946513 wrote:
               | Reason for it not being a compelling system aren't
               | inherently obvious to most people:
               | 
               | Games are developed currently to run on x86 Windows
               | machines. Predominantly Windows machines. If the game is
               | designed to run on MacOS, then more likely you'll see a
               | performant experience. The issue isn't inherently
               | architecture or Apple's chips, it's the lack of software
               | choices available on the platform. Though you can now
               | argue that Apple's platform has increased significantly
               | due to the compatibility with the App Store bringing
               | games from mobile to the desktop.
               | 
               | Gamers would like to play the games they've bought (or
               | free to play that they've dedicated time to) on platforms
               | those games support, and most of those games do not
               | support MacOS, or Linux. However, with Proton emulation,
               | and EasyAC & BattlEye now working with Valve to improve
               | Anti-Cheat on Linux, we may see a greater compatibility
               | with the aforementioned systems enabling cross-platform
               | play.
        
               | musicale wrote:
               | > free to play
               | 
               | If only you could run iOS games on the M1 Mac, you'd be
               | all set...
        
       | sz4kerto wrote:
       | Looks like it doesn't worth buying the Max if you don't need a
       | powerful GPU.
        
         | sylens wrote:
         | or if you want to have 3-4 monitors
        
           | joshstrange wrote:
           | Yep, this is what pushed me to the Max. I think the GPU is
           | going to be overkill for my other needs (though I'm looking
           | forward to trying some gaming) but first and foremost I
           | needed support for my monitors (the only thing that kept me
           | from the first M1s).
        
             | dmix wrote:
             | Im curious, what kind of monitor set up needs that kind of
             | power?
        
               | joshstrange wrote:
               | 3x2K (2560x1440) and 1x1080p. I have my 3 2K monitors in
               | a "tie-fighter" formation which really just means 1 2K in
               | landscape and the other 2 2Ks are portrait mode on either
               | side of the middle 2K. Then I have my 1080p monitor just
               | on top of my middle 2K. The 1080 is really just used for
               | my Home Assistant (home automation software) dashboard
               | and occasionally reference material (Adobe XD/Figma-type
               | stuff).
               | 
               | I spit my 2 portrait monitors into 3rds and have hotkeys
               | to resize windows to snap them to one of the 3
               | "positions" (also hot keys to expand them to 2/3rd for
               | things like my code editor).
               | 
               | This setup works really for me personally and at some
               | point I'll update to 4K+ but I just couldn't afford it
               | when I first set this all up.
               | 
               | My normal usage is (`+` separates 3rds, `/` represents 1
               | of these apps in in this "slot"):
               | 
               | Left 2K: Discord + DataGrip/Android Studio/Drafts + Slack
               | 
               | Middle 2K: Chrome w/ dev tools right-docked
               | 
               | Right 2K: iMessage/Spotify + IDEA (2/3rds)
               | 
               | Top 1080p: Home Assistant/Adobe XD/Figma
               | 
               | Here is a (rough) ascii art of my setup:
               | +----------------+                          |
               | |         +------------+   |                |
               | +------------+         |            |   |
               | |   |            |         |            |   |
               | |   |            |         |            |
               | +----------------+   |            |         |
               | | +-+----------------+-+ |            |         |
               | | |                    | |            |         |
               | | |                    | |            |         |
               | | |                    | |            |         |
               | | |                    | |            |         |
               | | |                    | |            |         |
               | | +--------------------+ |            |         |
               | |                        |            |         |
               | |                        |            |         |
               | |                        |            |
               | +------------+                        +------------+
               | 
               | EDIT: Just to note, I know I'm not going to be scratching
               | the surface of what's possible with the M1 Max monitor-
               | wise since it can do 3x6K (Pro Display XDR) + 1x4K
        
               | gigatexal wrote:
               | Pics?! I would love to see this setup. I am so jelly.
               | I've just got a single 32inch 4k dell and a U2719DC on
               | the right.
        
               | joshstrange wrote:
               | Ok, here is a picture [0] but don't make fun of my LEDs
               | lol. It's one of the few pictures I have of a semi-clean
               | desk so I'm going to run with it.
               | 
               | [0] https://imgur.com/ILTcpG0
               | 
               | EDIT: Because I forgot to mention and my above post isn't
               | editable: My 3 2K monitors are all 27" and my 1080p is
               | 24"
        
               | _ph_ wrote:
               | That definitely looks cool - including your LEDs. Not
               | sure I would want to work in that configuration (my neck
               | would get hurt), but it definitely looks great :)
        
               | joshstrange wrote:
               | > That definitely looks cool - including your LEDs.
               | 
               | Thank you!
               | 
               | > Not sure I would want to work in that configuration (my
               | neck would get hurt)
               | 
               | Up and down issues or side to side? I tried all 3 2K
               | monitors in landscape when I first got them but that was
               | way too much side to side movement to get to the edges of
               | the far left/right screen so I begrudgingly switched them
               | to portrait. I had always thought portrait mode was silly
               | for a monitor but I absolutely love it now and I'm super
               | happy with the setup. I don't have to move my head very
               | much at all to scan across my setup and normally I am
               | pretty focused on just 2 monitors (center for Chrome and
               | right for IDEA) so there isn't a ton of movement. As for
               | up/down movement I really only ever move my head to look
               | at the top 1080p screen which is fine since I don't use
               | it regularly.
        
         | sliken wrote:
         | Or 64GB of ram.
        
           | hajile wrote:
           | Docker uses almost 6gb on my M1 Air. Even with 16GB, I have
           | to be careful to avoid swapping (which trashes your SSD).
           | 
           | I'm looking at buying a machine for higher RAM. 64GB is a
           | pretty good deal (look at desktop DDR5 prices). If I spend 4k
           | on a machine, I plan on using it for at least 3-4 years
           | before I upgrade. 64GB seems to make a lot of sense if you
           | can stomach the price of the M1 Max.
        
         | kabes wrote:
         | I don't need a powerful GPU, but I do need the additional RAM
         | the max supports..
        
         | sudhirj wrote:
         | Yeah, the Max has exactly the same CPU as the Pro. Only reason
         | I picked it is because I wanted the 32GB RAM, and only the Max
         | has that by default -- customised orders take a long time
         | deliver in India.
        
           | glhaynes wrote:
           | Believe the Max also has twice the memory bandwidth as the
           | Pro.
           | 
           | EDIT, upon reading the bottom part of the first page of the
           | linked article: And double the cache, more cores on the
           | Neural Engine, and maybe doubled media processing, assuming
           | I'm reading all of that correctly.
        
       | crateless wrote:
       | Now that Apple has taken the lead in performance/battery life
       | tradeoff, are there any machines which come close to the M1 for
       | software dev? Specifically, compiling Rust, Android development
       | etc. without giving up too much on battery life?
       | 
       | Also, the last time I checked, CPUs were reporting high
       | performance but only under light load. Has the whole throttling
       | situation changed or should I just expect to get 2 hours battery
       | life in exchange for extreme CPU performance?
       | 
       | Edit: I should have specified machines that can run Linux.
        
         | ac29 wrote:
         | Today? Probably not.
         | 
         | Intel's Alder Lake is moving to a Performance + Efficiency core
         | setup, which should help overall with battery life. But they
         | are still behind on manufacturing process (Alder Lake is "Intel
         | 7", supposedly roughly comparable to TSMC N7), so Apple will
         | quite likely maintain their lead in power consumption.
         | 
         | Alder Lake is getting announced in 2 days, but rumors have it
         | as a desktop-first product launch, so laptops may be another
         | quarter or two out.
        
         | raydev wrote:
         | Panzerino tested WebKit compiles on the first run of M1
         | machines last year, and it seems like the battery held up
         | really well on those.
         | 
         | > After a single build of WebKit, the M1 MacBook Pro had a
         | massive 91% of its battery left. I tried multiple tests here
         | and I could have easily run a full build of WebKit 8-9 times on
         | one charge of the M1 MacBook's battery.
         | 
         | https://techcrunch.com/2020/11/17/yeah-apples-m1-macbook-pro...
         | 
         | I'm looking forward to the MBP compile benchmarks.
        
         | wayneftw wrote:
         | > ...any machines which come close to the M1 for software dev?
         | 
         | If you can stand using macOS, that is.
         | 
         | Personally, I'll continue using Linux because that's where all
         | my software gets deployed to and macOS simply can't approach
         | the value of that or the value of open source. On a Mac, you'll
         | be fighting the OS the whole time.
         | 
         | If speed was all that mattered, Mac users would have left Apple
         | a long time ago because this is the first time they're faster
         | than a PC.
        
       | OldHand2018 wrote:
       | It's been a while since I bought a "Pro" computer from Apple. I
       | am kind of wondering about the perf-per-$$$ factor. With a
       | starting price of $2000, these are expensive computers. But maybe
       | they are worth it!
       | 
       | The M1 computers seemed like an absolute bargain for the
       | performance.
        
         | pohl wrote:
         | Years ago I read someone express the maxim "the computer you
         | want is always around $5k". This has stuck with me. It's been
         | approximately true throughout my life.
        
         | heresaPizza wrote:
         | These MacBooks are definitely worth the money. They cost a lot
         | but they are not overpriced.
         | 
         | You don't have to consider just the CPU and GPU but the whole
         | SoC. CPU is impressive and GPU is good, but for standard
         | workloads some PCs may give you slightly better performances
         | (on the GPU side), at the cost of needing the power adapter to
         | show that. However, for some specific workloads (especially the
         | ones involving ProRaw video) the custom modules in it make it
         | perform better not only than a Mac Pro, but than every other
         | machine in the market. There is also the Neural Engine that
         | could be more important in the future.
         | 
         | You may not need those modules, but it seems like we are
         | forgetting this are laptops with screens, inputs and more.
         | These machines have one of the best screens, with high DPI,
         | high refresh rate but most important the miniLED technology
         | which brings true HDR. And that's something very pricey.
         | 
         | Far from defending Apple, they could sell this laptops for less
         | and we would all be happier, but at the end of the day these
         | machines are worthy in every aspect (specific cases aside).
        
         | neogodless wrote:
         | I liked Dave2D's (youtube) take on this.
         | 
         | They are tools. If you have specific workloads that these excel
         | at in your job/hobby/money-making venture, then the price
         | shouldn't be a concern.
         | 
         | Depending on workload, they are comparable to $1000 PC laptops
         | in CPU performance... or $3000 PCs. Or PCs that don't exist
         | yet!
         | 
         | As someone who uses a laptop for gaming, my $1000 laptop is
         | infinitely superior to a $6000 Macbook Pro (for the games that
         | I play). For almost every other use, the Macbook Pro is likely
         | far superior!
         | 
         | If you do Final Cut or Xcode work, these are the best tool
         | available to you.
        
           | smoldesu wrote:
           | Yep, I reached pretty much the same conclusion as Dave: these
           | are machines for _very specific people_ running _very
           | specific software_. Apple got their wish: they made the
           | computer disappear, and now the Macbook Pro is a tool. For
           | better or worse, this is the best way to experience the Apple
           | ecosystem.
           | 
           | But also, if you're a developer without any interest in Apple
           | (and maybe someone who wants to play games), the case for
           | using Linux for general-purpose computing is stronger than
           | ever. It will be interesting to see how Apple addresses their
           | own issues in MacOS over the next few months, I've really got
           | my fingers crossed for Vulkan support or 32-bit libraries
           | making a return.
        
           | throwawaywindev wrote:
           | Yea, I originally ordered an M1 Max model after the
           | presentation, then cancelled it when I realized for what I
           | would use a GPU for (gaming and 3D development) a RTX 3080
           | laptop would be a much better choice. I also don't care about
           | performance/watt as much since I use my iPad for non-work
           | stuff.
           | 
           | But the technology nerd in me still wants to buy one for
           | completely irrational reasons.
        
             | kzrdude wrote:
             | Me too. I really want an M1* mac but I also realize that I
             | just want to run Linux, so it's kind of pointless right now
             | (I'm not a Linux kernel level developer, so yeah).
        
         | hajile wrote:
         | AMD 5950x $750 Nvidia RTX 3060 $750 midrange x570 Motherboard
         | $300 Samsung 980 pro 2TB $370 64GB DDR5 $700 decent 24" display
         | $400 Case, keyboard, mouse, cooling $400
         | 
         | That's $3670 -- If I build it myself. I'd expect to pay much
         | more from a big box store for a prebuilt.
         | 
         | A new 14" MacBook with M1 Max, 2TB SSD, and 64GB of RAM is
         | around $4100.
         | 
         | That's a great deal IMO
        
           | AnthonyMouse wrote:
           | 64GB memory, $292:
           | 
           | https://www.newegg.com/crucial-64gb-288-pin-
           | ddr4-sdram/p/N82...
           | 
           | 2TB NVMe SSD, $199:
           | 
           | https://www.newegg.com/western-digital-2tb-blue-
           | sn550-nvme/p...
           | 
           | X570 board, $154:
           | 
           | https://www.newegg.com/asrock-x570-phantom-
           | gaming-4/p/N82E16...
           | 
           | Case, keyboard, mouse, fans:
           | 
           | https://www.newegg.com/petrol-blue-fractal-design-focus-g-
           | at... https://www.newegg.com/logitech-mk120-920-002565-usb-
           | wired/p...
           | https://www.newegg.com/p/13C-0007-001M0?Item=9SIABW9EME4036
           | 
           | Together ~$100.
           | 
           | You're off by about $1000.
        
             | csomar wrote:
             | Good. Now you have an NVMe drive but a motherboard that
             | doesn't support NVMe. That cheap-ass cooler will unlikely
             | be able to run the x5950; these usually should be coupled
             | with water-cooling as they get stupidly hot.
             | 
             | Also, if you are getting a $15 mouse/keyboard, why are you
             | even gaming/doing graphic work in the first place? Might as
             | well get a raspberry-pi.
        
               | AnthonyMouse wrote:
               | That motherboard has two NVMe slots.
               | 
               | That cheap-ass cooler isn't that bad. It's pretty middle
               | of the road. Personally I'd have the Hyper 212 Evo for
               | $39 because it's not that much more expensive, but that
               | plus a ~$60 case and a ~$20 keyboard and mouse still
               | isn't $400.
               | 
               | The main difference between the plain keyboard and your
               | average "gamer" keyboard is what, RGB LEDs? You can pay
               | the money for that if you like, but there are no RGB LEDs
               | on the Mac.
        
               | csomar wrote:
               | > That motherboard has two NVMe slots.
               | 
               | I fail to find that anywhere in the specs. It says 8
               | SATA3 slots and 2 PCIe 4 slots. It is not clear if you
               | can boot/configure them from the BIOS.
               | 
               | > That cheap-ass cooler isn't that bad. It's pretty
               | middle of the road. Personally I'd have the Hyper 212 Evo
               | for $39 because it's not that much more expensive
               | 
               | The cooler is critical if you are doing CPU intensive
               | work. A hot CPU will get throttled; so you'd better get a
               | _very_ good one if you are already paying a lot for your
               | CPU.
               | 
               | > The main difference between the plain keyboard and your
               | average "gamer" keyboard is what, RGB LEDs? You can pay
               | the money for that if you like, but there are no RGB LEDs
               | on the Mac.
               | 
               | No. The grip and precision are night and day. Even more
               | so for mouses that I can't move back to normal ones (I
               | had the G900 and now have the razer viper ultimate; this
               | thing made the carpal tunnel syndrome a thing of the past
               | and I use the mouse for 10+ hours per day).
        
             | hajile wrote:
             | That RAM is DDR4 while the MBP uses DDR5 (low power). MSI
             | has stated that DDR5 will cost at least 60% more. Known
             | prices are actually 3x higher.
             | 
             | https://www.pcmag.com/news/msi-ddr5-ram-will-cost-60-more-
             | th...
             | 
             | That SSD is 2600mbs while the MacBook is 7400mbs. The
             | Samsung Pro is the only SSD in that territory.
             | 
             | Buy a cheap motherboard and you'll pay the price later. I
             | didn't spec a $800 motherboard (though those are amazing).
             | I went quite middle-of-the-road for an x570.
             | 
             | A keyboard with a fingerprint reader will set you back at
             | least $50 with a Surface keyboard costing $100. A
             | comparable trackpad would be over $100, but even a midrange
             | $50 mouse.
             | 
             | A non-garbage case will be around $100 plus or minus a
             | little.
             | 
             | A decent air CPU cooler that will keep your CPU from
             | throttling way back is going to run close to $80-120. I
             | also didn't bother to price out all the little things.
             | 
             | I forgot to add a PSU by the way. A name-brand, modular PSU
             | with midrange internals that is just big enough (around
             | 5-600w) is another $100-120.
        
               | AnthonyMouse wrote:
               | > That RAM is DDR4 while the MBP uses DDR5 (low power).
               | MSI has stated that DDR5 will cost at least 60% more.
               | Known prices are actually 3x higher.
               | 
               | Naturally it's DDR4. The 5950X only supports DDR4. If it
               | had DDR5 it would be twice as fast on all the things the
               | M1 Max is doing well on as a result of having more memory
               | bandwidth.
               | 
               | > That SSD is 2600mbs while the MacBook is 7400mbs. The
               | Samsung Pro is the only SSD in that territory.
               | 
               | This is kind of fair, but then that's the other problem.
               | For most workloads a 2600MBps read speed is already going
               | to move the bottleneck somewhere else, especially on a
               | machine with 64GB memory to use as cache. If you're the
               | rare exception who actually benefits from it, the Samsung
               | one is available, but for everybody else they get to save
               | $170 by not coupling the fast CPU they actually want with
               | an expensive SSD that wasn't their bottleneck and has a
               | poor cost benefit ratio.
               | 
               | > Buy a cheap motherboard and you'll pay the price later.
               | 
               | How do you mean? At best it won't have some ports you
               | eventually want and then you buy the add-in card later
               | and spend half as much on it because by then the price is
               | lower.
               | 
               | $155 is a fairly high price for a motherboard. $300 is a
               | severe price. The most common ones are like $70.
               | 
               | > A keyboard with a fingerprint reader will set you back
               | at least $50 with a Surface keyboard costing $100. A
               | comparable trackpad would be over $100, but even a
               | midrange $50 mouse.
               | 
               | The logitech keyboard and mouse are perfectly serviceable
               | and on par with anything you get when you buy a complete
               | PC from the store. I would take them over the chiclet
               | thing that Apple makes.
               | 
               | You can have a $100 keyboard and a trackpad to use with
               | your desktop, but now you have an advantage over the
               | Macbook, because you get to buy it once and use it
               | forever instead of it being permanently attached to a
               | machine that will be obsolete before the keyboard is. So
               | you get to amortize the cost over several hardware
               | generations.
               | 
               | The same goes for the monitor for that matter.
               | 
               | > A non-garbage case will be around $100 plus or minus a
               | little.
               | 
               | How is the thing provided a garbage case? What am I not
               | getting from it that I actually care about?
               | 
               | > A decent air CPU cooler that will keep your CPU from
               | throttling way back is going to run close to $80-120.
               | 
               | That was a decent air CPU cooler. It has copper heat
               | pipes and a 92mm fan. The crappy ones are like $13:
               | 
               | https://www.newegg.com/cooler-master-air-cooler-series-
               | rh-a3...
               | 
               | A really great one is $39:
               | 
               | https://www.amazon.com/dp/B005O65JXI
               | 
               | I went to see what you would get for $120 and for that
               | price some of the coolers included a CPU.
        
             | kitsunesoba wrote:
             | There's definitely cheaper options for the various
             | components, but I would personally choose a nicer
             | motherboard if I were doing a 5950X build. One of the new
             | ASUS boards with no chipset fan and 2x Thunderbolt 4 ports,
             | like the ProArt X570 Creator Wifi, is a likely choice.
        
               | AnthonyMouse wrote:
               | I mean this is the other advantage of the PC. If you want
               | to pay more and get the Thunderbolt ports, you can. If
               | you don't need them for anything, you don't have to pay
               | for them.
               | 
               | The specs we're looking at here are pretty general. Most
               | workloads are either going to be CPU bound _or_ GPU
               | bound, not both. Do you need the 5950X? Then you 're CPU
               | bound and can save $500 with another GPU. Do you need the
               | RTX 3060? Then you're GPU bound and can save $500 with
               | another CPU.
               | 
               | If you need the fast GPU in the Mac, it comes as one
               | piece in a machine that starts at $3500.
        
             | friedman23 wrote:
             | You went and found the cheapest versions of components you
             | could find to make a false comparison. This has always been
             | the difference between pc laptops and mac laptops. People
             | say "I can get this cpu with this graphics card in a pc
             | laptop for WAAYY cheaper" ignoring the memory, quality of
             | the chassis, quality of the screen, quality of the
             | speakers, the battery and just about everything else.
        
               | AnthonyMouse wrote:
               | No, I didn't. You can get AM4 boards for ~$50. Here's
               | 64GB of memory for $226:
               | 
               | https://www.newegg.com/g-skill-64gb-288-pin-
               | ddr4-sdram/p/N82...
               | 
               | Here's a 2TB SSD for $143:
               | 
               | https://www.newegg.com/leven-2tb/p/1B0-016A-00002?Item=9S
               | IAW...
               | 
               | But then you don't have PCIe 4.0 and it's 4x16GB instead
               | of 2x32GB and it's SATA instead of NVMe.
               | 
               | Here's how you spend >$700 on 64GB of memory:
               | 
               | https://www.newegg.com/ballistix-64gb-288-pin-
               | ddr4-sdram/p/N...
               | 
               | $894!
               | 
               | But wait, here's the same thing for $310:
               | 
               | https://www.newegg.com/ballistix-64gb-288-pin-
               | ddr4-sdram/p/N...
               | 
               | It's a different color. Apparently that's what you get
               | for the extra $584.
               | 
               | There's getting cheaper stuff and then there's just
               | paying a different amount of money.
        
               | lowbloodsugar wrote:
               | The NVME drive you found is 2400MB/s. MacBook Pro peaks
               | at 7400Mb/s. So your pick is 1/3 speed.
               | 
               | The $143 SSD you found is 540Mb/s. So < 1/10th the speed.
        
               | AnthonyMouse wrote:
               | The NVMe drive I found is 2600MB/s. It also isn't the
               | bottleneck in general.
               | 
               | Both the Samsung and the Apple SSDs are just a bad fit
               | for a machine like this. For a non-I/O bound workload
               | it's paying money for nothing. For a read-bound workload,
               | the machine has 64GB of RAM, so you'd need a working set
               | larger than that to have to care, and that's pretty rare.
               | For a write-bound workload, at those speeds, you're going
               | to melt a consumer-grade SSD and need an enterprise one.
               | So you don't _want_ the faster one in this machine;
               | either it 's not worth the price or it won't survive that
               | amount of write load.
        
           | gigatexal wrote:
           | And it's portable with a high dpi screen and 120hz refresh.
        
             | kitsunesoba wrote:
             | > ...with a high dpi screen and 120hz refresh
             | 
             | Having shopped it recently, the desktop monitor market is
             | abysmal right now. To get a monitor remotely comparable to
             | the new MBP internal displays you're looking at spending
             | $3k on one of a tiny handful of FALD displays which run hot
             | enough to need a fan and have worse DPI, or rearranging
             | your desk setup to accommodate a 48" LG OLED TV which is
             | subject to burnin with serious usage.
             | 
             | Apple can't release those lower-priced 27" displays that've
             | been rumored soon enough.
        
               | _ph_ wrote:
               | I hope the next large iMac is at least a little bit above
               | 27", after all, the entry-level one went from 21.5 to
               | 24". A 30" iMac with mini-led and 120Hz would be just too
               | sweet.
        
               | kitsunesoba wrote:
               | That would be pretty great, but unless they restore
               | display passthrough capabilities with the M-series 27"+
               | iMacs, I'm really hoping for a standalone display that
               | can be used both with a laptop and the rumored upcoming
               | "Mac Pro Mini"/G4 Cube MKII.
        
           | r00fus wrote:
           | Given how difficult it's been to get GPUs recently due to
           | supply chain crunch, are you sure you can actually get your
           | order fulfilled without a significant wait period?
        
             | AnthonyMouse wrote:
             | You can get them. The supply crunch is why the GPU is ~$750
             | instead of ~$300.
        
         | paxys wrote:
         | Completely depends on what your use case is. For me personally
         | a desktop + mid-range laptop combo is cheaper, more powerful
         | and an all-round better fit than a single $2K+ laptop (which
         | realistically will become $3K+ after adding a few options).
        
         | mrtranscendence wrote:
         | They're definitely pricey, but CPU performance is out of this
         | world. GPU's pretty good too, though not as impressive, I
         | understand. Alas, it'll probably be a few years before I can
         | get one (I use my work machine for almost everything and I
         | _just_ got this one).
        
           | f6v wrote:
           | It's not that GPU isn't impressive. Rather that you can't put
           | it to use in games.
        
             | mrtranscendence wrote:
             | Fair enough! It doesn't really bother me either way, since
             | I don't really make use of the GPU on my laptop. Curious to
             | try out Tensorflow on one of the new machines, though.
        
               | rezahussain wrote:
               | Yes, I just want to see some tensorflow specific
               | benchmarks on the m1 max with 32 gpu cores vs 3080.
        
         | f6v wrote:
         | They're also compact, well-built(keyboards are fixed now) and
         | with good battery life. Last time I was shopping, I found
         | direct competition(XPS 13 and the likes) to be about the same
         | price.
        
           | OldHand2018 wrote:
           | Oh absolutely, they're great. My last MacBook Pro was around
           | $2500 or $3000, if I remember correctly, and it was a very
           | good price for what you got.
           | 
           | Don't think that I'm being overly negative here. These look
           | like outstanding computers that are worth a premium price. My
           | question here is about how much of a premium and I'm not
           | trying to give an opinion with a question - I really don't
           | know the answer :)
        
         | cududa wrote:
         | Depends what you need them for. On a music video set my 2010
         | MBP survived a drop from a second story balcony (three stories
         | in a normal home), onto a marble floor. Got dented as hell, but
         | only functional damage was to the ethernet port. I'm excited to
         | get one of these new ones. Imagine it'll last at least 5 years.
        
       ___________________________________________________________________
       (page generated 2021-10-25 23:02 UTC)