[HN Gopher] Apple's M1 Pro, M1 Max SoCs Investigated
___________________________________________________________________
Apple's M1 Pro, M1 Max SoCs Investigated
Author : defaultname
Score : 245 points
Date : 2021-10-25 13:01 UTC (10 hours ago)
(HTM) web link (www.anandtech.com)
(TXT) w3m dump (www.anandtech.com)
| jb1991 wrote:
| > GPU-accelerated machine learning, and more have all been
| developed or first implemented by Apple
|
| Did not realize Apple was first in that area in previous decades.
| tromp wrote:
| > I've had extreme trouble to find workloads that would stress
| the GPU sufficiently to take advantage of the available
| bandwidth.
|
| Would love to see how well the GPU runs a memory-bound Proof-of-
| Work like Cuckatoo Cycle [1].
|
| [1] https://github.com/tromp/cuckoo
| xqcgrek2 wrote:
| Seems the CPU cluster saturates at about 240 GB/s and can't
| utilize the full memory bandwidth. This bodes well for future
| clusters with double the number of CPU cores at a node shrink (M2
| Max?) or for a Mac Pro (Mac Quadra?).
| makomk wrote:
| Maybe. This seems like a cluster-wide limitation - the
| individual CPU cores can utilize enough memory bandwidth that
| together they should be able to saturate the bus, but there's
| some kind of bottleneck on the entire CPU section of the SoCs
| and who knows how easy or difficult it would be to alleviate
| that.
| sliken wrote:
| Sure, keep in mind that most competing laptops max at 70GB/sec
| (the never to exceed number), and most desktops are slower than
| that with 2 channels of DDR4-3200 to 4200 (41-67GB/sec).
|
| So while it's "only" 240, that's an excellent number. Keep in
| mind that you generally never see 100% of theoretical
| bandwidth.
| whatever1 wrote:
| Performance-wise they seem to be on par with the high-end intel
| 11th gen mobile processors and nvidia 3060.
|
| Power-wise these chips look like they landed from a different
| planet. 50% less power draw for most workloads.
|
| Will we get such efficiencies when intel hits 5nm?
| uluyol wrote:
| I don't think these efficiencies are just from the node
| advantage. The fact is that Apple chips follow mobile designs
| and are highly integrated SoCs where Apple can optimize every
| aspect of the system in exchange for losing flexibility (no
| mixing and matching of components).
|
| In contrast to mobile processors, x86 processors live in a
| world where flexibility is demanded. I need to be able to pick
| how much RAM I want, which WIFI modem, which graphics, and so
| on (where I is a combination of the consumer and laptop
| manufacturer). Sure, laptop processors have gotten more
| integrated lately, but it's not to the same degree. Competition
| from Apple might pressure Intel and AMD to integrate much more
| and sacrifice this flexibility in order to squeeze out better
| power efficiencies.
| cududa wrote:
| As a teenager in 2005/ 2006, had an obsession with
| overclocking AMD Opteron 165's. DFI motherboards allowed you
| to set the ratios for FSB/HTT, LDT/FSB, CPU/FSB, etc.
|
| I'd hunt like crazy for specific OCZ DDR2 RAM modules from
| specific batches that had the tolerances I was looking for.
| At a few points, had the highest perf overclocks (even among
| liquid cooling - mine was passive with a polished heatsink)
| on various leaderboards and my 4GB DDR2 system frequently
| could beat out 8GB DDR3 systems (with full stability via
| MemTest x86) on GeekBench like tests.
|
| WRT Apple Silicon I think about those days a lot - just
| thinking about the perf AMD, OCZ, and DFI could have
| repeatedly squeezed out if they all were one company setting
| the same tolerances on all silicon and power delivery.
|
| I have to imagine a large amount of the perf wins come from
| having consistent FSB, HTT, and LDT channels that can have
| the channel relay ratios optimally configured instead of
| buffering up the "lowest common denominator" silicon
| manufacturing tolerances.
| ksec wrote:
| >Will we get such efficiencies when intel hits 5nm?
|
| Judging from everything we know. Even in the most optimistic
| scenario. The answer is no.
|
| Note: And Intel doesn't have 5nm, they go to 4nm and then 3nm.
| But the answer is still the same.
|
| Edit: For those wondering how the conclusion was arrived, you
| take a look at the Alderlake SPEC_INT and Geekbench score, look
| at the power usage per core. ( Forget MT benchmarks ), you
| scale it by target IPC improvement and node improvement. You
| should see the gap is in terms of _efficiencies_ is still
| behind.
| gigatexal wrote:
| Yeah I am satisfied with this performance.
|
| I'm not a gamer. And even if the CPU can't take advantage of
| the 400GB/s 250 or so is very good indeed.
|
| This is all low hanging fruit for future revisions of this
| chip. The A15 based cores will further improve single core IPC
| and in turn make MT workloads even better. Basically if this is
| the floor then the sky won't be high enough to contain where we
| go next.
| GeekyBear wrote:
| For a laptop chip, single threaded integer performance is on
| par.
|
| Multi-threaded integer and floating point performance is not.
|
| >In the aggregate scores - there's two sides. On the SPECint
| work suite, the M1 Max lies +37% ahead of the best competition,
| it's a very clear win here and given the power levels and TDPs,
| the performance per watt advantages is clear. The M1 Max is
| also able to outperform desktop chips such as the 11900K, or
| AMD's 5800X.
|
| In the SPECfp suite, the M1 Max is in its own category of
| silicon with no comparison in the market. It completely
| demolishes any laptop contender, showcasing 2.2x performance of
| the second-best laptop chip. The M1 Max even manages to
| outperform the 16-core 5950X - a chip whose package power is at
| 142W, with rest of system even quite above that. It's an
| absolutely absurd comparison and a situation we haven't seen
| the likes of.
|
| https://www.anandtech.com/print/17024/apple-m1-max-performan...
|
| We'll see what happens when they make a desktop chip and are no
| longer so constrained on thermals and power draw.
|
| The unreleased Mac Pro chip is said to have the resources of
| either two or four M1 Pro chips glued together.
| lrem wrote:
| The laptop ones are already 20mm. That likely isn't very good
| for yield. Going even larger would likely be ruinous in the
| cost department. Wouldn't putting multiple M1 Max be a better
| idea?
| simonh wrote:
| The Mac pro chip will also likely use the next generation of
| core architecture as well.
| GeekyBear wrote:
| Yes, this year's iPhone chip did get a newer version of the
| performance core.
|
| >with a score of 7.28 in the integer suite, Apple's A15
| P-core is on equal footing with AMD's Zen3-based Ryzen
| 5950X with a score of 7.29, and ahead of M1 with a score of
| 6.66
|
| https://www.anandtech.com/show/16983/the-apple-a15-soc-
| perfo...
|
| You'll have to look in the charts, but on single threaded
| floating point the scores are 10.15 for the A15, 9.58 for
| the 11900K, and 9.79 for the 5950X.
|
| Having your phone chip match or beat Intel and AMD's
| desktop variants on single core performance (with a phone's
| memory bandwidth) is fairly impressive in itself.
| merb wrote:
| Is ist? I Tought that glueing memory in the die will
| always yield a better memory bandwidth? Also the new
| phone Uses ddr5 which is not possible in the desktop
| (yet)
| samgranieri wrote:
| Where did you hear the Mac Pro chip is supposed to have the
| resources of 2 or 4 M1 chips glued together?
| MR4D wrote:
| Leak by Mark Gurman. John Siracusa put together an
| illustrative diagram showing how it works (functionally,
| not actual layout).
|
| https://twitter.com/siracusa/status/1395706013286809600?s=2
| 1
| GeekyBear wrote:
| Bloomberg's Gurman, about half a year ago.
|
| >Codenamed Jade 2C-Die and Jade 4C-Die, a redesigned Mac
| Pro is planned to come in 20 or 40 computing core
| variations, made up of 16 high-performance or 32 high-
| performance cores and four or eight high-efficiency cores.
| The chips would also include either 64 core or 128 core
| options for graphics.
|
| https://www.macrumors.com/2021/05/18/bloomberg-mac-
| pro-32-hi...
| djrogers wrote:
| Gurman at Bloomberg - his article mailed the M1 Pro/Max (in
| March!), and this is what it says about the next stage:
|
| "Codenamed Jade 2C-Die and Jade 4C-Die, a redesigned Mac
| Pro is planned to come in 20 or 40 computing core
| variations, made up of 16 high-performance or 32 high-
| performance cores and four or eight high-efficiency cores.
| The chips would also include either 64 core or 128 core
| options for graphics."
|
| [1]
| https://www.bloomberg.com/news/articles/2021-05-18/apple-
| rea...
| ChuckNorris89 wrote:
| _> In the SPECfp suite, the M1 Max is in its own category of
| silicon with no comparison in the market._
|
| How much of that performance is due to the M1 Pro/Max having
| way more memory bandwidth than the Intel/AMD chips, and also
| being specifically designed from the ground up to make use of
| all that bandwidth? AFAIK the RAM used by the M1 Pro/Max is
| more similar in performance to the GDDR used in graphics
| cards vs the slow-ish ageing DDR4 used in Intel/AMD systems
| that are designed to prioritize compatibility with RAM of
| varying quality, speeds and latencies instead of raw
| performance at a specific high speed.
|
| So I'm curious to know how an X64 chip would perform if we
| even the playing field not just in node size but also if
| Intel and AMD would adapt their X64 designs from the ground
| up with a memory controller, cache architecture and
| instruction pipeline tuned to feed the CPU with data from
| such fast RAM.
|
| I'm asking this since AFAIK, Ryzen is very sensitive to
| memory bandwidth, the more you give it the better it performs
| to the point where if you take two identical laptops with the
| same Ryzen chip but one has 33% faster RAM, then that laptop
| will perform nearly 33% better in most CPU/GPU intensive
| benchmarks, all things being equal.
| namibj wrote:
| Should be fine to compare to an EPYC 72F3 or 73F3 (Zen3,
| 8/16 cores, 8 channel DDR4-3200 RDIMM (204.8 GB/s
| theoretical ceiling), $2500/$3500).
|
| If memory latency is that important, one would AFAIK have
| to compare to Zen3 Threadripper, because EPYC (and afaik
| Threadripper Pro, the other 8-channel variant of Zen3) is
| rather locked down in regards to memory "overclock"s.
|
| (Note that this is more of a latency/efficiency tradeoff,
| an any moderate OC is trivial to cool.)
| dragontamer wrote:
| > 8 channel DDR4-3200 RDIMM (204.8 GB/s theoretical
| ceiling)
|
| But the M1 Max is 8-channel LPDDR5 400GBps, or literally
| twice the bandwidth of that EPYC
| gigatexal wrote:
| Is anyone else finding it hilarious people are comparing
| these mid-tier consumer chips to server grade EPYCs?
| :rofl:
| ChuckNorris89 wrote:
| Not if you actually understand the technical constraints
| vs the business applications of each product in their
| specific market segments.
|
| A Ford F-150 can tow more weight than a Lamborghini
| despite having less power and being significantly cheaper
| but each is geared towards a different use case so direct
| comparisons are just splitting hairs.
| dragontamer wrote:
| That's also a good point. EPYC has 2TB or 4TB of DDR4 RAM
| support.
|
| That being said: its amusing to me to see the x86 market
| move into the "server-like" arguments of the 90s. x86
| back then was the "little guy" and all the big server
| folks talked about how much bigger the DEC Alpha was and
| how that changes assumptions.
|
| It seems like "standard" servers / systems have grown to
| outstanding sizes. The big question in my mind is if 64GB
| RAM is large enough?
|
| Moana scene
| (https://www.disneyanimation.com/resources/moana-island-
| scene...) is 131 GBs of RAM for example. It literally
| wouldn't fit on the M1 Max. And that's a 2016-era movie,
| more modern movies will use even more space. The amount
| of RAM modern 3d artists need is ridiculous!!
|
| Instinctively, I feel like 64GB is enough for power-
| users, but not in fact, the digital artists who have
| primarily been the "Pro" level customers of Apple.
| gigatexal wrote:
| I am waiting for the photography pros of YouTube to weigh
| in on that last bit but the Disney bit about 131GB of ram
| usage is intense. Surely a speedy disk can page but
| likely not enough to make the 64GB of ram a bottleneck.
| Maybe things like optane or ssds will get so much quicker
| that we'll see a further fusion of io down to disks and
| one day we'll really have a chip that thinks all it's
| storage is ram/storage and doesn't really distinguish
| between it. Sure it's unlikely SSDs will get to 400GB/s
| in speed but if they could get to 10GB/s or more
| sustained that latency could be handled by smart software
| probably.
|
| I think for that cohort any future iMac Pro or Mac Pro
| with the future revisions of these chips will surely
| increase the ram to 128GB maybe even 256GB or more.
|
| I am super curious how Apple will tackle those folks who
| want to put 1TB of ram or more into their systems if
| they'll do an SOC with some ram plus extra slotted ram as
| another layer?
| dragontamer wrote:
| > Sure it's unlikely SSDs will get to 400GB/s in speed
| but if they could get to 10GB/s or more sustained that
| latency could be handled by smart software probably.
|
| Yeah, I'm not part of the field, but the 3d guys always
| point to "Moana" to show off how much RAM they're using
| on their workstations. Especially since Disney has given
| away the Moana scene as a free-download, so anyone can
| analyze it.
|
| The 131GB is the animation data (ie: trees swaying in the
| wind). 93GBs are needed per frame (roughly). So the 131GB
| can be effectively paged, it takes several seconds (or
| much much longer) to render a single frame. So really,
| the data is over 220GBs needed for the whole scene.
|
| In practice, a computer would generate the wind and
| calculate the effects on the geometry. So the 131 GB
| animation data could very well be procedural and "not
| even stored".
|
| The 93GB "single frame" data however, is where all the
| rays are bouncing (!!!) and likely needs to be all in
| RAM.
|
| That's the thing: that water-wave over there will reflect
| your rays basically anywhere in the scene. Your rays are,
| in practice, bouncing around randomly in that 93GB of
| scene data. Someone managed to make an out-of-core GPU
| raytracer using 8GB of GPU-VRAM (they were using a very
| cheap GPU) to cache where the rays are going, but it
| still required keeping all 93GB of scene data in the CPU-
| RAM.
| giantrobot wrote:
| > Moana scene
| (https://www.disneyanimation.com/resources/moana-island-
| scene...) is 131 GBs of RAM for example. It literally
| wouldn't fit on the M1 Max. And that's a 2016-era movie,
| more modern movies will use even more space. The amount
| of RAM modern 3d artists need is ridiculous!!
|
| I doubt there was any laptop available in 2016 that could
| be loaded with enough RAM to handle those Moana scenes. I
| doubt such beasts exist in 2021.
|
| It seems the M1s are showing that Apple can just increase
| the number of cores and memory interfaces to beef up the
| performance. While there's obviously practical limits to
| such horizontal scaling, a theoretical M1 Pro Max Plus
| for a Mac Pro could have another doubling of memory
| interfaces (over the Mac) or add in an interconnect to do
| multi-socket configurations.
|
| That's all just horizontal scaling before new cores or a
| smaller node process becomes available. A 3NM process
| could get roughly double the current M1 Max circuitry
| into the same footprint as today's Max.
| dragontamer wrote:
| > That's all just horizontal scaling before new cores or
| a smaller node process becomes available. A 3NM process
| could get roughly double the current M1 Max circuitry
| into the same footprint as today's Max.
|
| I/O / off-chip SERDES doesn't scale very easily.
|
| If you need more pins, you need to go to advanced
| packaging like HBM or whatnot. 512-bit means 512-pins on
| the CPU, that's a lot of pins. Doubling to 16-channel
| (1024-bit bus) means more pins.
|
| You'll run out of pins on your chip, not without micro-
| bumps that are on HBM. That's why HBM can be 1024-bit or
| 4096-bit, because it uses advanced packaging / microbumps
| to communicate across a substrate.
| dragontamer wrote:
| EPYC almost certainly has more compute power though.
|
| Honestly, the memory bandwidth is imbalanced. Its mostly
| there to support the GPU, but the CPU also gains benefits
| from it. Its hard enough to push an EPYC to use all
| 200GBps in practice.
|
| EDIT: For workstation tasks however, 64GB is huge for a
| GPU, while 400GBps is huge for a CPU. Seems like win/win
| for the CPU and GPU. Its a very intriguing combination.
| GPU devs usually have to work with much less VRAM, while
| CPU devs usually have to work with much less bandwidth.
|
| 64GB is small for CPU workstation tasks however. Its
| certainly a strange tradeoff.
| hajile wrote:
| The article in question shows that a mere 8 big cores and
| 2 little ones can use 243gb/s.
|
| I'm guessing Apple will go with HBM3/4 before too long
| due to the lower power consumption and great performance.
| gigatexal wrote:
| I thought HBM was a power sucking tech?
| dragontamer wrote:
| HBM is very low-clock speed and super efficient.
|
| HBM's downside is that it requires many, many, many pins.
| Each channel is 1024-pins of communications (and more
| pins for power). In practice, the only thing that can
| make HBM work are substrates. (Typical chips have 4x to
| 6x HBM stacks, for well over 4096 pins to communicate,
| plus more pins for power / other purposes)
|
| But HBM is among the lowest power technologies available.
| Turns out that clocking every pin at like 500MHz (while
| LPDDR5 is probably a 3200 MHz clock) saves a lot on
| power. Because DRAM has such high latency, the channel
| speed is more for parallelism more so than anything else.
| (DDR4 parallelizes RAM into 4-bank groups, each with
| 4-banks. All 16 can be accessed in parallel across the
| channel).
|
| HBM just does this parallel access thing at a lower clock
| rate, to save on power. But spends way more pins to do
| so.
| AnthonyMouse wrote:
| > So I'm curious to know how an X64 chip would perform if
| we even the playing field not just in node size but also if
| Intel and AMD would adapt their X64 designs from the ground
| up with a memory controller, cache architecture and
| instruction pipeline tuned for that kind of fast RAM.
|
| We can get a pretty good idea about this by looking at
| Threadripper, which has more memory channels:
|
| https://www.anandtech.com/bench/product/2842?vs=2666
|
| (This is Zen 2 Ryzen vs. Zen 2 Threadripper because Zen 3
| Threadripper isn't out yet.)
|
| In nearly everything it's about the same, because most
| workloads aren't memory bandwidth limited. But then you get
| into the multi-threaded SPEC tests where the M1 Max is
| doing really well, and Threadripper does really well. And
| this is still with less memory bandwidth than the M1 Max
| because it's using DDR4 instead of LPDDR5.
|
| The lesson here is that memory bandwidth limited workloads
| are limited by memory bandwidth.
| dangus wrote:
| > Will we get such efficiencies when intel hits 5nm?
|
| I think the answer is "maybe," "probably," or "sort of."
|
| But I also wonder if x86 can ever truly outdo ARM on power
| efficiency.
|
| If we want potential evidence, we could look at what AMD is
| able to do on TSMC's manufacturing: better than Intel, but
| still short of Apple.
|
| Then again, AMD is tiny compared to Apple and Intel.
|
| Granted, I think I'm vastly oversimplifying processor
| architecture. I know it's way more complicated than "x86 vs.
| ARM."
| jjoonathan wrote:
| Isn't AMD still a node behind Apple? They both use TSMC, but
| my impression was that Apple was the largest customer
| bankrolling the leading node and therefore got first crack at
| it.
| eightysixfour wrote:
| Yes, AMD is still a full TSMC node behind.
| laydn wrote:
| I am surprised that Apple sources more wafers than AMD from
| TSMC. Are they really the largest customer in terms of
| wafers, or are they getting better deals thanks to their
| enormous cash reserves and financing abilities?
| neogodless wrote:
| AMD sources a double-digit market share of PC/server
| CPUs, plus their GPUs, and APU chips for consoles.
|
| On the flip side, Apple sources iPhones, iPads and other
| chips, plus this new line of Apple Silicon.
| KptMarchewa wrote:
| Even server market is small compared to mobile.
| IOT_Apprentice wrote:
| How many units is AMD shipping/Quarter vs Apple?
|
| Heck, Sony & Microsoft can't even get PS5s and Xboxes in
| customer hands 1 year after release. I would expect it is
| a combination of Apple's volume of phones shipped/quarter
| and cash reserves.
| jjoonathan wrote:
| Yep, Apple is the biggest customer. Apple is 25% of
| TSMC's revenue, AMD is only 10% -- and that's after the
| recent growth spurt. Just one year ago, AMD was behind
| Apple, Huawei, Qualcomm, Broadcomm, and NVidia.
| Someone wrote:
| That's a different metric than "number of wafers". Apple
| likely pays quite a premium for using the newer tech.
| jjoonathan wrote:
| Of course, but TSMC pays its shareholders in money, not
| wafers, so money is the correct metric for influence.
| hajile wrote:
| Even when you compare A12 on 7nm, the numbers don't look
| very good for AMD.
| api wrote:
| My understanding is that X86 can never outdo ARM on power
| because of the difficulty of parallel instruction decoding.
| dahfizz wrote:
| CISC was always a mistake, it just took a company the size
| of Apple to overcome the inertia of the established x86.
| api wrote:
| Sort of... RISC and CISC are really misnomers. The
| problem with X86 is not the number of instructions (ARM
| has a lot too!) but the variable length and difficult to
| decode instruction format. It's fine to have tons of
| instructions if they are trivial to decode and decoding
| can be easily parallelized.
| dahfizz wrote:
| CISC == Complex Instruction Set, not Large Instruction
| Set. As you say, the issue with x86 is how complicated it
| is, not necessarily how large it is.
| tenebrisalietum wrote:
| CISC was not a mistake when RAM was hundreds of dollars
| per _kilobyte_ in the 70 's and early 80s. Made sense to
| get as much out of a byte of memory instruction-wise as
| possible.
| dragontamer wrote:
| That doesn't seem too hard to me frankly.
|
| Intel has shown that you can just store the sizes of
| instructions in a new cache (in their new architecture) for
| example.
|
| But even then: it shouldn't be much more than O(n) work /
| O(log(n)) depth to determine the length of an instruction.
| Once lengths are known, you can perform parallel
| instruction decoding rather easily.
|
| Ex: given "Instruction at X, X+1, X+5, and X+10", decode
| the instructions. Well, simple. Just shove a decoder at X,
| X+1, X+5, and X+10. 4 instructions, decoded in parallel.
|
| Even with "Dynamic" length (ex: X, X+4, X+6, and X+7
| afterwards), its clear how to process these 4 instructions
| in parallel. Really not a problem IMO.
|
| --------
|
| So solving the length thing is a bit harder to see, but
| clearly could be implemented as a parallel-regex (which is
| O(n) work and O(log(n)) depth).
|
| I seriously doubt that decoding is really a problem. I'd
| expect that Apple just has made many small efficiency gains
| across the chip, especially the uncore.
|
| I'm personally looking at the L1 / L2 cache hierarchies
| more so than anything on this Apple chip.
| Veliladon wrote:
| The problem is how do you know if your just decoded
| instruction isn't just the operands for another legal
| instruction?
| dragontamer wrote:
| When you get to instruction X, "size cache" says "size
| 2", and this allows you to process instruction X+2 in
| parallel.
|
| You look at instruction X+2, and "size cache" says "size
| 4", which allows you to look at instruction X+6 in
| parallel. Finally, you look at instruction X+10, and it
| says "size 8", which ends with +18 as where the
| instruction pointer ends.
|
| This was sequentially described at first, but the
| parallel version is called Prefix Sum:
| https://en.wikipedia.org/wiki/Prefix_sum . This allows
| you to take a set of sums (like say 0, 2, 4, 8) and in
| parallel, figure out [2, 6, 10, 18], with 18 being the
| new location of the instruction pointer, and [0, 2, 6,
| 10] being the 4 instructions you process this clock tick.
|
| A parallel adder across say, 32-bytes would be able to
| perform this prefix sum very quickly, probably within a
| clock tick. These sorts of parallel structures (aka:
| butterfly circuits) are extremely common in practice,
| your carry-lookahead adders need them, as well as PDEP /
| PEXT. Intel's single-cycle PDEP/PEXT is way more
| complicated than what I'm proposing here, I kid you not.
| (Seriously, the dude who decided to make single-clock
| cycle PDEP/PEXT or single-clock cycle AESRound would have
| spent more time than the size-cache that Intel is now
| using on instruction decoding)
| Veliladon wrote:
| The problem is that you don't know the instruction length
| until you've decoded it so where is the size cache
| getting the size?
| dragontamer wrote:
| > The problem is that you don't know the instruction
| length until you've decoded it so where is the size cache
| getting the size?
|
| You are aware that x86 _CURRENTLY_ (ie: without size-
| cache) decodes 4-instructions per clock tick, right?
| That's in parallel, as currently implemented.
|
| Intel just seems to think the size-cache is a potential
| solution for going faster. I've given it some thought and
| it seems like it could very well be worth the 4-bits (or
| so) per byte it'd cost.
|
| ----------
|
| A parallel size-calculator would also be O(n) work and
| O(log(n)) depth, by using a parallel regex/finite
| automata on calculating the sizes for arbitrary lengths
| upwards.
| Veliladon wrote:
| That's only second stage decoding. In the fetch/pre-
| decode stage there is a specific piece of hardware that
| partially decodes 16 byte chunks of instruction streams
| before inserting instructions into the instruction queue
| for the decode from macro-op to uop. It can only handle 6
| instructions per clock and 16 bytes per clock. If you
| have 7 instructions in a 16 byte block it takes two
| cycles to process that block. If you have only 2
| instructions in that 16 byte block you only get those two
| instructions. It also only looks for instruction length
| and branching to feed the branch predictor, spitting the
| same instructions back out albeit fused and tagged with
| index numbers for insertion into the instruction queue
| ready for the second stage decoders.
|
| This is the length that Intel has to go to in order to
| keep the EUs fed. Apple/ARM? Every 4 bytes there's an
| instruction.
| dragontamer wrote:
| I'm talking 1st stage decoding actually. I'm ignoring the
| uOp cache.
|
| The 2nd stage uop can go 6-per or 7-per clock tick. But
| 1st stage (which executes in practice when the uop cache
| is thrashing) would still go 4-instructions per clock
| tick just fine.
|
| > This is the length that Intel has to go to in order to
| keep the EUs fed.
|
| Yeah. And the task described is O(n) total work and
| O(log(n)) depth. So... not a big difference? I'd have my
| doubts that the instruction-length portion of the decoder
| was taking up a significant amount of power.
| rbanffy wrote:
| > That doesn't seem too hard to me frankly.
|
| Consider Intel has a more or less infinite amount of
| money and they don't seem to be able to do that. And they
| have tried (I even had an Atom-based Android phone for a
| while).
|
| If you want an easy way to build a reorder buffer, you'll
| need to push every instruction in a structure that fits,
| IIRC, 15 bytes, which is the longest x86 instruction
| possible (for now - mwahahaha). This alone will make it
| twice as large as a similar arm64 one. Now factor in that
| the dependencies between instructions are defined in bits
| that can pretty much be all over the place in those 15
| bytes and you end up with a nightmare most engineers
| would consider suicide before having to work on it.
| dragontamer wrote:
| Or maybe, the problem isn't as hard as you think it is.
|
| Look, I started programming in GPUs a year or two ago.
| I've begun to "think in parallel", and now I'm beginning
| to see all sorts of efficient patterns all over the
| place.
|
| The actual CPU-architects have known about kogge-stone
| carry-lookahead longer than I have. I'm still a newbie to
| this mindset of parallel computations... but I enjoy
| reading the papers on PDEP / PEXT / other parallel
| structures these CPU designers are doing (and these
| structures have gross implications to how GPU code should
| be structured).
|
| But I've had enough practice with Kogge-stone / Carry-
| lookahead / Prefix-sum / scan pattern (yeah, its __all__
| the same parallelism), and this pattern has been well
| published since the 1970s. I have to assume that
| engineers know about this stuff.
|
| Instruction length decoding is very clearly a kogge-stone
| pattern / prefix sum / scan problem to me. Now, I'm not a
| chip architect and maybe there's some weird fanout / chip
| level thing going on that my ignorance is keeping me out
| of... but... based on my understanding of parallel
| systems + very, very common patterns well known to that
| community, I'd expect that chip-designers would just
| Kogge-stone their way out of this decoding problem.
|
| -------
|
| Like, I'm coming in from the reverse here. I suddenly
| realized that chip-designers have incredibly active minds
| about the layout and structure of parallel computing
| mechanisms, and have now taken an interest in studying
| some CPU-level parallelism techniques to apply to my GPU
| code.
|
| The CPU-designers are way ahead of us in "parallel
| thinking". I'm a visitor to their subject, they do this
| stuff for breakfast every day. They have to see the
| Kogge-stone solution to the decoding problem. If not,
| they've thought of something better.
| api wrote:
| The words "just" and "cache" should never appear in the
| same sentence.
| dragontamer wrote:
| Why not?
|
| L1 $I cache is already read-only / Harvard architecture.
| There's only two states here: size == unknown, and size
| == known. This is a simple size = 0 (default), and size=X
| (where X is the known size of the instruction) situation.
|
| x86 architecture states that if you write to those
| instructions, you need to flush L1 cache (ex: JIT Java)
| before the state is updated. L1 instructions are non-
| cohesive, so it isn't very hard. Upon the flush, set
| sizes back to 0 and you're done.
| codedokode wrote:
| This complicated decoding makes a pipeline longer. This
| means that in case of branch misprediction there would be
| a large penalty.
| dragontamer wrote:
| > This complicated decoding makes a pipeline longer. This
| means that in case of branch misprediction there would be
| a large penalty.
|
| PDEP / PEXT are single-clock tick instructions and are
| far more complex than what I'm proposing here. As is
| AESRound.
|
| I think you're underestimating the number of gates you
| can put in parallel and execute in a single stage of the
| pipeline. 64-bit PDEP / PEXT are more complicated than
| say... a 64-byte parallel adder in terms of depth. (PDEP
| / PEXT need both a butterfly circuit forward + inverse
| butterfly back + a decoder in parallel. 64-byte prefix
| sum is just one butterfly forward).
| AnthonyMouse wrote:
| How much of the power budget is actually going to
| instruction decoding?
| intricatedetail wrote:
| Now all ARM and Apple need to do is to persuade governments
| to ban x86 as not being energy efficient. Just like they want
| to ban diesel cars etc. Could be a tough battle for Intel.
| [deleted]
| lmilcin wrote:
| > Power Behaviour: No Real TDP, but Wide Range
|
| Actually, TDP stands for "Thermal Design Power" and is not a
| range. It means "I, the designer, designed it so that this is
| maximum amount of waste heat as it can safely produce
| continuously in normal use". It is mainly limited by its physical
| package and maximum temperature at which the internal components
| can run.
|
| That you can't observe that max power is due to the fact that
| those various applications stress the CPU in various ways, not
| always being able to exercise all internal structures to their
| maximum potential, at the same time.
|
| > One should probably assume a 90% efficiency figure in the AC-
| to-DC conversion chain from 230V wall to 28V USB-C MagSafe to
| whatever the internal PMIC usage voltage of the device is.
|
| (This was regarding idle power usage)
|
| Highly unlikely. I design AC switching power supplies from first
| principles (and stacks of books). Efficiencies above 90% are
| normal for newer designs but the PSUs are designed to achieve
| these efficiencies above significant percentage of their design
| power. High efficiency at design power is important because it
| limits worst case waste heat which in turn makes it possible to
| create smaller PSU. But as PSU is a lot of tradeoffs, one of the
| tradeoffs that is taken is lesser efficiency at lower power where
| it doesn't matter as much.
|
| Typically, the lower load on the PSU as portion of design power
| the lower the efficiency. If the PSU is designed for 140 watt 90%
| efficiency, I would expect that at 7 watt it is actually much
| less efficient probably somewhere between 70 and 80 percent.
| lnxg33k1 wrote:
| I would really love to experiment with one of those CPUs, too bad
| Apple just really sucks as a company / ethics
| gjsman-1000 wrote:
| Except for, say, Purism and Framework and a few others, _every_
| company sucks with ethics.
|
| Except Purism even had the idea to make their own products
| which were based on free, open-source code, charge for them,
| and then give no attribution before apologizing after blowback.
|
| So even for those companies their Ethics are suspect.
| tcbawo wrote:
| Honest question: for someone whose priority is maintaining an
| ecosystem of general computing devices with freedom to run
| software of my choice (rather than privacy paranoia), what
| companies should I get behind and/or avoid? Between custom
| processors, comprehensive SoC, and secure boot, I'm a little
| afraid getting caged in over the long term with practically
| any offerings out there right now.
| rsanheim wrote:
| https://frame.work is getting good reviews and seems very
| reputable. Of course its just a laptop, so you have to
| figure out the rest of your computing world then.
| mrtranscendence wrote:
| > Except for, say, Purism and Framework and a few others,
| every company sucks with ethics.
|
| Basically. If I wanted to stay ethically pure in all my
| purchases I'd be in trouble, unfortunately. It's not like
| Apple is ethically "worse" than Google or Samsung or other
| major phone manufacturers.
| noizejoy wrote:
| Arguably approaching 100% ethical purity would also
| approach involve not being born in the first place.
|
| Related note: I think that leading large numbers of people
| as well as having very large numbers of customers also
| reduces your chances of doing well for all of them.
| masterof0 wrote:
| > Except Purism even had the idea to make their own products
| which were based on free, open-source code, charge for them
|
| And then charging 2000 dollars for a barely functional brick.
| I think the only ethical hardware companies I know would be
| System76, Framework, and Fairphone.
| throwaway4good wrote:
| Why would you say that? Of US big tech they seem to be the
| least evil.
| AlexCoventry wrote:
| I guess their on-phone CSAM scanner has fallen off the news
| cycle.
| marcellus23 wrote:
| Or maybe no one was ever that upset, beyond the small but
| vocal group of HN commenters who will take any opportunity
| to bash Apple anyway.
| aserdf wrote:
| since these new machines come pre-installed with Monterrey
| I guess CSAM scanning is present from first boot?
| artificialLimbs wrote:
| As I understand it, they delayed rolling out CSAM scanning
| on-device due to the backlash.
| rowanG077 wrote:
| A shame that thermal/power limitations aren't investigated. That
| is the most deciding factor for me getting a Pro or Max. And
| something Apple has historically had a lot of trouble with.
| GeekyBear wrote:
| >A shame that thermal/power limitations aren't investigated.
|
| It's covered in the comments, along with when the "crank up the
| fans" mode would be useful.
|
| >Any pure CPU or GPU workload doesn't come close to the thermal
| limits of the machine. And even a moderate mixed workload like
| Premiere Pro didn't benefit from High Power mode.
|
| It has a reason to exist, but that reason is close to rendering
| a video overnight - as in a very hard and very sustained total
| system workload.
|
| https://www.anandtech.com/comments/17024/apple-m1-max-perfor...
| perardi wrote:
| Did you even bother to read the article? They have an entire
| page on power consumption under various workloads.
|
| https://www.anandtech.com/show/17024/apple-m1-max-performanc...
| rowanG077 wrote:
| Can you please quote the part which discusses power and
| thermal limitations backed up with tests because I don't see
| any of it on that page.
|
| As far as I read it these test only report package power and
| wall power used under certain loads. It doesn't say anything
| about any limitations. No long term tests or temperature
| graphs. No information at what temperature throttling kicks
| in. Is CPU temperature truly the only limiting factor or are
| the VRMs also a pain point? I could go on but I think this
| illustrates enough of what I want to see.
| GeekyBear wrote:
| Their stories span multiple pages, but for some reason
| people on mobile frequently seem to miss the other pages.
|
| Here's the whole story on one page.
|
| https://www.anandtech.com/print/17024/apple-m1-max-
| performan...
| rowanG077 wrote:
| Yes I read everything. There is no concrete information
| or tests about thermal or power limit throttling.
| GeekyBear wrote:
| Here you go:
|
| >https://www.anandtech.com/comments/17024/apple-m1-max-
| perfor...
| rowanG077 wrote:
| Someone saying without any data to back it up doesn't
| exactly inspire confidence. Like I said nothing concrete.
|
| > Any pure CPU or GPU workload doesn't come close to the
| thermal limits of the machine.
|
| So does that mean it is thermally limited on a CPU + GPU
| workload? What about a CPU + GPU + Media engine workload.
| What about using the NPU? Does SSD load have an impact?
| SSDs nowadays can consume 15+W of power. dozens of
| questions, unanswered by such a short sentence. Please
| just test it out and give us the data.
| GeekyBear wrote:
| >Someone saying without any data to back it up doesn't
| exactly inspire confidence.
|
| That "someone" is the editor in chief for Anandtech.
|
| He says you need a workload that stresses both the CPU
| and GPU as much as possible and let that run overnight
| before the ability to crank the fans higher than normal
| is handy.
| rowanG077 wrote:
| > That "someone" is the editor in chief for Anandtech.
|
| I don't see how that is relevant.
|
| > He says you need a workload that stresses both the CPU
| and GPU as much as possible and let that run overnight
| before the ability to crank the fans higher than normal
| is handy.
|
| Then why is there no data? This is much more interesting
| then what some of the tests done in the report.
| ksec wrote:
| What you are looking for is a Laptop Review, how M1 Pro /
| Max under the MacBook Pro 14 / 16 cooling works with
| maximum possible TDP and cooling.
|
| But this isn't a Laptop Review, it is an SoC review. My
| guess Dave2D will look into these sort of thing since he
| cares about it before _anyone_ on the internet actually
| test these sort of things for Laptop. ( Possibly due to
| various PR restrictions )
| neogodless wrote:
| If I'm understanding you correctly, you're thinking of previous
| issues with thermals and throttling, but this has been an issue
| over the past several years due to Intel falling behind AMD and
| TSMC, and thus driving power through their chips in order to
| stay competitive, but that generates heat, and ultimately ends
| up triggering throttling.
|
| If you read about these particular chips, it should be
| startlingly clear that they are much more efficient than the
| Intel chips they replace.
|
| In this article:
|
| > Apple doesn't advertise any TDP for the chips of the devices
| - it's our understanding that simply doesn't exist, and the
| only limitation to the power draw of the chips and laptops are
| simply thermals. As long as temperature is kept in check, the
| silicon will not throttle or not limit itself in terms of power
| draw.
|
| > The perf/W differences here are 4-6x in favour of the M1 Max,
| all whilst posting significantly better performance
|
| Read page 3 of this article. They really do cover a lot of
| this.
| rowanG077 wrote:
| It is much more efficient. It's also more powerful. You can
| see in one of their benchmarks they hit 90+W of package
| power. I doubt it can sustain this. The "turbo-mode" Apple
| has announced for the 16-inch version also indicates that it
| will be significantly thermally limited.
| neogodless wrote:
| In my Lenovo Legion 5, in performance mode, the CPU is
| configured to draw up to 70W, and the GPU is configured to
| draw up to 115W. It's able to do this just fine for gaming
| sessions. Yes, the fans are quite audible while doing this.
| I think in contrast, having to handle about half that power
| draw overall should be attainable. For sure, it's now a
| very large SoC, so the heat might be a bit more
| concentrated and require some engineering to cool. But it
| doesn't seem like it should be a showstopping concern. Of
| course, you can wait for additional reviews and see if
| anyone addresses longer, more sustained load testing.
| rowanG077 wrote:
| I don't doubt it's physically possible. I doubt Apple
| implemented it. So I rather wait till there is some hard
| data. Apple could also have implemented proper cooling
| for their Intel laptops but they didn't.
| neogodless wrote:
| That's perfectly understandable. It's a major purchase
| decision.
|
| But also, I'm not aware of anyone that implemented
| "proper cooling" sufficient to handle the last few
| generations of Intel chips (at least at the high end.) I
| read reviews of a variety of machines. All of them had
| issues with throttling.
|
| I was so happy when the Ryzen 4000 mobile chips were
| reviewed and did not require elaborate cooling systems
| just to perform their regular duties. I would be shocked
| if the 2021 Macbook Pro 14/16" have issues with thermal
| throttling.
| holmium wrote:
| I haven't seen too many benchmarks yet, but Dave2D ran a
| "Cinebench R23" test in a both a single benchmark and
| thirty minute loop. [1] He saw that the score remained
| the same after the 30 minutes.
|
| He also reported that the loudest fan noise he could get
| was 38dB, with typical loads under 30dB. [2]
|
| -----
|
| [1] - https://youtu.be/IhqCC70ZfDM?t=360
|
| [2] - https://youtu.be/IhqCC70ZfDM?t=438
| capableweb wrote:
| > As long as temperature is kept in check, the silicon will
| not throttle or not limit itself in terms of power draw.
|
| Apple laptops has, for as long as I know, had issues with
| just thermals. Sometimes they get so hot you can't even have
| them in your lap, so they are just "tops" at that point.
|
| Has this issue been solved with these new models?
| zepto wrote:
| Yes
| friedman23 wrote:
| The m1 macs do not have this problem, they draw a fraction
| of the power of the old laptops even when boosting.
| MikusR wrote:
| It was solved with M1 last year
| Leherenn wrote:
| If people are wondering about why some people in the comments are
| reporting 3080 vs 3060 levels of performance, it's based on the
| workload. On synthetic (native I assume) benchmarks, the M1 Max
| reaches 3080 levels, but in gaming benchmarks (using x86) it
| reaches 3060 levels.
| Thaxll wrote:
| I don't think you've read the comments:
|
| However gaming is a poorer experience, as the Macs aren't
| catching up with the top chips in either of our games.
|
| It's far far from a 3060 for gaming.
| mrtranscendence wrote:
| I haven't read _all_ the comments, but one of the first
| comments that shows up says this:
|
| > When it comes to the actual Gaming Performance, the M1X is
| slightly slower than the RTX-3060.
|
| Edit: also, the 3060 isn't a "top chip", particularly on a
| laptop.
| marricks wrote:
| It's interesting especially since it sounds like the reason it
| doesn't reach "close to 3080" in many games is because it's CPU
| bound, specifically because it's emulating x86.
|
| Once we get more benchmarks with non-rosetta apps the picture
| may be rosier? That said, it's not like Apple was every the
| company for gaming machines so perhaps that will just be the
| state of things.
| lowbloodsugar wrote:
| TFA also compares games at 4k, where it is very much GPU
| bound, and it is about half the speed of a laptop 6800. Which
| is not great. (And I am speaking as someone whose M1 Max
| arrives tomorrow).
|
| The M1 GPU is vastly different than an AMD or nVidia GPU, and
| I suspect it will have not-great scores until someone writes
| a game and optimizes it specifically for the M1. Which is
| most likely never.
| xoa wrote:
| > _and I suspect it will have not-great scores until
| someone writes a game and optimizes it specifically for the
| M1. Which is most likely never._
|
| Don't forget that these days few demanding games are
| "optimized for Platform XYZ", they're generally using one a
| small set of middleware/engine they license. So it's more a
| question of if Unreal Engine and Unity and so on get
| optimizations aimed at Apple's M-series chips. That isn't
| out of the question at all, given that they definitely have
| optimization aimed at Apple's A-series chips. Once they do,
| everything going forward that uses them will be better "for
| free". Even if they don't hit the performance of something
| truly hand tuned just to make max use of the arch it won't
| be entirely ignored either. May not even be that much work.
|
| That's another potential non-technical performance
| advantage in moving the Mac from x86, they get to piggyback
| in some areas off the enormously higher market share
| iDevices. We'll see how it works out of course.
| GeekyBear wrote:
| The money quote from testing vs Intel's 11980HK:
|
| The perf/W differences here are 4-6x in favour of the M1 Max, all
| whilst posting significantly better performance, meaning the
| perf/W at ISO-perf would be even higher than this.
|
| and
|
| >On the GPU side, the GE76 Raider comes with a GTX 3080 mobile.
| On Aztec High, this uses a total of 200W power for 266fps, while
| the M1 Max beats it at 307fps with just 70W wall active power.
|
| https://www.anandtech.com/print/17024/apple-m1-max-performan...
| AnthonyMouse wrote:
| The sad thing is that what you really want to compare is how
| their GPU is doing against nVidia, but then they pair it with
| Intel's CPU which is known to have very poor power efficiency
| vs. AMD.
| dragontamer wrote:
| 400GB/s is very high for a CPU bandwidth, but is less than of
| NVidia's GTX 3080 760GB/s bandwidth. Assuming you care about
| 32-bit of course.
|
| I don't expect the M1 Pro to have very good double-precision
| GPU-speeds.
| defaultname wrote:
| It's pretty remarkable that now we're not only comparing
| Apple's SoC to the best CPUs from dedicated makers, we're
| comparing it to the best GPUs.
|
| Could you qualify what you mean regarding double precision,
| though? nvidia consumer GPUs have pretty terrible double
| precision (usually in the range of 1/64th single
| precision). And FWIW, the normal cores in the M1 (Max|Pro)
| have fantastic double precision performance, and comprise
| the bulk of the SPECfp dominance.
| dragontamer wrote:
| > It's pretty remarkable that now we're not only
| comparing Apple's SoC to the best CPUs from dedicated
| makers, we're comparing it to the best GPUs.
|
| Is it? Apple has 5nm on lockdown right now. Process is
| nearly everything in performance/watt.
|
| If you want to compare architectures, you compare it on
| the same process. 5nm vs 5nm is only fair. 5nm vs 7nm is
| going to be 2x more power efficient from a process level.
|
| When every transistor uses 1/2 the power at the same
| speed, of course you're going to have a performance/watt
| advantage. That's almost... not a surprise at all. It is
| this process advantage that Intel wielded for so long
| over its rivals.
|
| Now that TSMC owns the process advantage, and now that
| Apple is the only one rich enough to get "first dibs" on
| the leading node, its no small surprise to me that Apple
| has the most power efficient chips. If anything, it shows
| off how efficient the 7nm designs are that they can
| compete against a 5nm design.
| GeekyBear wrote:
| > Process is nearly everything in performance/watt.
|
| Not really. Apple's A15 and A14 phone chips are on the
| same process node.
|
| >Apple A15 performance cores are extremely impressive
| here - usually increases in performance always come with
| some sort of deficit in efficiency, or at least flat
| efficiency. Apple here instead has managed to reduce
| power whilst increasing performance, meaning energy
| efficiency is improved by 17%
|
| The efficiency cores of the A15 have also seen massive
| gains, this time around with Apple mostly investing them
| back into performance, with the new cores showcasing
| +23-28% absolute performance improvements
|
| https://www.anandtech.com/print/16983/the-apple-a15-soc-
| perf...
| dragontamer wrote:
| > Not really. Apple's A15 and A14 phone chips are on the
| same process node.
|
| Yeah, you're talking about 20% performance changes on the
| same node.
|
| Meanwhile, advancing a process from 7nm to 5nm TSMC is
| something like 45% better density (aka: 45% more
| transistors per mm^2) and 50% to 100% better power-
| efficiency at the same performance levels, and closer to
| the 100%-side of power-efficiency if you're focusing on
| idle / near-zero-GHz side of performance. (Pushing to
| 3GHz is less power of a power difference, but lower idles
| do have a sizable contribution in practice)
|
| -----
|
| Oh right: and TSMC N5P is 10% less power and 5% speed
| improvement over TSMC N5 (aka: what TSMC figured out in a
| year). There's the bulk of your 17% difference from A15
| and A14.
|
| Yeah, process matters. A LOT.
| musicale wrote:
| Are you saying that if another company, say AMD, had
| access to TSMC's 5nm process than it would easily achieve
| comparable performance/watt to what Apple has done with
| the M1 series?
| dragontamer wrote:
| I'm saying that 15.5% of the 17% difference from Apple
| A14 to Apple A15 is accounted for in the TSMC N5 to TSMC
| N5p upgrade (Aka: 10% fewer watts at 5% higher clock
| rates).
|
| The bulk of efficiency gains has been, and for the
| foreseeable future will be, the efficiency of the
| underlying manufacturing process itself.
|
| There's still a difference in efficiency above-and-beyond
| the 15.5% from the A14 to the A15. But its a small
| fraction of what the __process__ has given.
|
| ---------
|
| Traditionally, AMD never was known for very efficient
| designs. AMD is more well known for "libraries", and more
| a plug-and-play style of chip-making. AMD often can
| switch to different nodes faster and play around with
| modular parts (see the Zen "chiplets"). I'd expect AMD to
| come around with some kind of chiplet strategy (or
| something along those lines) before I expect them in
| particular to take the efficiency crown.
|
| NVidia probably would be better at getting high-
| efficiency designs. They're on a weaker 8nm Samsung
| process yet still have extremely good power/efficiency
| curves.
|
| I like AMD's chiplet strategy though as a business, and
| as a customer. Its a bit of a softer benefit, and AMD
| clearly has made the "Infinity Fabric" more efficient
| than anyone expected it could get.
| rbanffy wrote:
| > Process is nearly everything in performance/watt.
|
| ARM has consistently beat x86 in performance/watt at
| larger node sizes since the beginning. The first
| Archimedes had better floating point performance without
| a dedicated FPU than the then market-leading Compaq 386
| WITH an 80387 FPU.
|
| A lot of the extra performance of the M1 family has
| nothing to do with node, but with the fact the ARM ISA is
| much more amenable to a lot of optimizations that allow
| these chips to have surreally large reordering buffer,
| which, in turn, keep more of the execution ports busy at
| any given time, resulting in a very high ICP. Less
| silicon used to deal with a complicated ISA also leaves
| more space for caches, which are easier to manage
| (remember the more regular instructions), putting less
| stress on the main memory bus (which is insanely wide
| here, BTW). On top of that, the M1 family has some
| instructions that help make JavaScript code faster.
|
| So, assume that Intel and AMD, when they get 5nm designs,
| will have to use more threads and cores to extract the
| same level of parallelism that the M1 does with an arm
| (no pun intended) tied behind its back.
| dragontamer wrote:
| > optimizations that allow these chips to have surreally
| large reordering buffer
|
| But only Apple's chip has a large reordering buffer. ARM
| Neoverse V1 / N1 / N2 don't have it, no one else is doing
| it.
|
| Apple made a bet and went very wide. I'm not 100% sure if
| that bet is worth the tradeoffs. I'm certain that if
| other companies thought that a larger reordering buffer
| was useful, they'd have done it.
|
| I'll give credit to Apple for deciding that width still
| had places to grow. But its a very weird design. Despite
| all that width, Apple CPUs don't have SMT, so I'd expect
| that a lot of the performance is "wasted" with idle
| pipelines, and that SMT would really help out the design.
|
| Like, who makes an 8-wide chip that supports only 1
| thread? Apple but... no one else. IBM's 8-wide decode is
| on a SMT4 chip (4-threads per core).
| rbanffy wrote:
| SMT is a good way to extract parallelism when your ISA
| makes it more difficult to do (with speculative
| execution/register renaming). ARM, it seems, makes it
| easier to the point I don't think any ARM CPU has been
| using multiple threads per core.
|
| I would expect POWER to be more amenable to it, but x86
| borrows heavily from the 8085 ISA and was designed at a
| time the best IPC you could hope to get was 1.
| defaultname wrote:
| > Apple has 5nm on lockdown right now
|
| Qualcomm has loads of 5nm chips. They're pretty solidly
| beaten by Apple's entrants, but they've been using them
| for over a year now. Huawei, Marvell, Samsung and others
| have 5nm products too.
|
| This notion that Apple just bullied everyone out of 5nm
| is not backed by fact. For that matter, Apple's
| efficiency holds even at the same node.
|
| There is this weird thing where some demand that we put
| an asterisk on everything Apple does. I remember the
| whole "sure it's faster but that's just because of a big
| cache" (as if that negated the whole faster / more
| efficient thing, or as if competing makers were somehow
| forbidden from using larger caches so it was all so
| unfair). Now it's all waved away as just a node
| advantage, when any analysis at all reveals that to be
| nonsensical.
| dragontamer wrote:
| > Qualcomm has loads of 5nm chips.
|
| I think we all know that TSMC 5nm is quite a bit better
| than Samsung 5nm.
|
| Samsung is "budget" 5nm. It ain't as good as the best-of-
| the-best that Apple is buying here.
| laserlight wrote:
| > Process is nearly everything in performance/watt.
|
| > TSMC 5nm is quite a bit better than Samsung 5nm.
|
| These two statements conflict.
| dragontamer wrote:
| > These two statements conflict.
|
| TSMC 5nm is not the same process as Samsung 5nm though?
|
| All the processes are the company's secret sauce. They
| aren't sharing the details. Ultimately, Samsung comes out
| and says "5nm technology", but that doesn't mean its
| necessarily competitive with TSMC 5nm.
|
| Indeed, Intel 10nm is somewhat competitive against TSMC
| 7nm. The specific "nm" is largely a marketing thing at
| this point... and Intel is going through a rebranding
| effort. (Don't get me wrong: Intel is still far behind
| because it tripped up in 2016. But the Intel 14nm process
| was the best-in-the-world at that timeframe)
| codedokode wrote:
| You can compare transistor count per mm^2 instead of
| nanometers.
| dragontamer wrote:
| But you compare power-efficiency by how efficient each
| transistor is.
|
| TSMC N5p is 10% more efficient and 5% higher clocks than
| TSMC N5. The same 5nm __BY THE SAME COMPANY__ can change
| 15.5% in just a year, as manufacturing issues are figured
| out.
|
| Making every transistor 10% less power and 5% more GHz
| across the entire chip, while keeping the same size, is a
| huge bonus that cannot be ignored. I don't know what
| magic these chip engineers are doing, but they're surely
| spending some supercomputer on brute forcing all sorts of
| shapes/sizes of transistors to find the best
| density/power/speed tradeoffs per transistor.
|
| This is part of the reason why Intel stuck with 14nm for
| so long. 14nm+++++++ kept increasing clock speeds,
| yields, and power efficiency (but not density), so it
| never really was "worth it" for Intel to switch to 10nm
| (which Intel had some customer silicon tapped out for
| years, but only at low clock speeds IIRC).
|
| It isn't until recently that Intel seems to have figured
| out the clock speed issue and has begun offering
| mainstream chips at 10nm.
| ac29 wrote:
| > This notion that Apple just bullied everyone out of 5nm
| is not backed by fact.
|
| In the context of laptops, its true. Neither Intel or AMD
| has chips being built on TSMC N5 or a comparable process.
| AMD is on TSMC N7, and Intel is currently on their own 10
| nm process, moving to "Intel 7" with Alder Lake which is
| getting formally introduced in 2 days.
| defaultname wrote:
| "In the context of laptops, its true"
|
| Intel wasn't in competition for TSMC's processes at all,
| and AMD was in absolutely no hurry to 5nm (especially
| given that they were targeting cost effectiveness). The
| fact that Apple readied a 5nm design, and decided that it
| was worth it for their customers, in no way indicates
| that they "bullied" to the front.
|
| Quite contrary, for years Intel made their mobile/ "low
| power" parts on some of their older processes. It was a
| low profit part for them and they saved the best for
| their high end Xeons and so on (where the process benefit
| was entirely spent on speed -- note that there is a lot
| of BS about the benefit of process nodes where people
| claim ridiculous benefits when in reality you can have a
| small efficiency improvement, or a small performance
| improvement, but not both. The biggest _real_ benefit is
| that you can pack more on a given silicon space, in Apple
| 's case loads of cores a fat GPU, big caches, etc). If
| Apple upset their business model, well tough beans for
| them.
|
| As an aside, note that the other initial customer of 5nm
| was HiSilicon (a subsidiary of Huawei) with the Kirin
| 9000. That's a pretty sad day when AMD and Intel are
| supposedly sad also-rans to Huawei. Or, more reality
| based, they simply weren't even in competition for that
| space, had zero 5nm designs ready, and didn't prioritize
| the process.
| rbanffy wrote:
| Well... Intel not having 5nm is entirely Intel's fault.
| They used process to their advantage and, well, when they
| messed up their process cadence, the advantage
| evaporated.
|
| AMD could, but they seem to be very happy where they are.
| They also have to decide on which fronts they want to
| outcompete Intel and, it seems, process isn't one of
| them.
| saberience wrote:
| > I don't expect the M1 Pro to have very good double-
| precision GPU-speeds.
|
| Compared to what? There are no laptops quite like these new
| Apple laptops. Anything with faster graphics also uses
| LOADS more power and runs WAY hotter.
| dragontamer wrote:
| > Compared to what? There are no laptops quite like these
| new Apple laptops. Anything with faster graphics also
| uses LOADS more power and runs WAY hotter.
|
| Using 2x the power for 2x the bandwidth (on top of
| significantly more compute power) is a good tradeoff,
| when the NVidia chip is 8nm Samsung vs Apple's 5nm TSMC.
|
| In any case, the actual video game performance is much
| much worse on the M1 Pro. The benchmarks show that the
| chip has potential, but games need to come to the system
| first before Apple can decidedly claim a victory.
| GeekyBear wrote:
| > the actual video game performance is much much worse on
| the M1 Pro
|
| Well, no. The emulated x86 gaming performance is.
|
| They didn't test a game with a native version.
| dragontamer wrote:
| > They didn't test a game with a native version.
|
| If the native version doesn't exist then... gamers don't
| care?
|
| Gotta get those games ported over
| rbanffy wrote:
| > If the native version doesn't exist then... gamers
| don't care?
|
| I don't think it's a fair assessment of the machine
| capabilities. Also, games WILL be ported to the platform
| AND if you really need your games running at full speed,
| you can keep the current computer and postpone the
| purchase of your Mac until the games you need are
| available.
| dragontamer wrote:
| No.
|
| Next-generation games will be made on the platform.
| Current-generation and last-generation games no longer
| have much support / developers, and no sane company will
| spend precious developer time porting over a year-old or
| 5-year-old game to a new platform in the hopes of a slim
| set of sales. (Except maybe Skyrim. Apparently those
| ports keep making money)
|
| Your typical game studio doesn't work on Skyrim though.
| They put in a bunch of developer work into a game, then
| by the time the game is released, all the developers are
| on a new project.
| GeekyBear wrote:
| Have you seen how terrible the x86 emulated performance
| is on a Surface Pro X?
|
| https://www.youtube.com/watch?v=OhESSZIXvCA
| dragontamer wrote:
| And that's why gamers are buying the Surface Book
| instead?
|
| The "gamer" community (or really, community-of-
| communities) only cares if their particular game runs
| quickly on a particular platform.
|
| Gamers don't really care about the advanced technology
| details, aside from the underlying "which system will run
| my game faster, with higher-quality images" (4k /
| raytracing / etc. etc.)?
| GeekyBear wrote:
| No, that's why having x86 emulation performance be this
| good is a minor miracle.
|
| Native performance would be expected to be inline with
| what the benchmarks are showing.
|
| The MacBook Pro Max would beat the 100 watt mobile
| variant of the 3080, especially if you unplug both
| laptops from the wall where the 3080 has to throttle down
| and the MacBook does not.
| dragontamer wrote:
| > No, that's why having x86 emulation performance be this
| good is a minor miracle.
|
| No gamer is going to pay $3000+ for a laptop with
| emulation when $2000+ gamer laptops are faster at the
| task (aka: video games are faster on the $2000 laptop).
|
| ------
|
| Look, gamers don't care about all games. They only care
| about that one or two games that they play. If you want
| to attract Call of Duty players, you need to port Call-
| of-Duty over to the Mac, native, so that the game
| actually runs faster on the system.
|
| It doesn't need to be an all-or-nothing deal. Emulation
| is probably good enough for casuals / non-gamers who
| maybe put in 20 hours or less into any particular game.
| But anyone putting 100-hours or more into a game will
| probably want the better experience.
| GeekyBear wrote:
| > No gamer is going to pay $3000+ for a laptop with
| emulation
|
| They pay $3000 for a laptop whose fans hit 55 decibels at
| load and that has to throttle way down slower than the
| MacBook if you use it like a laptop and go somewhere
| without a power outlet.
|
| https://www.anandtech.com/show/16928/the-msi-ge76-raider-
| rev...
| dragontamer wrote:
| The Mac doesn't even do raytracing, does it? So you're
| already looking at a sizable quality downgrade over AMD,
| NVidia, PS5, and XBox Series X.
|
| I think the eSports gamers will prefer FPS over graphical
| fidelity, so maybe that's the target audience for this
| chip ironically.
|
| But adventure gamers who want to explore raytraced worlds
| / prettier games will prefer the cards with raytracing,
| better shadows, etc. etc. (See the Minecraft RTX demo for
| instance: https://www.youtube.com/watch?v=1bb7wKIHpgY)
| pjmlp wrote:
| It does,
|
| https://developer.apple.com/videos/play/wwdc2021/10149/
|
| https://developer.apple.com/videos/play/wwdc2021/10150/
| dragontamer wrote:
| Look, my Vega64 raytraces all the time when I hit the
| "Render" button on Blender.
|
| But video-game raytracing is about hardware-dedicated
| raytracing units. Software (even GPU-software rendering)
| is an order of magnitude slower. Its still useful to
| implement, but what PS5 / XBox Series X / AMD / NVidia
| has implemented are specific raytracing cores (or in
| AMD's case: raytracing instructions) to traverse a BVH-
| tree and accelerate the raytracing process.
|
| "Can do Raytracing" or "Has an API for GPU-software that
| does raytracing" is just not the same as "we built a
| raytracing core into this new GPU". I'm sure Apple is
| working on their raytracing cores but I haven't seen
| anything yet that suggests that its ready yet.
| rbanffy wrote:
| > the actual video game performance is much much worse on
| the M1 Pro
|
| This is a workstation. For games one should look for a
| Playstation ;-)
|
| 2x power also means half the battery life. Remember this
| is a portable computer that's thin and light beyond what
| would be reasonable considering its performance. Also,
| remember the GPU has full 400GBps access to all of the
| RAM, which means models of up to 64GB won't need to pass
| over the PCIe bus.
| kcb wrote:
| GFXBench is meaningless.
| dsr_ wrote:
| A pointer to an article on why it is meaningless is better
| than the raw assertion.
| ac29 wrote:
| This article shows why - actual performance in games isn't
| great. Yes, its partially held back by x86->ARM
| translation, but if you are a gamer, this isnt a
| particularly compelling system.
| Aissen wrote:
| This part of the article was particularly cringe. People
| don't seem to realize how much of a tech prowess it is
| that x86 builds of those two games are even _running at
| an acceptable framerate_.
|
| That said, it's been a year since the M1 release, and
| Apple could have paid a few hundreds/millions for a few
| AAA game ports. They didn't, and that says how much they
| care about gaming on these devices.
| GeekyBear wrote:
| >This part of the article was particularly cringe. People
| don't seem to realize how much of a tech prowess it is
| that x86 builds of those two games are even running at an
| acceptable framerate.
|
| Spoken like someone who has never seen how poorly
| Microsoft's x86 to ARM emulation works.
|
| https://www.youtube.com/watch?v=OhESSZIXvCA
|
| They had difficulty getting programs to work at all, much
| less have acceptable performance under emulation.
| rbanffy wrote:
| > Spoken like someone who has never seen how poorly
| Microsoft's x86 to ARM emulation works.
|
| Sometimes I wonder why they are still a software company.
| I really like their keyboards and mice. Their software,
| not so much.
| musicale wrote:
| > that says how much they care about gaming on these
| devices
|
| Apple's bread and butter is iOS gaming (where it takes in
| more game profit than Sony, Microsoft, Nintendo and
| Activision combined) rather than Windows PC game ports to
| macOS.
| throwaway946513 wrote:
| Reason for it not being a compelling system aren't
| inherently obvious to most people:
|
| Games are developed currently to run on x86 Windows
| machines. Predominantly Windows machines. If the game is
| designed to run on MacOS, then more likely you'll see a
| performant experience. The issue isn't inherently
| architecture or Apple's chips, it's the lack of software
| choices available on the platform. Though you can now
| argue that Apple's platform has increased significantly
| due to the compatibility with the App Store bringing
| games from mobile to the desktop.
|
| Gamers would like to play the games they've bought (or
| free to play that they've dedicated time to) on platforms
| those games support, and most of those games do not
| support MacOS, or Linux. However, with Proton emulation,
| and EasyAC & BattlEye now working with Valve to improve
| Anti-Cheat on Linux, we may see a greater compatibility
| with the aforementioned systems enabling cross-platform
| play.
| musicale wrote:
| > free to play
|
| If only you could run iOS games on the M1 Mac, you'd be
| all set...
| sz4kerto wrote:
| Looks like it doesn't worth buying the Max if you don't need a
| powerful GPU.
| sylens wrote:
| or if you want to have 3-4 monitors
| joshstrange wrote:
| Yep, this is what pushed me to the Max. I think the GPU is
| going to be overkill for my other needs (though I'm looking
| forward to trying some gaming) but first and foremost I
| needed support for my monitors (the only thing that kept me
| from the first M1s).
| dmix wrote:
| Im curious, what kind of monitor set up needs that kind of
| power?
| joshstrange wrote:
| 3x2K (2560x1440) and 1x1080p. I have my 3 2K monitors in
| a "tie-fighter" formation which really just means 1 2K in
| landscape and the other 2 2Ks are portrait mode on either
| side of the middle 2K. Then I have my 1080p monitor just
| on top of my middle 2K. The 1080 is really just used for
| my Home Assistant (home automation software) dashboard
| and occasionally reference material (Adobe XD/Figma-type
| stuff).
|
| I spit my 2 portrait monitors into 3rds and have hotkeys
| to resize windows to snap them to one of the 3
| "positions" (also hot keys to expand them to 2/3rd for
| things like my code editor).
|
| This setup works really for me personally and at some
| point I'll update to 4K+ but I just couldn't afford it
| when I first set this all up.
|
| My normal usage is (`+` separates 3rds, `/` represents 1
| of these apps in in this "slot"):
|
| Left 2K: Discord + DataGrip/Android Studio/Drafts + Slack
|
| Middle 2K: Chrome w/ dev tools right-docked
|
| Right 2K: iMessage/Spotify + IDEA (2/3rds)
|
| Top 1080p: Home Assistant/Adobe XD/Figma
|
| Here is a (rough) ascii art of my setup:
| +----------------+ |
| | +------------+ | |
| +------------+ | | |
| | | | | | |
| | | | | |
| +----------------+ | | |
| | +-+----------------+-+ | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | +--------------------+ | | |
| | | | |
| | | | |
| | | |
| +------------+ +------------+
|
| EDIT: Just to note, I know I'm not going to be scratching
| the surface of what's possible with the M1 Max monitor-
| wise since it can do 3x6K (Pro Display XDR) + 1x4K
| gigatexal wrote:
| Pics?! I would love to see this setup. I am so jelly.
| I've just got a single 32inch 4k dell and a U2719DC on
| the right.
| joshstrange wrote:
| Ok, here is a picture [0] but don't make fun of my LEDs
| lol. It's one of the few pictures I have of a semi-clean
| desk so I'm going to run with it.
|
| [0] https://imgur.com/ILTcpG0
|
| EDIT: Because I forgot to mention and my above post isn't
| editable: My 3 2K monitors are all 27" and my 1080p is
| 24"
| _ph_ wrote:
| That definitely looks cool - including your LEDs. Not
| sure I would want to work in that configuration (my neck
| would get hurt), but it definitely looks great :)
| joshstrange wrote:
| > That definitely looks cool - including your LEDs.
|
| Thank you!
|
| > Not sure I would want to work in that configuration (my
| neck would get hurt)
|
| Up and down issues or side to side? I tried all 3 2K
| monitors in landscape when I first got them but that was
| way too much side to side movement to get to the edges of
| the far left/right screen so I begrudgingly switched them
| to portrait. I had always thought portrait mode was silly
| for a monitor but I absolutely love it now and I'm super
| happy with the setup. I don't have to move my head very
| much at all to scan across my setup and normally I am
| pretty focused on just 2 monitors (center for Chrome and
| right for IDEA) so there isn't a ton of movement. As for
| up/down movement I really only ever move my head to look
| at the top 1080p screen which is fine since I don't use
| it regularly.
| sliken wrote:
| Or 64GB of ram.
| hajile wrote:
| Docker uses almost 6gb on my M1 Air. Even with 16GB, I have
| to be careful to avoid swapping (which trashes your SSD).
|
| I'm looking at buying a machine for higher RAM. 64GB is a
| pretty good deal (look at desktop DDR5 prices). If I spend 4k
| on a machine, I plan on using it for at least 3-4 years
| before I upgrade. 64GB seems to make a lot of sense if you
| can stomach the price of the M1 Max.
| kabes wrote:
| I don't need a powerful GPU, but I do need the additional RAM
| the max supports..
| sudhirj wrote:
| Yeah, the Max has exactly the same CPU as the Pro. Only reason
| I picked it is because I wanted the 32GB RAM, and only the Max
| has that by default -- customised orders take a long time
| deliver in India.
| glhaynes wrote:
| Believe the Max also has twice the memory bandwidth as the
| Pro.
|
| EDIT, upon reading the bottom part of the first page of the
| linked article: And double the cache, more cores on the
| Neural Engine, and maybe doubled media processing, assuming
| I'm reading all of that correctly.
| crateless wrote:
| Now that Apple has taken the lead in performance/battery life
| tradeoff, are there any machines which come close to the M1 for
| software dev? Specifically, compiling Rust, Android development
| etc. without giving up too much on battery life?
|
| Also, the last time I checked, CPUs were reporting high
| performance but only under light load. Has the whole throttling
| situation changed or should I just expect to get 2 hours battery
| life in exchange for extreme CPU performance?
|
| Edit: I should have specified machines that can run Linux.
| ac29 wrote:
| Today? Probably not.
|
| Intel's Alder Lake is moving to a Performance + Efficiency core
| setup, which should help overall with battery life. But they
| are still behind on manufacturing process (Alder Lake is "Intel
| 7", supposedly roughly comparable to TSMC N7), so Apple will
| quite likely maintain their lead in power consumption.
|
| Alder Lake is getting announced in 2 days, but rumors have it
| as a desktop-first product launch, so laptops may be another
| quarter or two out.
| raydev wrote:
| Panzerino tested WebKit compiles on the first run of M1
| machines last year, and it seems like the battery held up
| really well on those.
|
| > After a single build of WebKit, the M1 MacBook Pro had a
| massive 91% of its battery left. I tried multiple tests here
| and I could have easily run a full build of WebKit 8-9 times on
| one charge of the M1 MacBook's battery.
|
| https://techcrunch.com/2020/11/17/yeah-apples-m1-macbook-pro...
|
| I'm looking forward to the MBP compile benchmarks.
| wayneftw wrote:
| > ...any machines which come close to the M1 for software dev?
|
| If you can stand using macOS, that is.
|
| Personally, I'll continue using Linux because that's where all
| my software gets deployed to and macOS simply can't approach
| the value of that or the value of open source. On a Mac, you'll
| be fighting the OS the whole time.
|
| If speed was all that mattered, Mac users would have left Apple
| a long time ago because this is the first time they're faster
| than a PC.
| OldHand2018 wrote:
| It's been a while since I bought a "Pro" computer from Apple. I
| am kind of wondering about the perf-per-$$$ factor. With a
| starting price of $2000, these are expensive computers. But maybe
| they are worth it!
|
| The M1 computers seemed like an absolute bargain for the
| performance.
| pohl wrote:
| Years ago I read someone express the maxim "the computer you
| want is always around $5k". This has stuck with me. It's been
| approximately true throughout my life.
| heresaPizza wrote:
| These MacBooks are definitely worth the money. They cost a lot
| but they are not overpriced.
|
| You don't have to consider just the CPU and GPU but the whole
| SoC. CPU is impressive and GPU is good, but for standard
| workloads some PCs may give you slightly better performances
| (on the GPU side), at the cost of needing the power adapter to
| show that. However, for some specific workloads (especially the
| ones involving ProRaw video) the custom modules in it make it
| perform better not only than a Mac Pro, but than every other
| machine in the market. There is also the Neural Engine that
| could be more important in the future.
|
| You may not need those modules, but it seems like we are
| forgetting this are laptops with screens, inputs and more.
| These machines have one of the best screens, with high DPI,
| high refresh rate but most important the miniLED technology
| which brings true HDR. And that's something very pricey.
|
| Far from defending Apple, they could sell this laptops for less
| and we would all be happier, but at the end of the day these
| machines are worthy in every aspect (specific cases aside).
| neogodless wrote:
| I liked Dave2D's (youtube) take on this.
|
| They are tools. If you have specific workloads that these excel
| at in your job/hobby/money-making venture, then the price
| shouldn't be a concern.
|
| Depending on workload, they are comparable to $1000 PC laptops
| in CPU performance... or $3000 PCs. Or PCs that don't exist
| yet!
|
| As someone who uses a laptop for gaming, my $1000 laptop is
| infinitely superior to a $6000 Macbook Pro (for the games that
| I play). For almost every other use, the Macbook Pro is likely
| far superior!
|
| If you do Final Cut or Xcode work, these are the best tool
| available to you.
| smoldesu wrote:
| Yep, I reached pretty much the same conclusion as Dave: these
| are machines for _very specific people_ running _very
| specific software_. Apple got their wish: they made the
| computer disappear, and now the Macbook Pro is a tool. For
| better or worse, this is the best way to experience the Apple
| ecosystem.
|
| But also, if you're a developer without any interest in Apple
| (and maybe someone who wants to play games), the case for
| using Linux for general-purpose computing is stronger than
| ever. It will be interesting to see how Apple addresses their
| own issues in MacOS over the next few months, I've really got
| my fingers crossed for Vulkan support or 32-bit libraries
| making a return.
| throwawaywindev wrote:
| Yea, I originally ordered an M1 Max model after the
| presentation, then cancelled it when I realized for what I
| would use a GPU for (gaming and 3D development) a RTX 3080
| laptop would be a much better choice. I also don't care about
| performance/watt as much since I use my iPad for non-work
| stuff.
|
| But the technology nerd in me still wants to buy one for
| completely irrational reasons.
| kzrdude wrote:
| Me too. I really want an M1* mac but I also realize that I
| just want to run Linux, so it's kind of pointless right now
| (I'm not a Linux kernel level developer, so yeah).
| hajile wrote:
| AMD 5950x $750 Nvidia RTX 3060 $750 midrange x570 Motherboard
| $300 Samsung 980 pro 2TB $370 64GB DDR5 $700 decent 24" display
| $400 Case, keyboard, mouse, cooling $400
|
| That's $3670 -- If I build it myself. I'd expect to pay much
| more from a big box store for a prebuilt.
|
| A new 14" MacBook with M1 Max, 2TB SSD, and 64GB of RAM is
| around $4100.
|
| That's a great deal IMO
| AnthonyMouse wrote:
| 64GB memory, $292:
|
| https://www.newegg.com/crucial-64gb-288-pin-
| ddr4-sdram/p/N82...
|
| 2TB NVMe SSD, $199:
|
| https://www.newegg.com/western-digital-2tb-blue-
| sn550-nvme/p...
|
| X570 board, $154:
|
| https://www.newegg.com/asrock-x570-phantom-
| gaming-4/p/N82E16...
|
| Case, keyboard, mouse, fans:
|
| https://www.newegg.com/petrol-blue-fractal-design-focus-g-
| at... https://www.newegg.com/logitech-mk120-920-002565-usb-
| wired/p...
| https://www.newegg.com/p/13C-0007-001M0?Item=9SIABW9EME4036
|
| Together ~$100.
|
| You're off by about $1000.
| csomar wrote:
| Good. Now you have an NVMe drive but a motherboard that
| doesn't support NVMe. That cheap-ass cooler will unlikely
| be able to run the x5950; these usually should be coupled
| with water-cooling as they get stupidly hot.
|
| Also, if you are getting a $15 mouse/keyboard, why are you
| even gaming/doing graphic work in the first place? Might as
| well get a raspberry-pi.
| AnthonyMouse wrote:
| That motherboard has two NVMe slots.
|
| That cheap-ass cooler isn't that bad. It's pretty middle
| of the road. Personally I'd have the Hyper 212 Evo for
| $39 because it's not that much more expensive, but that
| plus a ~$60 case and a ~$20 keyboard and mouse still
| isn't $400.
|
| The main difference between the plain keyboard and your
| average "gamer" keyboard is what, RGB LEDs? You can pay
| the money for that if you like, but there are no RGB LEDs
| on the Mac.
| csomar wrote:
| > That motherboard has two NVMe slots.
|
| I fail to find that anywhere in the specs. It says 8
| SATA3 slots and 2 PCIe 4 slots. It is not clear if you
| can boot/configure them from the BIOS.
|
| > That cheap-ass cooler isn't that bad. It's pretty
| middle of the road. Personally I'd have the Hyper 212 Evo
| for $39 because it's not that much more expensive
|
| The cooler is critical if you are doing CPU intensive
| work. A hot CPU will get throttled; so you'd better get a
| _very_ good one if you are already paying a lot for your
| CPU.
|
| > The main difference between the plain keyboard and your
| average "gamer" keyboard is what, RGB LEDs? You can pay
| the money for that if you like, but there are no RGB LEDs
| on the Mac.
|
| No. The grip and precision are night and day. Even more
| so for mouses that I can't move back to normal ones (I
| had the G900 and now have the razer viper ultimate; this
| thing made the carpal tunnel syndrome a thing of the past
| and I use the mouse for 10+ hours per day).
| hajile wrote:
| That RAM is DDR4 while the MBP uses DDR5 (low power). MSI
| has stated that DDR5 will cost at least 60% more. Known
| prices are actually 3x higher.
|
| https://www.pcmag.com/news/msi-ddr5-ram-will-cost-60-more-
| th...
|
| That SSD is 2600mbs while the MacBook is 7400mbs. The
| Samsung Pro is the only SSD in that territory.
|
| Buy a cheap motherboard and you'll pay the price later. I
| didn't spec a $800 motherboard (though those are amazing).
| I went quite middle-of-the-road for an x570.
|
| A keyboard with a fingerprint reader will set you back at
| least $50 with a Surface keyboard costing $100. A
| comparable trackpad would be over $100, but even a midrange
| $50 mouse.
|
| A non-garbage case will be around $100 plus or minus a
| little.
|
| A decent air CPU cooler that will keep your CPU from
| throttling way back is going to run close to $80-120. I
| also didn't bother to price out all the little things.
|
| I forgot to add a PSU by the way. A name-brand, modular PSU
| with midrange internals that is just big enough (around
| 5-600w) is another $100-120.
| AnthonyMouse wrote:
| > That RAM is DDR4 while the MBP uses DDR5 (low power).
| MSI has stated that DDR5 will cost at least 60% more.
| Known prices are actually 3x higher.
|
| Naturally it's DDR4. The 5950X only supports DDR4. If it
| had DDR5 it would be twice as fast on all the things the
| M1 Max is doing well on as a result of having more memory
| bandwidth.
|
| > That SSD is 2600mbs while the MacBook is 7400mbs. The
| Samsung Pro is the only SSD in that territory.
|
| This is kind of fair, but then that's the other problem.
| For most workloads a 2600MBps read speed is already going
| to move the bottleneck somewhere else, especially on a
| machine with 64GB memory to use as cache. If you're the
| rare exception who actually benefits from it, the Samsung
| one is available, but for everybody else they get to save
| $170 by not coupling the fast CPU they actually want with
| an expensive SSD that wasn't their bottleneck and has a
| poor cost benefit ratio.
|
| > Buy a cheap motherboard and you'll pay the price later.
|
| How do you mean? At best it won't have some ports you
| eventually want and then you buy the add-in card later
| and spend half as much on it because by then the price is
| lower.
|
| $155 is a fairly high price for a motherboard. $300 is a
| severe price. The most common ones are like $70.
|
| > A keyboard with a fingerprint reader will set you back
| at least $50 with a Surface keyboard costing $100. A
| comparable trackpad would be over $100, but even a
| midrange $50 mouse.
|
| The logitech keyboard and mouse are perfectly serviceable
| and on par with anything you get when you buy a complete
| PC from the store. I would take them over the chiclet
| thing that Apple makes.
|
| You can have a $100 keyboard and a trackpad to use with
| your desktop, but now you have an advantage over the
| Macbook, because you get to buy it once and use it
| forever instead of it being permanently attached to a
| machine that will be obsolete before the keyboard is. So
| you get to amortize the cost over several hardware
| generations.
|
| The same goes for the monitor for that matter.
|
| > A non-garbage case will be around $100 plus or minus a
| little.
|
| How is the thing provided a garbage case? What am I not
| getting from it that I actually care about?
|
| > A decent air CPU cooler that will keep your CPU from
| throttling way back is going to run close to $80-120.
|
| That was a decent air CPU cooler. It has copper heat
| pipes and a 92mm fan. The crappy ones are like $13:
|
| https://www.newegg.com/cooler-master-air-cooler-series-
| rh-a3...
|
| A really great one is $39:
|
| https://www.amazon.com/dp/B005O65JXI
|
| I went to see what you would get for $120 and for that
| price some of the coolers included a CPU.
| kitsunesoba wrote:
| There's definitely cheaper options for the various
| components, but I would personally choose a nicer
| motherboard if I were doing a 5950X build. One of the new
| ASUS boards with no chipset fan and 2x Thunderbolt 4 ports,
| like the ProArt X570 Creator Wifi, is a likely choice.
| AnthonyMouse wrote:
| I mean this is the other advantage of the PC. If you want
| to pay more and get the Thunderbolt ports, you can. If
| you don't need them for anything, you don't have to pay
| for them.
|
| The specs we're looking at here are pretty general. Most
| workloads are either going to be CPU bound _or_ GPU
| bound, not both. Do you need the 5950X? Then you 're CPU
| bound and can save $500 with another GPU. Do you need the
| RTX 3060? Then you're GPU bound and can save $500 with
| another CPU.
|
| If you need the fast GPU in the Mac, it comes as one
| piece in a machine that starts at $3500.
| friedman23 wrote:
| You went and found the cheapest versions of components you
| could find to make a false comparison. This has always been
| the difference between pc laptops and mac laptops. People
| say "I can get this cpu with this graphics card in a pc
| laptop for WAAYY cheaper" ignoring the memory, quality of
| the chassis, quality of the screen, quality of the
| speakers, the battery and just about everything else.
| AnthonyMouse wrote:
| No, I didn't. You can get AM4 boards for ~$50. Here's
| 64GB of memory for $226:
|
| https://www.newegg.com/g-skill-64gb-288-pin-
| ddr4-sdram/p/N82...
|
| Here's a 2TB SSD for $143:
|
| https://www.newegg.com/leven-2tb/p/1B0-016A-00002?Item=9S
| IAW...
|
| But then you don't have PCIe 4.0 and it's 4x16GB instead
| of 2x32GB and it's SATA instead of NVMe.
|
| Here's how you spend >$700 on 64GB of memory:
|
| https://www.newegg.com/ballistix-64gb-288-pin-
| ddr4-sdram/p/N...
|
| $894!
|
| But wait, here's the same thing for $310:
|
| https://www.newegg.com/ballistix-64gb-288-pin-
| ddr4-sdram/p/N...
|
| It's a different color. Apparently that's what you get
| for the extra $584.
|
| There's getting cheaper stuff and then there's just
| paying a different amount of money.
| lowbloodsugar wrote:
| The NVME drive you found is 2400MB/s. MacBook Pro peaks
| at 7400Mb/s. So your pick is 1/3 speed.
|
| The $143 SSD you found is 540Mb/s. So < 1/10th the speed.
| AnthonyMouse wrote:
| The NVMe drive I found is 2600MB/s. It also isn't the
| bottleneck in general.
|
| Both the Samsung and the Apple SSDs are just a bad fit
| for a machine like this. For a non-I/O bound workload
| it's paying money for nothing. For a read-bound workload,
| the machine has 64GB of RAM, so you'd need a working set
| larger than that to have to care, and that's pretty rare.
| For a write-bound workload, at those speeds, you're going
| to melt a consumer-grade SSD and need an enterprise one.
| So you don't _want_ the faster one in this machine;
| either it 's not worth the price or it won't survive that
| amount of write load.
| gigatexal wrote:
| And it's portable with a high dpi screen and 120hz refresh.
| kitsunesoba wrote:
| > ...with a high dpi screen and 120hz refresh
|
| Having shopped it recently, the desktop monitor market is
| abysmal right now. To get a monitor remotely comparable to
| the new MBP internal displays you're looking at spending
| $3k on one of a tiny handful of FALD displays which run hot
| enough to need a fan and have worse DPI, or rearranging
| your desk setup to accommodate a 48" LG OLED TV which is
| subject to burnin with serious usage.
|
| Apple can't release those lower-priced 27" displays that've
| been rumored soon enough.
| _ph_ wrote:
| I hope the next large iMac is at least a little bit above
| 27", after all, the entry-level one went from 21.5 to
| 24". A 30" iMac with mini-led and 120Hz would be just too
| sweet.
| kitsunesoba wrote:
| That would be pretty great, but unless they restore
| display passthrough capabilities with the M-series 27"+
| iMacs, I'm really hoping for a standalone display that
| can be used both with a laptop and the rumored upcoming
| "Mac Pro Mini"/G4 Cube MKII.
| r00fus wrote:
| Given how difficult it's been to get GPUs recently due to
| supply chain crunch, are you sure you can actually get your
| order fulfilled without a significant wait period?
| AnthonyMouse wrote:
| You can get them. The supply crunch is why the GPU is ~$750
| instead of ~$300.
| paxys wrote:
| Completely depends on what your use case is. For me personally
| a desktop + mid-range laptop combo is cheaper, more powerful
| and an all-round better fit than a single $2K+ laptop (which
| realistically will become $3K+ after adding a few options).
| mrtranscendence wrote:
| They're definitely pricey, but CPU performance is out of this
| world. GPU's pretty good too, though not as impressive, I
| understand. Alas, it'll probably be a few years before I can
| get one (I use my work machine for almost everything and I
| _just_ got this one).
| f6v wrote:
| It's not that GPU isn't impressive. Rather that you can't put
| it to use in games.
| mrtranscendence wrote:
| Fair enough! It doesn't really bother me either way, since
| I don't really make use of the GPU on my laptop. Curious to
| try out Tensorflow on one of the new machines, though.
| rezahussain wrote:
| Yes, I just want to see some tensorflow specific
| benchmarks on the m1 max with 32 gpu cores vs 3080.
| f6v wrote:
| They're also compact, well-built(keyboards are fixed now) and
| with good battery life. Last time I was shopping, I found
| direct competition(XPS 13 and the likes) to be about the same
| price.
| OldHand2018 wrote:
| Oh absolutely, they're great. My last MacBook Pro was around
| $2500 or $3000, if I remember correctly, and it was a very
| good price for what you got.
|
| Don't think that I'm being overly negative here. These look
| like outstanding computers that are worth a premium price. My
| question here is about how much of a premium and I'm not
| trying to give an opinion with a question - I really don't
| know the answer :)
| cududa wrote:
| Depends what you need them for. On a music video set my 2010
| MBP survived a drop from a second story balcony (three stories
| in a normal home), onto a marble floor. Got dented as hell, but
| only functional damage was to the ethernet port. I'm excited to
| get one of these new ones. Imagine it'll last at least 5 years.
___________________________________________________________________
(page generated 2021-10-25 23:02 UTC)