[HN Gopher] M1 Ultra About 3x Bigger Than AMD's Ryzen CPUs
       ___________________________________________________________________
        
       M1 Ultra About 3x Bigger Than AMD's Ryzen CPUs
        
       Author : Brajeshwar
       Score  : 96 points
       Date   : 2022-03-20 16:12 UTC (6 hours ago)
        
 (HTM) web link (www.tomshardware.com)
 (TXT) w3m dump (www.tomshardware.com)
        
       | slayerjain wrote:
       | I wonder about the size comparison between the M1 Ultra and
       | surface area of the equivalent 'Ryzen CPU + Radeon GPU + RAM :D'
        
         | Brian_K_White wrote:
         | But would you even buy the equivalent when the gpu and ram were
         | "hard-coded"?
        
           | kutenai wrote:
           | yeah, because it does not operate as a space heater. I can
           | buy one of those for like $100, and that's a high end model.
        
         | ohgodplsno wrote:
         | A 5950X is 285mm2 in total (and made with TSMC 7nm). An RTX3080
         | is 682mm2 (and made with Samsung 8nm). A 16GB DDR5 die seems to
         | be about 75mm2.
         | 
         | RAM aside which is Apple's actual performance differentiator,
         | all of these eat the M1 Ultra alive in their respective
         | categories.
        
           | lhl wrote:
           | Going a bit further, since the chips are all on different
           | processes (Apple on TSMC N5, AMD on TSMC N7, and Nvidia on
           | Samsung 8N), it might also be worth looking at transistor
           | counts. The M1 Ultra has 114B transistors, a 5950X has 19.2B
           | transistors, and a 3080/3090 (GA102) has 28.3B transistors.
           | 
           | (at 864mm2 (2 x 432mm2 M1 Max's), that's a density of
           | ~130MTr/mm2, which is in the ballpark for TSMC's max density
           | for N5 (est. 170MTr/mm2) - N7 max density for reference is
           | ~90MTr/mm2)
        
       | sofixa wrote:
       | > The drives use a proprietary form-factor, which is logical as
       | the SSD controller resides in the SoC, and the drives themselves
       | do not need to carry one
       | 
       | No, it's not logical, they still could have used a common form
       | factor like M.2.
        
         | addaon wrote:
         | Besides how malicious it would be to use an M.2 physical
         | connector for a non-M.2-compatible protocol, the pin count on
         | the Apple connector is much higher for a reason. Adding a chip
         | on to the SSD to mux between the different flash signals and
         | retransmit them over a reduced number of high speed transceiver
         | lanes would be cost-wise most of the way to putting a
         | traditional PCIe controller on the SSD, without the benefits.
        
         | Eric_WVGG wrote:
         | What, make a socket matching the M.2, that M.2s would be
         | incompatible with?
        
           | formerly_proven wrote:
           | Intel actually does just that with wifi cards (CNVio/2 cards
           | only work with specific chipsets and don't use PCIe/USB,
           | while using the same keying as normal wifi cards).
        
             | wmf wrote:
             | An M.2 2230 slot should be able to take Wi-Fi or NVMe since
             | they're both PCIe devices. In practice there aren't many
             | 2230 SSDs.
        
               | Tijdreiziger wrote:
               | CNVi isn't PCIe; non-CNVi platforms can't take CNVi Wi-Fi
               | cards.
               | 
               | > CNVi or CNVio ("Connectivity Integration", Intel
               | Integrated Connectivity I/O interface) is a proprietary
               | connectivity interface by Intel for Wi-Fi and Bluetooth
               | radios to lower costs and simplify their wireless
               | modules. In CNVi, the network adapter's large and usually
               | expensive functional blocks (MAC components, memory,
               | processor and associated logic/firmware) are moved inside
               | the CPU and chipset (Platform Controller Hub). Only the
               | signal processor, analog and Radio frequency (RF)
               | functions are left on an external upgradeable CRF
               | (Companion RF) module which, as of 2019 comes in M.2 form
               | factor (M.2 2230 and 1216 Soldered Down). Therefore, CNVi
               | requires chipset and Intel CPU support. Otherwise the Wi-
               | Fi + Bluetooth module has to be the traditional M.2 PCIe
               | form factor.
               | 
               | https://en.wikipedia.org/wiki/CNVi
        
               | wmf wrote:
               | True; I was thinking of normal Wi-Fi cards.
        
         | aenis wrote:
         | Right, but that would hardly serve any practical purpose, as
         | one still couldn't plug an off the shelf m.2 device and expect
         | it to work. So - why constrain yourself with someone elses
         | connector design if if adds no value?
        
         | rowanG077 wrote:
         | That would make 0 sense. It would be confusing even since no
         | M.2 device would work in that slot.
        
       | dpedu wrote:
       | Isn't the M1 Ultra more comparable to a SoC than the discreet cpu
       | Ryzen chips are?
        
         | rfoo wrote:
         | Ryzen chips are more of less, SoC.
        
           | Groxx wrote:
           | Fair. I think we can probably consider them to be _less of a_
           | SoC than the M1 Ultra though - Ryzens do have a lot in them,
           | but not all the system 's RAM for example.
           | 
           | (which is a good thing. all-in-one for specialized stuff is
           | fine, but I don't want it to become the norm - upgrading one
           | component is cheaper than upgrading multiple, and tends to
           | support longer hardware cycles)
        
           | DiabloD3 wrote:
           | AMD internally refers to the finished product as the SoC, and
           | the chipset (no longer a northbridge, the NB is on die) as a
           | socket extender (a term also used by most SoC vendors) in
           | internal documentation.
           | 
           | Also, arguably, the IO Die is the SoC, the chiplets are
           | external to the SoC even though they all live on the same
           | interposer.
        
       | whatever1 wrote:
       | I am very curious about the production yields of these monsters.
       | Things can go wrong at the cpu, the gpu, the memory etc and any
       | of these reasons will lead a chip to the trash bin.
        
         | ch_123 wrote:
         | The M1 Ultra is essentially two M1 Max joined together. I
         | suspect most of the things which can go wrong are at the M1 Max
         | level. I'm sure a certain percentage where fusing two dies
         | together kills one or both dies, but I suspect that's probably
         | a manageable percentage.
        
         | grishka wrote:
         | This is the reason there are lower-end models of all M1 chips
         | with some CPU and GPU cores disabled. And the M1 Ultra is just
         | two M1 Max glued together.
        
         | sysbot wrote:
         | in general the lower yield ones ended up being the lesser model
         | such as M1 Pro using the same die
        
         | deergomoo wrote:
         | On the latest episode of the Accidental Tech Podcast they were
         | discussing a patent related to this fancy interconnect stuff,
         | and it sounds like not only do they need two fully functioning
         | M1 Max's, but they also need to be adjacent on the wafer.
         | Although I guess at least they can still use the non-defective
         | M1 Max in any pair.
        
           | Applejinx wrote:
           | That would be an interesting approach: don't know if it's
           | what they're doing, but if they're like 'make M1 Ultras
           | predominantly, but then if they break you're making M1 Max
           | with an attached bit that has to be trimmed off', that's
           | pretty ingenious.
        
             | grishka wrote:
             | I think I've seen a picture of a decapped M1 Max and it had
             | this extra part on the side that never appeared in Apple
             | renders. It was back then when this speculation of it
             | having multiprocessor interconnect started. It was then
             | also confirmed by Asahi Linux people from the software side
             | of things.
             | 
             | So no, they don't even bother to trim it off.
        
       | rowanG077 wrote:
       | Without delidding both you can't really tell anything. It might
       | very well be that 80% of the area of the ultra is just SDRAM.
        
         | Flankk wrote:
         | The CPU is the same size as the thermal paste residue shown in
         | the pictures. The remaining space is for the two RAM chips. The
         | author should be embarrassed.
        
         | readams wrote:
         | SDRAM is not located on the chip die for either. The memory
         | controller is on-die for the M1 though. My understanding is
         | that making SDRAM is a pretty different process than making a
         | CPU, so it's hard to do it on the same die.
        
           | monocasa wrote:
           | This article is comparing package sizes, and DRAM is on
           | package for the M1s.
        
       | MaxMoney wrote:
        
       | trdtaylor1 wrote:
       | You know, there's a few articles out on this size difference
       | subject; if they can make my CPU 10% faster by doubling the
       | physical size -- all other things being equal I'd take that.
        
         | uluyol wrote:
         | Cost increases super-linearly with size. One reason for this is
         | defects: if a single defect ruins a whole chip, then for a
         | constant number of defects per square inch, you'll get more
         | usable square inches of silicon when you have small chips than
         | big ones. Of course you can build chips that can tolerate a few
         | defects, but the principle holds.
         | 
         | This is also why high quality TVs are harder to manufacture
         | than high quality phone displays. You have a lot more waste
         | when you need to throw out/recycle a TV screen compared to a
         | phone screen. And both are considered bad when they have just
         | one bad pixel.
        
           | adtac wrote:
           | >for a constant number of defects per square inch
           | 
           | Is this assumption true?
        
             | Qub3d wrote:
             | Its more the geometry of the process than the rate of
             | failure. The post is just saying, "if the rate were
             | constant". Basically, throw a dart at a wafer. Wherever the
             | dart hits, the whole circuit/chip containing that point is
             | now worthless. Assuming you threw the same number of darts
             | (failures) at a board with a smaller chip, and a board with
             | a large chip, the larger process wafer loses more total
             | silicon (as a percentage of usable area).
             | 
             | With smaller chips, you have a smaller "grid" for your dart
             | to land on. So the total number of failures (darts) being
             | the same, you still end up with less usable silicon, since
             | the bigger grid means you throw away a lot more surface
             | area with a failure.
             | 
             | Take a look at this picture of a failure map, if you would
             | like some visuals: https://ars.els-
             | cdn.com/content/image/1-s2.0-S09521976120008...
        
         | jmole wrote:
         | Would you pay 2-3x the price? That's the tradeoff...
        
           | rfoo wrote:
           | Given that Apple devices already have 2-3x premium, as long
           | as Apple absorbs the extra cost, people would be fine with
           | it.
           | 
           | This is also why the approach is an "Apple-only" one.
        
       | shiftingleft wrote:
       | Does the M1 Ultra use a chiplet design like the Ryzen series?
       | Otherwise the yields have to be awful.
        
         | addaon wrote:
         | The ultra uses two identical dice in a single package, with an
         | EMIB-style interconnect. This is fewer, larger, and more
         | uniform dice than the Ryzen approach, but still not monolithic.
        
       | gbolcer wrote:
       | My favorite quote: "It's sort of like arguing that because your
       | electric car can use dramatically less fuel when driving at 80
       | miles per hour than a Lamborghini, it has a better engine --
       | without mentioning the fact that a Lambo can still go twice as
       | fast."
       | 
       | https://www.macrumors.com/2022/03/17/m1-ultra-nvidia-rtx-309...
        
         | stavros wrote:
         | Right, but if I can strap two electric cars together and go as
         | fast as the Lamborghini for less fuel, isn't that still better?
        
           | jayd16 wrote:
           | That link was about GPU power and according to the benchmarks
           | it can't go as fast with two.
        
             | stavros wrote:
             | I did read the link, this indicated to me that it sort-of
             | can:
             | 
             | > Apple's M1 Ultra is essentially two M1 Max chips
             | connected together, and as The Verge highlighted in its
             | full Mac Studio review, Apple has managed to successfully
             | get double the M1 Max performance out of the M1 Ultra,
             | which is a notable feat that other chip makers cannot
             | match.
             | 
             | Unless you're talking about power per watt? GPU workloads
             | are generally inherently parallelizable so performance per
             | watt is actually what generally matters, rather than raw
             | FLOPS.
        
               | pengaru wrote:
               | In the analogy the Lamborghini would be an RTX 3090, and
               | the macrumors article makes it abundantly clear the Ultra
               | isn't even half of the RTX 3090 performance.
        
               | stavros wrote:
               | Right, but rather than getting bogged down with the
               | analogy, do we agree that performance per watt is what
               | matters? It doesn't matter whether you need 2 or 2.43 of
               | them to match an RTX 3090, if it has better performance
               | per watt, it's better.
        
               | pengaru wrote:
               | > Right, but rather than getting bogged down with the
               | analogy, do we agree that performance per watt is what
               | matters?
               | 
               | No, I am not in agreement with you, the RTX 3090 is
               | clearly better at being a GPU.
        
               | hunterb123 wrote:
               | > clearly better
               | 
               | better is subjective, it depends on your needs.
               | 
               | sometimes performance per watt is more important
               | especially when you're mobile or trying to be
               | conservative w/ electricity.
        
               | pengaru wrote:
               | > sometimes performance per watt is more important
               | especially when you're mobile or trying to be
               | conservative w/ electricity.
               | 
               | There is an obvious implicit context when you're
               | comparing _performance_ to an RTX 3090, and it is _not_
               | mobility or your electric bill.
        
               | hunterb123 wrote:
               | There may be only one implicit context for you, which is
               | why I said "better" is subjective.
               | 
               | Personally the M1 Ultra would be better imo as a desktop
               | GPU than RTX 3090 in certain contexts. Putting it in a
               | van, traveling with it, etc.
               | 
               | I'll probably wait until the next go around of them, but
               | I'm loving nearing desktop performance w/ reduced loads.
        
               | pengaru wrote:
               | The RTX series has lower power variants. When you compare
               | against the 3090 you are establishing a context with
               | power consumption as a low priority.
               | 
               | Considering the 3090 had over double the performance, the
               | lower power parts may well be both faster and more power
               | efficient. You'd have to do the comparison to know, if
               | that's what you care more about.
        
               | hunterb123 wrote:
               | > 3090 had over double the performance
               | 
               | The FPS was closer than double. The flawed benchmark the
               | 3090 had double.
               | 
               | I'm not sure how the lower power RTX variants compare,
               | maybe that's a closer comparison, has there been a review
               | against those?
        
               | Applejinx wrote:
               | I would ask if the RTX 3090 is competitive in some of the
               | specialized things the Mac chip is made to do: there's a
               | hell of a lot of dedicated video processing stuff in
               | there that seems to exist to serve Final Cut Pro, not
               | gaming. Are we comparing apples and oranges, in the sense
               | that video editing at 4k, 5k and beyond might call upon a
               | different set of skills than gaming benchmarks? How much
               | would it matter if this translated into rather
               | proprietary things like ability to pump impossible
               | amounts of ProRes video compression on the fly and scrub
               | ridiculously giant frame sizes? Are there benchmarks
               | along those lines?
        
               | p1necone wrote:
               | Performance per watt is very important for devices that
               | need to run on battery and for cloud compute where you're
               | running thousands of them and saving on
               | electricity/cooling would be significant.
               | 
               | But it's almost completely unimportant for always-
               | plugged-in-to-wall-socket desktop chips. Because even the
               | power consumption of stuff like the 3090 is still pretty
               | insignificant on your power bill compared to up front
               | cost, and relative to the power consumption of all the
               | other stuff in your house.
               | 
               | Remember even the 3090 consumes very little power when
               | idle - it's only pulling the rated ~300w when you're
               | maxing it out playing heavy games or rendering or
               | whatever.
        
               | smoldesu wrote:
               | > do we agree that performance per watt is what matters?
               | 
               | No? Certainly not on desktop computers.
        
               | readams wrote:
               | It doesn't go as fast as a 3090 even with two. It does go
               | about twice as fast as an M1 Max.
        
               | jayd16 wrote:
               | First graph with geekbench scores.
               | 
               | RTX3090: 215,034
               | 
               | M1 Ultra: 102,156
        
               | mikhailt wrote:
               | Geekbrench has already been debunked a long time ago by
               | Anandtech as not sustaining the full load enough to show
               | the real pref.
               | 
               | The CPU and GPU tests finishes too fast for the system to
               | ramp up to the full power load.
               | 
               | That's not to say that M1 Ultra is equal or better than
               | 3090, just that Geekbench isn't the right test for this.
               | 
               | You can see more detailed analysis here:
               | https://www.youtube.com/watch?v=pJ7WN3yome4
        
               | smoldesu wrote:
               | Is that an indication that the M1 has latency issues vis-
               | a-vis power delivery, then?
        
               | g42gregory wrote:
               | That's nice to know, but for me the M1 Ultra performance
               | is not very relevant. Apple has a history of introducing
               | proprietary architectures, not-quite-compatibles
               | interfaces and non-standard software.
               | 
               | MacOS has nice UI? Great, I will use it. Anything
               | computationally intensive? - Why bother? I would use only
               | Linux on the back end. It's very easy to couple mid-range
               | Apple product with high-end Linux workstation at your
               | desk. I don't need to run compute-intensive Deep Learning
               | load while I am at the coffee shop.
        
               | jayd16 wrote:
               | >I don't need to run compute-intensive Deep Learning load
               | while I am at the coffee shop.
               | 
               | Well they're putting it in a desktop and claiming desktop
               | performance...
               | 
               | If Apple is going to put up graphs saying they beat i9
               | and 3090RTX machines, it's worth investigating even if I
               | plan to put Linux on it.
               | 
               | In this instance it seems like the reality doesn't match
               | the hype as well as in the laptop space, but if it had I
               | would surly consider the option.
        
               | arecurrence wrote:
               | The GB Compute test is widely viewed as worthless for the
               | m1 Ultra chip because it's too fast for the chip to ramp
               | up. It's even too short for the m1 Max.
        
           | ohgodplsno wrote:
           | No. You have two electric cars both going at 80 mph, with
           | twice the weight. Strap them together aaaaaand.... They're
           | still two electric cars going together at 80mph. But if
           | you're counting wheel rotations, well now you have 8 spinning
           | wheels. Top speed will still not match the Lamborghini.
        
             | stavros wrote:
             | What's it called when you make an analogy for convenience
             | and someone stretches it way past its breaking point as if
             | it still applies to the original point?
        
               | ask_b123 wrote:
               | https://english.stackexchange.com/questions/305332/is-
               | there-...
               | 
               | > overstretched or overblown analogy / _reductio ad
               | absurdum_
        
         | mikhailt wrote:
         | According to Max Tech analysis here,
         | https://www.youtube.com/watch?v=pJ7WN3yome4, the GPU seems to
         | max out at 60-70w at the moment for full load, even if the heat
         | is well-managed. If Apple let it go up to 100-120w, we could
         | see much more performance out of the GPU but for now, something
         | is not set up correctly on macOS.
         | 
         | So, we may have to wait to see what Apple says or if they'll
         | unlock the full performance in future software updates. They
         | did do this in the past with M1 hardware a few times.
         | 
         | Same result here from another channel where it appears GPU
         | isn't at full power atm:
         | https://www.youtube.com/watch?v=0CUoHwMtRsE&t=1172s
        
           | GeekyBear wrote:
           | The strategy of creating a large die with lots of execution
           | units, then running the chip at a lower clock speed for power
           | efficiency has been Apple's way of doing things for a while
           | now.
           | 
           | >Apple has built a wide enough GPU that they can keep
           | clockspeeds nice and low on the voltage/frequency curve,
           | which keeps overall power consumption down. The RTX 3090, by
           | contrast, is designed to chase performance with no regard to
           | power consumption, allowing NVIDIA to get great performance
           | out of it, but only by riding high on the voltage frequency
           | curve.
           | 
           | https://www.anandtech.com/show/17306/apple-
           | announces-m1-ultr...
           | 
           | This strategy is something that Anandtech's been calling out
           | since the iPhone 5s chip.
           | 
           | >Brian and I have long been hinting at the sort of ridiculous
           | frequency/voltage combinations mobile SoC vendors have been
           | shipping at for nothing more than marketing purposes. I
           | remember ARM telling me the ideal target for a Cortex A15
           | core in a smartphone was 1.2GHz. Samsung's Exynos 5410 stuck
           | four Cortex A15s in a phone with a max clock of 1.6GHz. The
           | 5420 increases that to 1.7GHz. The problem with frequency
           | scaling alone is that it typically comes at the price of
           | higher voltage. There's a quadratic relationship between
           | voltage and power consumption, so it's quite possibly one of
           | the worst ways to get more performance.
           | 
           | https://www.anandtech.com/show/7335/the-iphone-5s-review/2
        
             | mikhailt wrote:
             | > The strategy of creating a large die with lots of
             | execution units, then running the chip at a lower clock
             | speed for power efficiency has been Apple's way of doing
             | things for a while now.
             | 
             | > This strategy is something that Anandtech's been calling
             | out since the iPhone 5s chip.
             | 
             | Not sure that applies to M1 Ultra that's clearly marketed
             | as a desktop SoC, not a mobile device or laptop SoC.
             | Optimizing for energy efficiency or battery life is not the
             | same as optimizing for performance first, which is what M1
             | Ultra is designed for. Not to mention, there's a reason
             | Windows and Linux have multiple power profiles that changes
             | how schedulers and power loads work. macOS has "high power
             | mode" for m1 on laptops and yet strangely, it is not even
             | available on the Mac Studio.
             | 
             | Keep in mind that Apple showed a graph that clearly shows
             | the GPU was at 100-110w. Why do that if they won't run it
             | at that wattage or even talk about 3090 in the first place?
             | Why ruin their reputation over a silly little thing?
             | Everyone on the planet will easily debunk that.
             | 
             | Also, why even bother adding a high-quality cooling system
             | that the Mac Studio clearly don't need since both CPU/GPU
             | are going to be capped at lower power wattage?
             | 
             | We'll find out eventually one way or another.
        
               | GeekyBear wrote:
               | > Not sure that applies to M1 Ultra that's clearly
               | marketed as a desktop SoC, not a mobile device or laptop
               | SoC.
               | 
               | The M1 Ultra is using the same CPU cores and GPU cores as
               | an iPhone, just more of them.
               | 
               | It's long been their strategy across the board. Throw
               | silicon die area at tons of execution units and run the
               | chip at a lower clock speed for power efficiency.
               | 
               | For example, the 2012 A5X iPad chip had about the same
               | die area as 4 core Ivy Bridge, which was huge for a
               | mobile device SOC.
               | 
               | https://www.anandtech.com/show/6330/the-iphone-5-review/4
               | 
               | Since they are selling the whole widget, they aren't as
               | dependent on minimizing die area to maximize their profit
               | margin.
        
               | addaon wrote:
               | > The M1 Ultra is using the same CPU cores and GPU cores
               | as an iPhone, just more of them.
               | 
               | Not only that, but it's running them at basically the
               | same clock speeds as the rest of the M1 family, and even
               | as the iPhone. This suggests that they're running at not
               | far above a best efficiency point, rather than way past
               | the elbow of the voltage curve as is traditional for
               | desktop chips.
        
             | jleahy wrote:
             | I would have thought this approach would cause issues for
             | leakage power? Better to run fast then sleep and power gate
             | to cut leakage, especially at smaller geometries.
             | 
             | Or do you think they are mostly using HVt cells?
        
               | GeekyBear wrote:
               | Doesn't power leakage get much, much worse as temperature
               | increases?
        
           | ohgodplsno wrote:
           | > If Apple let it go up to 100-120w
           | 
           | GPUs and CPUs are not simply "pump more power in it to go
           | faster hurr hurr". It is extremely likely that the M1's
           | maximum performance is attained already. Otherwise, the M1
           | ultra would have pumped a little bit more wattage and gotten
           | better single core perfs, but guess what, it doesn't.
        
             | mikhailt wrote:
             | > GPUs and CPUs are not simply "pump more power in it to go
             | faster hurr hurr".
             | 
             | I agree, they're not a simple "throw more power and it'll
             | go faster" problem. However, that doesn't mean they do not
             | go faster if you increase clock speed and give it a little
             | more power.
             | 
             | After all, overclocking would be utterly pointless over the
             | past two decades and Intel/AMD wouldn't benefit from the
             | so-called turbo boost technology. (I know M1 doesn't have
             | turbo boost).
             | 
             | Just to be clear, yes, there's a limit to everything of
             | course. There's a balance where certain higher clocks would
             | cause a bottleneck in the rest of the system since this is
             | a SoC.
             | 
             | In this case, I used the 100-120w number because Apple used
             | that number in their GPU graph showing the GPU running at
             | ~105w, why bother showing that if it can't accomplish it?
             | Here's the graph:
             | https://images.anandtech.com/doci/17306/Apple-M1-Ultra-
             | gpu-p...
             | 
             | Note that Apple shows 60w for their CPU graph
             | (https://images.anandtech.com/doci/17306/Apple-M1-Ultra-
             | cpu-p...), which did actually match up and does show the
             | double CPU pref that Apple stated.
             | 
             | I didn't pull that number out of nowhere. If Apple used 60w
             | in that graph, then I wouldn't be here.
             | 
             | > Otherwise, the M1 ultra would have pumped a little bit
             | more wattage and gotten better single core perfs, but guess
             | what, it doesn't.
             | 
             | That's because of two reasons:
             | 
             | 1. There is only so much data wecan "process" within the
             | same core, more power does not change the data itself, we
             | can only increase the clock speed to finish it faster or
             | widen the amount of data that can fit in the same core.
             | Also of note that Intel Turbo Boost does actually help with
             | single-core thread pref by boosting the single core speed
             | beyond the base line. 2. M1 max clock speed is set to
             | 3.2Ghz (single core), it can't go faster than this (set by
             | Apple). Throwing more wattage here wouldn't change anything
             | but throwing more power to allow all 16 P cores to hit 3.2
             | does improve performance as long as it isn't overheating. I
             | don't think M1 Ultra does hit 3.2 on all 16 cores (it might
             | have hit 60w max first; I might be wrong but I saw mostly
             | just 3.0ghz in Max video but I have to go back to review
             | it).
             | 
             | Again, this is not about single-core performance, this is
             | about GPU performance, which is extremely parallelizable
             | and more comparable to the multi-core performance instead.
        
         | mrtksn wrote:
         | You know what, Mercedes uses Renault engines in some of its
         | cars that are intended for luxury daily use.
         | 
         | Better != Faster or Better != More powerful
         | 
         | Better means closer to fulfil the goals and often the goals are
         | inversely proportional(i.e. cheaper, faster, cooler are the
         | goals but faster meaning less cheaper and hotter).
         | 
         | Apple might be guilty of misrepresentation of raw computational
         | power but with M1 the experience of using computers has become
         | significantly much better.
        
           | davewritescode wrote:
           | Mercedes only uses Renault engines in its absolute bottom of
           | the barrel products.
           | 
           | Mercedes, unlike BMW with Mini and Audi with the VW parts bin
           | has to outsource the low end stuff.
        
           | marcodiego wrote:
           | Actually better depends on the criteria. A Volkswagen Beetle
           | can be better than any of the listed cars if you consider the
           | type of road an easy of repair.
        
           | homarp wrote:
           | for the curious https://luxurycarsa2z.com/which-mercedes-
           | have-a-renault-engi...
        
         | phyalow wrote:
         | I am quite happy with my Macbook Pro M1 Max, which remains
         | virtually ice cold even running heavy compute (I am still yet
         | to hear the fans spin up) and having a battery life that can
         | accomodate easily 12+ hours of real work. Any Windows laptop
         | with a 3090 inside it, will be a hot and loud brick in
         | comparison. Granted I dont play video games, and my ML compute
         | is run in the cloud it fits my needs nicely. I would argue in
         | terms of sheer generalised productivity nothing else comes
         | close.
        
           | aenis wrote:
           | Same here. Great computer. The only thing I am missing is a
           | bit more reliable support for waking up external monitors.
           | Between my now two m1/m1 max macs and 6 screens I use, one
           | (an Asus 2k display which is slow to wake up from sleep)
           | often requires unplugging to wake up. That is not a m1 issue
           | - its been like that with macos since... forever, but it's
           | irritating.
        
             | Osiris wrote:
             | I use a 4 monitor setup for my workstation. I switched to a
             | desktop computer in part to not have to deal with this
             | exact issue with a MacBook.
        
             | flatiron wrote:
             | One big thing holding me back from jumping on one is macOS.
             | I'm sure it's ok but I've been using Linux for ~25 years
             | that's a lot of muscle memory to lose. I really wish apple
             | supported Linux better.
        
               | brettdong wrote:
               | Hope Asahi Linux will get mature and be daily driver
               | usable in the near future.
        
             | srcreigh wrote:
             | What is your monitor layout?
        
               | aenis wrote:
               | I have those 6 screens plugged to two different macs, so
               | just 3 on each. Big screen in the center, and two in
               | portrait mode on the sides. One uses 43" 4k for the
               | center screen, the other uses a 34" ultrawide.
               | 
               | The m1 max drives its displays directly, the m1 via a
               | starlink tripple 4k displaylink hub.
               | 
               | The problems, in my case, are specific to monitors that
               | take a good few seconds to come back from standby. Macos
               | seems to be impatient and considers them dead. I am down
               | to just 1 screen with this problem, and will replace it
               | at some point.
        
           | grishka wrote:
           | The only way I was able to get mine to spin the fans (at
           | seemingly full speed!) was to build the Telegram Android app
           | in "afat" variant. The NDK (which is _still_ x86-only, shame
           | on you Google) fully loads all cores for an extended period
           | of time. The fans start several minutes in.
        
       | oblak wrote:
       | I saw this earlier but since I can only see the main image and
       | reading didn't net an answer - what, exactly are they measuring?
       | I am not expert but to my understand part of ryzen success is
       | high yields due to small size. Having a thing 3x its size, while
       | impressive, is not super surprising.
       | 
       | Soon it will be 6x, then more. Zen 5 small cores are supposed to
       | be just zen 4 cores iirc.
        
       | ajaimk wrote:
       | M1 Ultra has a 3x larger heat-spreader than Ryzen. That's the
       | entire package and not the silicon chip itself.
       | 
       | Also, let's not forget that this heat-spreader includes the RAM
       | under it.
       | 
       | Here's the funny bit: the silicon die for M1 Ultra is actually
       | more than 4x bigger than Ryzen 5000 series. M1 Max was 432 mm2;
       | implying 864 mm2 of silicon. Ryzen 5000 has a 84 mm2 CCD and 124
       | mm2 IO die.
        
         | [deleted]
        
         | loosescrews wrote:
         | I think the Ryzen 6000 laptop chips are a better comparison.
         | They are also monolithic, contain a powerful GPU, and are
         | manufactured on a more similar node. Ryzen 6000 has a die size
         | of 210mm2.
        
           | jagger27 wrote:
           | It's a slightly better comparison, however Ryzen 6000 is on
           | N6 (which is an optimized N7 process), whereas M1 Max is on
           | N5. N5 is considerably denser than N6.
           | 
           | AMD's next round of APUs on 5 nm will offer a better die area
           | comparison.
        
             | opisthenar84 wrote:
             | I remember reading that at 5nm, electron tunnelling through
             | transistor gates would be a major problem. What happened to
             | that?
        
               | ace2358 wrote:
               | I think the do funky stuff with the 3D or 2.5D geometry
               | of the gate so that the surface area (between
               | transistors) is larger for a smaller footprint.
               | 
               | Someone correct me if I'm wrong!
        
               | jagger27 wrote:
               | Yes, that tech is called FinFET and it has been in use
               | for many years. The new hot thing is GAAFET (Gate-all-
               | around FET).
        
               | aunty_helen wrote:
               | 5nm doesn't mean anything is actually 5nm. It's the "5nm
               | node"
               | 
               | Marketing dept has trumped physics.
        
           | wmf wrote:
           | M1 Pro is 251.3 mm2 with a ~5 TFLOPS GPU and Rembrandt is 210
           | mm2 for 3.4 TFLOPS so the comparison checks out.
        
         | redisman wrote:
         | I'm guessing the price scales at least linearly with the size.
         | I do hope the new trend isn't giant chips because the price and
         | availability would be horrible
        
           | mcronce wrote:
           | It's generally superlinear due to yield effects - this is a
           | bit oversimplified, but in particular, with lithography
           | defects being more or less randomly placed throughout the
           | wafer, a defect in a 100mm2 die impacts a far smaller
           | percentage of the dies in the wafer than a defect in an
           | 800mm2 die. Assuming a constant defect rate per wafer, you'll
           | end up with drastically less usable wafer area with large
           | dies than with small ones.
           | 
           | This was _a_ major reason that AMD was able to price Ryzen so
           | aggressively even at the high end - a high-end CPU is just
           | made of more of the same small dies used in low-end CPUs,
           | albeit requiring a better bin, instead of having to make a
           | much larger single die.
           | 
           | The constant defect rate isn't typically a correct assumption
           | across different products (e.g. different CPU or GPU dies) -
           | different feature designs will have differing defect rates.
           | Maybe Apple was able to design the M1 Ultra's features so
           | that defect rate is very very low, though - I don't really
           | know much about that silicon.
        
             | kzrdude wrote:
             | This is probably M1 Ultra is designed to be two M1 Max
             | glued together, more or less.
        
               | addaon wrote:
               | The M1 Ultra would be at or over the reticle limit as a
               | single die, depending on how much area could be saved by
               | removing the inter-die transceivers.
        
               | reitzensteinm wrote:
               | I don't think that's a deal breaker if you're willing to
               | lay thick interconnects, treating the connection much
               | like you'd treat something like EMIB.
               | 
               | After all, Cerebras made a functional wafer sized chip.
        
               | tuvan wrote:
               | Is it actually 2 different pieces of silicon though? I
               | thought they were 2xM1 Max as a single silicon. Which
               | wouldn't help with defect rates
        
               | addaon wrote:
               | It is two separate dice, with an EMIB-style interconnect.
        
               | [deleted]
        
               | mcronce wrote:
               | That makes a lot of sense. The Max is still pretty
               | fucking big, but two of those is a lot more manageable
               | than a single monolothic die double the size
        
               | ace2358 wrote:
               | I just listened to the latest atp.fm podcast. I think
               | John said that it actually is two adjacent M1 max dies.
               | The Ultra can't be made from two random working die. They
               | have to physically be adjacent on the wafer.
               | 
               | I don't know if that means it's physically one due
               | though.
        
               | wmf wrote:
               | Usually dies aren't mirrored on the wafer so I'm calling
               | citation needed on this one.
        
             | kllrnohj wrote:
             | Apple has both a 48 core & 64 core GPU variant of the M1
             | Ultra. Since the GPU is the largest single element of the
             | M1 Max / Ultra, that's how they are handling yields. And
             | it's then +$1000 for the 64 core variant over the 48 core,
             | suggesting probably "normal" TSMC yields and then a very
             | healthy profit margin. Apple does have the luxury of just
             | charging whatever they want for this SoC since they are a
             | market unto themselves. So even if yields are bad, they can
             | just pass that cost along to the consumer.
             | 
             | Not entirely unlike what Intel used to do with Xeons before
             | AMD re-entered the picture and what Nvidia kinda does with
             | the likes of the A100.
        
             | imachine1980_ wrote:
             | probably this is why they decide to do it as last m1
             | product, in genera the more you use a process -> less error
             | in the production process happens.
        
         | Spartan22 wrote:
         | Why not spread the processor across the whole Macbook
         | motherboard, wouldn't this bring huge cooling benefits with no
         | major downsides?
        
           | pure_simplicity wrote:
           | The downside would be latency increase / speed reduction.
        
           | 01100011 wrote:
           | Speed of light is something like one foot per nanosecond. So
           | at 2GHz you go 6" per clock, assuming you can actually go the
           | speed of light which, in a wire, you can't.
        
           | bufferoverflow wrote:
           | The whole point of going to smaller nm process is to bring
           | things closer, so it takes less time to move information.
        
           | admax88qqq wrote:
           | The shorter signals have to travel, the less time and power
           | it takes to do so.
        
         | WhitneyLand wrote:
         | Yep, It's a poor comparison, the Mac chip has up to 128 gig of
         | RAM under there, and other stuff like SSD controller, etc..
         | 
         | It is kind of awesome to look at though.
        
       | GeekyBear wrote:
       | The technically interesting bit about the Ultra's GPU is that it
       | makes multiple GPU dies look like one physical GPU to software:
       | 
       | >if you could somehow link up multiple GPUs with a ridiculous
       | amount die-to-die bandwidth - enough to replicate their internal
       | bandwidth - then you might just be able to use them together in a
       | single task. This has made combining multiple GPUs in a
       | transparent fashion something of a holy grail of multi-GPU
       | design. It's a problem that multiple companies have been working
       | on for over a decade, and it would seem that Apple is charting
       | new ground by being the first company to pull it off.
       | 
       | https://www.anandtech.com/show/17306/apple-announces-m1-ultr...
        
         | kyriakos wrote:
         | Isn't this an OS feature?
        
           | wmf wrote:
           | No, I think the OS sees it as a single GPU.
        
             | [deleted]
        
         | smoldesu wrote:
         | You've posted this maybe 3 times in the past week, all on
         | articles that have nothing to do with the multi-die nature of
         | the M1 Ultra. What is the salient information here that we
         | couldn't ascertain from watching Apple's keynote or scrolling
         | through their landing page?
        
           | rat9988 wrote:
           | He is just enthusiastic about some specific point, and wants
           | to discuss it with HN readers. No harm in it.
        
           | GeekyBear wrote:
           | This is literally a point made in this article too.
           | 
           | How is it not relevant in this thread?
        
             | tedunangst wrote:
             | So why not quote this article instead of linking and
             | quoting a different article?
        
               | GeekyBear wrote:
               | Isn't this a terrible choice of websites if you are upset
               | by discussion that calls out a genuinely new
               | technological feat in the computer industry?
        
               | smoldesu wrote:
               | It's not new, AMD has been doing this for a while with
               | their Infinity Fabric technology (and before that,
               | technologies like SLI and Crossfire achieved the same
               | thing on a software level), and you've posted about it
               | several times already. It's also not much of a "feat",
               | since it was done by a company with 200 billion dollars
               | in liquid cash that could probably cure cancer if they
               | wanted to. Not only is it completely uninteresting from a
               | technical standpoint, it's also something you've
               | repeatedly posted on threads that outline the downsides
               | of doing this (like yesterday's benchmark results,
               | showing how strapping 2 M1 Max GPUs together doesn't
               | yield twice as much compute power).
               | 
               | Frankly, it's annoying, and pretty far from
               | intellectually stimulating. What is there to discuss
               | anyways? Do we all need to pat the world's largest tech
               | conglomerate on the back for doing the same thing as
               | other companies did 6+ years ago, with worse performance
               | results and misleading marketing material to boot?
        
               | GeekyBear wrote:
               | >It's not new, AMD has been doing this for a while with
               | their Infinity Fabric technology (and before that,
               | technologies like SLI and Crossfire achieved the same
               | thing on a software level)
               | 
               | None of those technologies make multiple GPU dies look
               | like a single physical GPU to software.
               | 
               | As covered by the Anandtech quote you dislike so much:
               | 
               | > This has made combining multiple GPUs in a transparent
               | fashion something of a holy grail of multi-GPU design.
               | It's a problem that multiple companies have been working
               | on for over a decade, and it would seem that Apple is
               | charting new ground by being the first company to pull it
               | off.
               | 
               | However, that article does go into additional detail.
               | 
               | >Unlike multi-die/multi-chip CPU configurations, which
               | have been commonplace in workstations for decades, multi-
               | die GPU configurations are a far different beast. The
               | amount of internal bandwidth GPUs consume, which for
               | high-end parts is well over 1TB/second, has always made
               | linking them up technologically prohibitive. As a result,
               | in a traditional multi-GPU system (such as the Mac Pro),
               | each GPU is presented as a separate device to the system,
               | and it's up to software vendors to find innovative ways
               | to use them together. In practice, this has meant having
               | multiple GPUs work on different tasks, as the lack of
               | bandwidth meant they can't effectively work together on a
               | single graphics task.
               | 
               | https://www.anandtech.com/show/17306/apple-
               | announces-m1-ultr...
        
               | [deleted]
        
       | olliej wrote:
       | I'd rather see the delidded version as this is comparing a CPU,
       | to CPU+RAM and without delidding we don't know how much space is
       | taken by the ram dies.
       | 
       | As it is, this is a kind of clickbaity article
        
       ___________________________________________________________________
       (page generated 2022-03-20 23:01 UTC)