[HN Gopher] M1 Ultra About 3x Bigger Than AMD's Ryzen CPUs
___________________________________________________________________
M1 Ultra About 3x Bigger Than AMD's Ryzen CPUs
Author : Brajeshwar
Score : 96 points
Date : 2022-03-20 16:12 UTC (6 hours ago)
(HTM) web link (www.tomshardware.com)
(TXT) w3m dump (www.tomshardware.com)
| slayerjain wrote:
| I wonder about the size comparison between the M1 Ultra and
| surface area of the equivalent 'Ryzen CPU + Radeon GPU + RAM :D'
| Brian_K_White wrote:
| But would you even buy the equivalent when the gpu and ram were
| "hard-coded"?
| kutenai wrote:
| yeah, because it does not operate as a space heater. I can
| buy one of those for like $100, and that's a high end model.
| ohgodplsno wrote:
| A 5950X is 285mm2 in total (and made with TSMC 7nm). An RTX3080
| is 682mm2 (and made with Samsung 8nm). A 16GB DDR5 die seems to
| be about 75mm2.
|
| RAM aside which is Apple's actual performance differentiator,
| all of these eat the M1 Ultra alive in their respective
| categories.
| lhl wrote:
| Going a bit further, since the chips are all on different
| processes (Apple on TSMC N5, AMD on TSMC N7, and Nvidia on
| Samsung 8N), it might also be worth looking at transistor
| counts. The M1 Ultra has 114B transistors, a 5950X has 19.2B
| transistors, and a 3080/3090 (GA102) has 28.3B transistors.
|
| (at 864mm2 (2 x 432mm2 M1 Max's), that's a density of
| ~130MTr/mm2, which is in the ballpark for TSMC's max density
| for N5 (est. 170MTr/mm2) - N7 max density for reference is
| ~90MTr/mm2)
| sofixa wrote:
| > The drives use a proprietary form-factor, which is logical as
| the SSD controller resides in the SoC, and the drives themselves
| do not need to carry one
|
| No, it's not logical, they still could have used a common form
| factor like M.2.
| addaon wrote:
| Besides how malicious it would be to use an M.2 physical
| connector for a non-M.2-compatible protocol, the pin count on
| the Apple connector is much higher for a reason. Adding a chip
| on to the SSD to mux between the different flash signals and
| retransmit them over a reduced number of high speed transceiver
| lanes would be cost-wise most of the way to putting a
| traditional PCIe controller on the SSD, without the benefits.
| Eric_WVGG wrote:
| What, make a socket matching the M.2, that M.2s would be
| incompatible with?
| formerly_proven wrote:
| Intel actually does just that with wifi cards (CNVio/2 cards
| only work with specific chipsets and don't use PCIe/USB,
| while using the same keying as normal wifi cards).
| wmf wrote:
| An M.2 2230 slot should be able to take Wi-Fi or NVMe since
| they're both PCIe devices. In practice there aren't many
| 2230 SSDs.
| Tijdreiziger wrote:
| CNVi isn't PCIe; non-CNVi platforms can't take CNVi Wi-Fi
| cards.
|
| > CNVi or CNVio ("Connectivity Integration", Intel
| Integrated Connectivity I/O interface) is a proprietary
| connectivity interface by Intel for Wi-Fi and Bluetooth
| radios to lower costs and simplify their wireless
| modules. In CNVi, the network adapter's large and usually
| expensive functional blocks (MAC components, memory,
| processor and associated logic/firmware) are moved inside
| the CPU and chipset (Platform Controller Hub). Only the
| signal processor, analog and Radio frequency (RF)
| functions are left on an external upgradeable CRF
| (Companion RF) module which, as of 2019 comes in M.2 form
| factor (M.2 2230 and 1216 Soldered Down). Therefore, CNVi
| requires chipset and Intel CPU support. Otherwise the Wi-
| Fi + Bluetooth module has to be the traditional M.2 PCIe
| form factor.
|
| https://en.wikipedia.org/wiki/CNVi
| wmf wrote:
| True; I was thinking of normal Wi-Fi cards.
| aenis wrote:
| Right, but that would hardly serve any practical purpose, as
| one still couldn't plug an off the shelf m.2 device and expect
| it to work. So - why constrain yourself with someone elses
| connector design if if adds no value?
| rowanG077 wrote:
| That would make 0 sense. It would be confusing even since no
| M.2 device would work in that slot.
| dpedu wrote:
| Isn't the M1 Ultra more comparable to a SoC than the discreet cpu
| Ryzen chips are?
| rfoo wrote:
| Ryzen chips are more of less, SoC.
| Groxx wrote:
| Fair. I think we can probably consider them to be _less of a_
| SoC than the M1 Ultra though - Ryzens do have a lot in them,
| but not all the system 's RAM for example.
|
| (which is a good thing. all-in-one for specialized stuff is
| fine, but I don't want it to become the norm - upgrading one
| component is cheaper than upgrading multiple, and tends to
| support longer hardware cycles)
| DiabloD3 wrote:
| AMD internally refers to the finished product as the SoC, and
| the chipset (no longer a northbridge, the NB is on die) as a
| socket extender (a term also used by most SoC vendors) in
| internal documentation.
|
| Also, arguably, the IO Die is the SoC, the chiplets are
| external to the SoC even though they all live on the same
| interposer.
| whatever1 wrote:
| I am very curious about the production yields of these monsters.
| Things can go wrong at the cpu, the gpu, the memory etc and any
| of these reasons will lead a chip to the trash bin.
| ch_123 wrote:
| The M1 Ultra is essentially two M1 Max joined together. I
| suspect most of the things which can go wrong are at the M1 Max
| level. I'm sure a certain percentage where fusing two dies
| together kills one or both dies, but I suspect that's probably
| a manageable percentage.
| grishka wrote:
| This is the reason there are lower-end models of all M1 chips
| with some CPU and GPU cores disabled. And the M1 Ultra is just
| two M1 Max glued together.
| sysbot wrote:
| in general the lower yield ones ended up being the lesser model
| such as M1 Pro using the same die
| deergomoo wrote:
| On the latest episode of the Accidental Tech Podcast they were
| discussing a patent related to this fancy interconnect stuff,
| and it sounds like not only do they need two fully functioning
| M1 Max's, but they also need to be adjacent on the wafer.
| Although I guess at least they can still use the non-defective
| M1 Max in any pair.
| Applejinx wrote:
| That would be an interesting approach: don't know if it's
| what they're doing, but if they're like 'make M1 Ultras
| predominantly, but then if they break you're making M1 Max
| with an attached bit that has to be trimmed off', that's
| pretty ingenious.
| grishka wrote:
| I think I've seen a picture of a decapped M1 Max and it had
| this extra part on the side that never appeared in Apple
| renders. It was back then when this speculation of it
| having multiprocessor interconnect started. It was then
| also confirmed by Asahi Linux people from the software side
| of things.
|
| So no, they don't even bother to trim it off.
| rowanG077 wrote:
| Without delidding both you can't really tell anything. It might
| very well be that 80% of the area of the ultra is just SDRAM.
| Flankk wrote:
| The CPU is the same size as the thermal paste residue shown in
| the pictures. The remaining space is for the two RAM chips. The
| author should be embarrassed.
| readams wrote:
| SDRAM is not located on the chip die for either. The memory
| controller is on-die for the M1 though. My understanding is
| that making SDRAM is a pretty different process than making a
| CPU, so it's hard to do it on the same die.
| monocasa wrote:
| This article is comparing package sizes, and DRAM is on
| package for the M1s.
| MaxMoney wrote:
| trdtaylor1 wrote:
| You know, there's a few articles out on this size difference
| subject; if they can make my CPU 10% faster by doubling the
| physical size -- all other things being equal I'd take that.
| uluyol wrote:
| Cost increases super-linearly with size. One reason for this is
| defects: if a single defect ruins a whole chip, then for a
| constant number of defects per square inch, you'll get more
| usable square inches of silicon when you have small chips than
| big ones. Of course you can build chips that can tolerate a few
| defects, but the principle holds.
|
| This is also why high quality TVs are harder to manufacture
| than high quality phone displays. You have a lot more waste
| when you need to throw out/recycle a TV screen compared to a
| phone screen. And both are considered bad when they have just
| one bad pixel.
| adtac wrote:
| >for a constant number of defects per square inch
|
| Is this assumption true?
| Qub3d wrote:
| Its more the geometry of the process than the rate of
| failure. The post is just saying, "if the rate were
| constant". Basically, throw a dart at a wafer. Wherever the
| dart hits, the whole circuit/chip containing that point is
| now worthless. Assuming you threw the same number of darts
| (failures) at a board with a smaller chip, and a board with
| a large chip, the larger process wafer loses more total
| silicon (as a percentage of usable area).
|
| With smaller chips, you have a smaller "grid" for your dart
| to land on. So the total number of failures (darts) being
| the same, you still end up with less usable silicon, since
| the bigger grid means you throw away a lot more surface
| area with a failure.
|
| Take a look at this picture of a failure map, if you would
| like some visuals: https://ars.els-
| cdn.com/content/image/1-s2.0-S09521976120008...
| jmole wrote:
| Would you pay 2-3x the price? That's the tradeoff...
| rfoo wrote:
| Given that Apple devices already have 2-3x premium, as long
| as Apple absorbs the extra cost, people would be fine with
| it.
|
| This is also why the approach is an "Apple-only" one.
| shiftingleft wrote:
| Does the M1 Ultra use a chiplet design like the Ryzen series?
| Otherwise the yields have to be awful.
| addaon wrote:
| The ultra uses two identical dice in a single package, with an
| EMIB-style interconnect. This is fewer, larger, and more
| uniform dice than the Ryzen approach, but still not monolithic.
| gbolcer wrote:
| My favorite quote: "It's sort of like arguing that because your
| electric car can use dramatically less fuel when driving at 80
| miles per hour than a Lamborghini, it has a better engine --
| without mentioning the fact that a Lambo can still go twice as
| fast."
|
| https://www.macrumors.com/2022/03/17/m1-ultra-nvidia-rtx-309...
| stavros wrote:
| Right, but if I can strap two electric cars together and go as
| fast as the Lamborghini for less fuel, isn't that still better?
| jayd16 wrote:
| That link was about GPU power and according to the benchmarks
| it can't go as fast with two.
| stavros wrote:
| I did read the link, this indicated to me that it sort-of
| can:
|
| > Apple's M1 Ultra is essentially two M1 Max chips
| connected together, and as The Verge highlighted in its
| full Mac Studio review, Apple has managed to successfully
| get double the M1 Max performance out of the M1 Ultra,
| which is a notable feat that other chip makers cannot
| match.
|
| Unless you're talking about power per watt? GPU workloads
| are generally inherently parallelizable so performance per
| watt is actually what generally matters, rather than raw
| FLOPS.
| pengaru wrote:
| In the analogy the Lamborghini would be an RTX 3090, and
| the macrumors article makes it abundantly clear the Ultra
| isn't even half of the RTX 3090 performance.
| stavros wrote:
| Right, but rather than getting bogged down with the
| analogy, do we agree that performance per watt is what
| matters? It doesn't matter whether you need 2 or 2.43 of
| them to match an RTX 3090, if it has better performance
| per watt, it's better.
| pengaru wrote:
| > Right, but rather than getting bogged down with the
| analogy, do we agree that performance per watt is what
| matters?
|
| No, I am not in agreement with you, the RTX 3090 is
| clearly better at being a GPU.
| hunterb123 wrote:
| > clearly better
|
| better is subjective, it depends on your needs.
|
| sometimes performance per watt is more important
| especially when you're mobile or trying to be
| conservative w/ electricity.
| pengaru wrote:
| > sometimes performance per watt is more important
| especially when you're mobile or trying to be
| conservative w/ electricity.
|
| There is an obvious implicit context when you're
| comparing _performance_ to an RTX 3090, and it is _not_
| mobility or your electric bill.
| hunterb123 wrote:
| There may be only one implicit context for you, which is
| why I said "better" is subjective.
|
| Personally the M1 Ultra would be better imo as a desktop
| GPU than RTX 3090 in certain contexts. Putting it in a
| van, traveling with it, etc.
|
| I'll probably wait until the next go around of them, but
| I'm loving nearing desktop performance w/ reduced loads.
| pengaru wrote:
| The RTX series has lower power variants. When you compare
| against the 3090 you are establishing a context with
| power consumption as a low priority.
|
| Considering the 3090 had over double the performance, the
| lower power parts may well be both faster and more power
| efficient. You'd have to do the comparison to know, if
| that's what you care more about.
| hunterb123 wrote:
| > 3090 had over double the performance
|
| The FPS was closer than double. The flawed benchmark the
| 3090 had double.
|
| I'm not sure how the lower power RTX variants compare,
| maybe that's a closer comparison, has there been a review
| against those?
| Applejinx wrote:
| I would ask if the RTX 3090 is competitive in some of the
| specialized things the Mac chip is made to do: there's a
| hell of a lot of dedicated video processing stuff in
| there that seems to exist to serve Final Cut Pro, not
| gaming. Are we comparing apples and oranges, in the sense
| that video editing at 4k, 5k and beyond might call upon a
| different set of skills than gaming benchmarks? How much
| would it matter if this translated into rather
| proprietary things like ability to pump impossible
| amounts of ProRes video compression on the fly and scrub
| ridiculously giant frame sizes? Are there benchmarks
| along those lines?
| p1necone wrote:
| Performance per watt is very important for devices that
| need to run on battery and for cloud compute where you're
| running thousands of them and saving on
| electricity/cooling would be significant.
|
| But it's almost completely unimportant for always-
| plugged-in-to-wall-socket desktop chips. Because even the
| power consumption of stuff like the 3090 is still pretty
| insignificant on your power bill compared to up front
| cost, and relative to the power consumption of all the
| other stuff in your house.
|
| Remember even the 3090 consumes very little power when
| idle - it's only pulling the rated ~300w when you're
| maxing it out playing heavy games or rendering or
| whatever.
| smoldesu wrote:
| > do we agree that performance per watt is what matters?
|
| No? Certainly not on desktop computers.
| readams wrote:
| It doesn't go as fast as a 3090 even with two. It does go
| about twice as fast as an M1 Max.
| jayd16 wrote:
| First graph with geekbench scores.
|
| RTX3090: 215,034
|
| M1 Ultra: 102,156
| mikhailt wrote:
| Geekbrench has already been debunked a long time ago by
| Anandtech as not sustaining the full load enough to show
| the real pref.
|
| The CPU and GPU tests finishes too fast for the system to
| ramp up to the full power load.
|
| That's not to say that M1 Ultra is equal or better than
| 3090, just that Geekbench isn't the right test for this.
|
| You can see more detailed analysis here:
| https://www.youtube.com/watch?v=pJ7WN3yome4
| smoldesu wrote:
| Is that an indication that the M1 has latency issues vis-
| a-vis power delivery, then?
| g42gregory wrote:
| That's nice to know, but for me the M1 Ultra performance
| is not very relevant. Apple has a history of introducing
| proprietary architectures, not-quite-compatibles
| interfaces and non-standard software.
|
| MacOS has nice UI? Great, I will use it. Anything
| computationally intensive? - Why bother? I would use only
| Linux on the back end. It's very easy to couple mid-range
| Apple product with high-end Linux workstation at your
| desk. I don't need to run compute-intensive Deep Learning
| load while I am at the coffee shop.
| jayd16 wrote:
| >I don't need to run compute-intensive Deep Learning load
| while I am at the coffee shop.
|
| Well they're putting it in a desktop and claiming desktop
| performance...
|
| If Apple is going to put up graphs saying they beat i9
| and 3090RTX machines, it's worth investigating even if I
| plan to put Linux on it.
|
| In this instance it seems like the reality doesn't match
| the hype as well as in the laptop space, but if it had I
| would surly consider the option.
| arecurrence wrote:
| The GB Compute test is widely viewed as worthless for the
| m1 Ultra chip because it's too fast for the chip to ramp
| up. It's even too short for the m1 Max.
| ohgodplsno wrote:
| No. You have two electric cars both going at 80 mph, with
| twice the weight. Strap them together aaaaaand.... They're
| still two electric cars going together at 80mph. But if
| you're counting wheel rotations, well now you have 8 spinning
| wheels. Top speed will still not match the Lamborghini.
| stavros wrote:
| What's it called when you make an analogy for convenience
| and someone stretches it way past its breaking point as if
| it still applies to the original point?
| ask_b123 wrote:
| https://english.stackexchange.com/questions/305332/is-
| there-...
|
| > overstretched or overblown analogy / _reductio ad
| absurdum_
| mikhailt wrote:
| According to Max Tech analysis here,
| https://www.youtube.com/watch?v=pJ7WN3yome4, the GPU seems to
| max out at 60-70w at the moment for full load, even if the heat
| is well-managed. If Apple let it go up to 100-120w, we could
| see much more performance out of the GPU but for now, something
| is not set up correctly on macOS.
|
| So, we may have to wait to see what Apple says or if they'll
| unlock the full performance in future software updates. They
| did do this in the past with M1 hardware a few times.
|
| Same result here from another channel where it appears GPU
| isn't at full power atm:
| https://www.youtube.com/watch?v=0CUoHwMtRsE&t=1172s
| GeekyBear wrote:
| The strategy of creating a large die with lots of execution
| units, then running the chip at a lower clock speed for power
| efficiency has been Apple's way of doing things for a while
| now.
|
| >Apple has built a wide enough GPU that they can keep
| clockspeeds nice and low on the voltage/frequency curve,
| which keeps overall power consumption down. The RTX 3090, by
| contrast, is designed to chase performance with no regard to
| power consumption, allowing NVIDIA to get great performance
| out of it, but only by riding high on the voltage frequency
| curve.
|
| https://www.anandtech.com/show/17306/apple-
| announces-m1-ultr...
|
| This strategy is something that Anandtech's been calling out
| since the iPhone 5s chip.
|
| >Brian and I have long been hinting at the sort of ridiculous
| frequency/voltage combinations mobile SoC vendors have been
| shipping at for nothing more than marketing purposes. I
| remember ARM telling me the ideal target for a Cortex A15
| core in a smartphone was 1.2GHz. Samsung's Exynos 5410 stuck
| four Cortex A15s in a phone with a max clock of 1.6GHz. The
| 5420 increases that to 1.7GHz. The problem with frequency
| scaling alone is that it typically comes at the price of
| higher voltage. There's a quadratic relationship between
| voltage and power consumption, so it's quite possibly one of
| the worst ways to get more performance.
|
| https://www.anandtech.com/show/7335/the-iphone-5s-review/2
| mikhailt wrote:
| > The strategy of creating a large die with lots of
| execution units, then running the chip at a lower clock
| speed for power efficiency has been Apple's way of doing
| things for a while now.
|
| > This strategy is something that Anandtech's been calling
| out since the iPhone 5s chip.
|
| Not sure that applies to M1 Ultra that's clearly marketed
| as a desktop SoC, not a mobile device or laptop SoC.
| Optimizing for energy efficiency or battery life is not the
| same as optimizing for performance first, which is what M1
| Ultra is designed for. Not to mention, there's a reason
| Windows and Linux have multiple power profiles that changes
| how schedulers and power loads work. macOS has "high power
| mode" for m1 on laptops and yet strangely, it is not even
| available on the Mac Studio.
|
| Keep in mind that Apple showed a graph that clearly shows
| the GPU was at 100-110w. Why do that if they won't run it
| at that wattage or even talk about 3090 in the first place?
| Why ruin their reputation over a silly little thing?
| Everyone on the planet will easily debunk that.
|
| Also, why even bother adding a high-quality cooling system
| that the Mac Studio clearly don't need since both CPU/GPU
| are going to be capped at lower power wattage?
|
| We'll find out eventually one way or another.
| GeekyBear wrote:
| > Not sure that applies to M1 Ultra that's clearly
| marketed as a desktop SoC, not a mobile device or laptop
| SoC.
|
| The M1 Ultra is using the same CPU cores and GPU cores as
| an iPhone, just more of them.
|
| It's long been their strategy across the board. Throw
| silicon die area at tons of execution units and run the
| chip at a lower clock speed for power efficiency.
|
| For example, the 2012 A5X iPad chip had about the same
| die area as 4 core Ivy Bridge, which was huge for a
| mobile device SOC.
|
| https://www.anandtech.com/show/6330/the-iphone-5-review/4
|
| Since they are selling the whole widget, they aren't as
| dependent on minimizing die area to maximize their profit
| margin.
| addaon wrote:
| > The M1 Ultra is using the same CPU cores and GPU cores
| as an iPhone, just more of them.
|
| Not only that, but it's running them at basically the
| same clock speeds as the rest of the M1 family, and even
| as the iPhone. This suggests that they're running at not
| far above a best efficiency point, rather than way past
| the elbow of the voltage curve as is traditional for
| desktop chips.
| jleahy wrote:
| I would have thought this approach would cause issues for
| leakage power? Better to run fast then sleep and power gate
| to cut leakage, especially at smaller geometries.
|
| Or do you think they are mostly using HVt cells?
| GeekyBear wrote:
| Doesn't power leakage get much, much worse as temperature
| increases?
| ohgodplsno wrote:
| > If Apple let it go up to 100-120w
|
| GPUs and CPUs are not simply "pump more power in it to go
| faster hurr hurr". It is extremely likely that the M1's
| maximum performance is attained already. Otherwise, the M1
| ultra would have pumped a little bit more wattage and gotten
| better single core perfs, but guess what, it doesn't.
| mikhailt wrote:
| > GPUs and CPUs are not simply "pump more power in it to go
| faster hurr hurr".
|
| I agree, they're not a simple "throw more power and it'll
| go faster" problem. However, that doesn't mean they do not
| go faster if you increase clock speed and give it a little
| more power.
|
| After all, overclocking would be utterly pointless over the
| past two decades and Intel/AMD wouldn't benefit from the
| so-called turbo boost technology. (I know M1 doesn't have
| turbo boost).
|
| Just to be clear, yes, there's a limit to everything of
| course. There's a balance where certain higher clocks would
| cause a bottleneck in the rest of the system since this is
| a SoC.
|
| In this case, I used the 100-120w number because Apple used
| that number in their GPU graph showing the GPU running at
| ~105w, why bother showing that if it can't accomplish it?
| Here's the graph:
| https://images.anandtech.com/doci/17306/Apple-M1-Ultra-
| gpu-p...
|
| Note that Apple shows 60w for their CPU graph
| (https://images.anandtech.com/doci/17306/Apple-M1-Ultra-
| cpu-p...), which did actually match up and does show the
| double CPU pref that Apple stated.
|
| I didn't pull that number out of nowhere. If Apple used 60w
| in that graph, then I wouldn't be here.
|
| > Otherwise, the M1 ultra would have pumped a little bit
| more wattage and gotten better single core perfs, but guess
| what, it doesn't.
|
| That's because of two reasons:
|
| 1. There is only so much data wecan "process" within the
| same core, more power does not change the data itself, we
| can only increase the clock speed to finish it faster or
| widen the amount of data that can fit in the same core.
| Also of note that Intel Turbo Boost does actually help with
| single-core thread pref by boosting the single core speed
| beyond the base line. 2. M1 max clock speed is set to
| 3.2Ghz (single core), it can't go faster than this (set by
| Apple). Throwing more wattage here wouldn't change anything
| but throwing more power to allow all 16 P cores to hit 3.2
| does improve performance as long as it isn't overheating. I
| don't think M1 Ultra does hit 3.2 on all 16 cores (it might
| have hit 60w max first; I might be wrong but I saw mostly
| just 3.0ghz in Max video but I have to go back to review
| it).
|
| Again, this is not about single-core performance, this is
| about GPU performance, which is extremely parallelizable
| and more comparable to the multi-core performance instead.
| mrtksn wrote:
| You know what, Mercedes uses Renault engines in some of its
| cars that are intended for luxury daily use.
|
| Better != Faster or Better != More powerful
|
| Better means closer to fulfil the goals and often the goals are
| inversely proportional(i.e. cheaper, faster, cooler are the
| goals but faster meaning less cheaper and hotter).
|
| Apple might be guilty of misrepresentation of raw computational
| power but with M1 the experience of using computers has become
| significantly much better.
| davewritescode wrote:
| Mercedes only uses Renault engines in its absolute bottom of
| the barrel products.
|
| Mercedes, unlike BMW with Mini and Audi with the VW parts bin
| has to outsource the low end stuff.
| marcodiego wrote:
| Actually better depends on the criteria. A Volkswagen Beetle
| can be better than any of the listed cars if you consider the
| type of road an easy of repair.
| homarp wrote:
| for the curious https://luxurycarsa2z.com/which-mercedes-
| have-a-renault-engi...
| phyalow wrote:
| I am quite happy with my Macbook Pro M1 Max, which remains
| virtually ice cold even running heavy compute (I am still yet
| to hear the fans spin up) and having a battery life that can
| accomodate easily 12+ hours of real work. Any Windows laptop
| with a 3090 inside it, will be a hot and loud brick in
| comparison. Granted I dont play video games, and my ML compute
| is run in the cloud it fits my needs nicely. I would argue in
| terms of sheer generalised productivity nothing else comes
| close.
| aenis wrote:
| Same here. Great computer. The only thing I am missing is a
| bit more reliable support for waking up external monitors.
| Between my now two m1/m1 max macs and 6 screens I use, one
| (an Asus 2k display which is slow to wake up from sleep)
| often requires unplugging to wake up. That is not a m1 issue
| - its been like that with macos since... forever, but it's
| irritating.
| Osiris wrote:
| I use a 4 monitor setup for my workstation. I switched to a
| desktop computer in part to not have to deal with this
| exact issue with a MacBook.
| flatiron wrote:
| One big thing holding me back from jumping on one is macOS.
| I'm sure it's ok but I've been using Linux for ~25 years
| that's a lot of muscle memory to lose. I really wish apple
| supported Linux better.
| brettdong wrote:
| Hope Asahi Linux will get mature and be daily driver
| usable in the near future.
| srcreigh wrote:
| What is your monitor layout?
| aenis wrote:
| I have those 6 screens plugged to two different macs, so
| just 3 on each. Big screen in the center, and two in
| portrait mode on the sides. One uses 43" 4k for the
| center screen, the other uses a 34" ultrawide.
|
| The m1 max drives its displays directly, the m1 via a
| starlink tripple 4k displaylink hub.
|
| The problems, in my case, are specific to monitors that
| take a good few seconds to come back from standby. Macos
| seems to be impatient and considers them dead. I am down
| to just 1 screen with this problem, and will replace it
| at some point.
| grishka wrote:
| The only way I was able to get mine to spin the fans (at
| seemingly full speed!) was to build the Telegram Android app
| in "afat" variant. The NDK (which is _still_ x86-only, shame
| on you Google) fully loads all cores for an extended period
| of time. The fans start several minutes in.
| oblak wrote:
| I saw this earlier but since I can only see the main image and
| reading didn't net an answer - what, exactly are they measuring?
| I am not expert but to my understand part of ryzen success is
| high yields due to small size. Having a thing 3x its size, while
| impressive, is not super surprising.
|
| Soon it will be 6x, then more. Zen 5 small cores are supposed to
| be just zen 4 cores iirc.
| ajaimk wrote:
| M1 Ultra has a 3x larger heat-spreader than Ryzen. That's the
| entire package and not the silicon chip itself.
|
| Also, let's not forget that this heat-spreader includes the RAM
| under it.
|
| Here's the funny bit: the silicon die for M1 Ultra is actually
| more than 4x bigger than Ryzen 5000 series. M1 Max was 432 mm2;
| implying 864 mm2 of silicon. Ryzen 5000 has a 84 mm2 CCD and 124
| mm2 IO die.
| [deleted]
| loosescrews wrote:
| I think the Ryzen 6000 laptop chips are a better comparison.
| They are also monolithic, contain a powerful GPU, and are
| manufactured on a more similar node. Ryzen 6000 has a die size
| of 210mm2.
| jagger27 wrote:
| It's a slightly better comparison, however Ryzen 6000 is on
| N6 (which is an optimized N7 process), whereas M1 Max is on
| N5. N5 is considerably denser than N6.
|
| AMD's next round of APUs on 5 nm will offer a better die area
| comparison.
| opisthenar84 wrote:
| I remember reading that at 5nm, electron tunnelling through
| transistor gates would be a major problem. What happened to
| that?
| ace2358 wrote:
| I think the do funky stuff with the 3D or 2.5D geometry
| of the gate so that the surface area (between
| transistors) is larger for a smaller footprint.
|
| Someone correct me if I'm wrong!
| jagger27 wrote:
| Yes, that tech is called FinFET and it has been in use
| for many years. The new hot thing is GAAFET (Gate-all-
| around FET).
| aunty_helen wrote:
| 5nm doesn't mean anything is actually 5nm. It's the "5nm
| node"
|
| Marketing dept has trumped physics.
| wmf wrote:
| M1 Pro is 251.3 mm2 with a ~5 TFLOPS GPU and Rembrandt is 210
| mm2 for 3.4 TFLOPS so the comparison checks out.
| redisman wrote:
| I'm guessing the price scales at least linearly with the size.
| I do hope the new trend isn't giant chips because the price and
| availability would be horrible
| mcronce wrote:
| It's generally superlinear due to yield effects - this is a
| bit oversimplified, but in particular, with lithography
| defects being more or less randomly placed throughout the
| wafer, a defect in a 100mm2 die impacts a far smaller
| percentage of the dies in the wafer than a defect in an
| 800mm2 die. Assuming a constant defect rate per wafer, you'll
| end up with drastically less usable wafer area with large
| dies than with small ones.
|
| This was _a_ major reason that AMD was able to price Ryzen so
| aggressively even at the high end - a high-end CPU is just
| made of more of the same small dies used in low-end CPUs,
| albeit requiring a better bin, instead of having to make a
| much larger single die.
|
| The constant defect rate isn't typically a correct assumption
| across different products (e.g. different CPU or GPU dies) -
| different feature designs will have differing defect rates.
| Maybe Apple was able to design the M1 Ultra's features so
| that defect rate is very very low, though - I don't really
| know much about that silicon.
| kzrdude wrote:
| This is probably M1 Ultra is designed to be two M1 Max
| glued together, more or less.
| addaon wrote:
| The M1 Ultra would be at or over the reticle limit as a
| single die, depending on how much area could be saved by
| removing the inter-die transceivers.
| reitzensteinm wrote:
| I don't think that's a deal breaker if you're willing to
| lay thick interconnects, treating the connection much
| like you'd treat something like EMIB.
|
| After all, Cerebras made a functional wafer sized chip.
| tuvan wrote:
| Is it actually 2 different pieces of silicon though? I
| thought they were 2xM1 Max as a single silicon. Which
| wouldn't help with defect rates
| addaon wrote:
| It is two separate dice, with an EMIB-style interconnect.
| [deleted]
| mcronce wrote:
| That makes a lot of sense. The Max is still pretty
| fucking big, but two of those is a lot more manageable
| than a single monolothic die double the size
| ace2358 wrote:
| I just listened to the latest atp.fm podcast. I think
| John said that it actually is two adjacent M1 max dies.
| The Ultra can't be made from two random working die. They
| have to physically be adjacent on the wafer.
|
| I don't know if that means it's physically one due
| though.
| wmf wrote:
| Usually dies aren't mirrored on the wafer so I'm calling
| citation needed on this one.
| kllrnohj wrote:
| Apple has both a 48 core & 64 core GPU variant of the M1
| Ultra. Since the GPU is the largest single element of the
| M1 Max / Ultra, that's how they are handling yields. And
| it's then +$1000 for the 64 core variant over the 48 core,
| suggesting probably "normal" TSMC yields and then a very
| healthy profit margin. Apple does have the luxury of just
| charging whatever they want for this SoC since they are a
| market unto themselves. So even if yields are bad, they can
| just pass that cost along to the consumer.
|
| Not entirely unlike what Intel used to do with Xeons before
| AMD re-entered the picture and what Nvidia kinda does with
| the likes of the A100.
| imachine1980_ wrote:
| probably this is why they decide to do it as last m1
| product, in genera the more you use a process -> less error
| in the production process happens.
| Spartan22 wrote:
| Why not spread the processor across the whole Macbook
| motherboard, wouldn't this bring huge cooling benefits with no
| major downsides?
| pure_simplicity wrote:
| The downside would be latency increase / speed reduction.
| 01100011 wrote:
| Speed of light is something like one foot per nanosecond. So
| at 2GHz you go 6" per clock, assuming you can actually go the
| speed of light which, in a wire, you can't.
| bufferoverflow wrote:
| The whole point of going to smaller nm process is to bring
| things closer, so it takes less time to move information.
| admax88qqq wrote:
| The shorter signals have to travel, the less time and power
| it takes to do so.
| WhitneyLand wrote:
| Yep, It's a poor comparison, the Mac chip has up to 128 gig of
| RAM under there, and other stuff like SSD controller, etc..
|
| It is kind of awesome to look at though.
| GeekyBear wrote:
| The technically interesting bit about the Ultra's GPU is that it
| makes multiple GPU dies look like one physical GPU to software:
|
| >if you could somehow link up multiple GPUs with a ridiculous
| amount die-to-die bandwidth - enough to replicate their internal
| bandwidth - then you might just be able to use them together in a
| single task. This has made combining multiple GPUs in a
| transparent fashion something of a holy grail of multi-GPU
| design. It's a problem that multiple companies have been working
| on for over a decade, and it would seem that Apple is charting
| new ground by being the first company to pull it off.
|
| https://www.anandtech.com/show/17306/apple-announces-m1-ultr...
| kyriakos wrote:
| Isn't this an OS feature?
| wmf wrote:
| No, I think the OS sees it as a single GPU.
| [deleted]
| smoldesu wrote:
| You've posted this maybe 3 times in the past week, all on
| articles that have nothing to do with the multi-die nature of
| the M1 Ultra. What is the salient information here that we
| couldn't ascertain from watching Apple's keynote or scrolling
| through their landing page?
| rat9988 wrote:
| He is just enthusiastic about some specific point, and wants
| to discuss it with HN readers. No harm in it.
| GeekyBear wrote:
| This is literally a point made in this article too.
|
| How is it not relevant in this thread?
| tedunangst wrote:
| So why not quote this article instead of linking and
| quoting a different article?
| GeekyBear wrote:
| Isn't this a terrible choice of websites if you are upset
| by discussion that calls out a genuinely new
| technological feat in the computer industry?
| smoldesu wrote:
| It's not new, AMD has been doing this for a while with
| their Infinity Fabric technology (and before that,
| technologies like SLI and Crossfire achieved the same
| thing on a software level), and you've posted about it
| several times already. It's also not much of a "feat",
| since it was done by a company with 200 billion dollars
| in liquid cash that could probably cure cancer if they
| wanted to. Not only is it completely uninteresting from a
| technical standpoint, it's also something you've
| repeatedly posted on threads that outline the downsides
| of doing this (like yesterday's benchmark results,
| showing how strapping 2 M1 Max GPUs together doesn't
| yield twice as much compute power).
|
| Frankly, it's annoying, and pretty far from
| intellectually stimulating. What is there to discuss
| anyways? Do we all need to pat the world's largest tech
| conglomerate on the back for doing the same thing as
| other companies did 6+ years ago, with worse performance
| results and misleading marketing material to boot?
| GeekyBear wrote:
| >It's not new, AMD has been doing this for a while with
| their Infinity Fabric technology (and before that,
| technologies like SLI and Crossfire achieved the same
| thing on a software level)
|
| None of those technologies make multiple GPU dies look
| like a single physical GPU to software.
|
| As covered by the Anandtech quote you dislike so much:
|
| > This has made combining multiple GPUs in a transparent
| fashion something of a holy grail of multi-GPU design.
| It's a problem that multiple companies have been working
| on for over a decade, and it would seem that Apple is
| charting new ground by being the first company to pull it
| off.
|
| However, that article does go into additional detail.
|
| >Unlike multi-die/multi-chip CPU configurations, which
| have been commonplace in workstations for decades, multi-
| die GPU configurations are a far different beast. The
| amount of internal bandwidth GPUs consume, which for
| high-end parts is well over 1TB/second, has always made
| linking them up technologically prohibitive. As a result,
| in a traditional multi-GPU system (such as the Mac Pro),
| each GPU is presented as a separate device to the system,
| and it's up to software vendors to find innovative ways
| to use them together. In practice, this has meant having
| multiple GPUs work on different tasks, as the lack of
| bandwidth meant they can't effectively work together on a
| single graphics task.
|
| https://www.anandtech.com/show/17306/apple-
| announces-m1-ultr...
| [deleted]
| olliej wrote:
| I'd rather see the delidded version as this is comparing a CPU,
| to CPU+RAM and without delidding we don't know how much space is
| taken by the ram dies.
|
| As it is, this is a kind of clickbaity article
___________________________________________________________________
(page generated 2022-03-20 23:01 UTC)