[HN Gopher] Apple M3 Ultra
       ___________________________________________________________________
        
       Apple M3 Ultra
        
       Author : ksec
       Score  : 726 points
       Date   : 2025-03-05 13:59 UTC (9 hours ago)
        
 (HTM) web link (www.apple.com)
 (TXT) w3m dump (www.apple.com)
        
       | nottorp wrote:
       | > support for more than half a terabyte of unified memory
       | 
       | Soldered?
        
         | simlevesque wrote:
         | As are all Apple M devices.
        
         | universenz wrote:
         | Is there a single Apple SoC where they've provided removable
         | ram? Not that I can recall.
        
           | danpalmer wrote:
           | Is there even an existing replaceable memory standard that
           | would meet the current needs of Apple's "Unified Memory"
           | architecture? I'm not an expert but I'd suspect probably not.
           | The bus probably looks a lot more like VRAM on GPUs, and I've
           | never seen a GPU with replaceable RAM.
        
             | jsheard wrote:
             | CAMM2 could kinda work, but each module is only 128-bit so
             | I think the furthest you could possibly push it is a
             | 512-bit M Max equivalent with CAMM2 modules north, east,
             | west and south of the SOC. There just isn't room to put
             | eight modules right next to the SOC for a 1024-bit bus like
             | the M Ultra.
        
               | eigenspace wrote:
               | Framework said that when they built a Strix Halo machine,
               | AMD assigned an engineer to work with them on seeing if
               | there's a way to get CAMM2 memory working with it, and
               | after a bunch of back and forth it was decided that CAMM2
               | still made the traces too long to maintain proper signal
               | integrity due to the 256 bit interface.
               | 
               | These machines have a 512 bit interface, so presumably
               | even worse.
        
               | jsheard wrote:
               | Yeah, but AMDs memory controllers are really finnicky.
               | That might have been more of a Strix Halo issue than a
               | CAMM2 issue.
        
               | eigenspace wrote:
               | Entirely possible. Obviously Apple wouldn't have been
               | interested in letting you upgrade the RAM even if it was
               | doable.
               | 
               | I'd love to have more points of comparison available, but
               | Strix Halo is the most analogous chip to an M-series chip
               | on the market right now from a memory point of view, so
               | it's hard to really know anything.
               | 
               | I very much hope CAMM2 or something else can be made to
               | work with a Strix-like setup in the future, but I have my
               | doubts.
        
               | zamadatix wrote:
               | Current (individual, not counting dual socketed) AMD Epyc
               | CPUs have 576 GB/s over a 768 bit bus using socketed
               | DIMMs.
        
               | eigenspace wrote:
               | My understanding is that works out due to the lower clock
               | speeds of those RAM modules though right?
               | 
               | It's getting that bandwdith by going very wide on very
               | very very many channels, rather than trying to push a
               | gigantic amount of bandwidth through only a few channels.
        
               | zamadatix wrote:
               | Yeah, "channels" are just a roundabout way to say "wider
               | bus" and you can't get too much past 128 GB/s of memory
               | bandwidth without leaning heavily into a very wide bus
               | (i.e. more than the "standard" 128 bit we're used to on
               | consumer x86) regardless who's making the chip. Looking
               | at it from the bus width perspective:
               | 
               | - The AI Max+ 395 is a 256 bit bus ("4 channels") of 8000
               | MHz instead of 128 bits ("2 channels") of 16000 MHz
               | because you can't practically get past 9000 MHz in a
               | consumer device, even if you solder the RAM, at the
               | moment. Max capacity 128 GB.
               | 
               | - 5th Gen Epyc is a 768 bit bus ("12 channels") of 6000
               | MHz because that lets you use a standard socketed setup.
               | Max capacity 6 TB.
               | 
               | - M3 Ultra is a 1024 bit bus ("16 channels") of "~6266
               | MHz" as it's 2x the M3 Max (which is 512 bits wide) and
               | we know the final bandwidth is ~800 GB/s. Max capacity
               | 512 GB.
               | 
               | Note: "Channels" is in quotes because the number of bits
               | per channel isn't actually the same per platform (and
               | DDR5 is actually 2x32 bit channels per DIMM instead of
               | 1x64 per DIMM like older DDR... this kind of shit is why
               | just looking at the actual bit width is easier :p).
               | 
               | So really the frequencies aren't that different even
               | though these are completely different products across
               | completely different segments. The overwhelming factor is
               | bus width (channels) and the rest is more or less design
               | choice noise from the perspective of raw performance.
        
             | hoseja wrote:
             | It's really unfortunate that GPUs aren't fully customizable
             | daughterboards, isn't it.
        
             | nottorp wrote:
             | I thought so too when they launched the M1, but I soon got
             | corrected.
             | 
             | The memory bus is the same as for modules, it's just very
             | short. The higher end SoCs have more memory bandwidth
             | because the bus is wider (i.e. more modules in parallel).
             | 
             | You could blame DDR5 (who thought having a speed
             | negotiation that can go over a minute at boot is a good
             | idea?), but I blame the obsession with thin and the ability
             | to overcharge your customers.
             | 
             | > I've never seen a GPU with replaceable RAM
             | 
             | I still have one :) It's an ISA Trident TVGA 8900 that I
             | personally upgraded from 512k VRAM to one full megabyte!!!
        
         | reaperducer wrote:
         | _Soldered?_
         | 
         | Figure out a way to make it unified without also soldering it,
         | and you'll be a billionaire.
         | 
         | Or are you just grinding a tired, 20-year-old axe.
        
           | jonjojojon wrote:
           | Like all intel/amd integrated graphics that use the systems
           | ram as vram?
        
           | rsynnott wrote:
           | _That_, in itself, wouldn't be that difficult, and there are
           | shared-memory setups that do use modular memory. Where you'd
           | really run into trouble is making it _fast_; this is very,
           | very high bandwidth memory.
        
         | ZekeSulastin wrote:
         | Not even Framework has escaped from soldered RAM for this kind
         | of thing.
        
         | klausa wrote:
         | It's not soldered, it's _on the package_ with the SoC.
        
           | georgeburdell wrote:
           | Probably on package at best
        
             | klausa wrote:
             | Right, yes, sorry for imprecise language!
        
               | riidom wrote:
               | Thanks for clarifying
        
           | eigenspace wrote:
           | It is _not_ on die. It's soldered onto the package.
           | 
           | There's a good reason it's soldered, i.e. the wide memory
           | interface and huge bandwidth mean that the extra trace
           | lengths needed for an upgradable RAM slot would screw up the
           | memory timings too much, but there's no need to make false
           | claims like saying it's on-die.
        
             | sschueller wrote:
             | > RAM slot would screw up the memory timings
             | 
             | Existing ones possibly but why not build something that
             | lets you snap-in a BGA package just like we snap in CPUs on
             | full sized PC mainboards?
        
               | eigenspace wrote:
               | The longer traces are the problem. They want these
               | modules as physically close as possible to the CPU to
               | make the timings work out and maintain signal integrity.
               | 
               | It's the same reason nobody sells GPUs that have user
               | upgradable non-soldered GDDR VRAM modules.
        
         | varispeed wrote:
         | You know that memory can be "easily" de-soldered and soldered
         | at home?
         | 
         | The issue is availability of chips and most likely you have to
         | know which components to change so the new memory is
         | recognised. For instance that could be changing a resistor to
         | different value or bridging certain pads.
        
           | A4ET8a8uTh0_v2 wrote:
           | This viewpoint is interesting. It is not exactly inaccurate,
           | but it does appear to be missing a point. Soldering in itself
           | is a valuable and useful skill, but I can't say you can just
           | get in and start de-soldering willy-nilly as opposed to
           | opening a box and upgrading ram by plopping stuff in a
           | designated spot.
           | 
           | What if both are an issue?
        
             | varispeed wrote:
             | Do you know that "plopping stuff in a designated spot" can
             | also be out of reach to some people? I know plenty who
             | would give their computer to a tech do to the upgrade for
             | them even if they are shown in person how to do all the
             | steps. Soldering is just one step (albeit fairly big) above
             | that. But the fact this can be done at home with fairly
             | inexpensive tools, means tech person with reasonable skill
             | could do it, so such upgrade could be accessible in
             | computer/phone repair shop if parts were available to do
             | so. Soldering is not a barrier - what I am trying to say.
        
       | universenz wrote:
       | 96gb on baseline model m3 ultra with a max of 512gb! Looks like
       | they're leaning in hard with the AI crowd.
        
       | datadrivenangel wrote:
       | Unclear what devices this will be in outside of the mac studio.
       | Also most of the comparisons were with M1 and M2 chips, not M4.
        
         | reaperducer wrote:
         | _most of the comparisons were with M1 and M2 chips, not M4._
         | 
         | Is anyone other than a vanishingly small number of hard core
         | hobbiests going to upgrade from an M4 to an M4 Ultra?
        
           | nordsieck wrote:
           | > Is anyone other than a vanishingly small number of hard
           | core hobbiests going to upgrade from an M4 to an M4 Ultra?
           | 
           | I expect that the 2 biggest buyers of M4 Ultra will be people
           | who want to run LLMs locally, and people who want the highest
           | performance machine they can get (professionals), but are
           | wedded to mac-only software.
        
             | bredren wrote:
             | Anecdotal, and reasonable criticisms of the release aside,
             | OpenAI's gpt-4.5 introduction video was done from a hard-
             | to-miss Apple laptop.
             | 
             | It is reasonable to say many folks in the field prefer to
             | work on mac hardware.
        
         | dlachausse wrote:
         | It is a bit misleading to do that, but in fairness to Apple,
         | almost nobody is upgrading to this from an M4 Mac, so those are
         | probably more useful comparisons.
        
       | mythz wrote:
       | Ultra disappointing, they waited 2 years just to push out a
       | single gen bump, even my last year's iPad Pro runs M4.
        
         | heeton wrote:
         | For AI workflows that's quite a lot cheaper than the
         | alternative in GPUs.
        
           | mythz wrote:
           | Yeah VRAM option is good (if it performs well), just sad we'd
           | have to drop 10K to access it tied to a prev gen M3 when
           | they'll likely have M5 by the end of the year.
           | 
           | Hard to drop that much cash on an outdated chip.
        
       | TheTxT wrote:
       | 512GB unified memory is absolutely wild for AI stuff! Compared to
       | how many NVIDIA GPUs you would need, the pricing looks almost
       | reasonable.
        
         | InTheArena wrote:
         | A server with 512GB of high-bandwidth GPU addressable RAM in a
         | server is probably a six figure expenditure. If memory is your
         | constrain, this is absolutely the server for you.
         | 
         | (sorry, should have specified that the NPU and GPU cores need
         | to access that ram and have reasonable performance). I
         | specified it above, but people didn't read that :-)
        
           | Numerlor wrote:
           | A basic brand new server can easily do 512gb. Not as fast as
           | soldered memory but it should be maybe mid to high 5 figures
        
             | la_oveja wrote:
             | 5 figures? can be done in 6k
             | https://x.com/carrigmat/status/1884244369907278106
        
               | InTheArena wrote:
               | That's CPU only memory, not high bandwidth, and not
               | addressable by the GPU.
        
               | jeffbee wrote:
               | There isn't anything particularly high-bandwidth about
               | Apple's DDR5 implementation, either. They just have a lot
               | of channels, which is why I compared it to a 24-channel
               | EPYC system. I agree that their integrated GPU
               | architecture hits a unique design point that you don't
               | get from nvidia, who prefer to ship smaller amounts of
               | very different kinds of memory. Apple's architecture may
               | be more suited to some workloads but it hasn't exactly
               | grabbed the machine learning market.
        
               | buildbot wrote:
               | M3 Ultra has 819GB/s, and a single epyc cpu with 12
               | channels has 460GB/s. As far as I know, llama.cpp and
               | friends don't scale across multiple sockets so you can't
               | use a dual socket Turin system to match the M3 Ultra.
               | 
               | Also, 32GB DDR5 RDIMMS are ~200, so that's 5K for 24
               | right there. Then you need 2x CPUs, at ~1K for the
               | cheapest, and you need 2, and then a motherboard that's
               | another 1K. So for 8K (more, given you need a case, power
               | supply, and cooling!), you get a system with about half
               | the memory bandwidth, much higher power consumption, and
               | very large.
        
               | adrian_b wrote:
               | Partial correction, an Epyc CPU with 12 channels has 576
               | GB/s, i.e. DDR5-6000 x 768 bits. That is 70% of the Apple
               | memory bandwidth, but with possibly much more memory (768
               | GB in your example).
               | 
               | You do not need 2 CPUs. If however you use 2 CPUs, then
               | the memory bandwidth doubles, to 1152 GB/s, exceeding
               | Apple by 40% in memory bandwidth. The cost of the memory
               | would be about the same, by using 16 GB modules, but the
               | MB would be more expensive and the second CPU would add
               | to the price.
        
               | buildbot wrote:
               | Ah, I didn't realize they'd upped the memory bandwidth to
               | DDR5-6000 (vs 4800), thanks for the correction!
               | 
               | The memory bandwidth does not double, I believe. See this
               | random issue for a graph that has single/dual socket
               | measurements, there is essentially no difference:
               | https://github.com/abetlen/llama-cpp-python/issues/1098
               | 
               | Perhaps this is incorrect now, but I also know with 2x
               | 4090s you don't get higher tokens per second than 1x 4090
               | with llama.cpp, just more memory capacity.
               | 
               | (All if this only applies to llama.cpp, I have no
               | experience with other software and how memory bandwidth
               | may scale across sockets)
        
               | adrian_b wrote:
               | The memory bandwidth does double, but in order to exploit
               | it the program must be written and executed with care in
               | the memory placement, taking into account NUMA, so that
               | the cores should access mostly memory attached to the
               | closest memory controller and not memory attached to the
               | other socket.
               | 
               | With a badly organized program, the performance can be
               | limited not by the memory bandwidth, which is always
               | exactly double for a dual-socket system, but by the
               | transfers on the inter-socket links.
               | 
               | Moreover, your link is about older Intel Xeon Sapphire
               | Rapids CPUs, with inferior memory interfaces and with
               | more quirks in memory optimization.
        
               | buildbot wrote:
               | Yes, I believe in theory a correctly written program
               | could scale across sockets, depending on the problem at
               | hand.
               | 
               | But where is your data? For llama.cpp? For whatever dual
               | socket CPU system you want. That's all I am claiming.
        
               | adrian_b wrote:
               | Googling for what you ask has found immediately this
               | discussion:
               | 
               | https://github.com/ggml-org/llama.cpp/discussions/11733
               | 
               | about the scaling of llama.cpp and DeepSeek on some dual-
               | socket AMD systems.
               | 
               | While it was rather tricky, after many experiments they
               | have obtained an almost double speed on two sockets,
               | especially on AMD Turin.
               | 
               | However, if you look at the actual benchmark data, that
               | must be much lower than what is really possible, because
               | their test AMD Turin system (named there P1) had only two
               | thirds of the memory channels populated, i.e. performance
               | limited by memory bandwidth could be increased by 50%,
               | and they had 16-core CPUs, so performance limited by
               | computation could be increased around 10 times.
        
               | aurareturn wrote:
               | CPUs do not have enough compute typically. You'll be
               | compute bottlenecked before bandwidth if the model is
               | large enough.
               | 
               | Time to first token, context length, and tokens/s are
               | significantly inferior on CPUs when dealing with larger
               | models even if the bandwidth is the same.
        
               | adrian_b wrote:
               | One big server CPUs can have a computational capability
               | similar to a mid-range desktop NVIDIA GPU.
               | 
               | When used for ML/AI applications, a consumer GPU has much
               | better performance per dollar.
               | 
               | Nevertheless, when it is desired to use much more memory
               | than in a desktop GPU, a dual-socket server can have
               | higher memory bandwidth than most desktop GPUs, i.e. more
               | than an RTX 4090, and a computational capability that for
               | FP32 could exceed an RTX 4080, but it would be slower for
               | low-precision data where the NVIDIA tensor cores can be
               | used.
        
               | KeplerBoy wrote:
               | addressable is a weird choice of words here.
               | 
               | CUDA has had managed memory for a long time now. You
               | absolutely can address the entire host memory from your
               | GPU. It will fetch it, if it's needed. Not fast, but
               | addressable.
        
               | p_ing wrote:
               | Windows has been doing this since what... the AGP era?
               | Though this is a function of the ISA rather than the OS.
        
               | Numerlor wrote:
               | Ah seems like I remembered the CPU price for a higher
               | tier CPU which can cost the 6k on their own.
               | 
               | Thinking about it you can get a decent 256gb on consumer
               | platforms now too, but the speed will be a bit crap and
               | would need to make sure the platform ully supports ECC
               | UDIMMs
        
           | behnamoh wrote:
           | except that you cannot run multiple language models on Apple
           | Silicon in parallel
        
             | kevin42 wrote:
             | I'm curious why not. I am running a few different models on
             | my mac studio. I'm using llama.cpp, and it performs
             | amazingly fast for the $7k I spent.
        
               | behnamoh wrote:
               | I said in parallel.
        
           | jeffbee wrote:
           | That doesn't sound right. The marginal cost of +768GB of DDR5
           | ECC memory in an EPYC system is < $5k.
        
             | InTheArena wrote:
             | GPU accessible RAM.
        
               | numpad0 wrote:
               | moot point if tok/s benchmark results are the same or
               | worse.
        
               | DrBenCarson wrote:
               | Not moot if you care about producing those tokens with
               | the largest available models
        
               | kjreact wrote:
               | Are the benchmarks worse? Running LLMs in system memory
               | is rather painful. I am having a hard time finding
               | benchmarks for running large models using system memory.
               | Can you point me to some benchmarks you're referring to?
        
               | adrian_b wrote:
               | In a dual-socket EPYC system, the memory bandwidth is
               | higher than in this Apple system by 40% (i.e. 1152 GB/s),
               | and the memory capacity can be many times higher.
               | 
               | Like another poster said, 768 GB of ECC RDIMM DDR5-6000
               | costs around $5000.
               | 
               | Any program whose performance is limited by memory
               | bandwidth, as it can be frequently the case for
               | inference, will run significantly faster in such an EPYC
               | server than in the Apple system, even when running on the
               | CPU.
               | 
               | Even for computationally-limited programs, the difference
               | between server CPUs and consumer GPUs is not great. One
               | Epyc CPU may have about the same number of FP32 execution
               | units as an RTX 4070, while running at a higher clock
               | frequency (but it lacks the tensor units of an NVIDIA
               | GPU, which can greatly accelerate the execution, where
               | applicable).
        
               | aurareturn wrote:
               | Any program whose performance is limited by memory
               | bandwidth, as it can be frequently the case for
               | inference, will run significantly faster in such an EPYC
               | server than in the Apple system, even when running on the
               | CPU.
               | 
               | Source on this? CPUs would be very compute constrained.
        
               | adrian_b wrote:
               | According to Apple, the GPU of M3 Ultra has 80 graphics
               | cores, which should mean 10240 FP32 execution units, the
               | same like an NVIDIA RTX 4080 Super.
               | 
               | However Apple does not say anything about the GPU clock
               | frequency, which I assume that it is significantly less
               | than that of NVIDIA.
               | 
               | In comparison, a dual-socket AMD Turin can have up to
               | 12288 FP32 execution units, i.e. 20% more than an Apple
               | GPU.
               | 
               | Moreover, the clock frequency of the AMD CPU must be much
               | higher than that of the Apple GPU, so it is likely that
               | the AMD system may be at least twice faster for computing
               | some graphic application than the Apple M3 Ultra GPU.
               | 
               | I do not know what facilities exist in the Apple GPU for
               | accelerating the computations with low-precision data
               | types, like the tensor cores of NVIDIA GPUs.
               | 
               | While for graphic applications big server CPUs are
               | actually less compute constrained than almost all
               | consumer GPUs (except RTX 4090/5090), the GPUs can be
               | faster for ML/AI applications that use low-precision data
               | types, but this is not at all certain for the Apple GPU.
               | 
               | Even if the Apple GPU happens to be faster for some low-
               | precision data type, the difference cannot be great.
               | 
               | However a server that would beat the Apple M3 Ultra GPU
               | computationally would cost much more than $10k, because
               | it would need CPUs with many cores.
               | 
               | If the goal is only to have a system with 50% more memory
               | and 40% more memory bandwidth than the Apple system, that
               | can be done at a $10k price.
               | 
               | While such a system would become compute constrained more
               | often than an Apple GPU, it would still beat it every
               | time when the memory would be the bottleneck.
        
         | jeroenhd wrote:
         | If you're going to overthrow your entire AI workflow to use a
         | different API anyway, surely the AMD Instinct accelerator cards
         | make more sense. They're expensive, but also a lot faster, and
         | you don't need to deal with making your code work on macOS.
        
           | codedokode wrote:
           | I don't think API has any value because writing software is
           | free and hardware for ML is super expensive.
        
             | internetter wrote:
             | > writing software is free
             | 
             | says who? NVIDIA has essentially entrenched themselves
             | thanks to CUDA
        
             | knowitnone wrote:
             | I'd like to hire you to write free software
        
           | wmf wrote:
           | Doesn't AMD Instinct cost >$50K for 512GB?
        
         | chakintosh wrote:
         | 14k for a maxed out Mac Studio
        
       | varjag wrote:
       | Call me a unit fundamentalist but calling 512Gb "over half a
       | terabyte memory" irks me to no end.
        
         | klausa wrote:
         | It's over half a _tera_byte; exactly half of _tebi_byte if you
         | wanna be a fundamentalist.
        
           | varjag wrote:
           | It is exactly the opposite. Every computer architecture in
           | production addresses memory in the powers of two.
           | 
           | SI has no business in memory size nomenclature as it is not
           | derived from fundamental physical units. The whole klownbyte
           | change was pushed through by hard drive marketers in 1990s.
        
             | esafak wrote:
             | Do SSD companies do the same thing? We ought to go back to
             | referring to storage capacity in powers of two.
        
               | jl6 wrote:
               | SSDs have added weirdness like 3-bit TLC cells and
               | overprovisioning. Usable storage size of an SSD is
               | typically not an exact power of 10 _or_ 2.
        
             | umanwizard wrote:
             | > Every computer architecture in production addresses
             | memory in the powers of two.
             | 
             | What does it mean to "address memory in powers of two" ?
             | There are certainly machines with non-power-of-two memory
             | quantities; 96 GiB is common for example.
             | 
             | > The whole klownbyte change was pushed through by hard
             | drive marketers in 1990s.
             | 
             | The metric prefixes based on powers of 10 have been around
             | since the 1790s.
        
               | varjag wrote:
               | > What does it mean to "address memory in powers of two"
               | ? There are certainly machines with non-power-of-two
               | memory quantities; 96 GiB is common for example.
               | 
               | I challenge you to show me any SKU from any memory
               | manufacturer that has a power of 10 capacity. Or a CPU
               | whose address space is a power of 10. This is an
               | unavoidable artefact of using a binary address bus.
               | 
               | > The metric prefixes based on powers of 10 have been
               | around since the 1790s.
               | 
               | And Babylonians used power of 60, what gives?
        
             | kstrauser wrote:
             | *bibytes are a practical joke played on computer scientists
             | by the salespeople to make it sound like we're drunk. "Tell
             | us more about your mebibytes, Fred _elbows colleague,
             | listen to this_ ".
             | 
             | If Donald Knuth and Gordon Bell say we use base-2 for RAM,
             | that's good enough for me.
        
         | transcriptase wrote:
         | Perhaps they're including the CPU cache and rounding down for
         | brevity.
        
         | kissiel wrote:
         | You're nitpicking, but then you use lowercase b for a byte ;)
        
       | okamiueru wrote:
       | Don't know what the prior extreme apple is alluding to here. But,
       | apple marketing is what it is.
        
       | dlachausse wrote:
       | Interesting that they're releasing M3 Ultra after the M4 Macs
       | have already shipped.
       | 
       | I wonder if the plan is to only release Ultras for odd number
       | generations.
        
         | _alex_ wrote:
         | m2 ultra tho
        
         | pier25 wrote:
         | They released the M2 Ultra
        
           | dlachausse wrote:
           | Good point, I forgot about that. Maybe it just got really
           | delayed in production.
        
             | ryao wrote:
             | Reportedly Apple is using its own silicon in data centers
             | to run "Apple Intelligence" and other things like machine
             | translation in safari. I suspect that the initial supply
             | was sent to Apple's datacenters.
        
         | jmull wrote:
         | I'm guessing it's more because "Ultra" versions, which "fuse"
         | multiple chips take significant additional engineering work. So
         | we might expect an ultra M4 next year, possibly after non-ultra
         | M5s are released.
        
       | iambateman wrote:
       | People who know more than me: they're talking a lot about RAM and
       | not much about GPU.
       | 
       | Do you expect this will be able to handle AI workloads well?
       | 
       | All I've heard for the past two years is how important a beefy
       | GPU is. Curious if that holds true here too.
        
         | lynndotpy wrote:
         | VRAM is what takes a model from "can not run at all" to "can
         | run" (even if slowly), hence the emphasis.
        
           | dartos wrote:
           | You can say the same about GPU clock speed as well...
        
           | vlovich123 wrote:
           | No, with limited VRAM you could offload the model partially
           | or split across CPU and GPU. And since CPU has swap, you
           | could run the absolute largest model. It's just really really
           | slow.
        
             | jeffhuys wrote:
             | Really, really, really, really, really, REALLY REALLY slow.
        
             | Espressosaurus wrote:
             | The difference between Deepseek-r1:70b (edit: actually 32b)
             | running on an M4 Pro (48 GB unified RAM, 14 CPU cores, 20
             | GPU cores) and on an AMD box (64 GB DDR4, 16 core 5950X,
             | RTX 3080 with 10 GB of RAM) is more than a factor of 2.
             | 
             | The M4 pro was able to answer the test prompt twice--once
             | on battery and once on mains power--before the AMD box was
             | able to finish processing.
             | 
             | The M4's prompt parsing took significantly longer, but
             | token generation was significantly faster.
             | 
             | Having the memory to the cores that matter makes a big
             | difference.
        
               | vlovich123 wrote:
               | You're adding detail that's not relevant to anything I
               | said. I was saying this statement:
               | 
               | > VRAM is what takes a model from "can not run at all" to
               | "can run" (even if slowly), hence the emphasis.
               | 
               | Is false. Regardless of how much VRAM you have, if the
               | criteria is "can run even if slowly", all machines can
               | run all models because you have swap. It's unusably slow
               | but that's not what OP was claiming the difference is.
        
               | Espressosaurus wrote:
               | The criteria for purchase for anybody trying to use it is
               | "run slowly but acceptably" vs. "run so slow as to be
               | unusable".
               | 
               | My memory is wrong, it was the 32b. I'm running the 70b
               | against a similar prompt and the 5950X is probably going
               | to take over an hour for what the M4 managed in about 7
               | minutes.
               | 
               | edit: an hour later and the 5950 isn't even done thinking
               | yet. Token generation is generously around 1 token/s.
               | 
               | edit edit: final statistics. M4 Pro managing 4 tokens/s
               | prompt eval, 4.8 tokens/s token generation. 5950X
               | managing 150 tokens/s prompt eval, and 1 token/s
               | generation.
               | 
               | Perceptually I can live with the M4's performance. It's a
               | set prompt, do something else, come back sort of thing.
               | The 5950/RTX3080's is too slow to be even remotely usable
               | with the 70b parameter model.
        
               | vlovich123 wrote:
               | I don't disagree. I'm just taking OP at the literal
               | statement they made.
        
         | Retr0id wrote:
         | When it comes to LLMs in particular, it comes down to memory
         | size+bandwidth more than anything else.
        
         | simlevesque wrote:
         | What's more important isn't how beefy it is, it's how much
         | memory it has.
         | 
         | These are unified memory. The M3 Ultra with 512gb has as much
         | VRAM as sixteen 5090.
        
         | qwertox wrote:
         | A beefy GPU which can't hold models in VRAM is of very limited
         | use. You'll see 16 GB of VRAM on gamer Nvidia cards, the RTX
         | 5090 being an exception with 32 GB VRAM. The professional cards
         | have around 96 GB of VRAM.
         | 
         | The thing with these Apple chips is that they have unified
         | memory, where CPU and GPU use the same memory chips, which
         | means that you can load huge models into RAM (no longer VRAM,
         | because that doesn't exist on those devices). And while Apple's
         | integrated GPU isn't as powerful as an Nvidia GPU, it is
         | powerful enough for non-professional workloads and has the huge
         | benefit of access to lots of memory.
        
           | _zoltan_ wrote:
           | which professional card has 96GB of VRAM?
        
             | qwertox wrote:
             | Like the NVIDIA H100 NVL 94GB HBM3 PCIe 5.0 Data Center GPU
             | for 27.651,20 EUR
             | 
             | https://www.primeline-
             | solutions.com/de/nvidia-h100-nvl-94gb-...
        
               | _zoltan_ wrote:
               | that's not a professional line card, but a data center
               | card.
        
         | gatienboquet wrote:
         | LLMs are primarily "memory-bound" rather than "compute-bound"
         | during normal use.
         | 
         | The model weights (billions of parameters) must be loaded into
         | memory before you can use them.
         | 
         | Think of it like this: Even with a very fast chef (powerful
         | CPU/GPU), if your kitchen counter (VRAM) is too small to lay
         | out all the ingredients, cooking becomes inefficient or
         | impossible.
         | 
         | Processing power still matters for speed once everything fits
         | in memory, but it's secondary to having enough VRAM in the
         | first place.
        
           | whimsicalism wrote:
           | Transformers are typically memory- _bandwidth_ bound during
           | decoding. This chip is going to have a much worse memory b /w
           | than the nvidia chips.
           | 
           | My guess is that these chips could be compute-bound though
           | given how little compute capacity they have.
        
             | Gracana wrote:
             | It's pretty close. A 3090 or 4090 has about 1TB/s of memory
             | bandwidth, while the top Apple chips have a bit over
             | 800GB/s. Where you'll see a big difference is in prompt
             | processing. Without the compute power of a pile of GPUs,
             | chewing through long prompts, code, documents etc is going
             | to be slower.
        
               | whimsicalism wrote:
               | nobody in industry is using a 4090, they are using H100s
               | which have 3TB/s. Apple also doesn't have any equivalent
               | to nvlink.
               | 
               | I agree that compute is likely to become the bottleneck
               | for these new Apple chips, given they only have like
               | ~0.1% the number of flops
        
               | Gracana wrote:
               | I chose the 3090/4090 because it seems to me that this
               | machine could be a replacement for a workstation or a
               | homelab rig at a similar price point, but not a $100-250k
               | server in a datacenter. It's not really surprising or
               | interesting that the datacenter GPUs are superior.
               | 
               | FWIW I went the route of "bunch of GPUs in a desktop
               | case" because I felt having the compute oomph was worth
               | it.
        
               | _zoltan_ wrote:
               | 4.8TB/s on H200, 8TB/s on B200, pretty insane.
        
               | Gracana wrote:
               | That's wild, somehow I hadn't seen the B200 specs before
               | now. I wish I could have even a fraction of that!
        
             | gatienboquet wrote:
             | VRAM capacity is the initial gatekeeper, then bandwidth
             | becomes the limiting factor.
        
               | whimsicalism wrote:
               | i suspect that compute actually might be the limiter for
               | these chips before b/w, but not certain
        
             | cubefox wrote:
             | > Transformers are typically memory-bandwidth bound during
             | decoding.
             | 
             | Not in case of language models, which are typically bound
             | by memory size rather than bandwidth.
        
               | whimsicalism wrote:
               | nope
        
               | cubefox wrote:
               | I assume even this one won't run on an RTX 5090 due to
               | constrained memory size:
               | https://news.ycombinator.com/item?id=43270843
        
               | whimsicalism wrote:
               | sure on consumer GPUs but that is not what is
               | constraining the model inference in most actual industry
               | setups. technically even then, you are CPU-GPU memory
               | bandwidth bound more than just GPU memory, although that
               | is maybe splitting hairs
        
         | matwood wrote:
         | I was able to run and use the DeepSeek distilled 24gb on an M1
         | Max with 64gb of ram. It wasn't speedy, but it was usable. I
         | imagine the M3/4s are much faster, especially on smaller, more
         | specific models.
        
       | chvid wrote:
       | Now make a data center version.
        
       | ksec wrote:
       | Previous model of M2 Ultra had max memory of 192GB. Or 128GB for
       | Pro and some other M3 model, which I think is plenty for even
       | 99.9% of professional task.
       | 
       | They now bump it to _512GB_. Along with _insane_ price tag of
       | $9499 for 512GB Mac Studio. I am pretty sure this is some AI Gold
       | rush.
        
         | dwighttk wrote:
         | Maybe .1% of tasks need this RAM, why are they charging so
         | much?
        
           | regularfry wrote:
           | The narrower the niche, the more you can charge.
        
           | rewtraw wrote:
           | because that's how much its worth
        
             | internetter wrote:
             | Its not though. For consumer computers somewhere 1k-4k
             | there's nothing better. But for the price of 512gb of RAM
             | you could buy that + a crazy CPU + 2x 5090s by building
             | your own. The market fit is "needs power; needs/wants
             | macOS; has no budget" which is so incredibly niche. But in
             | terms of raw compute output there's absolutely no chance
             | this is providing bang for buck
        
               | jeffhuys wrote:
               | Do you understand that it's UNIFIED RAM, so it doubles as
               | vRAM? I would love to know what computer you can build
               | for <10k with 0.5TB of VRAM.
        
               | kjreact wrote:
               | 2x 5090s would only give you 64GB of memory to work with
               | re:LLM workloads, which is what people are talking about
               | in this thread. The 512GB of system RAM you're referring
               | to would not be useful in this context. Apple's unified
               | memory architecture is the part you're missing.
        
               | DrBenCarson wrote:
               | How much VRAM do you get on those 2x 5090s?
               | 
               | How much would it cost to get up to 512gb?
        
           | pier25 wrote:
           | Because the minority that needs that much RAM can't work
           | without it.
           | 
           | In the media composing world they use huge orchestral
           | templates with hundreds and hundreds of tracks with millions
           | of samples loaded into memory.
        
           | A4ET8a8uTh0_v2 wrote:
           | I think the answer is because they can ( there is a market
           | for it ). The benefit to a crazy person like me that with
           | this addition, I might be able to grab 128gb version at a
           | lower price.
        
           | znpy wrote:
           | because they know there will be a large amount of people that
           | don't know this much ram but they'll buy it anyway.
        
           | cjbgkagh wrote:
           | I don't need 512GB of RAM but the moment I do I'm certain
           | I'll have bigger things to worry about than a $10K price tag.
        
             | almostgotcaught wrote:
             | This is Pascal's wager written in terms of ... RAM. The
             | original didn't make sense and neither does this iteration.
        
               | cjbgkagh wrote:
               | I would still wait until I need it before buying it...
        
           | agloe_dreams wrote:
           | Because the .1% is who will buy it? I mean, yeah, supply and
           | demand. High demand in a niche with no supply currently means
           | large margins.
           | 
           | I don't think anyone commercially offers nearly this much
           | unified memory or NPU/GPUs with anything near 512GB of
           | memory.
        
           | madeofpalk wrote:
           | Maybe because .1% of tasks need this RAM, it attracts a .1%
           | price tag
        
           | Spooky23 wrote:
           | With all things semiconductor, low volume = higher cost (and
           | margin).
           | 
           | The people who need the crazy resource can tie it to some
           | need that costs more. You'd spend like $10k running a machine
           | with similar capabilities in AWS in a month.
        
           | Sharlin wrote:
           | It enables the use of _giant_ AI models on a personal
           | computer. Might not run too fast though. But at least it 's
           | possible _at all_.
        
         | InTheArena wrote:
         | Every single AI shop on the planet is trying to figure out if
         | there is enough compute or not to make this a reasonable AI
         | path. If the answer is yes, that 10k is a absolute bargain.
        
           | 827a wrote:
           | Is this actually true? Were people doing this with the 192gb
           | of the M2 Ultra?
           | 
           | I'm curious to learn how AI shops are actually doing model
           | development if anyone has experience there. What I imagined
           | was: Its all in the "cloud" (or, their own infra), and the
           | local machine doesn't matter. If it did matter, the nvidia
           | software stack is too important, especially given that a
           | 512gb M3 Ultra config costs $10,000+.
        
             | DrBenCarson wrote:
             | You're largely correct for training models
             | 
             | Where this hardware shines is inference (aka developing
             | products on top of the models themselves)
        
               | 827a wrote:
               | True. But with Project Digits supposedly around the
               | corner, which supposedly costs $3,000 and supports
               | ConnectX and runs Blackwell; what's the over-under on
               | just buying two of those at about half the price of one
               | maxed M3 Ultra Mac Studio?
        
               | DrBenCarson wrote:
               | And how much VRAM will Project Digits have?
        
           | internetter wrote:
           | No AI shop is buying macs to use as a server. Apple should
           | really release some server macOS distribution, maybe even
           | rackable M-series chips. I believe they have one internally.
        
             | jerjerjer wrote:
             | Why would any business pay Apple Tax for a backend, server
             | product?
        
           | Spooky23 wrote:
           | > that 10k is a absolute bargain
           | 
           | The higher end NVidia workstation boxes won't run well on
           | normal 20amp plugs. So you need to move them to a computer
           | room (whoops, ripped those out already) or spend months
           | getting dedicated circuits run to office spaces.
        
             | magnetometer wrote:
             | Didn't really think about this before, but that seems to be
             | mainly an issue in Northern / Central America and Japan. In
             | Germany, for example, typical household plugs are 16A at
             | 230V.
        
             | someothherguyy wrote:
             | In the US, normal circuits aren't always 20A, especially in
             | residential buildings, where they are more commonly 15A in
             | bedrooms and offices.
             | 
             | https://en.wikipedia.org/wiki/NEMA_connector
        
               | hervature wrote:
               | To clarify, the circuit is almost always 20A with 15A
               | being used for lighting. However, the outlet itself is
               | almost always 15A because you put multiple outlets on a
               | single circuit. You are going to see very few 20A in
               | outlets (which have a T shaped prong) in residential.
        
               | theturtle32 wrote:
               | While technically true, the NEMA 5-15R receptacles are
               | rated for use on 20A circuits, and circuits for
               | receptacles are almost always 20A circuits, in modern
               | construction at least. Older builds may not be, of
               | course.
               | 
               | That said, if your load is going to be a continuous load
               | drawing 80% of the rated amperage, it really should be a
               | NEMA 5-20 plug and receptacle, the one where one of the
               | prongs is horizontal instead of vertical. Swapping out
               | the receptacle for one that accepts a NEMA 5-20P plug is
               | like $5.
               | 
               | If you are going to actually run such a load on a 20A
               | circuit with multiple receptacles, you will want to make
               | sure you're not plugging anything substantial into any of
               | the other receptacles on that circuit. A couple LED
               | lights are fine. A microwave or kettle, not so much.
        
               | andrewmcwatters wrote:
               | > and circuits for receptacles are almost always 20A
               | circuits, in modern construction at least.
               | 
               | This is not true. Standard builds (a majority) still use
               | 15-amp circuits where 20-amp is not required by NEC.
        
           | NorwegianDude wrote:
           | Not much to figure out. It's 2x M4 Max, so you need 100 of
           | these to match the TOPS of even a single consumer card like
           | the RTX 5090.
        
             | jeffhuys wrote:
             | Sure, but if you have models like DeepSeek - 400GB - that
             | won't fit on a consumer card.
        
               | NorwegianDude wrote:
               | True. But an AI shop doesn't care about that. They get
               | more performance for the money by going for multiple
               | Nvidia GPUs. I have 512 GB ram on my PC too with 8 memory
               | channels, but it's not like it's usable for AI workloads.
               | It's nice to have large amounts of RAM, but increasing
               | the batch size during training isn't going to help when
               | compute is the bottleneck.
        
             | DrBenCarson wrote:
             | Now do VRAM
        
             | wpm wrote:
             | It's 2x M3 Max
        
             | alberth wrote:
             | > It's 2x M4 Max
             | 
             | Not exactly though.
             | 
             | This can have 512GB unified memory, 2x M4 Max can only have
             | 128GB total (64GB each).
        
           | ZeroTalent wrote:
           | No, because there is no CUDA. We have fast and cheap
           | alternatives to NVIDIA, but they do not have CUDA. This is
           | why NVIDIA has 90% margins on its hardware.
        
         | HPsquared wrote:
         | LLMs easily use a lot of RAM, and these systems are MUCH, MUCH
         | cheaper (though slower) than a GPU setup with the equivalent
         | RAM.
         | 
         | A 4-bit quantization of Llama-3.1 405b, for example, should fit
         | nicely.
        
         | segmondy wrote:
         | The question will be how it will perform. I suspect Deepseek,
         | Llama405B demonstrated the need for larger memory. Right now
         | folks could build an epyc system with that much ram or more to
         | run Deepseek at about 6 tokens/sec for a fraction of that cost.
         | However not everyone is a tinker, so there's a market for this
         | for those that don't want to be bothered. You say "AI Gold
         | rush" like it's a bad thing, it's not.
        
         | bloppe wrote:
         | Big question is: Does the $10k price already reflect Trump's
         | tariffs on China? Or will the price rise further still..
        
       | desertmonad wrote:
       | Time to upgrade m1 ultra I guess! M1 ultra has been pretty good
       | with deepseek locally.
        
         | _alex_ wrote:
         | what flavor of deepseek are you running? what kind of
         | performance are you seeing?
        
       | InTheArena wrote:
       | Whoa. M3 instead of M4. I wonder if this was basically binning,
       | but I thought that I had read somewhere that the interposer that
       | enabled this for the M1 chips where not available.
       | 
       | That Said, 512GB of unified ram with access to the NPU is
       | absolutely a game changer. My guess is that Apple developed this
       | chip for their internal AI efforts, and are now at the point
       | where they are releasing it publicly for others to use. They
       | really need a 2U rack form for this though.
       | 
       | This hardware is really being held back by the operating system
       | at this point.
        
         | klausa wrote:
         | >I had read somewhere that the interposer that enabled this for
         | the M1 chips where not available.
         | 
         | With all my love and respect for "Apple rumors" writers; this
         | was always "I read five blogposts about CPU design and now I'm
         | an expert!" territory.
         | 
         | The speculation was based on the M3 Maxes die shots not having
         | the interposer visible, which... implies basically nothing
         | whether that _could have_ been supported in an M3 Ultra
         | configuration; as evidenced by the announcement today.
        
           | sroussey wrote:
           | I'm guessing it's not really a M3.
           | 
           | No M3 has thunderbolt 5.
           | 
           | This is a new chip with M3 marketing. I'd expect this from
           | Intel, not Apple.
        
             | klausa wrote:
             | Baseline M4 doesn't have Thunderbolt 5 either; only the
             | Pro/Max variants do.
             | 
             | The press-release even calls TB5 out: >Each Thunderbolt 5
             | port is supported by its own custom-designed controller
             | directly on the chip.
             | 
             | Given that they're doing the same on A-series chips (A18
             | Pro with 10Gbps USB-C; A18 with USB 2.0); I imagine it's
             | just relatively simple to swap the I/O blocks around and
             | they're doing this for cost and/or product segmentation
             | reasons.
        
               | sroussey wrote:
               | Which means this is a whole new chip. It may be M3 based,
               | but with added interposer support and new thunderbolt
               | stuff.
               | 
               | Which, at this point, why not just use M4 as a base?
        
               | kridsdale1 wrote:
               | Could be that M4 requires a different TSMC fab that is at
               | full production doing iPhones.
        
               | operatingthetan wrote:
               | Or they are saving the M4 Ultra name for later on ...
        
               | klausa wrote:
               | >Which, at this point, why not just use M4 as a base?
               | 
               | I imagine that making those chips is quite a bit more
               | involved than just taking the files for M3 Max, and copy-
               | pasting them twice into a new project.
               | 
               | I imagine it just takes more time to
               | design/verify/produce them; especially given they're not
               | selling very many of them, so they're probably not super-
               | high-priority projects.
        
             | hinkley wrote:
             | TB 5 seems like the sort of thing you could 'slap on' to a
             | beefy enough chip.
             | 
             | Or the sort of thing you put onto a successor when you had
             | your fingers crossed that the spec and hardware would
             | finalize in time for your product launch but the fucking
             | committee went into paralysis again at the last moment and
             | now your product has to ship 4 months before you can put TB
             | 5 hardware on shelves. So you put your TB4 circuitry on a
             | chip that has the bandwidth to handle TB5 and you wait for
             | the sequel.
        
               | kridsdale1 wrote:
               | Sounds like you've seen some things.
        
         | jagged-chisel wrote:
         | > This hardware is really being held back by the operating
         | system at this point.
         | 
         | Please elucidate.
        
           | diggan wrote:
           | https://news.ycombinator.com/item?id=43243075 ("Apple's
           | Software Quality Crisis" - 1134 comments)
           | 
           | ^ has a lot of elaborations on this subject
        
             | internetter wrote:
             | This is more about "average" end user software, not the
             | type of software that would be running on a machine like
             | this. Yes their applications fell off, but if you're paying
             | for 512gb of RAM apple notes being slow isn't the
             | bottleneck
        
               | diggan wrote:
               | Lack of focus on quality of software affects all types of
               | workloads, not just consumer-oriented or professional-
               | oriented in isolation.
        
               | gjsman-1000 wrote:
               | Nah, if I ever wrote an article about the software crisis
               | on the Linux desktop, there'd be flames here making
               | Apple's issues look small.
        
               | diggan wrote:
               | It'd be an interesting flame war in the comments, if
               | nothing else, go for it! I'm happy to give plenty of
               | concrete evidence why Linux is more suitable for
               | professionals than macOS is in 2025 :)
        
               | hedora wrote:
               | Try copy pasting bash snippets between any linux text
               | editor and terminal.
               | 
               | Now try the same with notes on a mac. Notes mangles the
               | punctuation and zsh is not bash.
        
               | internetter wrote:
               | Omg I despise the fact that there's n competing GUI
               | standards on linux, zero visual consistency.
               | 
               | I love diversity in websites, and apps for that matter,
               | but this isn't diversity, it is the uncanny valley
               | between bespoke graphic design and homogeneity.
               | 
               | Say what you want about SwiftUI, but it makes consistent,
               | good looking apps. Unless something has changed, GTK is a
               | usability disaster.
               | 
               | And that's before I get into how much both X11 _and_
               | wayland suck equally.
               | 
               | There's so much I miss about Linux, but there's so much I
               | don't
        
               | WD-42 wrote:
               | People are paying the richest company in the world for
               | their software crisis on Linux.
        
               | knowitnone wrote:
               | if you do write something, please seperate enterprise,
               | developer, end user, embedded/RT because they all have
               | different requirements.
        
               | internetter wrote:
               | > Lack of focus on quality of software affects all types
               | of workloads, not just consumer-oriented or professional-
               | oriented in isolation.
               | 
               | The apps are developed by different teams. MacOS apps are
               | containerized. Saying macOS's performance is hindered by
               | Notes.app is like saying that Windows is hindered by
               | Paint.exe. Notes.app is just a default[0]
               | 
               | [0]: though, I dislike saying this because I always feel
               | like I need to mention that even Notes links against a
               | hilarious amount of private APIs that could easily be
               | exposed to other developers but... aren't.
        
           | InTheArena wrote:
           | No native docker support, no headless management options
           | (enterprise strength), Limited QoS management, lack of robust
           | python support (out of the box), interactive user focused
           | security model.
        
             | bredren wrote:
             | >lack of robust python support (out of the box)
             | 
             | What would robust python support oob look like?
        
               | FergusArgyll wrote:
               | uv pre-installed! /s
        
             | flats wrote:
             | I feel you on a lot of this! But out of the box Python
             | support? Does anybody actually want that? It's pretty darn
             | quick & straightforward to get a Python environment up &
             | running on MacOS. Maybe I'm misunderstanding what you mean
             | here.
        
               | p_ing wrote:
               | No one would want OOTB Python support. You'd be stuck on
               | a version you didn't want to use.
        
               | hedora wrote:
               | I want it. That way, like code I write in any other
               | language, it'll run reliably on other people's machines a
               | few years from now.
               | 
               | I avoid writing python, so I'm usually the "other people"
               | in that sentence.
        
               | fauigerzigerk wrote:
               | _> it'll run reliably on other people's machines a few
               | years from now_
               | 
               | That's optimistic. What if the system Python gets
               | upgraded? For some reason, Python libraries tend to be
               | super picky about the Python versions they support (not
               | just Python 2 vs 3).
        
             | kstrauser wrote:
             | 1. I run Docker and Podman on my Macs.
             | 
             | 2. If you mean MDM, there are several good options. Screen
             | sharing and SSH are build in.
             | 
             | 3. In what sense?
             | 
             | 4. `uv python install whatever` is infinitely better than
             | upgrading on the OS vendor's schedule.
             | 
             | 5. What does that affect?
        
               | mschuster91 wrote:
               | > 1. I run Docker and Podman on my Macs.
               | 
               | That's using a Linux VM. The idea people are asking about
               | is native process isolation. Yes you'd have to rebuild
               | Docker containers based on some sort of (small) macOS
               | base layer and Homebrew/Macports, but hey. Being able to
               | even run nodejs or php with its thousands of files
               | _natively_ would be a gamechanger in performance.
        
               | hedora wrote:
               | Also, it were possible to containerize macos, or even do
               | an unintended vm installation, then it'd be possible for
               | apple to automatically regression test their stuff.
        
               | devmor wrote:
               | >I run Docker and Podman on my Macs.
               | 
               | The same way Windows users run them. In a linux VM.
               | 
               | You don't get real on-hardware containerization.
        
               | naikrovek wrote:
               | surprisingly, _Windows_ containers on Windows are not run
               | in a VM. Well, not necessarily; they can be.
               | 
               | It is definitely odd that Macs have no native container
               | support, though, especially when you learn that _Windows_
               | does.
        
               | devmor wrote:
               | That is an important point, I didn't really think of it
               | since I've never had a reason to use Windows containers.
        
               | naikrovek wrote:
               | that's ok, no one thinks of windows, and fewer people
               | than that would ever use a windows container.
        
               | p_ing wrote:
               | Well, Windows (in a form) is the hypervisor for the Azure
               | infrastructure. Azure Web Sites when run as Windows/IIS
               | are Windows containers. Makes sense.
               | 
               | Honestly I don't know what XNU/Darwin is good for. It
               | doesn't do anything especially well compared to *BSD,
               | Linux, and NT.
        
               | hedora wrote:
               | Its async i/o APIs are best in class (i.e., compatible
               | with BSD, and not Linux's epoll tire fire).
               | 
               | Not disagreeing though.
        
               | kstrauser wrote:
               | Ah, I see what you're saying. Basically, Darwin doesn't
               | support cgroups, so Docker runs Linux in a VM to get
               | that.
        
               | devmor wrote:
               | I don't think it supports userland namespaces either,
               | which is another important part of container isolation.
        
             | pmarreck wrote:
             | > lack of robust python support
             | 
             | There is no such thing. Tell me, which combination of the
             | 15+ virtual environments, dependency management and Python
             | version managers would you use? And how would you prevent
             | "project collision" (where one Python project bumps into
             | another one and one just stops working)? Example: SSL
             | library differences across projects is a notorious culprit.
             | 
             | Python is garbage and I don't understand why people put up
             | with this crap unless you seriously only run ONE SINGLE
             | Python project at a time and do not care what else silently
             | breaks. Having to run every Python app in its own Docker
             | image (which is the only real solution to this, if you
             | don't want to learn Nix, which you really should, because
             | it is better thanks to determinism... but entails its own
             | set of issues) is not a reasonable compromise.
             | 
             | Was so glad when the Elixir guys came out with this
             | recently, to at least be able to use Python, but in a very
             | controlled, not-insane way:
             | https://dashbit.co/blog/running-python-in-elixir-its-fine
        
               | simonw wrote:
               | uv
               | 
               | (Not saying Apple should bundle that, but it's the best
               | current answer to running many different Python projects
               | without using something like Docker)
        
               | mapcars wrote:
               | > at least be able to use Python, but in a very
               | controlled, not-insane way
               | 
               | Thats funny, about 10 years ago I started my career in a
               | startup that had Python business logic running under
               | Erlang (via custom connector) which handled supervision
               | and task distribution, and it looked insane for me at the
               | time.
               | 
               | Even today I think it can be useful but is very hard to
               | maintain, and containers are a good enough way to handle
               | python.
        
               | kstrauser wrote:
               | Virtualenv's been a thing for many years, it's built into
               | Python, and it solves all that without adding additional
               | tooling.
               | 
               | And if you're genuinely asking, everything's converging
               | toward uv. If you pick only one, use that and be done
               | with it.
        
               | hedora wrote:
               | I've been using virtualenv for a decade, and we use uv at
               | work.
               | 
               | Neither fixed anything. They just make it slightly less
               | painful to deal with python scripts' constant bitrot.
               | 
               | They also make python uniquely difficult to dockerize.
        
               | kstrauser wrote:
               | That's so completely, diametrically opposite of my
               | experience with both that I can't help but wonder how it
               | ended up there.
               | 
               | > They also make python uniquely difficult to dockerize.
               | RUN pip install uv && uv sync
               | 
               | Tada, done. No, seriously. That's the whole invocation.
        
               | whimsicalism wrote:
               | these are solved problems now, check back in. uv is now
               | the standard
        
               | tomn wrote:
               | This is incoherent to me. Your complaints are about
               | packaging, but the elixir wrapper doesn't deal with that
               | in any way -- it just wraps UV, which you could use
               | without elixir.
               | 
               | What am I missing?
               | 
               | Also, typically when people say things like
               | 
               | > Tell me, which combination of the 15+ virtual
               | environments, dependency management and Python version
               | managers
               | 
               | It means they have been trapped in a cycle of thinking
               | "just one more tool will surely solve my problem",
               | instead of realising that the tools _are_ the problem,
               | and if you just use the official methods (virtualenv and
               | pip from a stock python install), things mostly just
               | work.
        
               | kstrauser wrote:
               | I agree. Python certainly had its speedbumps, but it's
               | utterly manageable today and has been for years and
               | years. It seems like people get hung up on there not
               | being 1 official way to do things, but I think that's
               | been great, too: the competition gave us nice things like
               | Poetry and UV. The odds are slim that a Rust tool
               | would've been accepted as the official Python.org-
               | supplied system, but now we have it.
               | 
               | There are reasons to want something more featureful than
               | plain pip. Even without them, pip+virtualenv has been
               | completely usable for, what, 15 years now?
        
             | duped wrote:
             | > No native docker support
             | 
             | Honest question: why do you want this in MacOS? Do you
             | understand what docker does? (it's fundamentally a linux
             | technology, unless you are asking for user namespaces and
             | chroot w/o SIP on MacOS, but that doesn't make sense since
             | the app sandbox exists).
             | 
             | MacOS doesn't have the fundamental ecosystem problems that
             | beget the need for docker.
             | 
             | If the answer is "I want to run docker containers because I
             | have them" then use orbstack or run linux through the
             | virtualization framework (not Docker desktop). It's
             | remarkably fast.
        
               | jeffhuys wrote:
               | Docker Desktop now offers an option to use the
               | virtualization framework, and works pretty well. But
               | you're still constantly running a VM because "docker is
               | how devs work now right?". I agree with your comment.
        
               | raydev wrote:
               | > MacOS doesn't have the fundamental ecosystem problems
               | that beget the need for docker.
               | 
               | Anyone wanting to run and manage their own suite of Macs
               | to build multiple massive iOS and Mac apps at scale, for
               | dozens or hundreds or thousands of developers deploying
               | their changes.
               | 
               | xcodebuild is by far the most obvious "needs native for
               | max perf" but there are a few other tools that require
               | macOS. But obviously if you have multiple repos and apps,
               | you might require many different versions of the same
               | tools to build everything.
               | 
               | Sounds like a perfect use case for native containers.
        
               | egorfine wrote:
               | > why do you want this in MacOS?
               | 
               | I have a small rackmounted rendering farm using mac
               | minis, which outperform everything in the Intel world,
               | even order of magnitude more expensive.
               | 
               | I run macOS on my personal and development computers for
               | over a decade and I use Linux since inception on server
               | side.
               | 
               | My experience: running server-side macOS is such a PITA
               | it's not even funny. It may even pretend it has ssh while
               | in fact the ssh server is only available on good days and
               | only after Remote Desktop logged in at least once.
               | Launchd makes you wanna crave systemd. etc, etc.
               | 
               | So, about docker. I would absolutely love to run my app
               | in a containerized environment on a Mac in order to not
               | touch the main OS.
        
               | mannyv wrote:
               | Funny, I ran a bunch of Mac minis in colo for over a
               | decade with no problems. Maybe you have a config problem?
               | 
               | Of course, I had a LOM/KVM and redundant networking etc.
               | They were substantially more reliable than the Dell
               | equipment that I used in my day job for sure.
        
               | duped wrote:
               | What would a containerization environment on MacOS give
               | you that you don't already have? Like concretely - what
               | does containerization _mean_ in the context of a MacOS
               | user space?
               | 
               | In Linux, it means something very specific: a
               | user/mount/pid/network namespace, overlayfs to provide a
               | rootfs, chroot to pivot to the new root to do your work,
               | and port forwarding between the host/guest systems.
               | 
               | On MacOS I don't know what containerization means short
               | of virtualization. But you have virtualization on MacOS
               | already, so why not use that?
        
           | e40 wrote:
           | I torrent things from two different hosts on my gigabit
           | network. The macos stack literally cannot handle the full
           | bandwidth I have. It fails and the machine needs to be
           | rebooted to fix it. It's not pretty on the way into this
           | state, either. Other remote connections to the computer are
           | unreliable. On Linux, running the same app in a docker
           | container works perfectly. Transmission is the app.
        
             | petecooper wrote:
             | >Transmission is the app.
             | 
             | Former Transmission user here.
             | 
             | I realise you didn't ask, but you might find some
             | improvements in qBittorrent.
        
               | jihadjihad wrote:
               | I haven't had any issue running BiglyBT on my M1 MacBook,
               | granted I don't run it all day every day but everything
               | runs plenty fast for my needs (25-30 MB/s for well-seeded
               | torrents).
        
               | jeffhuys wrote:
               | I went to Transmission years and years ago because it's
               | just simple. It has all the options if you need them, but
               | no HUUUGE interface with RSS feeds, 10001 stats about
               | your download, categories, tags, etc etc etc.
               | 
               | Transmission is just a small, floating window with your
               | downloads. Click for more. It fits in the macOS vibe. But
               | I'm a person that fully adopted the original macOS "way
               | of working" - kicked the full-screen habit I had in
               | windows and never felt better.
               | 
               | Can I ask, why would you go FROM Transmission to
               | qBittorrent?
        
               | petecooper wrote:
               | >why would you go FROM Transmission to qBittorrent?
               | 
               | In my case: some torrents wouldn't find known-good seeds
               | in Transmission but worked fine in qBittorrent; there's
               | reasonable (but not perfect) support for libtorrent 2.0
               | in qBittorrent; my download speeds and overall
               | responsiveness is anecdotally better in qBittorrent, and;
               | I make use of some of the nitty gritty settings in
               | qBittorrent.
        
               | jeffhuys wrote:
               | Well there's a list of good reasons! Thanks for
               | answering. I haven't had any problems with finding seeds,
               | and no need for libtorrent but now I know how to fix that
               | when I do encounter those situations.
        
               | e40 wrote:
               | The Linux version, in a container no less, handles the
               | entire gigabit bandwidth.
               | 
               | And let's be clear, it wasn't the app that had problems,
               | the Apple Remote Desktop connection to the machine failed
               | when the speeds got above 40MB/s and the network
               | interface stopped working around 80MB/s.
               | 
               | I think Transmission works perfectly fine. I've been
               | using it for 10+ years with no issues at all on Linux.
               | 
               | I forgot to mention this is a Mac mini/Intel (2018).
        
             | kstrauser wrote:
             | I get nearly 10Gbps from my NAS to my Mac Studio. It
             | absolutely can handle that bandwidth. It may not handle
             | that specific client well for unrelated reasons.
        
               | egorfine wrote:
               | Bandwidth, yes. Connections count, no.
        
           | drumttocs8 wrote:
           | To expatiate with perspicuity:
           | 
           | The Apple ecosystem is a walled garden.
        
         | behnamoh wrote:
         | > My guess is that Apple developed this chip for their internal
         | AI efforts
         | 
         | what internal AI efforts?
         | 
         | Apple Intelligence is bunkers, and Apple MLX framework remains
         | a hobby project for Apple
        
           | layer8 wrote:
           | https://security.apple.com/blog/private-cloud-compute/
           | 
           | https://security.apple.com/documentation/private-cloud-
           | compu...
           | 
           | https://techcrunch.com/2024/12/11/apple-reportedly-
           | developin...
           | 
           | https://techcrunch.com/2025/02/24/apple-commits-500b-to-
           | us-m...
        
           | InTheArena wrote:
           | Apple stated that they were deploying their own hardware for
           | next generation Siri. My thesis is that this is the hardware
           | they developed.
           | 
           | If so, this is hardly a hobby project.
           | 
           | It may not be effective, but there is serious cash behind
           | this.
        
           | gatienboquet wrote:
           | They use their own M chips for IA. They are far more advanced
           | on AI than the majority of company.
           | 
           | They are using OpenAI for now but in couple months they will
           | own the full chain of value.
        
             | behnamoh wrote:
             | we've heard that claim for the past three years, but every
             | effort by them points to the opposite. don't get me wrong,
             | I would love for Apple Intelligence to be smart enough on
             | my iPhone and on my Mac, but honestly, the current version
             | is a complete disappointment.
        
               | DrBenCarson wrote:
               | Apple are working on the hard problems of making AI
               | useful (call them "agents"), not AGI
               | 
               | 1. Small models running locally with well-established
               | tool interfaces ("app intents")
               | 
               | 2. Large models running in a bespoke cloud that can
               | securely and quickly load all relevant tokens from a
               | device before running inference
               | 
               | No AI lab is even close to what Apple is trying to
               | deliver in the next ~12 months
        
           | Spooky23 wrote:
           | They're taking a different and more difficult path of
           | integrating AI with existing apps and workflows.
           | 
           | It's their spin of the Google strategy of targeting providjng
           | services to their enterprise GCP customer. I think we'll see
           | more out of them long term.
        
           | DrBenCarson wrote:
           | Apple have been putting ML models running on their own
           | silicon into production for far longer than any of their
           | competitors. They publish some of the most innovative ML
           | research
           | 
           | They also own distribution to the wealthiest and most
           | influential people in the world
           | 
           | Don't get lost in recency bias
        
         | Teever wrote:
         | FTFA
         | 
         | > Apple's custom-built UltraFusion packaging technology uses an
         | embedded silicon interposer that connects two M3 Max dies
         | across more than 10,000 signals, providing over 2.5TB/s of low-
         | latency interprocessor bandwidth, and making M3 Ultra appear as
         | a single chip to software.
        
           | InTheArena wrote:
           | I RTFA, RMFP
           | 
           | The comment was that the press had reported that the
           | interposer wasn't available. This obviously uses some form of
           | interposer, so the question is if the press missed it, or
           | Apple has something new.
        
             | nsteel wrote:
             | > uses an _embedded_ silicon interposer
             | 
             | It sounds like they're using TSMC's new LSI (Local Si
             | Interconnect) technology, which is their version of Intel's
             | EMIB. It's essentially small islands of silicon, just
             | around the inter-chip connections, embedded within the
             | organic substrate. This gives the advantages of silicon
             | interconnect, without the cost and size restrictions of a
             | silicon interposer. It would not be visible from just
             | looking at the package.
             | 
             | https://www.anandtech.com/show/16031/tsmcs-version-of-
             | emib-l...
             | 
             | https://semianalysis.com/2022/01/06/advanced-packaging-
             | part-...
        
         | darthrupert wrote:
         | Yeah, if only Apple at least semi-supported Linux, their
         | computers would have no competition.
        
           | dwedge wrote:
           | I've been buying and using MBP for 6 or 7 years now, and just
           | _assumed_ I could run Linux on one if I wanted to. I just
           | spent a couple of days trying to get a 2018 MBP working with
           | Linux and found out [edit to clarify] that my other ARM MBP
           | basically won 't work.
           | 
           | I just want a break from MacOS, I'll be buying a Thinkpad and
           | will probably never come back. This isn't my moaning, I
           | understand it's their market, but if their hardware supported
           | Linux (especially dual booting) or Docker native, I'd
           | probably be buying Apple for the next decade and now I just
           | won't be.
        
             | creddit wrote:
             | > trying to get a 2018 MBP working with Linux and found out
             | ARM basically doesn't work.
             | 
             | Since the M series of ARM processors didn't come out until
             | 2020, that would make a lot of sense.
        
               | dwedge wrote:
               | Two separate laptops, I could have been clearer. I have
               | an old 2018 I wanted to try it on, and my daily is M2
               | that would have been next.
        
             | dghlsakjg wrote:
             | A 2018 MacBook would be an intel x86 chip. It's incredibly
             | easy to get Linux running on that machine.
        
               | dwedge wrote:
               | Getting Linux running wasn't difficult. But Mint lost
               | audio (everything else worked), the specialised Mint
               | kernel lost both audio and wifi, and Arch lost both wifi
               | and the onboard keyboard.
               | 
               | I'm sure with tinkering I could eventually get it
               | working, but I'm well past the point of wanting to tinker
               | with hardware and drivers to get Linux working.
        
               | goosedragons wrote:
               | Because of the T2 chip it's actually pretty annoying.
               | Mainline kernels I think are still missing keyboard and
               | trackpad support for those models. Plus a host of other
               | issues.
        
               | sunshowers wrote:
               | No, there's a bunch of MBP generations in the middle that
               | just never got any Linux attention.
        
             | LordIllidan wrote:
             | 2018 MBP is Intel unless you're referring to the T2 chip?
        
               | dwedge wrote:
               | I could have written it clearer. I have both, Intel was
               | the first attempt and when I was struggling to get it up
               | without losing one of wifi, audio and onboard keyboard
               | and read that ARM was worse I gave up. Even the best
               | combination I had (no audio but everything else working)
               | would kill bluetooth after a while if wifi was connected
               | to 2.6. I don't like their hardware enough to fight with
               | it.
        
             | officeplant wrote:
             | Loved my M1 mini, loved my M2 air. I've moved on to 2024 HP
             | Elitebook with an AMD R7 8840U, 1TB replaceable NVME, 32gb
             | of socketed DDR5. 14in laptop with a serviceable enough
             | 1920x1200 matte screen. $800 and a 3 hour drive to the
             | nearest Microcenter. I gave Apple another try (refused
             | apple from 2009-2020 because of the nvidia era issues) and
             | I just can't stomach living off of piles of external drives
             | anymore to make up for their lack luster storage space on
             | the affordable units.
             | 
             | The HP Elitebook was on Ubuntu's list of compatible tested
             | laptops and came in hundreds of dollars less than a
             | Thinkpad. Most of the comparably priced on sale T14's I
             | could find were all crap Intel spec'd ones.
             | 
             | Months in I don't regret it at all and Linux support has
             | been fantastic even for a fairly newer Ryzen chip and not
             | the latest kernel. (I stick to LTS releases of most
             | Distros) Shoving in 4TB of NVME storage and 96GB of DDR5
             | should I feel the need to upgrade would still put me only
             | around $1300 invested in this machine.
        
               | dwedge wrote:
               | I'm not really moaning about the cost or lack of
               | upgradability. I mean, I don't like it but at least you
               | know what you're getting into. I just always assumed
               | Linux as a backup was an option, and more and more OSX is
               | annoying me (last 2 or 3 days it keeps dropping bluetooth
               | for 30 seconds) and more and more I just find the
               | interface distracting. Plus whether it works with
               | external displays over USB C is a crapshoot.
               | 
               | I'll miss the battery life of the M1 chips, and I'm going
               | to have to re-learn how to type (CTRL instead of ALT, fn
               | rarely being on the left, I use fn+left instead of CTRL A
               | in terminals) but otherwise, I think I'm done.
        
               | brailsafe wrote:
               | Surely you're using that thing as a laptop in a minority
               | of cases though, looks like it's basically just specs you
               | bought. That's fine, but if that's all you want then it
               | seems like rather than trying to give a mac a reasonable
               | go of it as opposed to whatever else, you were trying to
               | instead explore a fundamental difference in how you value
               | technology products, which is quite a different battle.
        
             | least wrote:
             | I think the only laptops you won't find weird issues with
             | linux are from smaller manufacturers dedicated to shipping
             | them like the kde laptop or system76. Every other hardware
             | manufacturer, including those that ship laptops with linux
             | preinstalled, probably have weird hardware
             | incompatibilities because they don't fully customize their
             | SKUs with linux support in mind.
             | 
             | Not that I'm discouraging you from switching or anything.
             | If Linux is what you want/need, there's definitely better
             | laptops to be had than a Macbook for that purpose. It's
             | just that weird incompatibilities and having to fight with
             | the operating system on random issues is, at least in my
             | experience, normal when using a linux laptop. Even my T480
             | which has overall excellent compatibility isn't trouble-
             | free.
        
               | dwedge wrote:
               | Something like the brightness buttons not working, or
               | sleep being a little erratic is ok. No released wifi
               | drivers, bluetooth issues, and audio and the keyboard not
               | working are not ok. Apple going backwards in terms of
               | supporting Linux is not something I'm ok with.
        
               | least wrote:
               | There are wifi drivers; you just have to install them
               | separately because they use broadcom chips. It's a
               | proprietary blob. The other things do work, but it
               | requires special packages and you'll need an external
               | keyboard while installing. It's a pain to install, for
               | sure, but it's not insurmountably difficult to get it
               | installed.
               | 
               | Apple Silicon chips are arguably more compatible with
               | Asahi Linux [1], but that's largely in thanks to the hard
               | work of Marcan, who's stepped down as project lead from
               | the project [2].
               | 
               | Overall I still think the right choice is to find a
               | laptop better suited for the purpose of running linux on
               | it, just something that requires more careful
               | consideration than people think. Framework laptops, which
               | seem well suited since ideologically it meshes well with
               | linux users, can be a pain to set up as well.
               | 
               | [1] https://asahilinux.org/
               | 
               | [2] https://marcan.st/2025/02/resigning-as-asahi-linux-
               | project-l...
        
               | dwedge wrote:
               | I know there are wifi and keyboard drivers, because the
               | live boots and installers work with them, but then when
               | it comes to installing they're gone. I know it's not
               | insurmountable, and 10 years ago I'd have done it, but I
               | spent a few hours and got sick of it. I agree with you
               | that it's probably better to get another laptop.
        
           | carlosjobim wrote:
           | No competition among the Linux userbase - which is a client
           | segment that you want to avoid at all costs.
        
         | kokada wrote:
         | > This hardware is really being held back by the operating
         | system at this point.
         | 
         | Apple could either create a 2U rack hardware and support Linux
         | (and I mean Apple supporting it, not hobbysts), or have a build
         | of Darwin headless that could run on that hardware. But in the
         | later case, we probably wouldn't have much software available
         | (though I am sure people would eventually starting porting
         | software to it, there is already MacPorts and Homebrew and I am
         | sure they could be adapted to eventually run in that platform).
         | 
         | But Apple is also not interested in that market, so this will
         | probably never happen.
        
           | naikrovek wrote:
           | > But Apple is also not interested in that market, so this
           | will probably never happen.
           | 
           | they're just a tiny company with shareholders who are really
           | tired of never earning back their investments. give 'em a
           | break. I mean they're still so small that they must protect
           | themselves by requiring that macs be used for publishing
           | iPhone and iPad applications.
        
             | hnaccount_rng wrote:
             | Not to get in the way of good snark or anything. But..
             | Apple isn't _requiring_ that everyone uses MacOS on their
             | systems. But you have to bring your own engineering effort
             | to actually make another OS run. And so far Asahi is the
             | only effort that I'm aware of (there were alternatives in
             | the very beginning, but they didn't even get to M2 right?)
        
               | thesuitonym wrote:
               | > But you have to bring your own engineering effort to
               | actually make another OS run.
               | 
               | I mean, that's usually how it works though. When IBM
               | launched the PS/2, they didn't support anything other
               | than PC-DOS and OS/2, Microsoft had to make MS-DOS work
               | for it (I mean... they _did_ get support from IBM, but
               | not really), the 386BSD and Linux communities brought the
               | engineering effort without IBM 's involvement.
               | 
               | When Apple was making Motorola Macs, they may have given
               | Be a little help, but didn't support any other OSes that
               | appeared. Same with PowerPC.
               | 
               | All of the support for alternative OSes has always come
               | from the community, whether that's volunteers or a
               | commercial interest with cash to burn. Why should that
               | change for Apple silicon?
        
               | jorams wrote:
               | Note that they said (emphasis mine):
               | 
               | > they're still so small that they must protect
               | themselves by requiring that _macs be used for publishing
               | iPhone and iPad applications._
               | 
               | They're not talking about Apple's silicon as a target,
               | but as a development platform.
        
           | ewzimm wrote:
           | There has to be someone at Apple with a contact at IBM that
           | could make Fedora Apple Remix happen. It may not be on-brand,
           | but this is a prime opportunity to make the competition look
           | worse. File it under Community projects at
           | https://opensource.apple.com/projects
        
             | asadm wrote:
             | https://www.globalnerdy.com/wordpress/wp-
             | content/uploads/200...
        
           | alwillis wrote:
           | I wouldn't be so sure about that.
           | 
           | https://news.ycombinator.com/item?id=43271486
        
         | AlchemistCamp wrote:
         | Keep in mind the minimum configuration that has 512GB of
         | unified RAM is $9,499.
        
           | 42lux wrote:
           | Still cheap if the only thing you look for is vram.
        
           | nsteel wrote:
           | And how is it only PS9,699.00!! Does that dollar price
           | include sales tax or are Brits finally getting a bargain?
        
             | vr46 wrote:
             | The US prices never include state sales tax IIRC. Maybe
             | we're finally getting some parity.
        
               | seanmcdirmid wrote:
               | You could always buy one at an apple store without sales
               | tax (e.g. Portland Oregon). But they might not have that
               | one in stock...
        
             | mastax wrote:
             | Tariffs perhaps?
        
             | kgwgk wrote:
             | What's the bargain?
             | 
             | There is also "parity" in other products like a MacBook Pro
             | from PS1,599 / $1,599 or an iPhone 16 from PS799 / $799.
             | PS9,699 / $9,499 is worse than that!
        
           | DrBenCarson wrote:
           | Cheap relative to the alternatives
        
           | stego-tech wrote:
           | I cannot express how dirt cheap that pricepoint is for what's
           | on offer, especially when you're comparing it to rackmount
           | servers. By the time you've shoehorned in an nVidia GPU and
           | all that RAM, you're easily looking at 5x that MSRP; sure,
           | you get proper redundancy and extendable storage for that
           | added cost, but now you also need redundant UPSes and have
           | local storage to manage instead of centralized SANs or NASes.
           | 
           | For SMBs or Edge deployments where redundancy isn't as
           | critical or budgets aren't as large, this is an _incredibly_
           | compelling offering... _if_ Apple actually had a competent
           | server OS to layer on top of that hardware, which it does
           | not.
           | 
           | If they did, though...whew, I'd be quaking in my boots if I
           | were the usual Enterprise hardware vendors. That's a _damn
           | frightening_ piece of competition.
        
             | cubefox wrote:
             | I assume there is a very good reason why AMD and Intel
             | aren't releasing a similar product.
        
               | stego-tech wrote:
               | From my outsider perspective, it's pretty straightforward
               | why they don't.
               | 
               | In Intel's case, there's ample coverage of the company's
               | lack of direction and complacency on existing hardware,
               | even as their competitors ate away at their moat, year
               | after year. AMD with their EPYC chips taking datacenter
               | share, Apple moving to in-house silicon for their entire
               | product line, Qualcomm and Microsoft partnering with
               | ongoing exploration of ARM solutions. A lack of
               | competency in leadership over that time period has
               | annihilated their lead in an industry they used to
               | single-handedly _dictate_ , and it's unlikely they'll
               | recover that anytime soon. So in a sense, Intel _cannot_
               | make a similar product, in a timely manner, that competes
               | in this segment.
               | 
               | As for AMD, it's a bit more complicated. They're seeing
               | pleasant success in their CPU lineup, and have all but
               | thrown in the towel on higher-end GPUs. The industry has
               | broadly rallied around CUDA instead of OpenCL or other
               | alternatives, especially in the datacenter, and AMD
               | realizes it's a fool's errand to try and compete directly
               | there when it's a monopoly in practice. Instead of
               | squandering capital to compete, they can just continue
               | succeeding and working on their own moat in the areas
               | they specialize in - mid-range GPUs for work and gaming,
               | CPUs targeting consumers and datacenters, and APUs
               | finding their way into game consoles, handhelds, and
               | other consumer devices or Edge compute systems.
               | 
               | And that's just getting into the specifics of those two
               | companies. The reality is that any vendor who hasn't
               | already unveiled their own chips or accelerators is
               | coming in at what's perceived to be the "top" of the
               | bubble or market. They'd lack the capital or moat to
               | really build themselves up as a proper competitor, and
               | are more likely to just be acquired in the current
               | regulatory environment (or lack thereof) for a quick
               | payout to shareholders. There's a reason why the
               | persistent rumor of Qualcomm purchasing part or whole of
               | Intel just won't die: the x86 market is rather stagnant,
               | churning out mediocre improvements YoY at growing
               | pricepoints, while ARM and RISC chips continue to
               | innovate on modern manufacturing processes and chip
               | designs. The growth is _not_ in x86, but a juggernaut
               | like Qualcomm would be an ideal buyer for a  "dying" or
               | "completed" business like Intel's, where the only thing
               | left to do is constantly iterate for diminishing returns.
        
               | kridsdale1 wrote:
               | Well said.
        
             | AlchemistCamp wrote:
             | It's not quite an apples to apples comparison, no pun
             | intended. I guess we'll see how it sells.
        
             | kllrnohj wrote:
             | > By the time you've shoehorned in an nVidia GPU and all
             | that RAM, you're easily looking at 5x that MSRP
             | 
             | That nvidia GPU setup will actually have the compute grunt
             | to make use of the RAM, though, which this M3 Ultra
             | probably realistically doesn't. After all, if the only
             | thing that mattered was RAM then the 2TB you can shove into
             | an Epyc or Xeon would already be dominating the AI
             | industry. But they aren't, because it isn't. It certainly
             | hits at a unique combination of things, but whether or not
             | that's maximally useful for the money is a completely
             | different story.
        
               | rbanffy wrote:
               | Had the M3 GPU been much wider, it would be constrained
               | by the memory bandwidth. It might still have an advantage
               | over Nvidia competitors in that it has 512GB accessible
               | to it and will need to push less memory across socket
               | boundaries.
               | 
               | It all depends on the workload you want to run.
        
               | stego-tech wrote:
               | You're forgetting what Apple's been baking into their
               | silicon for (nearly? over?) a decade: the Neural
               | Processing Unit (NPU), now called the "Neural Engine".
               | That's their secret sauce that makes their kit more
               | competitive for endpoint and edge inference than standard
               | x86 CPUs. It's why I can get similarly satisfying
               | performance on my old M1 Pro Macbook Pro with a scant
               | 16GB of memory as I can on my 10900k w/ 64GB RAM and an
               | RTX 3090 under the hood. Just to put these two into
               | context, I ran the latest version of LM Studio with the
               | deepseek-r1-distill-llama-8b model @ Q8_0, both with the
               | exact same prompt and maximally offloaded onto hardware
               | acceleration and memory, with a context window that was
               | entirely empty:                 Write me an AWS
               | CloudFormation file that does the following:
               | * Deploys an Amazon Kubernetes Cluster       * Deploys
               | Busybox in the namespace "Test1", including creating that
               | Namespace       * Deploys a second Busybox in the
               | namespace "Test3", including creating that Namespace
               | * Creates a PVC for 60GB of storage
               | 
               | The M1Pro laptop with 16GB of Unified Memory:
               | * 21.28 seconds for "Thinking"       * 0.22s to the first
               | token       * 18.65 tokens/second over 1484 tokens in its
               | responses       * 1m:23s from sending the input to
               | completion of the output
               | 
               | The 10900k CPU, with 64GB of RAM and a full-fat RTX 3090
               | GPU in it:                 * 10.88 seconds for "thinking"
               | * 0.04s to first token       * 58.02 tokens/second over
               | 1905 tokens in its responses       * 0m:34s from sending
               | the input to completion of the output
               | 
               | Same model, same loader, different architectures and
               | resources. This is why a lot of the AI crowd are on Macs:
               | their chip designs, especially the Neural Engine and
               | GPUs, allow quite competent edge inference while sipping
               | comparative thimbles of energy. It's why if I were all-in
               | on LLMs or leveraged them for work more often (which I
               | intend to, given how I'm currently selling my generalist
               | expertise to potential employers), I'd be seriously
               | eyeballing these little Mac Studios for their local
               | inference capabilities.
        
               | kllrnohj wrote:
               | Uh.... I must be missing something here, because you're
               | hyping up Apple's NPU only to show it getting absolutely
               | obliterated by the equally old 3090? Your 10900K having
               | 64gb of RAM is also irrelevant here...
        
               | stego-tech wrote:
               | You're missing the the bigger picture by getting bogged
               | down in technical details. To an end user, the difference
               | between thirty seconds and ninety seconds is often
               | irrelevant for things like AI, where they _expect_ a
               | delay while it  "thinks". When taken in that context,
               | you're now comparing a 14" laptop running off its
               | battery, to a desktop rig gulping down ~500W according to
               | my UPS, for a mere 66% reduction in runtime for a single
               | query at the expense of 5x the power draw.
               | 
               | Sure, the desktop machine performs better, as would a
               | datacenter server jam-packed full of Blackwell GPUs, but
               | _that 's not what's exciting_ about Apple's
               | implementation. It's the _efficiency_ of it all, being
               | able to handle modern models on comparatively  "weaker"
               | hardware most folks would dismiss outright. _That 's_ the
               | point I was trying to make.
        
               | kllrnohj wrote:
               | We're talking about the m3 ultra here, which is also wall
               | powered and also expensive. Nobody is interested in
               | dropping upwards of $10,000 on a Mac Studio to have
               | "okay" performance just because an unrelated product is
               | battery powered. Similarly saving a few bucks on
               | electricity to triple the time the much, much more
               | expensive engineer time spent waiting on results is
               | foolish
               | 
               | Also Apple isn't unique in having an NPU in a laptop.
               | Fucking everyone does at this point.
        
           | baq wrote:
           | This is a 'shut up and take my money' price, it'll fly off
           | the shelves.
        
           | jread wrote:
           | $8549 with 1TB storage
        
             | rbanffy wrote:
             | It can connect to external storage easily.
        
         | exabrial wrote:
         | If Apple supported Linux (headless) natively, and we could rack
         | m4 pros, I absolutely would use them in our Colo.
         | 
         | The CPUs have zero competition in terms of speed, memory
         | bandwidth. Still blown away no other company has been able to
         | produce Arm server chips that can compete.
        
           | notpushkin wrote:
           | Asahi is a thing. For headless usage it's pretty much ready
           | to go already.
        
             | criddell wrote:
             | The Asahi maintainer resigned recently. What that means for
             | the future only time will tell. I probably wouldn't want to
             | make a big investment in it right now.
        
               | seabrookmx wrote:
               | Your wording makes it sound like it was a one-man show.
               | Asahi has a really strong contributor base, new
               | leadership[1], and the backing of Fedora via the Asahi
               | Fedora Remix. While Hector resigning is a loss, I don't
               | think it's a death knell for the project.
               | 
               | [1]: https://asahilinux.org/2025/02/passing-the-torch/
        
               | hoppp wrote:
               | He was the lead developer and very prominent figure. I
               | think it probably boils down to funding the new
               | developments.
        
               | whimsicalism wrote:
               | it was pretty close to a one man show
        
               | skyyler wrote:
               | By what grounds do you make this statement?
               | 
               | My understanding is there are dozens of people working on
               | it.
        
               | whimsicalism wrote:
               | e: I'm removing this comment because on reflection I
               | think it is probably some form of doxxing and being right
               | on the internet isn't that important.
        
               | ArchOversight wrote:
               | You believe that Hector Martin is also Asahi Lina?
               | 
               | https://bsky.app/profile/lina.yt
               | 
               | https://github.com/AsahiLina
        
               | whimsicalism wrote:
               | e: snip
        
               | raydev wrote:
               | I thought this was confirmed a couple years ago.
        
               | surajrmal wrote:
               | You make it sound like there was only one.
        
             | lynndotpy wrote:
             | Not at all for M3 or M4. Support is for M2 and M1
             | currently.
        
             | EgoIncarnate wrote:
             | M3 support in Asahi is still heavily WIP. I think it
             | doesn't even have display support, Ethernet, or Wifi yet, I
             | think it's only serial over USB . Without any GPU or ANE
             | support, it's not very useful for AI stuff.
             | https://asahilinux.org/docs/M3-Series-Feature-Support/
        
             | WD-42 wrote:
             | It's only a thing for the M1. Asahi is a Sisyphean effort
             | to keep up with new hardware and the outlook is pretty grim
             | at the moment.
             | 
             | Apple's whole m.o. is to take FOSS software, repackage it
             | and sell it. They don't want people using it directly.
        
           | hedora wrote:
           | The last I checked, AMD was outperforming Apple perf/dollar
           | on the high end, though they were close on perf/watt for the
           | TDPs where their parts overlapped.
           | 
           | I'd be curious to know if this changes that. It'd take a lot
           | more than doubling cores to take out the very high power AMD
           | parts, but this might squeeze them a bit.
           | 
           | Interestingly, AMD has also been investing heavily in unified
           | RAM. I wonder if they have / plan an SoC that competes 1:1
           | with this. (Most of the parts I'm referring to are set up for
           | discrete graphics.)
        
             | aurareturn wrote:
             | The M4 Pro is 56% faster in ST performance against AMD's
             | new Strix Halo while being 3.6x more efficient.
             | 
             | Source: https://www.notebookcheck.net/AMD-Ryzen-AI-
             | Max-395-Analysis-...
             | 
             | Cinebench 2024 results.
        
               | hedora wrote:
               | That's a laptop part, so it makes different tradeoffs.
               | 
               | Somewhere on the internet there is a tdp wattage vs
               | performance x-y plot. There's a pareto optimal region
               | where all the apple and amd parts live. Apple owns low
               | tdp, AMD owns high tdp. They duke it out in the middle.
               | Intel is nowhere close to the line.
               | 
               | I'd guess someone has made one that includes datacenter
               | ARM, but I've never seen it.
        
               | aurareturn wrote:
               | High TDP? You mean server-grade CPUs? Apple doesn't make
               | those.
        
               | derefr wrote:
               | True, but these "Ultra" chips do target the same niche as
               | (some) high-TDP chips.
               | 
               | Workstations (like the Mac Studio) have traditionally
               | been a space where "enthusiast"-grade consumer parts
               | (think Threadripper) and actual server parts competed.
               | The owner of a workstation didn't _usually_ care about
               | their machine 's TDP; they just cared that it could chew
               | through their workloads as quickly as possible. But,
               | unlike an actual server, workstations didn't need the
               | super-high core count required for _multitenant_
               | parallelism; and would go idle for long stretches -- thus
               | benefitting (though not requiring) more-efficient power
               | management that could drive down _baseline_ TDP.
        
               | aurareturn wrote:
               | Oh you. mean Threadripper. I thought you were talking
               | about Epyc.
               | 
               | Anyway, I don't think it's comparable really. This thing
               | comes with a fat GPU, NPU, and unified memory.
               | Threadripper is just a CPU.
        
               | mort96 wrote:
               | The GPU and NPU shouldn't be consuming power when not in
               | use. Why shouldn't we compare M3 Ultra to Threadripper?
        
               | diggan wrote:
               | Isn't the rack-mounted Mac Pro supposedly "server-grade"
               | (https://www.apple.com/shop/buy-mac/mac-pro/rack)?
               | 
               | At least judging by the mounts, they want them to be used
               | that way, even though the CPU might not fit with the de
               | facto industry label for "server-grade".
        
               | aurareturn wrote:
               | Server grade CPUs. I thought he was referring to Epyc
               | CPUs.
        
               | hedora wrote:
               | Indeed. The M3 Ultra is in the midrange where they duke
               | it out. Similarly, for its niche, the iPhone CPU is was
               | better than AMD's low end processors.
               | 
               | Anyway the Apple config in the article costs about 5x
               | more than a comparable low end AMD server with 512GB of
               | ram, but adds an NPU. AMD has NPUs in lower end stuff;
               | not sure about this TDP range.
        
               | refulgentis wrote:
               | > You mean server-grade CPUs? Apple doesn't make those.
               | 
               | Right.
               | 
               | It is coming up because we're in a thread about using
               | them as server CPUs. (c.f. "colo", "2U" in OP and OP's
               | child), and the person you're replying to is making the
               | same point you are
               | 
               | For years now, people will comment "these are the best
               | chips, I'd replace all chips with them."
               | 
               | Then someone points out perf/watt is not perf.
               | 
               | Then someone else points out some M-series is much faster
               | than a random CPU.
               | 
               | And someone else points out that the random CPU is not a
               | top performing CPU.
               | 
               | And someone else points out M-series are optimized for
               | perf/watt and it'd suck if it wasn't.
               | 
               | I love my MacBook, the M-series has no competitors in the
               | case it's designed for.
               | 
               | I'd just prefer, at this point, that we can skip long
               | threads rehashing it.
               | 
               | It's a great chip. It's not the fastest, and it's better
               | for that. We want perf/watt in our mobile devices.
               | There's fundamental, well-understood, engineering
               | tradeoffs that imply being great at that necessitates the
               | existence of faster processors.
        
               | aurareturn wrote:
               | It's a great chip. It's not the fastest,
               | 
               | It has the world's fastest single thread.
        
               | refulgentis wrote:
               | I can't quite tell what's going on here, earlier, you
               | seem to be clear -- c.f. "Apple doesn't make server-grade
               | CPUs"
        
               | aurareturn wrote:
               | Correct. But their M4 line has the fastest single thread
               | performance in the world.
        
               | nameequalsmain wrote:
               | According to what source? Passmark says otherwise[1]. The
               | fastest Intel CPUs have both a higher single thread and
               | multi thread score in that test.
               | 
               | [1] https://www.cpubenchmark.net/singleThread.html
        
               | refulgentis wrote:
               | Well, no, right?
               | 
               | The _M4 Max_ had great, I would argue the best at time of
               | release, single _core_ results on _Geekbench_.
               | 
               | That is a different claim from M4 line has the top single
               | thread performance in the world.
               | 
               | I'm curious:
               | 
               | You're signalling both that you understand the
               | fundamental tradeoff ("Apple doesn't make server-grade
               | CPUs") and that you are talking about something else
               | (follow-up with M4 family has top single-thread
               | performance)
               | 
               | What drives that? What's the other thing you're hoping to
               | communicate?
               | 
               | If you are worried that if you leave it at "Apple doesn't
               | make server-grade CPUs", that people will think M4s
               | aren't as great as they are, this is a technical-enough
               | audience, I think we'll understand :) It doesn't come
               | across as denigrating the M-series, but as understanding
               | a fundamental, physically-based, tradeoff.
        
               | yxhuvud wrote:
               | It also include gaming machines. Of course, Apple also
               | don't make those.
        
               | tomrod wrote:
               | > tdp wattage vs performance x-y plot
               | 
               | This?
               | 
               | https://www.videocardbenchmark.net/power_performance.html
               | #sc...
        
               | echoangle wrote:
               | That's GPUs, not CPUs
        
             | nick_ wrote:
             | Same. I'm not sure what to make of the various claims. I
             | personally defer to this table in general:
             | https://www.cpubenchmark.net/power_performance.html.
             | 
             | I'm not sure how those benchmarks translate to common real
             | world use cases.
        
           | hoppp wrote:
           | What about serviceability? These come with soldered in ssd?
           | That would be an issue for server use, Its too expensive to
           | throw it away all for a broken ssd.
        
             | gjsman-1000 wrote:
             | Nah, in many businesses, everything is on a schedule. For
             | desktop computers, a common cycle is 4 years. For servers,
             | maybe a little longer, but not by much. After that date
             | arrives, it's liquidate everything and rebuild.
             | 
             | Having things consistently work is much cheaper than down
             | days caused by your ancient equipment. Apple's SSDs will
             | make it to 5 years no problem - and more likely, 10-15
             | years.
        
               | hedora wrote:
               | At my last N jobs, companies built high end server farms
               | and carefully specced all the hardware. Then they looked
               | at SSD specs and said "these are all fine".
               | 
               | Fast forward 2 years: The $50-$250K machines have a 100%
               | drive failure rate, and some poor bastard has to fly from
               | data center to data center to swap the $60 drive for a
               | $120 one, then re-rack and re-image each machine.
               | 
               | Anyway, soldering a decent SSD to the motherboard board
               | would actually improve reliability at all those places.
        
               | olyjohn wrote:
               | What does soldering it to the board have to do with
               | reliability?
               | 
               | If they were soldered onto those systems you talk about,
               | all those would have had to be replaced instead of just
               | having the drive swapped out and re-imaged.
        
               | wtallis wrote:
               | I think the implication was that a soldered SSD doesn't
               | give the customer as much chance to pick the wrong SSD.
               | But it's still possible for the customer to have a
               | different use case in mind than the OEM did when the OEM
               | is picking what SSD to include.
        
               | choilive wrote:
               | What company was specc'ing out a 6 figure machine just to
               | put in a consumer class SSD?
        
             | galad87 wrote:
             | No, the SSD isn't soldered, it has got one or two removable
             | modules: https://everymac.com/systems/apple/mac-studio/mac-
             | studio-faq...
        
           | PaulHoule wrote:
           | If I read this right, the r8g.48xlarge at AMZN [1] has 192
           | cores and 1536GB which exceeds the M3 Ultra in some metrics.
           | 
           | It reminds me of the 1990s when my old school was using Sun
           | machines based on the 68k series and later SPARC and we were
           | blown away with the toaster-sized HP PA RISC machine that was
           | used for student work for all the CS classes.
           | 
           | Then Linux came out and it was clear the 386 trashed them all
           | in terms of value and as we got the 486 and 586 and further
           | generations, the Intel architecture trashed them in every
           | respect.
           | 
           | The story then was that Intel was making more parts than
           | anybody else so nobody else could afford to keep up the
           | investment.
           | 
           | The same is happening with parts for phones and TSMC's
           | manufacturing dominance -- and today with chiplets you can
           | build up things like the M3 Ultra out of smaller parts.
           | 
           | [1] https://aws.amazon.com/ec2/instance-types/r8g/
        
             | hedora wrote:
             | In fairness, the sun and dec boxes I used back then (up to
             | about 1999) could hold their own against intel machines.
             | 
             | Then, one day, we built a 5 machine amd athlon xp linux
             | cluster for $2000 ($400/machine) that beat all the unix and
             | windows server hardware by at least 10x on $/perf.
             | 
             | It's nice that we have more than one viable cpu vendor
             | these days, though it seems like there's only one viable
             | fab company.
        
               | PaulHoule wrote:
               | In 1998-1999 I had a DEC Alpha on my desktop that was
               | really impressive, it was a 64-bit machine a few years
               | before you could get a 64-bit Athlon.
        
               | hedora wrote:
               | Yeah.
               | 
               | For what we needed, five 32 bit address spaces was enough
               | DRAM. The individual CPU parts were way more than 20% as
               | fast, and the 100Mbit switch was good enough.
               | 
               | (The data basically fit in ram, so network transport time
               | to load a machine was bounded by 4GiB / 8MiB / sec = 500
               | seconds. Also, the hard disks weren't much faster than
               | network back then.)
        
               | winocm wrote:
               | The Alpha architecture was 64-bit from the very beginning
               | (though the amount of addressable virtual memory and
               | physical memory depends on the processor implementation).
               | 
               | I think it goes something like:                 -
               | 2106x/EV4: 34-bit physical, 43-bit virtual       -
               | 21164/EV5: 40-bit physical, 43-bit virtual       -
               | 21264/EV6: 44-bit physical, 48-bit virtual
               | 
               | The EV6 is a bit quirky as it is 43-bit by default, but
               | can use 48-bits when I_CTL<VA_48> or VA_CTL<VA_48> is
               | set. (the distinction of the registers is for each access
               | type, i.e: instruction fetch versus data load/store)
               | 
               | The 21364/EV7 likely has the same characteristics as EV6,
               | but the hardware reference manual seems to have been lost
               | to time...
        
               | PaulHoule wrote:
               | My understanding is that the VAX from Digital was the
               | mother of all "32-bit" architectures to replace the dead
               | end PDP-11 (had a 64kbyte user space so wasn't really
               | that much better than an Apple ][) and PDP-10/20 (36-bit
               | words were awkward after the 8-bit byte took over the
               | industry) The 68k and 386 protected mode were imitations
               | of the VAX.
               | 
               | Digital struggled with the microprocessor transition
               | because they didn't want to kill their cash cow
               | minicomputers with microcomputer-based replacements. They
               | went with the 64-bit Alpha because they wanted to rule
               | the high end in the CMOS age. And they did, for a little
               | while. But the mass market caught up.
        
             | nsteel wrote:
             | It seems Graviton 4 CPUs have 12-channels of DDR5-5600 i.e
             | 540GB/s main memory bandwidth for the CPU to use. M3 Ultra
             | has 64-channels of LPDDR5-6400 i.e. ~800GB/s of memory
             | bandwidth for the CPU or the GPU to use. So the M3 Ultra
             | has way fewer (CPU) cores, but way more memory bandwidth.
             | Depends what you're doing.
        
           | icecube123 wrote:
           | Yea ive been thinking about this for a few years. The Mx
           | series's chip would sell into data centers like crazy if
           | apple went after that market. Especially if they created a
           | server tuned chip. It could probably be their 2nd biggest
           | product line behind the iphone. The performance and
           | efficiency is awesome. I guess it would be meat to see some
           | web serving and database benchmarks to really know.
        
             | kridsdale1 wrote:
             | TSMC couldn't make enough at the leading node in addition
             | to all the iPhone chips Apple has to sell. There's a
             | physical thoughput limit. That's why this isn't M4.
        
           | Apofis wrote:
           | Doesn't MacOS support these things? I'm sure Apple runs these
           | on their datacenters somehow?
        
           | rbanffy wrote:
           | > The CPUs have zero competition in terms of speed, memory
           | bandwidth.
           | 
           | Maybe not at the same power consumption, but I'm sure mid-
           | range Xeons and EPYCs mop the floor with the M3 Ultra in CPU
           | performance. What the M3 Ultra has that nobody else comes
           | close is a decent GPU near a pool of half a terabyte of RAM.
        
           | Thaxll wrote:
           | Apple does not make server CPUs, they make consumer low W
           | CPUs, it's very different.
           | 
           | FYI Apple runs Linux in their DC, so no Apple hardware in
           | their own servers.
        
             | alwillis wrote:
             | > Apple does not make server CPUs, they make consumer low W
             | CPUs, it's very different.
             | 
             | This is silly. Given the performance per watt, the M series
             | would be great in a data center. As you all know,
             | electricity for running the servers and cooling for the
             | servers are the two biggest ongoing costs for a data
             | center; the M series requires less power and runs more
             | efficiently than the average Intel or AMD-based server.
             | 
             | > FYI Apple runs Linux in their DC, so no Apple hardware in
             | their own servers.
             | 
             | That's certainly no longer the case. Apple announced their
             | Private Cloud Compute [1] initiative--Apple designed
             | servers running Apple Silicon to support Apple Intelligence
             | functions that can't run on-device.
             | 
             | BTW, Apple just announced a $500 billion investment [2] in
             | US-based manufacturing, including a 250,000 square foot
             | facility to make _servers_. Yes, these will obviously be
             | for their Private Cloud Compute servers... but it doesn 't
             | have to be only for that purpose.
             | 
             | From the press release:
             | 
             |  _As part of its new U.S. investments, Apple will work with
             | manufacturing partners to begin production of servers in
             | Houston later this year. A 250,000-square-foot server
             | manufacturing facility, slated to open in 2026, will create
             | thousands of jobs._
             | 
             |  _Previously manufactured outside the U.S., the servers
             | that will soon be assembled in Houston play a key role in
             | powering Apple Intelligence, and are the foundation of
             | Private Cloud Compute, which combines powerful AI
             | processing with the most advanced security architecture
             | ever deployed at scale for AI cloud computing. The servers
             | bring together years of R &D by Apple engineers, and
             | deliver the industry-leading security and performance of
             | Apple silicon to the data center._
             | 
             |  _Teams at Apple designed the servers to be incredibly
             | energy efficient, reducing the energy demands of Apple data
             | centers -- which already run on 100 percent renewable
             | energy. As Apple brings Apple Intelligence to customers
             | across the U.S., it also plans to continue expanding data
             | center capacity in North Carolina, Iowa, Oregon, Arizona,
             | and Nevada._
             | 
             | [1]: https://security.apple.com/blog/private-cloud-compute/
             | 
             | [2]: https://www.apple.com/newsroom/2025/02/apple-will-
             | spend-more...
        
         | stego-tech wrote:
         | > This hardware is really being held back by the operating
         | system at this point.
         | 
         | It really is. Even if they themselves won't bring back their
         | old XServe OS variant, I'd really appreciate it if they at
         | least partnered with a Linux or BSD (good callout, ryao) dev to
         | bring a server OS to the hardware stack. The consumer OS, while
         | still better (to my subjective tastes) than Windows, is
         | increasingly hampered by bloat and cruft that make it untenable
         | for production server workloads, at least to my subjective
         | standards.
         | 
         | A server OS that just treats the underlying hardware like a
         | hypervisor would, making the various components attachable or
         | shareable to VMs and Containers on top, would make these things
         | incredibly valuable in smaller datacenters or Edge use cases.
         | Having an on-prem NPU with that much RAM would be a godsend for
         | local AI acceleration among a shared userbase on the LAN.
        
           | ryao wrote:
           | Given shared heritage, I would expect to see Apple work with
           | FreeBSD before I would expect Apple to work with Linux.
        
             | stego-tech wrote:
             | You are _technically_ correct (the best kind of correct).
             | I'm just a filthy heathen who lumps the BSDs and Linux
             | distros under "Linux" as an _incredibly incorrect_ catchall
             | for casual discourse.
        
             | hedora wrote:
             | I heard OpenBSD has been working for a while.
             | 
             | I'm continually surprised Apple doesn't just donate
             | something like 0.1% of their software development budget to
             | proton and the asahi projects. It'd give them a big chunk
             | of the gaming and server markets pretty much overnight.
             | 
             | I guess they're too busy adding dark patterns that re-
             | enable siri and apple intelligence instead.
        
           | hinkley wrote:
           | I miss the XServe almost as much as I miss the Airport
           | Extreme.
        
             | stego-tech wrote:
             | I feel like Apple and Ubiquiti have a missed collaboration
             | opportunity on the latter point, especially with the
             | latter's recent UniFi Express unit. It feels like pairing
             | Ubiquiti's kit with Apple's Homekit could benefit both, by
             | making it easier for Homekit users to create new VLANs
             | specifically for Homekit devices, thereby improving
             | security - with Apple dubbing the term, say, "Secure Device
             | Network" or some marketingspeak to make it easier for
             | average consumers to understand. An AppleTV unit could even
             | act as a limited CloudKey for UniFi devices like Access
             | Points, or UniFi Cameras to connect/integrate as Homekit
             | Cameras.
             | 
             | Don't get me wrong, _I_ wouldn 't use that feature (I
             | prefer self-hosting it all myself), but for folks like my
             | family members, it'd be a killer addition to the lineup
             | that makes my life supporting them much easier.
        
         | jmyeet wrote:
         | I've been looking at the potential for Apple to make really
         | interesting LLM hardware. Their unified memory model could be a
         | real game-changer because NVidia really forces market
         | segmentation by limiting memory.
         | 
         | It's worth adding the M3 Ultra has 819GB/s memory bandwidth
         | [1]. For comparison the RTX 5090 is 1800GB/s [2]. That's still
         | less but the M4 Mac Minis have 120-300GB/s and this will limit
         | token throughput so 819GB/s is a vast improvement.
         | 
         | For $9500 you can buy a M3 Ultra Mac Studio with 512GB of
         | unified memory. I think that has massive potential.
         | 
         | [1]: https://www.apple.com/mac-studio/specs/
         | 
         | [2]: https://www.nvidia.com/en-us/geforce/graphics-
         | cards/50-serie...
        
         | hedora wrote:
         | Other than the NPU, it's not really a game changer; here's a
         | 512GB AMD deepseek build for $2000:
         | 
         | https://digitalspaceport.com/how-to-run-deepseek-r1-671b-ful...
        
           | flakiness wrote:
           | The low energy use can be a game changer if you live in a
           | crappy apartment with limited power capacity. I gave up my
           | big GPU box dream because of that.
        
           | aurareturn wrote:
           | between 4.25 to 3.5 TPS (tokens per second) on the Q4 671b
           | full model.
           | 
           | 3.5 - 4.25 tokens/s. You're torturing yourself. Especially
           | with a reasoning model.
           | 
           | This will run it at 40 tokens/s based on rough calculation.
           | Q4 quant. 37b active parameters.
           | 
           | 5x higher price for 10x higher performance.
        
             | hinkley wrote:
             | Also you don't have to deal with Windows. Which people who
             | do not understand Apple are very skilled at not noticing.
             | 
             | If you've ever used git, svn, or an IDE side by side on
             | corporate Windows versus Apple I don't know why you would
             | ever go back.
        
               | hatthew wrote:
               | Is there a reason one couldn't use linux?
        
               | bigyabai wrote:
               | The PC doesn't have to run Windows either. Strictly
               | speaking, professional applications see MacOS support as
               | an Apple-sanctioned detriment.
               | 
               | > If you've ever used git, svn, or an IDE side by side
               | 
               | I still reach for Windows, even though it's a dogshit OS.
               | I would rather use WSL to write and deploy a single app,
               | as opposed to doing my work in a Linux VM or (god forbid)
               | writing and debugging multiple versions just to support
               | my development runtime. If I'm going to use an ad-
               | encumbered commercial service-slop OS, I might as well
               | pick the one that doesn't actively block my work.
        
               | brailsafe wrote:
               | It's also just clearly a powerful and interesting
               | tinkering project, which there are valid arguments for,
               | but this can just chill out on your desk as an elegant
               | general productivity machine. What it wouldn't do that
               | the tinkering project could do is be upgraded, act as a
               | powerful gaming pc, or cause migraines from constant fan
               | noise.
               | 
               | The custom build would work great though, and even moreso
               | in a server room and as well continues to reveal by
               | comparison how excessively Apple prices it's components.
        
         | intrasight wrote:
         | It certainly is held back and that is unfortunate. But if you
         | can run your workloads on this amazing machine, then that's a
         | lot of compute for the buck.
         | 
         | I assume that there's a community of developers focusing on
         | leveraging this hardware instead of complaining about the
         | operating system.
        
         | hinkley wrote:
         | Given that the M1 Ultra and M2 Ultra also exist, I'd expect
         | either straight binning, or two designs that use mostly the
         | same designs for the cores but more of them and a few extra
         | features.
         | 
         | I love Apple but they love to speak in half truths in product
         | launches. Are they saying the M3 Ultra is their first
         | Thunderbolt 5 computer? I don't recall seeing any previous
         | announcements.
        
           | kridsdale1 wrote:
           | M4 Pro MacBook and Mini have TB5.
        
         | hajile wrote:
         | One of the leakers who got this Mac Studio right claims Apple
         | is reserving the M4 ultra for the Mac Pro to differentiate the
         | products a bit more.
        
         | GeekyBear wrote:
         | I also wondered about binning, so I pulled together how heavily
         | Apple's Max chips were binned in shipping configurations.
         | 
         | M1 Max - 24 to 32 GPU cores
         | 
         | M2 Max - 30 to 38 GPU cores
         | 
         | M3 Max - 30 to 40 GPU cores
         | 
         | M4 Max - 32 to 40 GPU cores
         | 
         | I also looked up the announcement dates for the Max and the
         | Ultra variant in each generation.
         | 
         | M1 Max - October 18, 2021
         | 
         | M1 Ultra - March 8, 2022
         | 
         | M2 Max - January 17, 2023
         | 
         | M2 Ultra - June 5, 2023
         | 
         | M3 Max - October 30, 2023
         | 
         | M3 Ultra - March 12, 2025
         | 
         | M4 Max - October 30, 2024
         | 
         | > My guess is that Apple developed this chip for their internal
         | AI efforts
         | 
         | As good a guess as any, given the additional delay between the
         | M3 Max and Ultra being made available to the public.
        
           | jonplackett wrote:
           | I'm missing the point. What is it you're concluding from
           | these dates?
        
             | GeekyBear wrote:
             | I was referring to the additional year of delay between the
             | M3 Max and M3 Ultra announcements when compared to the M1
             | and M2 generations.
             | 
             | The theory that the M3 Ultra was being produced, but
             | diverted for internal use makes as much sense as any theory
             | I've seen.
             | 
             | It makes at least as much sense as the "TSMC had difficulty
             | producing enough defect free M3 Max chips" theory.
        
       | behnamoh wrote:
       | 819GB/s bandwidth...
       | 
       | what's the point of 512GB RAM for LLMs on this Mac Studio if the
       | speed is painfully slow?
       | 
       | it's as if Apple doesn't want to compete with Nvidia... this is
       | really disappointing in a Mac Studio. FYI: M2 Ultra already has
       | 800GB/s bandwidth
        
         | gatienboquet wrote:
         | NVIDIA RTX 4090: ~1,008 GB/s
         | 
         | NVIDIA RTX 4080: ~717 GB/s
         | 
         | AMD Radeon RX 7900 XTX: ~960 GB/s
         | 
         | AMD Radeon RX 7900 XT: ~800 GB/s
         | 
         | How's that slow exactly ?
         | 
         | You can have 10000000Gb/s and without enough VRAM it's useless.
        
           | ttul wrote:
           | I have a 4090 and, out of curiosity, I looked up the FLOPS in
           | comparison with Apple chips.
           | 
           | Nvidia RTX 4090 (Ada Lovelace)
           | 
           | FP32: Approximately 82.6 TFLOPS
           | 
           | FP16: When using its 4th-generation Tensor Cores in FP16 mode
           | with FP32 accumulation, it can deliver roughly 165.2 TFLOPS
           | (in non-tensor mode, the FP16 rate is similar to FP32).
           | 
           | FP8: The Ada architecture introduces support for an FP8
           | format; using this mode (again with FP32 accumulation), the
           | RTX 4090 can achieve roughly 330.3 TFLOPS (or about 660.6
           | TOPS, depending on how you count operations).
           | 
           | Apple M1 Ultra (The previous-generation top-end Apple chip)
           | 
           | FP32: Around 15.9 TFLOPS (as reported in various benchmarks)
           | 
           | FP16: By similar scaling, FP16 performance would be roughly
           | double that value--approximately 31.8 TFLOPS (again, an
           | estimate based on common patterns in Apple's GPU designs)
           | 
           | FP8: Like the M3 family, the M1 Ultra does not support a
           | dedicated FP8 precision mode.
           | 
           | So a $2000 Nvidia 4090 gives you about 5x the FLOPS, but with
           | far less high speed RAM (24GB vs. 512GB from Apple in the new
           | M3 Ultra). The RAM bandwidth on the Nvidia card is over
           | 1TBps, compared with 800GBps for Apple Silicon.
           | 
           | Apple is catching up here and I am very keen for them to
           | continue doing so! Anything that knocks Nvidia down a notch
           | is good for humanity.
        
             | bigyabai wrote:
             | > Anything that knocks Nvidia down a notch is good for
             | humanity.
             | 
             | I don't love Nvidia a whole lot but I can't understand
             | where this sentinent comes from. Apple abandoned their
             | partnership with Nvidia, tried to support their own CUDA
             | alternative with blackjack and hookers (OpenCL), abandoned
             | _that_ , and began rolling out a proprietary replacement.
             | 
             | CUDA sucks for the average Joe, but Apple abandoned any
             | chance of taking the high road when they cut ties with
             | Khronos. Apple doesn't want better AI infrastructure for
             | humanity; they envy the control Nvidia wields and want it
             | for themselves. Metal versus CUDA is the type of
             | competition where no matter who wins, humanity loses. Bring
             | back OpenCL, then we'll talk about net positives again.
        
           | whimsicalism wrote:
           | h100 sxm - 3TB/s
           | 
           | vram is not really the limiting factor for serious actors in
           | this space
        
             | gatienboquet wrote:
             | If my grandmother had wheels, she'd be a bicycle
        
         | aurareturn wrote:
         | what's the point of 512GB RAM for LLMs on this Mac Studio if
         | the speed is painfully slow?
         | 
         | You can fit the entire Deepseek 671B q4 into this computer and
         | get 41 tokens/s because it's an MoE model.
        
           | KingOfCoders wrote:
           | Your comments went from
           | 
           | "40 tokens/s by my calculations"
           | 
           | to
           | 
           | "40 tokens/s"
           | 
           | to
           | 
           | "41 tokens/s"
           | 
           | Is there a dice involved in "your calculations?"
        
       | pier25 wrote:
       | So weird they released the Mac Studio with an M4 Max and M3
       | Ultra.
       | 
       | Why? They have too many M3 chips on stock?
        
         | bigfishrunning wrote:
         | The M4 Max is faster, the M3 Ultra supports more unified memory
         | -- So pick whichever meets your requirements
        
           | pier25 wrote:
           | Yes but why not release an M4 Ultra?
        
             | wpm wrote:
             | Because the M4 architecture doesn't have the interconnects
             | needed to fuse two Max SoCs together.
        
       | johntitorjr wrote:
       | Lots of AI HW is focused on RAM (512GB!). I have a cost-sensitive
       | application that needs speed (300+ TOPS), but only 1GB of RAM.
       | Are there any HW companies focused on that space?
        
         | Havoc wrote:
         | Greyskull cards might be a fit. Think they're not entirely plug
         | and play though
        
         | xyzsparetimexyz wrote:
         | Isn't that just any discrete (Nvidia,AMD) GPU?
        
         | NightlyDev wrote:
         | Most recent GPUs will do. An older RTX 4070 is over 400 TOPS,
         | the new RTX 5070 is around 1000 TOPS, and the RTX 5090 is
         | around 3600 TOPS.
        
           | johntitorjr wrote:
           | Yeah, that's basically where I'm at with options. Not ideal
           | for a cost sensitive application.
        
         | stefan_ wrote:
         | Just buy any gaming card? Even something like the Jetson AGX
         | Orin boasts 275 TOPS (but they add in all kind of different
         | subsystems to reach that number).
        
           | johntitorjr wrote:
           | The Jetson is interesting!
           | 
           | Can you elaborate on how the TOPS value is inflated? What GPU
           | would be the equivalent of the Jetson AGX Orin?
        
             | stefan_ wrote:
             | The problem with the TOPS is that they add in ~100 TOPS
             | from the "Deep Learning Accelerator" coprocessors, but they
             | have a lot of awkward limitations on what they can do (and
             | software support is terrible). The GPU is an Ampere
             | generation, but there is no strict consumer GPU equivalent.
        
       | crest wrote:
       | Too bad it lacks even the streaming mode SVE2 found in M4 cores.
       | If only Apple would provide a full SVE2 implementation to put
       | pressure on ARM to make it non-optional so AArch64 isn't
       | effectively restricted to NEON for SIMD.
        
         | vlovich123 wrote:
         | This is for AI which is going to benefit more from use of metal
         | / NPU than SIMD.
        
           | bigyabai wrote:
           | Sure, but larger models that fit in that 512gb memory are
           | going to take a long time to tokenize/detokenize without
           | hardware-accelerated BLAS.
        
             | danieldk wrote:
             | Why would you need BLAS for tokenization/detokenization?
             | Pretty much everyone still uses BBPE which amounts to
             | iteratively applying merges.
             | 
             | (Maybe I'm missing something here.)
        
             | ryao wrote:
             | Tokenization/detokenization does not use BLAS.
        
         | stouset wrote:
         | Hell I'm just sitting here hoping the future M5 adopts SVE. Not
         | even SVE2.
        
       | lauritz wrote:
       | They update the Studio to M3 Ultra now, so M4 Ultra can
       | presumably go directly into the Mac Pro at WWDC? Interesting
       | timing. Maybe they'll change the form factor of the Mac Pro, too?
       | 
       | Additionally, I would assume this is a very low-volume product,
       | so it being on N3B isn't a dealbreaker. At the same time, these
       | chips must be very expensive to make, so tying them with luxury-
       | priced RAM makes some kind of sense.
        
         | jsheard wrote:
         | > Maybe they'll change the form factor of the Mac Pro, too?
         | 
         | Either that or kill the Mac Pro altogether, the current
         | iteration is such a half-assed design and blatantly terrible
         | value compared to the Studio that it feels like an end-of-the-
         | road product just meant to tide PCIe users over until they can
         | migrate everything to Thunderbolt.
         | 
         | They recycled a design meant to accommodate multiple beefy GPUs
         | even though GPUs are no longer supported, so most of the
         | cooling and power delivery is vestigial. Plus the PCIe
         | expansion was quietly downgraded, Apple Silicon doesn't have a
         | ton of PCIe lanes so the slots are _heavily_ oversubscribed
         | with PCIe switches.
        
           | pier25 wrote:
           | I've always maintained that the M2 Mac Pro was really a dev
           | kit for manufacturers of PCI parts. It's such a meaningless
           | product otherwise.
        
           | lauritz wrote:
           | I agree. Nonetheless, I agree with Siracusa that the Mac Pro
           | makes sense as a "halo car" in the Mac lineup.
           | 
           | I just find it interesting that you can currently buy a M2
           | Ultra Mac Pro that is weaker than the Mac Studio (for a
           | comparable config) at a higher price. I guess it "remains a
           | product in their lineup" and we'll hear more about it later.
           | 
           | Additionally: If they wanted to scrap it down the road, why
           | would they do this now?
        
             | madeofpalk wrote:
             | The current Mac Pro is not a "halo car". It's a large USB-A
             | dongle for a Mac Studio.
        
           | crowcroft wrote:
           | Agree with this, and it doesn't seem like it's a priority for
           | Apple to bring the kind of expandability back any time soon.
           | 
           | Maybe they can bring back the trash can.
        
             | jsheard wrote:
             | Isn't the Mac Studio the new trash can? I can't think of
             | how a non-expandable Mac Pro could be meaningfully
             | different to the Studio unless they introduce an even
             | bigger chip above the Ultra.
        
               | xp84 wrote:
               | > Mac Studio the new trash can?
               | 
               | Indeed, and tbh it really commits even more to the non-
               | expandability that the Trashcan's designers seemed to be
               | going for. After all, the Trashcan at least had
               | replaceable RAM and storage. The Mac Studio has
               | proprietary storage modules for no reason aside from
               | Apple's convenience/profits (and of course the
               | 'integrated' RAM which I'll charitably assume was done
               | for altruistic reasons because of how it's "shared.")
               | 
               | The difference is that today users are accepting modern
               | Macs where they rejected the Trashcan. I think it's
               | because Apple's practices have become more widespread
               | anyway*, and certain parts of the strategy like the RAM
               | thing at least have upsides. That, and the thermals are
               | better because the Trashcan's thermal design was not fit
               | for purpose.
               | 
               | * I was trying to fix a friend's nice Lenovo laptop
               | recently -- it turned out to just have some bad RAM, but
               | when we opened it up we found it was soldered :(
        
               | crowcroft wrote:
               | Oh yea I wasn't clear I just meant bring back the design
               | - agree the studio basically is the trash can.
        
           | newsclues wrote:
           | The Mac Pro could exist as a PCIe expansion slot storage case
           | that accepts a logic board from the frequently updated
           | consumer models. Or multiple Mac Studio logic boards all in
           | one case with your expansion cards all working together.
        
         | agloe_dreams wrote:
         | My understanding was that Apple wanted to figure out how to
         | build systems with multi-SOCs to replace the Ultra chips. The
         | way it is currently done means that the Max chips need to be
         | designed around the interconnect. Theoretically speaking, a
         | multi-SOC setup could also scale beyond two chips and into a
         | wider set of products.
        
           | aurareturn wrote:
           | I'm not sure if multi-SoC is possible because having 2 GPUs
           | together such that the OS sees it as one big GPU is not very
           | possible if the SoCs are separated.
        
           | rbanffy wrote:
           | Ultra is already two big M3 chips coupled through an
           | interposer. Apple is curiously not going the way of chiplets
           | like the big CPU crowd is.
        
         | lauritz wrote:
         | Interestingly, Apple apparently confirmed to a French website
         | that M4 lacks the interconnect required to make an "Ultra"
         | [0][1], so contrary to what I originally thought, they maybe
         | won't make this after all? I'll take this report with a grain
         | of salt, but apparently it's coming directly from Apple.
         | 
         | Makes it even more puzzling what they are doing with the M2 Mac
         | Pro.
         | 
         | [0] https://www.numerama.com/tech/1919213-m4-max-et-m3-ultra-
         | let...
         | 
         | [1] More context on Macrumors:
         | https://www.macrumors.com/2025/03/05/apple-confirms-m4-max-l...
        
         | raydev wrote:
         | Honestly I don't think we'll see the M4 Ultra at all this year.
         | That they introduced the Studio with an M3 Ultra tells me M4
         | Ultras are too costly or they don't have capacity to build
         | them.
         | 
         | And anyway, I think the M2 Mac Pro was Apple asking customers
         | "hey, can you do anything interesting with these PCIe slots?
         | because we can't think of anything outside of connectivity
         | expansion really"
         | 
         | RIP Mac Pro unless they redesign Apple Silicon to allow for
         | upgradeable GPUs.
        
         | layer8 wrote:
         | Apple says that not every generation will get an "Ultra"
         | variant: https://arstechnica.com/apple/2025/03/apple-
         | announces-m3-ult...
        
       | mrtksn wrote:
       | Let's say you want to have the absolute max memory(512GB) to run
       | AI models and let's say that you are O.K. with plugging a drive
       | to archive your model weights then you can get this for a little
       | bit shy of $10K. What a dream machine.
       | 
       | Compared to Nvidia's Project DIGITS which is supposed to cost $3K
       | and be available "soon", you can get a specs matching 128GB & 4TB
       | version of this Mac for about $4700 and the difference would be
       | that you can actually get it in a week and will run macOS(no idea
       | how much performance difference to expect).
       | 
       | I can't wait to see someone testing the full DeepSeek model on
       | this, maybe this would be the first little companion AI device
       | that you can fully own and can do whatever you like with it,
       | hassle-free.
        
         | bloomingkales wrote:
         | There's an argument that replaceable pc parts is what you want
         | at that price point, but Apple usually provides multi year
         | durability on their pcs. An Apple ai brick should last awhile.
        
         | behnamoh wrote:
         | > I can't wait to see someone testing the full DeepSeek model
         | on this
         | 
         | at 819 GB per second bandwidth, the experience would be
         | terrible
        
           | mrtksn wrote:
           | How many t/s would you expect? I think I feel perfectly fine
           | when its over 50.
           | 
           | Also, people figured a way to run these things in parallel
           | easily. The device is pretty small, I think for someone who
           | wouldn't mind the price tag stacking 2-3 of those wouldn't be
           | that bad.
        
             | behnamoh wrote:
             | I know you're referring to the exolabs app, but the t/s is
             | really not that good. it uses thunderbolt instead of
             | NVlink.
        
             | yk wrote:
             | I think I've seen 800 GB/s memory bandwidth, so a q4 quant
             | of a 400 B model should be 4 t/s if memory bound.
        
           | coder543 wrote:
           | DeepSeek-R1 only has 37B active parameters.
           | 
           | A back of the napkin calculation: 819GB/s / 37GB/tok = 22
           | tokens/sec.
           | 
           | Realistically, you'll have to run quantized to fit inside of
           | the 512GB limit, so it could be more like 22GB of data
           | transfer per token, which would yield 37 tokens per second as
           | the theoretical limit.
           | 
           | It is likely going to be very usable. As other people have
           | pointed out, the Mac Studio is also not the only option at
           | this price point... but it is neat that it _is_ an option.
        
           | bearjaws wrote:
           | Not sure why you are being downvoted, we already know the
           | performance numbers due to memory bandwidth constraints on
           | the M4 Max chips, it would apply here as well.
           | 
           | 525GB/s to 1000GB/s will double the TPS at best, which is
           | still quite low for large LLMs.
        
             | lanceflt wrote:
             | Deepseek R1 (full, Q1) is 14t/s on an M2 Ultra, so this
             | should be around 20t/s
        
         | NightlyDev wrote:
         | The full deepseek R1 model needs more memory than 512GB. The
         | model is 720GB alone. You can run a quantized version on it,
         | but not the full model.
        
       | giancarlostoro wrote:
       | At 9 grand I would certainly hope that they support the device
       | software wise longer than they supported my 2017 Macbook Air. I
       | see no reason to be forced to cough up 10 grand essentially every
       | 7 years to Apple, that's ridiculous.
        
       | moondev wrote:
       | > support for more than half a terabyte of unified memory -- the
       | most ever in a personal computer
       | 
       | AMD Ryzen Threadripper PRO 3995WX released over four years ago
       | and supports 2TB (64c/128t)
       | 
       | > Take your workstation's performance to the next level with the
       | AMD Ryzen Threadripper PRO 3995WX 2.7 GHz 64-Core sWRX8
       | Processor. Built using the 7nm Zen Core architecture with the
       | sWRX8 socket, this processor is designed to deliver exceptional
       | performance for professionals such as artists, architects,
       | engineers, and data scientists. Featuring 64 cores and 128
       | threads with a 2.7 GHz base clock frequency, a 4.2 GHz boost
       | frequency, and 256MB of L3 cache, this processor significantly
       | reduces rendering times for 8K videos, high-resolution photos,
       | and 3D models. The Ryzen Threadripper PRO supports up to 128 PCI
       | Express 4.0 lanes for high-speed throughput to compatible
       | devices. It also supports up to 2TB of eight-channel ECC DDR4
       | memory at 3200 MHz to help efficiently run and multitask
       | demanding applications.
        
         | ryao wrote:
         | I suspect that they do not consider workstations to be personal
         | computers.
        
           | agloe_dreams wrote:
           | No the comment misunderstood the difference between CPU
           | memory and unified memory. This can dedicate 500GB of high
           | bandwidth memory to the GPU. - ~3.5X that of an H200.
        
         | Shank wrote:
         | > unified memory
         | 
         | So unified memory means that the memory is accessible to the
         | GPU and the CPU in a shared pool. AMD does not have that.
        
           | mythz wrote:
           | AMD Ryzen AI Max SoC chips have that [1], but it maxes out at
           | 128GB RAM.
           | 
           | [1] https://www.amd.com/en/products/processors/laptop/ryzen/a
           | i-3...
        
           | curt15 wrote:
           | What about AMD Instinct accelerators like the MI300A[1]?
           | Doesn't that use a single memory pool for both CPU and GPU
           | cores?
           | 
           | [1] https://www.amd.com/en/products/accelerators/instinct/mi3
           | 00/...
        
         | lowercased wrote:
         | I don't think that's "unified memory" though.
        
         | JamesSwift wrote:
         | > unified memory
         | 
         | Its a very specific claim that isnt comparing itself to DIMMs
        
         | aaronmdjones wrote:
         | > It also supports up to 2TB of eight-channel ECC DDR4 memory
         | at 3200 MHz (sic) to help efficiently run and multitask
         | demanding applications.
         | 
         | 8 channels at 3200 MT/s (1600 MHz) is only 204.8 GB/sec; less
         | than a quarter of what the M3 Ultra can do. It's also not GPU-
         | addressable, meaning it's not actually unified memory at all.
        
       | gatienboquet wrote:
       | No benchmarks yet for the LLMs :(
        
       | xyst wrote:
       | I might like Apple again if the SoC could be sold separately and
       | opened up. It would be interesting to see a PC with Asahi or
       | Windows running on Apple's chips.
        
       | c0deR3D wrote:
       | When would Apple silicons made natively support for OSes such as
       | Linux? Apple seemlingly reluctant to release detailed technical
       | reference manual for M-series SoCs, which makes running Linux
       | natively on Apple silicon challenging.
        
         | bigyabai wrote:
         | Probably never. We don't have official Linux support for the
         | iPhone or iPad, I would't hold out hope for Apple to change
         | their tune.
        
           | dylan604 wrote:
           | That makes sense to me though. If you don't run iOS, you
           | don't have App Store and that means a loss of revenue.
        
             | bigyabai wrote:
             | Right. Same goes for MacOS and all of it's convenient
             | software services. Apple might stand to sell more units
             | with a more friendlier stance towards Linux, but unless it
             | sells more Apple One subscriptions or increases hardware
             | margins on the Mac, I doubt Cook would consider it.
             | 
             | If you sit around expecting selflessness from Apple you
             | will waste an enormous amount of time, trust me.
        
             | AndroTux wrote:
             | If you don't run macOS, you don't have Apple iCloud Drive,
             | Music, Fitness, Arcade, TV+ and News and that means a loss
             | of revenue.
        
               | dylan604 wrote:
               | As I replied in else where here, I do not run any Apple
               | Services on my Mac hardware. I do on my iDevices though,
               | but that's a different topic. Again, I could be the edge
               | case
        
               | bigyabai wrote:
               | > I do not run any Apple Services on my Mac hardware
               | 
               | Not even OCSP?
        
               | dylan604 wrote:
               | I have no idea what that is, so ???
               | 
               | But if you're being pedantic, I meant Apple SaaS
               | requiring monthly payments or any other form of using
               | something from Apple where I give them money outside the
               | purchase of their hardware.
               | 
               | If you're talking background services as part of macOS,
               | then you're being intentionally obtuse to the point and
               | you know it
        
             | jobs_throwaway wrote:
             | You lose out on revenue from people who require OS freedom
             | though
        
               | orangecat wrote:
               | All seven of them. I kid, I have a lot of sympathy for
               | that position, but as a practical matter running Linux
               | VMs on an M4 works great, you even get GPU acceleration.
        
         | dylan604 wrote:
         | That's what's weird to me too. It's not like they would lose
         | sales of macOS as it is given away with the hardware. So if
         | someone wants to buy Apple hardware to run Linux, it does not
         | have a negative affect to AAPL
        
           | bigfishrunning wrote:
           | Except the linux users won't be buying Apple software, from
           | the app store or elsewhere. They won't subscribe to iCloud.
        
             | dylan604 wrote:
             | I have Mac hardware and and have spent $0 through the Mac
             | App Store. I do not use iCloud on it either. I do on
             | iDevices though. I must be an edge case though.
        
               | c0deR3D wrote:
               | Same here.
        
               | xp84 wrote:
               | All of us on HN are basically edge cases. The main target
               | market of Macs is super dependent on Apple service
               | subscriptions.
               | 
               | Maybe that's why they ship with insultingly-small SSDs by
               | default, so that as people's photo libraries, Desktop and
               | Documents folders fill up, Apple can "fix your problem"
               | for you by selling you the iCloud/Apple One plan to
               | offload most of the stuff to only live in iCloud.
               | 
               | Either they spend the $400 up front to get 2 notches up
               | on the SSD upgrade, to match what a reasonable device
               | would come with, or they spend that $400 $10 a month for
               | the 40 month likely lifetime of the computer. Apple wins
               | either way.
        
             | jeroenhd wrote:
             | While I don't think Apple wants to change course from its
             | services-oriented profit model, surely someone within Apple
             | has run the calculations for a server-oriented M3/M4
             | device. They're not far behind server CPUs in terms of
             | performance while running a lot cooler AND having
             | accelerated amd64 support, which Ampere lacks.
             | 
             | Whatever the profit margin on an iMac Studio is these days,
             | surely improving non-consumer options becomes profitable at
             | some point if you start selling them by the thousands to
             | data centers.
        
             | cosmic_cheese wrote:
             | Those buying the hardware to run Linux also aren't writing
             | software for macOS to help make the platform more
             | attractive.
        
               | dylan604 wrote:
               | There are a large number of macOS users that are not app
               | software devs. There's a large base of creative users
               | that couldn't code their way out of a wet paper bag, yet
               | spend lots of money on Mac hardware.
               | 
               | This forum looses track of the world outside this echo
               | chamber
        
               | cosmic_cheese wrote:
               | I'm among them, even if creative works aren't my bread
               | and butter (I'm a dev with a bit of an artistic bent).
               | 
               | That said, attracting creative users also adds value to
               | the platform by creating demand for creative software for
               | macOS, which keeps existing packages for macOS maintained
               | and brings new ones on board every so often.
        
               | dylan604 wrote:
               | I'm a mix of both, however, my dev time does not create
               | macOS or iDevice apps. My dev is still focused on
               | creative/media workflows, while I still get work for
               | photo/video. I don't even use Xcode any further than
               | running the CLI command to install the necessary tools to
               | have CLI be useful.
        
           | re-thc wrote:
           | > So if someone wants to buy Apple hardware to run Linux, it
           | does not have a negative affect to AAPL
           | 
           | It does. Support costs. How do you prove it's a hardware
           | failure or software? What should they do? Say it
           | "unofficially" supports Linux? People would still try to get
           | support. Eventually they'd have to test it themselves etc.
        
             | dylan604 wrote:
             | Apple has already been in this spot. With the TrashCan
             | MacPro, there was an issue with DaVinci Resolve under OS X
             | at the time where the GPU was cause render issues. If you
             | then rebooted into Windows with BootCamp using the exact
             | same hardware and open up the exact same Resolve project
             | with the exact same footage, the render errors disappeared.
             | Apple blamed Resolve. DaVinci blamed GPU drivers. GPU
             | blamed Apple.
        
               | re-thc wrote:
               | > Apple has already been in this spot.
               | 
               | Has been. This is importance. Past tense. Maybe that's
               | the point - they gave up on it acknowledging the extra
               | costs / issues.
        
             | k8sToGo wrote:
             | We used to have bootcamp though.
        
               | dylan604 wrote:
               | There you go using logical arguments in an emotional
               | illogical debate.
        
           | amelius wrote:
           | But then they'd have to open up their internal documentation
           | of their silicon, which could possibly be a legal disaster
           | (patents).
        
         | WillAdams wrote:
         | Is it not an option to run Darwin? What would Linux offer that
         | that would not?
        
           | internetter wrote:
           | Darwin is a terrible server operating system. Even getting a
           | process to run at server boot reliably is a nightmare.
        
           | kbolino wrote:
           | I don't think Darwin has been directly distributed in
           | bootable binary format for _many_ years now. And, as far as I
           | know, it has never been made available in that format for
           | Apple silicon.
        
         | cpfleming wrote:
         | https://asahilinux.org/
        
       | NorwegianDude wrote:
       | The memory amount is fantastic, memory bandwidth is half
       | decent(~800 GB/s), and the compute capabilities are terrible(36
       | TOPS).
       | 
       | For comparison, a single consumer card like the RTX 5090 is only
       | 32 GB of memory, has 1792 GB/s memory and 3593 TOPS of compute.
       | 
       | The use cases will be limited. While you can't run a 600B model
       | directly like Apple says(cause you need more memory for that),
       | you can run a quantized version, but it will be very slow unless
       | its a MoE architecture.
        
         | Havoc wrote:
         | >36 Tops
         | 
         | Thats going to be the NPU specifically. Pretty much nothing on
         | llm front seems to use NPUs at this stage (copilot snapdragon
         | laptops aside) so not sure the low number is a problem
        
         | BonoboIO wrote:
         | A factor of 100 faster in compute ... wow.
         | 
         | It will be interesting when somebody will upgrade the ram ram
         | of the 5090 like they did with 4090s
        
           | bilbo0s wrote:
           | They're a bit confused and not comparing the same compute.
           | 
           | Pretty sure they're comparing Nvidia's gpu to Apple's npu.
        
             | NorwegianDude wrote:
             | I'm not confused at all. It's the real numbers. Feel free
             | to provide anything that suggests that the TOPS of the GPU
             | in M chips are faster than the dedicated hardware for it.
             | But you can't, cause it's not true. If you think Apple
             | added the neural engine just for fun then I don't know what
             | to tell you.
             | 
             | You have a fundamental flaw in your understanding of how
             | both chips work. Not using the tensor cores would be
             | slower, and the same goes for apples neural engine. The
             | numbers are both for the hardware both have implemented for
             | maximum performance for this task.
        
         | llm_nerd wrote:
         | I do think people are going a little overboard with all the
         | commentary about AI in this discussion, and you rightly cite
         | some of the empirical reasons. People are trying to rationalize
         | convincing themselves to buy one of these, but they're deluding
         | themselves.
         | 
         | It's nice that these devices have loads of memory, but they
         | don't have remotely the necessary level of compute to be
         | competitive in the AI space. As a fun thing to run a local LLM
         | as a hobbyist, sure, but this presents zero threat to nvidia.
         | 
         | Apple hardware is irrelevant in the AI space, outside of making
         | YouTube "I ran a quantized LLM on my 128GB Mac Mini" type
         | content for clicks, and this release doesn't change that.
         | 
         | Looks like a great desktop chip though.
         | 
         | It would be nice if nvidia could start giving their less
         | expensive offerings more memory, though they're currently in
         | the realm Intel was 15 yearsago, thinking that their biggest
         | competition is themselves.
        
         | dagmx wrote:
         | You're comparing two different things.
         | 
         | The compute level you're talking about on the M3 Ultra is the
         | neural engine. Not including the GPU.
         | 
         | I expect the GPU here will be behind a 5090 for compute but not
         | by the unrelated numbers you're quoting. After all, the 5090
         | alone is multiple times the wattage of this SoC.
        
           | bigyabai wrote:
           | > After all, the 5090 alone is multiple times the wattage of
           | this SoC.
           | 
           | FWIW, normalizing the wattages (or even underclocking the
           | GPU) will still give you an Nvidia advantage most days.
           | Apple's GPU designs are closer to AMD's designs than
           | Nvidia's, which means they omit a lot of AI accelerators to
           | focus on a less-LLM-relevent raster performance figure.
           | 
           | Yes, the GPU is faster than the NPU. But Apple's GPU designs
           | haven't traditionally put their competitors out of a job.
        
             | dagmx wrote:
             | M2 Ultra is ~250W (averaging various reports since Apple
             | don't publish) for the entire SoC.
             | 
             | 5090 is 575W without the CPU.
             | 
             | You'd have to cut the Nvidia to a quarter and then find a
             | comparable CPU to normalize the wattage for an actual
             | comparison.
             | 
             | I agree that Apple GPUs aren't putting the dedicated GPU
             | companies in danger on the benchmarks, but they're also not
             | really targeting it? They're in completely different zones
             | on too many fronts to really compare.
        
               | bigyabai wrote:
               | Well, select your hardware of choice and see for yourself
               | then: https://browser.geekbench.com/opencl-benchmarks
               | 
               | > but they're also not really targeting it?
               | 
               | That's fine, but it's not an excuse to ignore the
               | power/performance ratio.
        
               | dagmx wrote:
               | But I'm not ignoring the power/performance ratio? If
               | anything, you are doing that by handwaving away the
               | difference.
               | 
               | Give me a comparable system build where the NVIDIA GPU +
               | any CPU of your choice is running at the same wattage as
               | an M2 Ultra, and outperforms it on average. You'd get
               | 150W for the GPU and 150W for the CPU.
               | 
               | Again, you can't really compare the two. They're
               | inherently different systems unless you only care about
               | singular metrics.
        
           | llm_nerd wrote:
           | Using the NPU numbers grossly _overstates_ the AI performance
           | of the Apple Silicon hardware, so they 're actually giving
           | Apple the benefit of the doubt.
           | 
           | Most AI training and inference (including generative AI) is
           | bound by large scale matrix MACs. That's why nvidia fills
           | their devices with enormous numbers of tensor cores and Apple
           | / Qualcomm et al are adding NPUs, filling largely the same
           | gap. Only nvidia's not only are a magnitude+ more performant,
           | they've massively more flexible (in types and applications),
           | usable for training and inference, while Apple's is only even
           | useful for a limited set of inference tasks (due to
           | architecture and type limits).
           | 
           | Apple can put the effort in and making something actually
           | competitive with nvidia, but this isn't it.
        
             | dagmx wrote:
             | Care to share the TOPs numbers for the Apple GPUs and show
             | how this would "grossly overstate" the numbers?
             | 
             | Apple won't compete with NVIDIA, I'm not arguing that. But
             | your opening line will only make sense if you can back up
             | the numbers and the GPU performance is lower than the ANE
             | TOPS.
        
               | llm_nerd wrote:
               | Tensor / neural cores are very easy to benchmark and give
               | a precise number because they do a single well-defined
               | thing at a large scale. So GPU numbers are less common
               | and much more use-specific.
               | 
               | However the M2 Ultra GPU is estimated, with every bit of
               | compute power working together, at about 26 TOPS.
        
               | dagmx wrote:
               | Could you provide a link for that TOPS count? (And
               | specifically TOPs with comparable unit sizes since NVIDIA
               | and Apple did not use the same units till recently)
               | 
               | The only similar number I can find is for TFLOPS vs TOPS
               | 
               | Again I'm not saying the GPU will be comparable to an
               | NVIDIA one, but that the comparison point isn't sensible
               | in the comments I originally replied to.
        
           | NorwegianDude wrote:
           | No, I'm not. I'm comparing the TOPS of the M3 Ultra and the
           | tensor cores of the RTX 5090.
           | 
           | If not, what is the TOPS of the GPU, and why isn't apple
           | talking about it if there is more performance hidden
           | somewhere? Apple states 18 TOPS for the M3 Max. And why do
           | you think Apple added the neural engine, if not to accelerate
           | compute?
           | 
           | The power draw is quite a bit higher, but it's still much
           | more efficient as the performance is much higher.
        
             | dagmx wrote:
             | The ANE and tensor cores are not comparable though. One is
             | literally meant for low cost inference while the others are
             | meant for acceleration of training.
             | 
             | If you squint, yeah they look the same, but so does the
             | microcontroller on the GPU and a full blown CPU. They're
             | fundamentally different purposes, architectures and scale
             | of use.
             | 
             | The ANE can't even really be used directly. Apple heavily
             | restricts the use via CoreML APIs for inference. It's only
             | usable for smaller, lightweight models.
             | 
             | If you're comparing to the tensor cores, you really need to
             | compare against the GPU which is what gets used by apples
             | ml frameworks such as MLX for training etc.
             | 
             | It will still be behind the NVIDIA GpU, but not by anywhere
             | near the same numbers.
        
               | llm_nerd wrote:
               | >The ANE and tensor cores are not comparable though
               | 
               | They're both built to do the most common computation in
               | AI (both training and inference), which is multiply and
               | accumulate of matrices - A * B + C. The ANE is far more
               | limited because they decided to spend a lot less silicon
               | space on it, focusing on low-power inference of quantized
               | models. It is fantastically useful for a lot of on-device
               | things like a lot of the photo features (e.g. subject
               | detection, text extraction, etc).
               | 
               | And yes, you need to use CoreML to access it _because_ it
               | 's so limited. In the future Apple will absolutely, with
               | 100% certainty, make an ANE that is as flexible and
               | powerful as tensor cores, and they force you through
               | CoreML because it will automatically switch to using it
               | (where now you submit a job to CoreML and for many it
               | will opt to use the CPU/GPU instead, or a combination
               | thereof. It's an elegant, forward thinking
               | implementation). Their AI performance and credibility
               | will greatly improve when they do.
               | 
               | >you really need to compare against the GPU
               | 
               | From a raw performance perspective, the ANE is capable of
               | more matrix multiply/accumulates than the GPU is on Apple
               | Silicon, it's just limited to types and contexts that
               | make it unsuitable for training, or even for many
               | inference tasks.
        
               | NorwegianDude wrote:
               | So now the TOPS are not comparable because M3 is much
               | slower than an Nvidia GPU? That's not how comparisons
               | work.
               | 
               | My numbers are correct, the M3 Ultra has around 1 % of
               | the TOPS performance of a RTX 5090.
               | 
               | Comparing against the GPU would look even worse for
               | apple. Do you think Apple added the neural engine just
               | for fun? This is exactly what the neural engine is there
               | for.
        
               | dagmx wrote:
               | You're completely missing the point. The ANE is not
               | equivalent as a component to the tensor cores. It has
               | nothing to do with comparison of TOPs but as what they're
               | intended for.
               | 
               | Try and use the ANE in the same way you would use the
               | tensor cores. Hint: you can't, because the hardware and
               | software will actively block you.
               | 
               | They're meant for fundamentally different use cases and
               | power loads. Even apples own ML frameworks do not use the
               | ANE for anything except inference.
        
       | tempodox wrote:
       | I could salivate over the hardware no end, if only Apple software
       | (including the OS) weren't that shoddy.
        
       | bredren wrote:
       | Apart from enabling an 120h update to the XDR Pro, does TB5 offer
       | a viable pathway for eGPUs on Apple Silicon macbooks?
       | 
       | This is a cool computer, but not something I'd want to lug
       | around.
        
         | mohsen1 wrote:
         | For AI stuff, 120GB/s is not really useful really...
        
       | submeta wrote:
       | I am confused. I got an M4 with 64 GB Ram. Did I buy something
       | from the future? :) Now why M3 now? Not M4 Ultra.
        
         | seanmcdirmid wrote:
         | It took them awhile to developed their ultra chip and this is
         | what they had ready? I'm sure they are working on the M4 ultra,
         | but they are just slow at it.
         | 
         | I bought a refubished M3 max to run LLMs (can only go up to 70b
         | with 4 bit quant), and it is only slightly slower than the more
         | expensive M4 max.
        
         | opan wrote:
         | Haven't the Max/Ultra type chips always come much later, close
         | to when the next number of standard chips came out? M2 Max was
         | not available when M2 launched, for example.
        
           | SirMaster wrote:
           | An Ultra has never come out after the next gen base model,
           | let alone the next gen Pro/Max model before.
           | 
           | M1: November 10, 2020
           | 
           | M1 Pro: October 18, 2021
           | 
           | M1 Max: October 18, 2021
           | 
           | M1 Ultra: March 8, 2022
           | 
           | -------------------------
           | 
           | M2: June 6, 2022
           | 
           | M2 Pro: January 17, 2023
           | 
           | M2 Max: January 17, 2023
           | 
           | M2 Ultra: June 5, 2023
           | 
           | -------------------------
           | 
           | M3: October 30, 2023
           | 
           | M3 Pro: October 30, 2023
           | 
           | M3 Max: October 30, 2023
           | 
           | -------------------------
           | 
           | M4: May 7, 2024
           | 
           | M4 Pro: October 30, 2024
           | 
           | M4 Max: October 30, 2024
           | 
           | -------------------------
           | 
           | M3 Ultra: March 5, 2025
        
             | kridsdale1 wrote:
             | So about a year and a half delay for Ultra, but the M2 was
             | an anomaly.
        
             | ellisv wrote:
             | I'd also point out that there was a rather awkward
             | situation with M1/M2 chips where lower end devices were
             | getting newer chips before the higher end devices. For
             | example, the 14 and 16-inch MacBooks Pro didn't get a M2
             | series chip until about 6 months after the 13 and 15-inch
             | MacBooks Air. This left some professionals and power users
             | frustrated.
             | 
             | The M3 Ultra might perform as well as the M4 Max - I
             | haven't seen benchmarks yet - but the newer series is in
             | the higher end devices which is what most people expect.
        
       | ferguess_k wrote:
       | Ah, if we can have the hardware and the freedom of installing a
       | good Linux repo on top of it. How is Asahi? Is it good enough? I
       | assume, that since Asahi is focused on Apple hardware, it should
       | have an easier time figuring out drivers and etc?
        
         | bigyabai wrote:
         | > How is Asahi?
         | 
         | For M3 and M4 machines, hardware support is pretty derilict:
         | https://asahilinux.org/docs/M3-Series-Feature-Support/
        
           | ferguess_k wrote:
           | Thanks, looks like even M1 support has some gaps:
           | 
           | https://asahilinux.org/docs/M1-Series-Feature-
           | Support/#table...
           | 
           | I assume anything that doesn't have "linux-asahi" is not
           | supported -- or any WIP is not supported.
           | 
           | Wish I had the skills to help them. Targeting just one set of
           | architecture, I think Asahi has more chances of success.
        
             | bigyabai wrote:
             | It's just not an easy task. I can't help but compare it to
             | the Nouveau project spending years of effort to reverse-
             | engineer just a few GPU designs. Then Nvidia changed their
             | software and hardware architecture, and things went from
             | "relatively hopeful" to "there is no chance" overnight.
        
               | ferguess_k wrote:
               | I agree, it's a lot of work, plus Apple definitely is not
               | not going to help with the project. Maybe an alternative
               | is something like Framework -- find some good enough
               | hardware and support it.
        
       | _alex_ wrote:
       | apple keeps talking about the Neural Engine. Does anything
       | actually use it? Seems like all the current LLM and Stable
       | Diffusion packages (including MLX) use the GPU.
        
         | gield wrote:
         | Face ID, taking pictures, Siri, ARKit, voice-to-text
         | transcription, face recognition and OCR in photos, noise
         | filtering, ...
        
           | cubefox wrote:
           | These have been possible in much smaller smartphone chips for
           | years.
        
             | stouset wrote:
             | Possible != energy efficient, which is important for mobile
             | devices.
        
               | cubefox wrote:
               | If the energy efficiency of things like Face ID was
               | indeed so far so bad that you need a more efficient M3
               | Ultra, how come Face ID was integrated into smartphones
               | years ago, apparently without significant negative impact
               | on battery life?
        
         | anentropic wrote:
         | Yeah I agree.
         | 
         | The Neural Engine is useful for a bunch of Apple features, but
         | seems weirdly useless for any LLM stuff... been wondering if
         | they'd address it on any of these upcoming products. AI is so
         | hype right now it seems odd that they have specialised
         | processor that doesn't get used for the kind of AI people are
         | doing. I can see in the latest release:
         | 
         | > Mac Studio is a powerhouse for AI, capable of running large
         | language models (LLMs) with over 600 billion parameters
         | entirely in memory, thanks to its advanced GPU
         | 
         | https://www.apple.com/newsroom/2025/03/apple-unveils-new-mac...
         | 
         | i.e. LLMs still run on the GPU not the NPU
        
           | aurareturn wrote:
           | On the iPhone, it runs on the NPU.
        
       | 827a wrote:
       | Very curiously: They upgraded the Mac Studio but not the Mac Pro
       | today.
        
       | FloatArtifact wrote:
       | So, what's the question if the M1/M2 Ultra was limited by GPU/NPU
       | or more memory bandwidth at this point?
       | 
       | I'm curious what instruction sets may have been included with the
       | M3 chip that the other two lack for AI.
       | 
       | So far the candidates seem to be NVIDIA digits, Framework
       | Desktop, M1 64gb M2/M3 128gb studio/ultra.
       | 
       | The GPU market isn't competitive enough for the amount of VRAM
       | needed. I was hoping for an Battlemage GPU Model with 24GB that
       | would be reasonably priced and available.
       | 
       | The framework desktop and devices I think a second generation
       | will be significantly better than what's currently on offer
       | today. Rationale below...
       | 
       | For a max spec processor with ram at $2,000, this seems like a
       | decent deal given today's market. However, this might age very
       | fast for three reasons.
       | 
       | Reason 1: LPDDR6 may debut in the next year or two this could
       | bring massive improvements to memory bandwidth and capacity for
       | soldered on memory.
       | 
       | LPDDR6 vs LPDDR5 - Data bus width - 24 bits, 16 bits Burst length
       | - 24 bits, 15 bits Memory bandwidth - Up to 38.4 GB/s, Up to 6.7
       | GB/s
       | 
       | - Camm ram may or may not be maintain signal integrity as memory
       | bandwidth increases. Until I see it implemented for a AI use-case
       | in a cost-effective manner, I am skeptical.
       | 
       | Reason 2: - It's a laptop chip with limited PCI lanes and reduced
       | power envelope. Theoretically, a desktop chip could have better
       | performance, more lanes, socketable (Although, I don't think I've
       | seen a socketed CPU with soldered RAM)
       | 
       | Reason 3: In addition, what does hardware look like being
       | repurposed in the future compared to alternatives?
       | 
       | - Unlike desktop or server counterparts which can have a higher
       | cpu core count, PCEe/IO Expansion, this processor with its
       | motherboard is limited on re-purposing later down the line as a
       | server to self-host other software besides AI. I suppose could be
       | turned into a overkill, NAS with ZFS and HBA Single Controller
       | Card in new case.
       | 
       | - Buying into the framework desktop is pretty limited based on
       | the form factor. Next generation might be able to include a 16x
       | slot fully populated, a 10G nic. That seems about it if they're
       | going to maintain the backward compatibility philosophy given the
       | case form factor.
        
       | gpapilion wrote:
       | I think this will eventually morph into apples server fleet. This
       | in conjunction with the ai server factory they are opening makes
       | a lot of sense.
        
       | api wrote:
       | Half a terabyte could run 8 bit quantized versions of some of
       | those full size llama and deepseek models. Looking forward to
       | seeing some benchmarks on that.
        
         | zamadatix wrote:
         | Deepseek would need Q5ish level quantization to fit.
        
       | ntqvm wrote:
       | Disappointing announcement. M4 brings a significant uplift over
       | M3, and the ST performance of the M3 Ultra will be significantly
       | worse than the M4 Max.
       | 
       | Even for its intended AI audience, the ISA additions in M4
       | brought significant uplift.
       | 
       | Are they waiting to put M4 Ultra into the Mac Pro?
        
       | tuananh wrote:
       | but is it actually usable for anything if it's too slow.
       | 
       | Has anyone has a ballpark number how many tokens per second we
       | can get with this?
        
       | cxie wrote:
       | 512GB of unified memory is truly breaking new ground. I was
       | wondering when Apple would overcome memory constraints, and now
       | we're seeing a half-terabyte level of unified memory. This is
       | incredibly practical for running large AI models locally ("600
       | billion parameters"), and Apple's approach of integrating this
       | much efficient memory on a single chip is fascinating compared to
       | NVIDIA's solutions. I'm curious about how this design of "fusing"
       | two M3 Max chips performs in terms of heat dissipation and power
       | consumption though
        
         | bigyabai wrote:
         | For enterprise markets, this is table stakes. A lot of
         | datacenter customers will probably ignore this release
         | altogether since there isn't a high-bandwidth option for
         | systems interconnect.
        
           | pavlov wrote:
           | The Mac Studio isn't meant for data centers anyway? It's a
           | small and silent desktop form factor -- in every respect the
           | opposite of a design you'd want to put in a rack.
           | 
           | A long time ago Apple had a rackmount server called Xserve,
           | but there's no sign that they're interested in updating that
           | for the AI age.
        
             | bigyabai wrote:
             | It's the Ultra chip, the same one that goes into the
             | rackmount Mac Pro. I don't think there's much confusion as
             | to who this is for.
             | 
             | > there's no sign that they're interested in updating that
             | for the AI age.
             | 
             | https://security.apple.com/blog/private-cloud-compute/
        
               | pavlov wrote:
               | I genuinely forgot the Mac Pro still exists. It's been so
               | long since I even saw one.
               | 
               | And I've had every previous Mac tower design since 1999:
               | G4, G5, the excellent dual Xeon, the horrible black trash
               | can... But Apple Silicon delivers so much punch in the
               | Studio form factor, the old school Pro has become very
               | niche.
               | 
               | Edit - looks like the new M3 Ultra is only available in
               | Mac Studio anyway? So the existence of the Pro is moot
               | here.
        
               | choilive wrote:
               | never understood the hate on the trash can. Isn't the mac
               | studio basically the same idea as the trash can but even
               | less upgradeable?
        
               | pavlov wrote:
               | The Mac Studio hit a sweet spot in 2023 that the trash
               | can Mac Pro couldn't ten years earlier. It's mostly
               | thanks to the high integration of Apple Silicon and
               | improved device availability and speed of Thunderbolt.
               | 
               | The 2013 Mac Pro was stuck forever with its original
               | choice of Intel CPU and AMD GPU. And it was unfortunately
               | prone to overheating due to these same components.
        
               | wtallis wrote:
               | The trash can also suffered from hitting the market right
               | around when the industry gave up on making dual-GPU work.
        
               | Alupis wrote:
               | Outside of extremely niche use cases, who is racking
               | apple products in 2025?
        
               | nordsieck wrote:
               | There's MacMiniVault (nee MacMiniColo)
               | https://www.macminivault.com/
               | 
               | Not sure if they count as niche or not.
        
               | waveringana wrote:
               | github for their macos runners (pretty sure theyre m1
               | minis)
        
               | wpm wrote:
               | AWS
        
               | kube-system wrote:
               | Every provider who offers MacOS in the cloud.
        
               | Alupis wrote:
               | So MacOS is still not allowed to be virtualized per the
               | EULA? Wow if that's true...
        
               | kube-system wrote:
               | MacOS is permitted to be virtualized... as long as the
               | host is a Mac. :)
        
               | wtallis wrote:
               | The rackmount Mac Pro is for A/V studios, not
               | datacenters.
        
               | phillco wrote:
               | Don't forget CI/CD farms for iOS builds, although I think
               | it's much more cost effective to just make Minis or
               | Studios work, despite their nonstandard formfactor
        
               | kridsdale1 wrote:
               | Google and Facebook have vast fleets of Minis in custom
               | chassis for this purpose.
        
             | alwillis wrote:
             | Apple recently announced they're building a new plant in
             | Texas to produce servers. Yes, they need servers for their
             | Private Compute Cloud used by Apple Intelligence, but it
             | doesn't _only_ need to be for that.
             | 
             | From https://www.apple.com/newsroom/2025/02/apple-will-
             | spend-more...
             | 
             |  _As part of its new U.S. investments, Apple will work with
             | manufacturing partners to begin production of servers in
             | Houston later this year. A 250,000-square-foot server
             | manufacturing facility, slated to open in 2026, will create
             | thousands of jobs._
        
           | PaulHoule wrote:
           | That article says you can connect them through the
           | Thunderbolt 5 somehow to form clusters.
        
             | burnerthrow008 wrote:
             | I wonder if that's something new, or just the same virtual
             | network interface that's been around since the TB1 days (a
             | new network interface appears when you connect two Macs
             | with a TB cable)
        
               | PaulHoule wrote:
               | Well already it is faster than GigE...
               | 
               | https://arstechnica.com/gadgets/2013/10/os-x-10-9-brings-
               | fas...
               | 
               | Thunderbolt is PCIe-based and I could imagine it being
               | extended to do what
               | https://en.wikipedia.org/wiki/Compute_Express_Link and
               | https://en.wikipedia.org/wiki/InfiniBand
        
           | spiderfarmer wrote:
           | You can use Thunderbolt 5 interconnect (80Gbps) to run LLMs
           | distributed across 4 or 5 Mac Studios.
        
             | whimsicalism wrote:
             | why would you ever want to do that remains an open question
        
               | aurareturn wrote:
               | Probably some kind of local LLM server. 1TB of 1.6 TB/s
               | memory if you link 2 together. $20k total. Half the price
               | of a single Blackwell chip.
        
               | whimsicalism wrote:
               | with a vanishingly small fraction of flops and a small
               | fraction of memory bandwidth
        
               | aurareturn wrote:
               | It's good enough to run whatever local model you want. 2x
               | 80core GPU is no joke. Linking them together gives it
               | effectively 1.6 TB/s of bandwidth. 1TB of total memory.
               | 
               | You can run the full Deepseek 671b q8 model at 40
               | tokens/s. Q4 model at 80 tokens/s. 37B active params at a
               | time because R1 is MoE.
               | 
               | Linking 2 of these together let's you run a model more
               | capable (R1) than GPT4o at a comfortable speed at home.
               | That was simply fantasy a year ago.
        
             | atwrk wrote:
             | But 80Gbit/s is way slower than even regular dual channel
             | RAM, or am I missing something here? That would mean the
             | LLM would be excruciatingly slow. You could get an old EPYC
             | for a fraction of that price _and_ have more performance.
        
               | wmf wrote:
               | The weights don't go over the network so performance is
               | OK.
        
               | atwrk wrote:
               | If I'm not mistaken, each token produced roughly equals
               | the whole model in memory transfers (the exception being
               | MoE models). That's why memory bandwidth is so important
               | in the first place, or not?
        
               | wmf wrote:
               | My understanding is that if you can store 1/Nth of the
               | weights in RAM on each of the N nodes then there's no
               | need to send the weights over the network.
        
           | phonon wrote:
           | Thunderbolt 5 can do bi-directional 80 Gbps....and Mac Studio
           | Ultra has 6 ports...
        
             | cibyr wrote:
             | That's still not even competitive with 100G Ethernet on a
             | per-port basis. An overall bandwidth of 480 Gbps pales in
             | comparison with, for example, the 3200 Gbps you get with a
             | P5 instance on EC2.
        
               | nyrikki wrote:
               | To add to this GPU servers like supermicro have a 400GBe
               | port per GPU plus more for the CPU.
        
               | kridsdale1 wrote:
               | Cost competitive though?
        
               | phonon wrote:
               | A 3 year reservation of a P5 is over a million dollars
               | though? Not sure how that's comparable....
        
         | FloatArtifact wrote:
         | They didn't increase the memory bandwidth. You can get the same
         | memory bandwidth, which is available on the M2 Studio. Yes,
         | yes, of course you can get 512 gigabytes of uRAM for 10 grand.
         | 
         | The the question is if a llm will run with usable performance
         | at that scale? The point is there's diminishing returns despite
         | having enough uRAM with the same amount of memory bandwidth
         | even with increased processing speed of the new chip for AI.
         | 
         | So there must be a min-max performance ratio between memory
         | bandwidth and the size of the memory pool in relation to the
         | processing power.
        
           | cxie wrote:
           | Guess what? I'm on a mission to completely max out all 512GB
           | of mem...maybe by running DeepSeek on it. Pure greed!
        
             | swivelmaster wrote:
             | You could always just open a few Chrome tabs...
        
           | valine wrote:
           | Probably helps that models like deepseek are mixture of
           | expert. Having all weights in VRAM means you don't have to
           | unlod/reload. Memory bandwidth usage should be limited to the
           | 37B active parameters.
        
             | FloatArtifact wrote:
             | > Probably helps that models like deepseek are mixture of
             | expert. Having all weights in VRAM means you don't have to
             | unlod/reload. Memory bandwidth usage should be limited to
             | the 37B active parameters.
             | 
             | "Memory bandwidth usage should be limited to the 37B active
             | parameters."
             | 
             | Can someone do a deep dive above quote. I understand having
             | the entire model loaded into RAM helps with response times.
             | However, I don't quite understand the memory bandwidth to
             | active parameters.
             | 
             | Context window?
             | 
             | How much the model can actively be processed despite being
             | fully loaded into memory based on memory bandwidth?
        
               | valine wrote:
               | With a mixture of experts model you only need to read a
               | subset of the weights from memory to compute the output
               | of each layer. The hidden dimensions are usually smaller
               | as well so that reduces the size of the tensors you write
               | to memory.
        
               | bick_nyers wrote:
               | Just to add onto this point, you expect different experts
               | to be activated for every token, so not having all of the
               | weights in fast memory can still be quite slow as you
               | need to load/unload memory every token.
        
               | valine wrote:
               | Probably better to be moving things from fast memory to
               | faster memory than from slow disk to fast memory.
        
               | ein0p wrote:
               | What people who did not actually work with this stuff in
               | practice don't realize is the above statement only holds
               | for batch size 1, sequence size 1. For processing the
               | prompt you will need to read all the weights (which isn't
               | a problem, because prefill is compute-bound, which, in
               | turn is a problem on a weak machine like this Mac or an
               | "EPYC build" someone else mentioned). Even for inference,
               | batch size greater than 1 (more than one inference at a
               | time) or sequence size of greater than 1 (speculative
               | decoding), could require you to read the entire model,
               | repeatedly. MoE is beneficial, but there's a lot of
               | nuance here, which people usually miss.
        
               | doctorpangloss wrote:
               | Sure, nuance.
               | 
               | This is why Apple makes so much fucking money: people
               | will craft the wildest narratives about how they're going
               | to use this thing. It's part of the aesthetics of
               | spending $10,000. For every person who wants a solution
               | to the problem of running a 400b+ parameter neural
               | network, there are 19 who actually want an exciting
               | experience of buying something, which is what Apple
               | really makes. It has more in common with a Birkin bag
               | than a server.
        
               | rfoo wrote:
               | For decode, MoE is nice for either bs=1 (decoding for a
               | single user), or bs=<very large> (do EP to efficiently
               | serve a large amount of users).
               | 
               | Anything in between suffers.
        
               | valine wrote:
               | No one should be buying this for batch inference
               | obviously.
               | 
               | I remember right after OpenAI announced GPT3 I had a
               | conversation with someone where we tried to predict how
               | long it would be before GPT3 could run on a home desktop.
               | This mac studio that has enough VRAM to run the full 175B
               | parameter GPT3 with 16bit precision, and I think that's
               | pretty cool.
        
               | Der_Einzige wrote:
               | No one who is using this for home use cares about
               | anything except batch size 1 sequence size 1.
        
               | ein0p wrote:
               | What if you're doing bulk inference? The efficiency and
               | throughput of bs=1 s=1 is truly abysmal.
        
           | diggan wrote:
           | > The the question is if a llm will run with usable
           | performance at that scale?
           | 
           | This is the big question to have answered. Many people claim
           | Apple can now reliably be used as a ML workstation, but from
           | the numbers I've seen from benchmarks, the models may fit in
           | memory, but the performance for tok/sec is so slow to not
           | feel worth it, compared to running it on NVIDIA hardware.
           | 
           | Although it be expensive as hell to get 512GB of VRAM with
           | NVIDIA today, maybe moves like this from Apple could push
           | down the prices at least a little bit.
        
             | johnmaguire wrote:
             | It is much slower than nVidia, but for a lot of personal-
             | use LLM scenarios, it's very workable. And it doesn't need
             | to be anywhere near as fast considering it's really the
             | only viable (affordable) option for private, local
             | inference, besides building a server like this, which is no
             | faster: https://news.ycombinator.com/item?id=42897205
        
               | bastardoperator wrote:
               | It's fast enough for me to cancel monthly AI services on
               | a mac mini m4 max.
        
               | diggan wrote:
               | Could you maybe share a lightweight benchmark where you
               | share the exact model (+ quantization if you're using
               | that) + runtime + used settings and how much
               | tokens/second you're getting? Or just like a log of the
               | entire run with the stats, if you're using something like
               | llama.cpp, LMDesktop or ollama?
               | 
               | Also, would be neat if you could say what AI services you
               | were subscribed to, there is a huge difference between
               | paid Claude subscription and the OpenAI Pro subscription
               | for example, both in terms of cost and the quality of
               | responses.
        
               | fetus8 wrote:
               | How much RAM are you running on?
        
               | staticman2 wrote:
               | Smaller, dumber models are faster than bigger, slower
               | ones.
               | 
               | What model do you find fast enough and smart enough?
        
               | Matl wrote:
               | Not OP but I am finding the Qwen 2.5 32b distilled with
               | DeepSeek R1 model to be a good speed/smartness ratio on
               | the M4 Pro Mac Mini.
        
               | lostmsu wrote:
               | Hm, the AI services over 5 years cost half of m4 max
               | minimal configuration which can barely run severely
               | lobotomized LLaMA 70B. And they provide significantly
               | better models.
        
               | nomel wrote:
               | It's probably much worse than that, with the falling
               | prices of compute.
        
               | Matl wrote:
               | Sure, with something like Kagi you even get many models
               | to choose from for a relatively low price, but not
               | everybody likes to send over their codebase and documents
               | to OpenAI.
        
             | hangonhn wrote:
             | Do we know if is it slower because of hardware is not as
             | well suited for the task or is it mostly a software issue
             | -- the code hasn't been optimized to run on Apple Silicon?
        
               | titzer wrote:
               | AFAICT the neural engine has accelerators for CNNs and
               | integer math, but not the exact tensor operations in
               | popular LLM transformer architectures that are well-
               | supported in GPUs.
        
               | kridsdale1 wrote:
               | I have to assume they're doing something like that in the
               | lab for 4 years from now.
        
               | woadwarrior01 wrote:
               | The neural engine is perfectly capable of accelerating
               | matmults. It's just that autoregressive decoding in
               | single batch LLM inference is memory bandwidth
               | constrained, so there are no performance benefits to
               | using the ANE for LLM inference (although, there's a huge
               | power efficiency benefit). And the only way to use the
               | neural engine is via CoreML. Using the GPU with MLX or
               | MPS is often easier.
        
               | azinman2 wrote:
               | Memory bandwidth is the issue
        
           | TheRealPomax wrote:
           | Yeah they did? The M4 has a max memory bandwidth of 546GBps,
           | the M3 Ultra bumps that up to a max of 819GBps.
           | 
           | (and the 512GB version is $4,000 more rather than $10,000 -
           | that's still worth mocking, but it's nowhere _near_ as much)
        
             | okanesen wrote:
             | Not that dramatic of an increase actually - the M2 Max
             | already had 400GB/s and M2 Ultra 800GB/s memory bandwidth,
             | so the M3 Ultra's 819GB/s is just a modest bump. Though the
             | M4's additional 146GB/s is indeed a more noticeable
             | improvement.
        
               | choilive wrote:
               | Also should note that 800/819GB/s of memory bandwidth is
               | actually VERY usable for LLMs. Consider that a 4090 is
               | just a hair above 1000GB/s
        
               | hereonout2 wrote:
               | Does it work like that though at this larger scale? 512GB
               | of VRAM would be across multiple NVIDIA cards, so the
               | bandwidth and access is parallelized.
               | 
               | But here it looks more of a bottleneck from my
               | (admittedly naive) understanding.
        
               | choilive wrote:
               | For inference the bandwidth is generally not parallelized
               | because the weights need to go through the model layer by
               | layer. The most common model splitting method is done by
               | assigning each GPU a subset of the LLM layers and it
               | doesn't take much bandwidth to send model weights via
               | PCIE to the next GPU.
        
               | manmal wrote:
               | My understanding is that the GPU must still load its
               | assigned layer from VRAM into registers and L2 cache for
               | every token, because those aren't large enough to hold a
               | significant portion. So naively, for a 24GB layer, you'd
               | need to move up to 24GB for every token.
        
           | lhl wrote:
           | Since no one specifically answered your question yet, yes,
           | you should be able to get usable performance. A Q4_K_M GGUF
           | of DeepSeek-R1 is 404GB. This is a 671B MoE that "only" has
           | 37B activations per pass. You'd probably expect in the
           | ballpark of 20-30 tok/s (depends on how much actually MBW can
           | be utilized) for text generation.
           | 
           | From my napkin math, the M3 Ultra TFLOPs is still relatively
           | low (around 43 FP16 TFLOPs?), but it should be more than
           | enough to handle bs=1 token generation (should be way <10
           | FLOPs/byte for inference). Now as far is its prefill/prompt
           | processing speed... well, that's another matter.
        
             | drited wrote:
             | I would be curious about context window size that would be
             | expected when generating ballpark 20 to 20 tokens per
             | second using Deepseek-R1 Q4 on this hardware?
        
           | bob1029 wrote:
           | > The question is if a llm will run with usable performance
           | at that scale?
           | 
           | For the self-attention mechanism, memory bandwidth
           | requirements scale ~quadratically with the sequence length.
        
             | kridsdale1 wrote:
             | Someone has got to be working on a better method than that.
             | Hundreds of billions are at stake.
        
           | deepGem wrote:
           | Any idea what the sRAM to uRAM ratio is on these new GPUs ?
           | If they have meaningfully higher sRAM than the Hopper GPUs,
           | it could lead to meaningful speedups in large model training.
           | 
           | If they didn't increase the memory bandwidth, then 512GB will
           | enable longer context lengths and that's about it right? No
           | speedups
           | 
           | For any speedups You may need some new variant of
           | FlashAttention3 or something along similar lines to be
           | purpose built for Apple GPUs.
        
         | dheera wrote:
         | It will cost 4X what it costs to get 512GB on an x86 server
         | motherboard.
        
           | smith7018 wrote:
           | You can build an x86 machine that can fully run DeepSeek R1
           | with 512GB VRAM for ~$2,500?
        
             | ta988 wrote:
             | You will have to explain to me how.
        
               | johnmaguire wrote:
               | https://news.ycombinator.com/item?id=42897205
        
               | bmelton wrote:
               | https://digitalspaceport.com/how-to-run-
               | deepseek-r1-671b-ful...
        
               | muricula wrote:
               | Is that a CPU based inference build? Shouldn't you be
               | able to get more performance out of the M3's GPU?
        
             | hbbio wrote:
             | How would you compare the tok/sec between this setup and
             | the M3 Max?
        
               | aurareturn wrote:
               | 3.5 - 4.5 tokens/s on the $2,000 AMD Epyc setup. Deepseek
               | 671b q4.
               | 
               | The AMD Epyc build is severely bandwidth and compute
               | constrained.
               | 
               | ~40 tokens/s on M3 Ultra 512GB by my calculation.
        
               | sgt wrote:
               | What kind of Nvidia-based rig would one need to achieve
               | 40 tokens/sec on Deepseek 671b? And how much would it
               | cost?
        
               | aurareturn wrote:
               | Around 5x Nvidia A100 80GB can fit 671b Q4. $50k just for
               | the GPUs and likely much more when including cooling,
               | power, motherboard, CPU, system RAM, etc.
        
               | sgt wrote:
               | So the M3 Ultra is amazing value then. And from what I
               | could tell, an equivalent AMD Epyc would still be so
               | constrained that we're talking 4-5 tokens/s. Is this a
               | fair assumption?
        
               | Aeolun wrote:
               | The Epyc would only set you back $2000 though, so it's
               | only a slightly worse price/return.
        
               | SkiFire13 wrote:
               | How many tokens/s would that be though?
        
               | hbbio wrote:
               | Thanks!
               | 
               | If the M3 can run 24/7 without overheating it's a great
               | deal to run agents. Especially considering that it should
               | run only using 350W... so roughly $50/mo in electricity
               | costs.
        
           | valine wrote:
           | What would it cost to get 512GB of VRAM on an Nvidia card?
           | That's the real comparison.
        
             | dheera wrote:
             | Apples to oranges. NVIDIA cards have an order of magnitude
             | more horsepower for compute than this thing. A B100 has 8
             | TB/s of memory bandwidth, 10 times more than this. If
             | NVIDIA made a card with 512GB of HBM I'd expect it to cost
             | $150K.
             | 
             | The compute and memory bandwidth of the M3 Ultra is more
             | in-line with what you'd get from a Xeon or
             | Epyc/Threadripper CPU on a server motherboard; it's just
             | that the x86 "way" of doing things is usually to attach a
             | GPU for way more horsepower rather than squeezing it out of
             | the CPU.
             | 
             | This will be good for local LLM inference, but not so much
             | for training.
        
               | LeifCarrotson wrote:
               | Yep, it's apples to oranges. But sometimes you want
               | apples, and sometimes you want oranges, so it's all good!
               | 
               | There's a wide spectrum of potential requirements between
               | memory capacity, memory bandwidth, compute speed, compute
               | complexity, and compute parallelism. In the past, a few
               | GB was adequate for tasks that we assigned to the GPU,
               | you had enough storage bandwidth to load the relevant
               | scene into memory and generate framebuffers, but now
               | we're running different workloads. Conversely, a big
               | database server might want its entire contents to be
               | resident in many sticks of ECC DIMMs for the CPU, but
               | only needed a couple dozen x86-64 threads. And if your
               | workload has many terabytes or petabytes of content to
               | work with, there are network file systems with entirely
               | different bandwidth targets for entire racks of
               | individual machines to access that data at far slower
               | rates.
               | 
               | There's a lot of latency between the needs of programmers
               | and the development and shipping of hardware to satisfy
               | those needs, I'm just happy we have a new option on that
               | spectrum somewhere in the middle of traditional CPUs and
               | traditional GPUs.
               | 
               | As you say, if Nvidia made a 512 GB card it would cost
               | $150k, but this costs an order of magnitude less than
               | that. Even high-end consumer cards like a 5090 have 16x
               | less memory than this does (average enthusiasts on
               | desktops have maybe 8 GB) and just over double the
               | bandwidth (1.7 TB/s).
               | 
               | Also, nit pick FTA:
               | 
               | > _Starting at 96GB, it can be configured up to 512GB, or
               | over half a terabyte._
               | 
               | 512 GB is exactly half of a terabyte, which is 1024 GB.
               | It's too late for hard drives - the marketing departments
               | have redefined storage to use multipliers of 1000 and
               | invented "tebibytes" - but in memory we still work with
               | powers of two. Please.
        
               | dheera wrote:
               | Sure, if you want to do training get an NVIDIA card. My
               | point is that it's not worth comparing either Mac or CPU
               | x86 setup to anything with NVIDIA in it.
               | 
               | For inference setups, my point is that instead of paying
               | $10000-$15000 for this Mac you could build an x86 system
               | for <$5K (Epyc processor, 512GB-768GB RAM in 8-12
               | channels, server mobo) that does the same thing.
               | 
               | The "+$4000" for 512GB on the Apple configurator would be
               | "+$1000" outside the Apple world.
        
               | KingOfCoders wrote:
               | But this is how it wonderfully works. +$4000 does two
               | things: 1. Make Apple very very rich 2. Make people think
               | this is better than a $10k EPYC. Win-Win for Apple. At
               | the point when you have convinced that you are the best,
               | higher price just means people think you are even better.
        
               | egorfine wrote:
               | > we still work with powers of two. Please.
               | 
               | We do. Common people don't. It's easier to write "over
               | half a terabyte" than explain (again) to millions of
               | people what the power of two is.
        
               | johnklos wrote:
               | Anyone who calls 512 gigs "over half a terabyte" is
               | bullshitting. No, thank you.
        
               | egorfine wrote:
               | Wasn't me.
        
               | egorfine wrote:
               | ...aaand I'm being downvoted for pointing out apple's
               | language and possible reason for its obvious factual
               | incorrectness...
        
               | pklausler wrote:
               | This prompts an "old guy anecdote"; forgive me.
               | 
               | When I was much younger, I got to work on compilers at
               | Cray Computer Corp., which was trying to bring the Cray-3
               | to market. (This was basically a 16-CPU Cray-2
               | implemented with GaAs parts; it never worked reliably.)
               | 
               | Back then, HPC performance was measured in mere
               | megaflops. And although the Cray-2 had peak performance
               | of nearly 500MF/s/CPU, it was really hard to attain,
               | since its memory bandwidth was just 250M words/s/CPU
               | (2GB/s/CPU); so you had to have lots of operand re-use to
               | not be memory-bound. The Cray-3 would have had more
               | bandwidth, but it was split between loads and stores, so
               | it was still quite a ways away from the competing Cray
               | X-MP/Y-MP/C-90 architecture, which could load two words
               | per clock, store one, and complete an add and a multiply.
               | 
               | So I asked why the Cray-3 didn't have more read bandwidth
               | to/from memory, and got a lesson from the answer that has
               | stuck. You could actually _see_ how much physical
               | hardware in that machine was devoted to the CPU /memory
               | interconnect, since the case was transparent -- there was
               | a thick nest of tiny blue & white twisted wire pairs
               | between the modules, and the stacks of chips on each CPU
               | devoted to the memory system were a large proportion of
               | the total. So the memory and the interconnect constituted
               | a surprising (to me) majority of the machine. Having more
               | floating-point performance in the CPUs than the memory
               | could sustain meant that the memory system was
               | oversubscribed, and that meant that more of the machine
               | was kept fully utilized. (Or would have been, had it
               | worked...)
               | 
               | In short, don't measure HPC systems with just flops.
               | Measure the effective bandwidth over large data, and make
               | sure that the flops are high enough to keep it utilized.
        
             | zitterbewegung wrote:
             | Since the GH200 has over a terabyte of VRAM at $343,000 and
             | the H100 has 80GB that makes that $195,993 with a bit over
             | 512GB of VRAM . You could beat the price of the Apple M3
             | Ultra with an AMD EPYC build.
        
             | bick_nyers wrote:
             | About $12k when Project Digits comes out.
        
               | valine wrote:
               | That will only have 128GB of unified memory
        
               | dragonwriter wrote:
               | 128GB for 3K; per the announcement their ConnectX
               | networking allows two Project Digits devices to be
               | plugged into eachother and work together as one device
               | giving you 256GB for $6k, and, AFAIK, existing frameworks
               | can split models across devices, as well, hence,
               | presumably, the upthread suggestion that Project Digits
               | would provide 512GB for $12k, though arguably the last
               | step is cheating.
        
               | justincormack wrote:
               | the reason Nvidia only talk about two machines over the
               | network is I think they only have one network port, so
               | you need to add costs for a switch.
        
               | bick_nyers wrote:
               | If you want to split tensorwise yes. Layerwise splits
               | could go over Ethernet.
               | 
               | I would be interested to see how feasible hybrid
               | approaches would be, e.g. connect each pair up directly
               | via ConnectX and then connect the sets together via
               | Ethernet.
        
           | matt-p wrote:
           | Not really like for like.
           | 
           | The pricing isn't as insane as you'd think, 96 to 256GB is
           | 1500 which isn't 'cheap' but, it could be worse.
           | 
           | All in 5,500 gets you a ultra with 256GB memory, 28 cores, 60
           | GPU cores, 10Gb network - I think you'd be hard pushed to
           | build a server for less.
        
             | kllrnohj wrote:
             | 5,500 easily gets me either vastly more CPU cores if I care
             | more about that or a vastly faster GPU if I care more about
             | that. Or for both a 9950x + 5090 (assuming you can actually
             | find one in stock) is ~$3000 for the pair + motherboard,
             | leaving a solid $2500 for whatever amount of RAM, storage,
             | and networking you desire.
             | 
             | The M3 strikes a very particular middle ground for AI of
             | lots of RAM but a significantly slower GPU which nothing
             | else matches, but that also isn't inherently the _right_
             | balance either. And for any other workloads, it 's quite
             | expensive.
        
               | seanmcdirmid wrote:
               | You'll need a couple of 32GB 5090s to run a quantized 70B
               | model, maybe 4 to run a 70b model without quantization,
               | forget about anything larger than that. A huge model
               | might run slow on a M3 Ultra, but at least you can run it
               | all.
               | 
               | I have a Max M3 (the non-binned one), and I feel like
               | 64GB or 96GB is within the realm of enabling LLMs that
               | run reasonable fast on it (it is also a laptop, so I can
               | do things on planes or trips). I thought about the Ultra,
               | if you have 128GB for a top line M3 Ultra, the models
               | that you could fit into memory would run fairly fast. For
               | 512GB, you could run the bigger models, but not very
               | quickly, so maybe not much point (at least for my use
               | cases).
        
               | matt-p wrote:
               | That config would also use about 10x the power, and you
               | still wouldn't be able to run a model over 32GB whereas
               | the studio can easily cope with 70B llama and plenty of
               | space to grow.
               | 
               | I think it actually is perfect for local inference in a
               | way that build or any other pc build in this price range
               | would be.
        
               | kllrnohj wrote:
               | The M3 Ultra studio also wouldn't be able to run path
               | traced Cyberpunk at all no matter how much RAM it has.
               | Workloads other than local inference LLMs exist, you know
               | :) After all, if the only thing this was built to do was
               | run LLMs then they wouldn't have bothered adding so many
               | CPU cores or video engines. CPU cores (along with
               | networking) being 2 of the specs highlighted by the
               | person I was responding to, so they were obviously
               | valuing more than _just_ LLM use cases.
        
               | kridsdale1 wrote:
               | The core customer market for this thing remains Video
               | Editors. That's why they talk about simultaneous 8K
               | encoding streams.
               | 
               | Apple's Pro segment has been video editors since the 90s.
        
           | AnthonBerg wrote:
           | That's not going to yield the same bandwidth or memory
           | latency though, right?
        
             | rbanffy wrote:
             | You'd need a chip with 8 memory channels. 16 DIMM slots,
             | IIRC.
        
         | amelius wrote:
         | Why does it matter if you can run the LLM locally, if you're
         | still running it on someone else's locked down computing
         | platform?
        
           | PeterStuer wrote:
           | Running locally, your data is not sent outside of your
           | security perimeter off to a remote data center.
           | 
           | If you are going to argue that the OS or even below that the
           | hardware could be compromised to still enable exfiltration,
           | that is true, but it is a whole different ballgame from using
           | an external SaaS no matter what the service guarantees.
        
         | tempest_ wrote:
         | Nvidia has had the Grace Hoppers for a while now. Is this not
         | like that?
        
           | ykl wrote:
           | This is cheap compared to GB200, which has a street price of
           | >$70k for just the chip alone if you can even get one. Also
           | GB200 technically has only 192GB per GPU and access to more
           | than that happens over NVLink/RDMA, whereas here it's just
           | one big flat pool of unified memory without any tiered access
           | topology.
        
             | rbanffy wrote:
             | We finally encountered the situation where an Apple
             | computer is cheaper than its competition ;-)
             | 
             | All joking aside, I don't think Apples are that expensive
             | compared to similar high-end gear. I don't think there is
             | any other compact desktop computer with half a terabyte of
             | RAM accessible to the GPU.
        
               | kridsdale1 wrote:
               | And yet all that cash still just goes to TSMC
        
         | TheRealPomax wrote:
         | I think the other big thing is that the base model finally
         | starts at a normal amount of memory for a production machine.
         | You can't get less than 96GB. Although an extra $4000 for the
         | 512GB model seems Tim Apple levels of ridiculous. There is
         | absolutely no way that the different costs anywhere near that
         | much at the fab.
         | 
         | And the storage solution still makes no sense of course, a
         | machine like this should start at 4TB for $0 extra, 8TB for
         | $500 more, and 16TB for $1000 more. Not start at a useless 1TB,
         | with the 8TB version costing an extra $2400 and 16TB a truly
         | idiotic $4600. If Sabrent can make and sell 8TB m.2 NVMe drives
         | for $1000, SoC storage should set you back half that, not over
         | double that.
        
           | jjtheblunt wrote:
           | > There is absolutely no way that the different costs
           | anywhere near that much at the fab.
           | 
           | price premium probably, but chip lithography errors (thus,
           | yields) at the huge memory density might be partially driving
           | up the cost for huge memory.
        
             | TheRealPomax wrote:
             | It's Apple, price premium is a given.
        
         | PeterStuer wrote:
         | Is this on chip memory? From the 800GB/s I would guess more
         | likely a 512bit bus (8 channel) to DDR5 modules. Doing it on a
         | quad channel would _just_ about be possible, but really be
         | pushing the envelope. Still a nice thing.
         | 
         | As for practicality, which mainstream applications would
         | benefit from this much memory paired with a nice but relative
         | mid compute? At this price-point (14K for a full specced
         | system), would you prefer it over e.g. a couple of NVIDIA
         | project DIGITS (assuming that arrives on time and for around
         | the announced the 3K price-point)?
        
           | zitterbewegung wrote:
           | NVIDIA project DIGITS has 128 GB LPDDR5x coherent unified
           | system memory at a 273 Gb/s memory bus speed.
        
             | bangaladore wrote:
             | It would be 273 GB/s (gigabytes, not gigabits). But in
             | reality we don't know the bandwidth. Some ex employee said
             | 500 GB/s.
             | 
             | You're source is a reddit post in which they try to match
             | the size to existing chips, without realizing that its very
             | likely that NVIDIA is using custom memory here produced by
             | Micron. Like Apple uses custom memory chips.
        
         | samstave wrote:
         | "unified memory"
         | 
         | funny that people think this is so new, when CRAY had Global
         | Heap eons ago...
        
           | ddtaylor wrote:
           | Why did it take so long for us to get here?
        
             | baby_souffle wrote:
             | Just a guess, but fabricating this can't be easy. Yield is
             | probably higher if you have less memory per chip.
        
             | RachelF wrote:
             | Some possible groups of reasons: 1. Until recently RAM
             | amount was something the end user liked to configure, so
             | little market demand. 2. Technically, building such a large
             | system on a chip or collection of chiplets was not
             | possible. 3. RAM speed wasn't a bottleneck for most tasks,
             | it was IO or CPU. LLMs changed this.
        
               | hot_gril wrote:
               | M1 came out before the LLM rush, though
        
           | webworker wrote:
           | The real hardware needed for artificial intelligence wasn't
           | NVIDIA, it was a CRAY XMP from 1982 all along
        
           | hot_gril wrote:
           | It's new for mainstream PCs to have it.
        
       | daft_pink wrote:
       | Really? M4 Max or M3 Ultra instead of M4 Ultra?
        
         | aurareturn wrote:
         | With an M3 Ultra going into the Mac Studio, Apple could
         | differentiate from the Mac Pro, which could then get the M4
         | Ultra. Right now, the Mac Studio and Mac Pro oddly both have
         | the M2 Ultra and same overall performance.
         | 
         | https://x.com/markgurman/status/1896972586069942738
        
       | cynicalpeace wrote:
       | Can someone explain what it would take for Apple to overtake
       | NVIDIA as the preferred solution for AI shops?
       | 
       | This is my understanding (probably incorrect in some places)
       | 
       | 1. NVIDIA's big advantage is that they design the hardware
       | (chips) _and_ software (CUDA). But Apple also designs the
       | hardware (chips) _and_ software (Metal and MacOS).
       | 
       | 2. CUDA has native support by AI libraries like PyTorch and
       | Tensorflow, so works extra well during training and inference. It
       | seems Metal is well supported by PyTorch, but not well supported
       | by Tensorflow.
       | 
       | 3. NVIDIA uses Linux rather than MacOS, making it easier in
       | general to rack servers.
        
         | bigyabai wrote:
         | It's still boiling down to hardware and software differences.
         | 
         | In terms of hardware - Apple designs their GPUs for GPU
         | workloads, whereas Nvidia has a decades-old lead on optimizing
         | for general-purpose compute. They've gotten really good at
         | pipelining and keeping their raster performance competitive
         | while also accelerating AI and ML. Meanwhile, Apple is
         | directing most of their performance to just the raster stuff.
         | They _could_ pivot to an Nvidia-style design, but that would be
         | pretty unprecedented (even if a seemingly correct decision).
         | 
         | And then there's CUDA. It's not really appropriate to compare
         | it to Metal, both in feature scope and ease of use. CUDA has
         | expansive support for AI/ML primatives and deeply integrated
         | tensor/SM compute. Metal _does_ boast some compute features,
         | but you 're expected to write most of the support yourself in
         | the form of compute shaders. This is a pretty radical departure
         | from the pre-rolled, almost "cargo cult" CUDA mentality.
         | 
         | The Linux shtick matters a tiny bit, but it's mostly a matter
         | of convenience. If Apple hardware started getting competitive,
         | there would be people considering the hardware regardless of
         | the OS it runs.
        
           | cynicalpeace wrote:
           | > keeping their raster performance competitive while also
           | accelerating AI and ML. Meanwhile, Apple is directing most of
           | their performance to just the raster stuff. They could pivot
           | to an Nvidia-style design, but that would be pretty
           | unprecedented (even if a seemingly correct decision).
           | 
           | Isn't Apple also focusing on the AI stuff? How has it not
           | already made that decision? What would prevent Apple from
           | making that decision?
           | 
           | > Metal does boast some compute features, but you're expected
           | to write most of the support yourself in the form of compute
           | shaders. This is a pretty radical departure from the pre-
           | rolled, almost "cargo cult" CUDA mentality.
           | 
           | Can you give an example of where Metal wants you to write
           | something yourself whereas CUDA is pre-rolled?
        
             | bigyabai wrote:
             | > Isn't Apple also focusing on the AI stuff?
             | 
             | Yes, but not with their GPU architecture. Apple's big bet
             | was on low-power NPU hardware, assuming the compute cost of
             | inference would go down as the field progressed. This was
             | the wrong bet - LLMs and other AIs have scaled _up_ better
             | than they scaled down.
             | 
             | > How has it not already made that decision? What would
             | prevent Apple from making that decision?
             | 
             | I mean, for one, Apple is famously stubborn. They're the
             | last ones to admit they're wrong whenever they make a
             | mistake, presumably admitting that the NPU is wasted
             | silicon would be a mea-culpa for their AI stance. It's also
             | easier to wait for a new generation of Apple Silicon to
             | overhaul the architecture, rather than driving a
             | generational split as soon as the problem is identified.
             | 
             | As for what's _preventing_ them, I don 't think there's
             | anything insurmountable. But logically it might not make
             | sense to adopt Nvidia's strategy even if it's better. Apple
             | can't neccessarily block Nvidia from buying the same nodes
             | they get from TSMC, so they'd have to out-design Nvidia if
             | they wanted to compete on their merits. Even then, since
             | Apple doesn't support OpenCL it's not guaranteed that they
             | would replace CUDA. It would just be another proprietary
             | runtime for vendors to choose from.
             | 
             | > Can you give an example of where Metal wants you to write
             | something yourself whereas CUDA is pre-rolled?
             | 
             | Not exhaustively, no. Some of them are performance-
             | optimized kernels like cuSPARSE, some others are primative
             | sets like cuDNN, others yet are graph and signal processing
             | libraries with built-out support for industrial
             | applications.
             | 
             | To Apple's credit, they've definitely started hardware-
             | accelerating the important stuff like FFT and ray tracing.
             | But Nvidia still has a decade of lead time that Apple spent
             | shopping around with AMD for other solutions. The head-
             | start CUDA has is so great that I don't think Apple can
             | seriously respond unless the executives light a fire under
             | their ass to make some changes. It will be an "immovable
             | rock versus an unstoppable force" decision for Apple's
             | board of directors.
        
       | fintechie wrote:
       | IMO this is a bigger blow to the AI big boys than Deepseek's
       | release. This is massive for local inference. Exciting times
       | ahead for open source AI.
        
         | whimsicalism wrote:
         | it is absolutely not
        
         | kcb wrote:
         | The market for local inference and $10k+ Macs is not nearly
         | significant enough to effect the big boys.
        
         | bigyabai wrote:
         | I don't think you understand what the "AI big boys" are in the
         | market for.
        
       | rjeli wrote:
       | Wow, incredible. I told myself I'd stop waffling and just buy the
       | next 800gb/s mini or studio to come out, so I guess I'm getting
       | this.
       | 
       | Not sure how much storage to get. I was floating the idea of
       | getting less storage, and hooking it up to a TB5 NAS array of
       | 2.5" SSDs, 10-20tb for models + datasets + my media library would
       | be nice. Any recommendations for the best enclosure for that?
        
         | kridsdale1 wrote:
         | It depends on your bandwidth needs.
         | 
         | I also want to build the thing you want. There are no multi SSD
         | M2 TB5 bays. I made one that holds 4 drives (16TB) at TB3 and
         | even there the underlying drives are far faster than the cable.
         | 
         | My stuff is in OWC Express 4M2.
        
           | perfmode wrote:
           | Are you running RAID?
        
       | Sharlin wrote:
       | > it can be configured up to 512GB, or over half a terabyte.
       | 
       | Hah, I see what they did there.
        
         | kridsdale1 wrote:
         | If they added 1 byte, it counts.
        
       | aurareturn wrote:
       | You can run the full Deepseek 671b q4 model at 40 tokens/s. 37B
       | active params at a time because R1 is MoE.
        
         | KingOfCoders wrote:
         | In another of your comments it was "by my calculation". Now
         | it's just fact?
        
       | screye wrote:
       | How does the 500gb vram compare with 8xA100s ? ($15/hr rentals)
       | 
       | If it is equivalent, then the machine pays for itself in 300
       | hours. That's incredible value.
        
         | awestroke wrote:
         | A100 has 10x or so higher mem bandwidth
        
           | egorfine wrote:
           | Per nvidia [1] A100 has memory bandwidth up to 2,039. So not
           | 10x, more like 2x.
           | 
           | [1] https://www.nvidia.com/content/dam/en-zz/Solutions/Data-
           | Cent...
        
       | ummonk wrote:
       | Is the Mac Pro dead or are they waiting for M4 Ultra refresh it?
        
       | ozten wrote:
       | We've come a long way since beowulf clusters of smart toasters.
        
       | perfmode wrote:
       | 32 core, 512GB RAM, 8TB SSD
       | 
       | please take my money now
        
       | raydev wrote:
       | I know it's basically nitpicking competing luxury sports cars at
       | this point, but I am very bothered that existing benchmarks for
       | the M3 show single core perf that is approximately 70% of M4
       | single core perf.
       | 
       | I feel like I should be able to spend all my money to both get
       | the fastest single core performance AND all the cores and
       | available memory, but Apple has decided that we need to downgrade
       | to "go wide". Annoying.
        
         | xp84 wrote:
         | > both get the fastest single core performance AND all the
         | cores
         | 
         | I'm a major Apple skeptic myself, but hasn't there always been
         | a tradeoff between "fastest single core" vs "lots of cores"
         | (and thus best multicore)?
         | 
         | For instance, I remember when you could buy an iMac with an i9
         | or whatever, with a higher clock speed and faster single core,
         | or you could buy an iMac Pro with a Xeon with more cores, but
         | the iMac (non-Pro) would beat it in a single core benchmark.
         | Note: Though I used Macs as the example due to the simple
         | product lines, I thought this was pretty much universal among
         | all modern computers.
        
           | raydev wrote:
           | > hasn't there always been a tradeoff between "fastest single
           | core" vs "lots of cores" (and thus best multicore)?
           | 
           | Not in the Apple Silicon line. The M2 Ultra has the same
           | single core performance as the M2 Max and Pro. No benchmarks
           | for the M3 Ultra yet but I'm guessing the same vs M3 Max and
           | Pro.
        
             | xp84 wrote:
             | Okay, good to know. Interesting change then.
        
               | LPisGood wrote:
               | I think the traditional reason for this is that other
               | chips like to use complex scheduling logic to have more
               | logical cores than physical cores. This costs single
               | threaded speed but allows you to run more threads faster.
        
       | 1attice wrote:
       | Now with Ultra-class backdoors?
       | https://news.ycombinator.com/item?id=43003230
        
       | ein0p wrote:
       | That's all nice, but if they are to be considered a serious AI
       | hardware player, they will need to invest in better support of
       | their hardware in deep learning frameworks such as PyTorch and
       | Jax. Currently the support is rather poor, and is not suitable
       | for any serious work.
        
       | divan wrote:
       | Model with 512GB VRAM costs $9500, if anyone wonders.
        
       | m3kw9 wrote:
       | Instantly hippa compliant high end models running locally.
        
       | mlboss wrote:
       | $14K with 512gb memory and 16 Tb storage
        
         | maverwa wrote:
         | I cannot believe I'm saying that, but: for apple that's rather
         | cheap. Threadripper boxes with that amount of memory do not
         | come a lot cheaper. Considering what apples pricing when it
         | comes to memory in other devices, 4K for the 96GB to 512GB
         | upgrade is a bargain.
        
           | jltsiren wrote:
           | It's not that much cheaper that with earlier comparable
           | models. Apple memory prices have been $25/GB for the base and
           | Pro chips and $12.5/GB for the Max and Ultra chips. With the
           | new Studios, we get $12.5/GB until 128 GB and $9.375/GB
           | beyond that.
           | 
           | If you configure a Threadripper workstation at Puget Systems,
           | memory price seems to be ~$6/GB. Except if you use 128 GB
           | modules, which are almost $10/GB. You can get 768 GB for a
           | Threadripper Pro cheaper than 512 GB for a Threadripper, but
           | the base cost of a Pro system is much higher.
        
       | alok-g wrote:
       | Two questions for the fellow HNers:
       | 
       | 1. What are various average joe (as opposed to researchers, etc.)
       | use cases for running powerful AI models locally vs. just using
       | cloud AI. Privacy of course is a benefit, but it by itself may
       | not justify upgrades for an average user. Or are we expecting
       | that new innovation will lead to much more proliferation of AI
       | and use cases that will make running locally more feasible?
       | 
       | 2. With the amount of memory used jumping up, would there be a
       | significant growth for companies making memories? If so, which
       | ones would be the best positioned?
       | 
       | Thanks.
        
         | christiangenco wrote:
         | IMO it's all about privacy. Perhaps also availability if the
         | main LLM providers start pulling shenanigans but it seems like
         | that's not going to be a huge problem with how many big players
         | are in the space.
         | 
         | I think a great use case for this would be in a company that
         | doesn't want all of their employees sending LLM queries about
         | what they're working on outside the company. Buy one or two of
         | these and give everybody a client to connect to it and hey
         | presto you've got a secure private LLM everybody in the company
         | can use while keeping data private.
        
           | chamomeal wrote:
           | I'll add to this that while I couldn't care less about open
           | AI seeing my general coding questions, I wouldn't run actual
           | important data through ChatGPT.
           | 
           | With a local model, I could toss anything in there. Database
           | query outputs, private keys, stuff like that. This'll
           | probably become more relevant as we give LLM's broader use
           | over certain systems.
           | 
           | Like right now I still mostly just type or paste stuff into
           | ChatGPT. But what about when I have a little database copilot
           | that needs to read query results, and maybe even run its own
           | subset of queries like schema checks? Or some open source
           | computer-use type thingy needs to click around in all sorts
           | of places I don't want openAI going, like my .env or my bash
           | profile? That's the kinda thing I'd only use a local model
           | for
        
           | user3939382 wrote:
           | Hopefully homomorphic encryption can solve this rather than
           | building a new hardware layer everywhere.
        
         | piotrpdev wrote:
         | 1. Lower latency for real time tasks e.g. transcription +
         | translation?
        
         | archagon wrote:
         | I don't currently use AIs, but if I did, they would be local.
         | Simply put: I can't build my professional career around tools
         | that I do not own.
        
           | alok-g wrote:
           | >> ... around tools that I do not own.
           | 
           | That just may be dependent on how much trust you have on the
           | providers you use. Or do you do your own electricity
           | generation?
        
             | archagon wrote:
             | That's quite a reductio ad absurdum. No, I don't generate
             | my own electricity (though I could). But I don't use tools
             | for work that can change out from under me at any moment,
             | or that can increase 10x in price on a corporate whim.
        
               | hobofan wrote:
               | And why would that require running AI models locally? You
               | can be in essentially full control by using open source
               | (/open weight) models (DeepSeek etc.) running on
               | exchangable cloud providers that are as replaceable as
               | your electricity provider.
        
               | archagon wrote:
               | Sure, I guess you can do that as long as you use an open
               | weight model. (Offline support is a nice perk, however.)
        
               | alok-g wrote:
               | We align.
               | 
               | I tend to do the same thing. I do not consider myself as
               | a good representative of an average user though.
        
         | zamalek wrote:
         | I don't think there's a huge use-case locally, if you're happy
         | with the subscription cost and privacy. That is, yet. Give it
         | maybe 2 years and someone will probably invent something which
         | local inference would seriously benefit from. I'm anticipating
         | inference for the home appliances (something mac mini form
         | factor that plugs into your router) _but_ that 's based on what
         | would make logical sense for consumers, not what consumers
         | would fall for.
         | 
         | Apple seems to be using LPDDR, but HBM will also likely be a
         | key tech. SK Hynix and Samsung are the most reputable for both.
        
           | alok-g wrote:
           | Thanks.
           | 
           | >> Apple seems to be using LPDDR, but HBM will also likely be
           | a key tech. SK Hynix and Samsung are the most reputable for
           | both.
           | 
           | So not much Micron? Any US based stocks to invest in? :-)
        
             | zamalek wrote:
             | I forgot about Micron, absolutely. TSMC is the supplier for
             | all of these, so you're covering both memory and compute if
             | that's your strategy (the risk is that US TSMC is over
             | provisioning manufacturing based on the pandemic hardware
             | boom).
        
               | alok-g wrote:
               | Thanks!
        
         | theshrike79 wrote:
         | For 1: censorship
         | 
         | A local model will do anything you ask it to, as far as it
         | "knows" about it. It doesn't need to please investors or be
         | afraid of bad press.
         | 
         | LM Studio + a group of select models from huggingface and you
         | can do whatever you want.
         | 
         | For generic coding assistance and knowledge, online services
         | are still better quality.
        
         | JadedBlueEyes wrote:
         | One important one that I haven't seen mentioned is simply
         | working without an internet connection. It was quite important
         | for me when I was using AI whilst travelling through the
         | countryside, where there is very limited network access.
        
       | epolanski wrote:
       | Can anybody ELI5 why aren't there multi gpu builds to run LLMs
       | locally?
       | 
       | It feels like one should be able to build a good machine for 3/4k
       | if not less with 6 16GB mid level gaming GPUs.
        
         | snitty wrote:
         | Reddit's LocalLLama has a lot of these. 3090s are pretty
         | popular for these purposes. But they're not trivial to build
         | and run at home. Among other issues are that you're drawing
         | >1kW for just the GPUs if you have four of them at 100% usage.
        
         | risho wrote:
         | 6 * 16 is still nowhere near 512gb of vram. On top of that that
         | monster that you create requires hyper specific server grade
         | hardware, will be huge, loud and pull down enough power to trip
         | a circuit breaker. i'm sure most people would rather pay a 30
         | percent premium to get twice the ram and have a power sipping
         | device that you can hold in the palm of your hand.
        
       | crowcroft wrote:
       | Kinda curious to see how man tok/sec it can crush. Could be a fun
       | way to host AI apps.
        
       | joshhart wrote:
       | This is pretty exciting. Now an organization could produce an
       | open weights mixture of experts model that has 8-15b active
       | parameters but could still be 500b+ parameters and it could be
       | run locally with INT4 quantization with very fast performance.
       | DeepSeek R1 is a similar model but over 30b active parameters
       | which makes it a little slow.
       | 
       | I do not have a good sense of how well quality scales with narrow
       | MoEs but even if we get something like Llama 3.3 70b in quality
       | at only 8b active parameters people could do a ton locally.
        
       | wewewedxfgdf wrote:
       | Computers these days - the more appealing, exciting, cooler
       | desirable, the higher the price, into the stratosphere.
       | 
       | $9499
       | 
       | What ever happening to competition in computing?
       | 
       | Computing hardware competition used to be cut throat, drop dead,
       | knife fight, last man standing brutally competitive. Now it's
       | just a massive gold rush cash grab.
        
         | hu3 wrote:
         | It doesn't even run Linux properly.
         | 
         | Could cost half of that and it would still be uninteresting for
         | my use cases.
         | 
         | For AI, on-demand cloud processing is magnitudes better in
         | speed and software compatibility anyway.
        
         | niek_pas wrote:
         | The Macintosh plus, released in 1986, cost $2600 at the time,
         | or $7460 adjusted for inflation.
        
           | bigyabai wrote:
           | It even came with an official display! Nowadays that's a
           | $1,600-$6,000 accessory, depending on whether you own a VESA
           | mount.
        
       | martin_a wrote:
       | > Apple today announced M3 Ultra, the highest-performing chip it
       | has ever created
       | 
       | Well, duh, it would be a shame if you made a step backwards,
       | wouldn't it? I hate that stupid phrase...
        
       | wewewedxfgdf wrote:
       | The good news is that AMD and Intel are both in good positions to
       | develop similar products.
        
       | dangoodmanUT wrote:
       | 800GB/s and 512 unified ram is going to go stupid for llms
        
       | minton wrote:
       | + $4,000 to bump to 512GB from 96GB.
        
       | ldng wrote:
       | Well, a shame for Apple, a lot of the rest of the world is going
       | to boycott american products after such level of treacherousness.
        
       | tap-snap-or-nap wrote:
       | All this hardware but I don't know how to best utilize it because
       | 1) I am not a pro, and 2) The apps are not as helpful which can
       | make complex jobs easier, which is what old apple used to do
       | really well.
        
       | narrator wrote:
       | Not to rain on the Apple parade, but cloud video editing with the
       | models running on H100s that can edit videos based on prompts is
       | going to be vastly more productive than anything running locally.
       | This will be useful for local development with the big Deepseek
       | models though. Not sure if it's worth the investment unless
       | Deepseek is close to the capability of cloud models, or privacy
       | concerns overwhelm everything.
        
       | gigatexal wrote:
       | 8tb, 512gb ram, m3 ultra 15k+ usd. Wow.
        
       ___________________________________________________________________
       (page generated 2025-03-05 23:00 UTC)