codevoid.de

        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
 (HTM) Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
 (HTM)   macOS 26.2 enables fast AI clusters with RDMA over Thunderbolt
       
       
        TheRealPomax wrote 1 day ago:
        IS this... good? Why is this something that the underlying OS itself
        should be involved in at all?
       
          wmf wrote 23 hours 49 min ago:
          Networking is part of the OS's job.
       
        jamesfmilne wrote 1 day ago:
        Anyone found any APIs related to this?
        
        I'd have some other uses for RDMA between Macs.
       
          jamesfmilne wrote 21 hours 37 min ago:
          I found some useful clues here. Looks like it uses the regular
          InfiniBand RDMA APIs.
          
 (HTM)    [1]: https://github.com/Anemll/mlx-rdma/commit/a901dbd3f9eeefc628...
       
        DesiLurker wrote 1 day ago:
        does this means an egpu might finally work with macbook-pro or studio?
       
          wmf wrote 23 hours 45 min ago:
          No.
       
        irusensei wrote 1 day ago:
        I am waiting for M5 studio but due to current price of hardware I'm not
        sure it will be at a level that I would call affordable. Currently I'm
        watching for news and if there is any announcement prices will go up
        I'll probably settle for an M4 Max.
       
        zeristor wrote 1 day ago:
        Will Apple be able to ramp up M3 Ultra MacStudios if this becomes a big
        thing?
        
        Is this part of Appleâs plan of building out server side AI support
        using their own hardware?
        
        If so they would need more physical data centres.
        
        Iâm guessing they too would be constrained by RAM.
       
        nottorp wrote 1 day ago:
        It's good to sell shovels :)
       
        pjmlp wrote 1 day ago:
        Maybe Apple should rethink bringing back Mac Pro desktops with
        pluggable GPUs, like that one in the corner still playing with its
        Intel and AMD toys, instead of a big box full of air and pro audio
        cards only.
       
        nickysielicki wrote 1 day ago:
        This is such a weird project. Like where is this running at scale?
        Whereâs the realistic plan to ever run this at scale? Whatâs the
        end goal here?
        
        Donât get me wrong... Itâs super cool, but I fail to understand why
        money is being spent on this.
       
          aurareturn wrote 1 day ago:
          The end goal is that Macs become good local LLM inference machines
          and for AI devs to keep using Macs.
       
            nickysielicki wrote 1 day ago:
            The former will never happen and the latter is a certainty.
       
              aurareturn wrote 1 day ago:
              The former is already true and will become even more true when M5
              Pro/Max/Ultra release.
       
        FridgeSeal wrote 1 day ago:
        Thatâs great for AI people, but can we use this for other distributed
        workloads that arenât ML?
       
          dagmx wrote 1 day ago:
          Sure, thereâs nothing about it thatâs tied to ML. Itâs faster
          interconnect , use it for many kinds of shared compute scenarios.
       
          geerlingguy wrote 1 day ago:
          I've been testing HPL and mpirun a little, not yet with this new RDMA
          capability (it seems like Ring is currently the supported method)...
          but it was a little rough around the edges.
          
          See:
          
 (HTM)    [1]: https://ml-explore.github.io/mlx/build/html/usage/distribute...
       
        kjkjadksj wrote 1 day ago:
        Remember when they enabled egpu over thunderbolt and no one cared
        because the thunderbolt housing cost almost as much as your macbook
        outright? Yeah. Thunderbolt is a racket. Itâs a god damned cord. Why
        is it $50.
       
          wmf wrote 1 day ago:
          In this case Thunderbolt is much much cheaper than 100G Ethernet.
          
          (The cord is $50 because it contains two active chips BTW.)
       
            geerlingguy wrote 1 day ago:
            Yeah, even decent 40 Gbps QSFP+ DAC cables are usually $30+, and
            those don't have active electronics in them like Thunderbolt does.
            
            The ability to also deliver 240W (IIRC?) over the same cable is
            also a bit different here, it's more like FireWire than a standard
            networking cable.
       
        0manrho wrote 1 day ago:
        Just for reference:
        
        Thunderbolt5's stated "80Gbps" bandwidth comes with some caveats.
        That's the figure for either Display Port bandwidth itself or in
        practice more often realized by combining the data channel (PCIe4x4
        ~=64Gbps) with the display channels (=<80Gbps if used in concert with
        data channels), and potentially it can also do unidirectional 120Gbps
        of data for some display output scenarios.
        
        If Apple's silicon follows spec, then that means you're most likely
        limited to PCIe4x4 ~=64Gbps bandwidth per TB port, with a slight
        latency hit due to the controller. That Latency hit is ItDepends(TM),
        but if not using any other IO on that controller/cable (such as display
        port), it's likely to be less than 15% overhead vs Native on average,
        but depending on drivers, firmware, configuration, usecase, cable
        length, and how apple implemented TB5, etc, exact figures very.  And
        just like how 60FPS Average doesn't mean every frame is exactly 1/60th
        of a second long, it's entirely possible that individual packets or
        niche scenarios could see significantly more latency/overhead.
        
        As a point of reference Nvidia RTX Pro (formerly known as quadro)
        workstation cards of Ada generation and older along with most modern
        consumer grahics cards are PCIe4 (or less, depending on how old we're
        talking), and the new RTX Pro Blackwell cards are PCIe5. Though
        comparing a Mac Studio M4 Max for example to an Nvidia GPU is akin to
        comparing Apples to Green Oranges
        
        However, I mention the GPU's not just to recognize the 800lb AI compute
        gorilla in the room, but also that while it's possible to pool a pair
        of 24GB VRAM GPU's to achieve a 48GB VRAM pool between them (be it
        through a shared PCIe bus or over NVlink), the performance does not
        scale linearly due to PCIe/NVLinks limitations, to say nothing of the
        software, and configuration and optimization side of things also being
        a challenge to realizing max throughput in practice.
        
        This is also just as true as a pair of TB5 equipped macs with 128GB of
        memory each using TB5 to achieve a 256GB Pool will take a substantial
        performance hit compared to on otherwise equivalent mac with 256GB.
        (capacities chosen are arbitrary to illustrate the point). The exact
        penalty really depends on usecase and how sensitive it is to the
        latency overhead of using TB5 as well as the bandwidth limitation.
        
        It's also worth noting that it's not just entirely possible with RDMA
        solutions (no matter the specifics) to see worse performance than using
        a singular machine if you haven't properly optimized and configured
        things. This is not hating on the technology, but a warning from
        experience for people who may have never dabbled to not expect things
        to just "2x" or even just better than 1x performance just by simply
        stringing a cable between two devices.
        
        All that said, glad to see this from Apple. Long overdue in my opinion
        as I doubt we'll see them implement an optical network port with
        anywhere near that bandwidth or RoCEv2 support, much less a expose a
        native (not via TB) PCIe port on anything that's a non-pro model.
        
        EDIT: Note, many mac skus have multiple TB5 ports, but it's unclear to
        me what the underlying architecture/topology is there and thus can't
        speculate on what kind of overhead or total capacity any given device
        supports by attempting to use multiple TB links for more
        bandwidth/parallelism. If anyone's got an SoC diagram or similar
        refernce data that actually tells us how the TB controller(s) are
        uplinked to the rest of the SoC, I could go in more depth there. I'm
        not an Apple silicon/MacOS expert. I do however have lots of experience
        with RDMA/RoCE/IB clusters, NVMeoF deployments, SXM/NVlink'd devices
        and generally engineering low latency/high performance network fabrics
        for distributed compute and storage (primarily on the
        infrastructure/hardware/ops side than on the software side) so this is
        my general wheelhouse, but Apple has been a relatively blindspot for me
        due to their ecosystem generally lacking features/support for things
        like this.
       
        yalogin wrote 1 day ago:
        As someone that is not familiar with rdma, dos it mean I can connect
        multiple Macs and run inference? If so itâs great!
       
          wmf wrote 1 day ago:
          You've been able to run inference on multiple Macs for around a year
          but now it's much faster.
       
        schmuckonwheels wrote 1 day ago:
        That's nice but
        
        Liquid (gl)ass still sucks.
       
        thatwasunusual wrote 1 day ago:
        Can someone do an ELI5, and why this is important?
       
          wmf wrote 1 day ago:
          It's faster and lower latency than standard Thunderbolt networking.
          Low latency makes AI clusters faster.
       
        sebnukem2 wrote 1 day ago:
        I didn't know they skipped 10 version numbers.
       
          badc0ffee wrote 1 day ago:
          They switched to using the year.
       
        int32_64 wrote 1 day ago:
        Apple should setup their own giant cloud of M chips with tons of vram,
        make Metal as good as possible for AI purposes, then market the cloud
        as allowing self-hosted models for companies and individuals that care
        about privacy. They would clean up in all kinds of sectors whose data
        can't touch the big LLM companies.
       
          make3 wrote 1 day ago:
          The advantages of having a single big memory per gpu are not as big
          in a data center where you can just shard things between machines and
          use the very fast interconnect, saturating the much faster compute
          cores of a non Apple GPU from Nvidia or AMD
       
          wmf wrote 1 day ago:
          That exists but it's only for iUsers running Apple models.
          
 (HTM)    [1]: https://security.apple.com/blog/private-cloud-compute/
       
        cluckindan wrote 1 day ago:
        This sounds like a plugânâplay physical attack vector.
       
          guiand wrote 1 day ago:
          For security, the feature requires setting a special option with the
          recovery mode command line:
          
          rdma_ctl enable
       
        londons_explore wrote 1 day ago:
        Nobodies gonna take them seriously till they make something rack
        mounted and that isn't made of titanium with pentalobe screws...
       
          moralestapia wrote 1 day ago:
          You might ignore this but, for a while, Mac Mini clusters were a
          thing and they were capex and opex effective. That same setup is kind
          of making a comeback.
       
            fennecbutt wrote 1 day ago:
            They were only a thing to do ci/compilation related to apples os
            because their walled garden locked using other platforms out.
            You're building an iPhone or mac app? Well your ci needs to be on a
            cluster of apple machines.
       
            londons_explore wrote 1 day ago:
            It's in a similar vein to the PS2 linux cluster or someone trying
            to use vape CPU's as web servers...
            
            It might be cost effective, but the supplier is still saying "you
            get no support, and in fact we might even put roadblocks in your
            way because you aren't the target customer".
       
              moralestapia wrote 1 day ago:
              True.
              
              I'm sure Apple could make a killing on the server side,
              unfortunately their income from their other products is so big
              that even if that's a 10B/year opportunity they'll be like "yawn,
              yeah, whatever".
       
                fennecbutt wrote 1 day ago:
                Doubt. A 10B idea is still a promotion. And if capitalism is
                shrinkflationing hard, which it is atm, then capitalists would
                not leave something like that on the table.
       
        piskov wrote 1 day ago:
        George Hotz made nvidia running on macs with his tinygrad via usb4
        
 (HTM)  [1]: https://x.com/__tinygrad__/status/1980082660920918045
       
          throawayonthe wrote 1 day ago:
           [1] nvidia on a 2023 Mac Pro running linux :p
          
 (HTM)    [1]: https://social.treehouse.systems/@janne/115509948515319437
       
            piskov wrote 1 day ago:
            Geohotz stuff anyone can run today
       
        650REDHAIR wrote 1 day ago:
        Do we think TB4 is on the table or is there a technical limitation?
       
        stego-tech wrote 1 day ago:
        This doesnât remotely surprise me, and I can guess Appleâs AI
        endgame:
        
        * They already cleared the first hurdle to adoption by shoving
        inference accelerators into their chip designs by default. Itâs why
        Apple is so far ahead of their peers in local device AI compute, and
        will be for some time.
        
        * I suspect this introduction isnât just for large clusters, but also
        a testing ground of sorts to see where the bottlenecks lie for
        distributed inference in practice.
        
        * Depending on the telemetry they get back from OSes using this
        feature, my suspicion is theyâll deploy some form of distributed
        local AI inference system that leverages their devices tied to a given
        iCloud account or on the LAN to perform inference against larger
        models, but without bogging down any individual device (or at least the
        primary device in use)
        
        For the endgame, Iâm picturing a dynamically sharded model across
        local devices that shifts how much of the model is loaded on any given
        device depending on utilization, essentially creating local-only
        inferencing for privacy and security of their end users. Throw the same
        engines into, say, HomePods or AppleTVs, or even a local AI box, and
        voila, youâre golden.
        
        EDIT: If you're thinking, "but big models need the higher latency of
        Thunderbolt" or "you can't do that over Wi-Fi for such huge models",
        you're thinking too narrowly. Think about the devices Apple consumers
        own, their interconnectedness, and the underutilized but standardized
        hardware within them with predictable OSes.  Suddenly you're not
        jamming existing models onto substandard hardware or networks, but
        rethinking how to run models effectively over consumer distributed
        compute.  Different set of problems.
       
          wmf wrote 1 day ago:
          inference accelerators ... Itâs why Apple is so far ahead of their
          peers in local device AI compute, and will be for some time.
          
          Not really. llama.cpp was just using the GPU when it took off.
          Apple's advantage is more VRAM capacity.
          
          this introduction isnât just for large clusters
          
          It doesn't work for large clusters at all; it's limited to 6-7 Macs
          and most people will probably use just 2 Macs.
       
          fwip wrote 1 day ago:
          The bandwidth of rdma over thunderbolt is so much faster (and lower
          latency) than Apple's system of mostly-wireless devices, I can't see
          how any learnings here would transfer.
       
            stego-tech wrote 1 day ago:
            You're thinking, "You can't put modern models on that sort of
            distributed compute network", which is technically correct.
            
            I was thinking, "How could we package or run these kinds of large
            models or workloads across a consumer's distributed compute?"  The
            Engineer in me got as far as "Enumerate devices on network via mDNS
            or Bonjour, compare keys against iCloud device keys or otherwise
            perform authentication, share utilization telemetry and permit
            workload scheduling/balance" before I realized that's probably what
            they're testing here to a degree, even if they're using RDMA.
       
          threecheese wrote 1 day ago:
          I think you are spot on, and this fits perfectly within my mental
          model of HomeKit; tasks are distributed to various devices within the
          network based on capabilities and authentication, and given a very
          fast bus Apple can scale the heck out of this.
       
            stego-tech wrote 1 day ago:
            Consumers generally have far more compute than they think; it's
            just all distributed across devices and hard to utilize effectively
            over unreliable interfaces (e.g. Wi-Fi).  If Apple (or anyone,
            really) could figure out a way to utilize that at modern scales, I
            wager privacy-conscious consumers would gladly trade some latency
            in responses in favor of superior overall model performance - heck,
            branding it as "deep thinking" might even pull more customers in
            via marketing alone ("thinks longer, for better results" or some
            vaguely-not-suable marketing slogan). It could even be made into an
            API for things like batch image or video rendering, but without the
            hassle of setting up an app-specific render farm.
            
            There's definitely something there, but Apple's really the only
            player setup to capitalize on it via their halo effect with devices
            and operating systems.    Everyone else is too fragmented to make it
            happen.
       
        ComputerGuru wrote 1 day ago:
        Imagine if the Xserve was never killed off. Discontinued 14 years ago,
        now!
       
          icedchai wrote 1 day ago:
          If it was still around, it would probably still be stuck on M2, just
          like the Mac Pro.
       
        reilly3000 wrote 1 day ago:
        dang I wish I could share md tables.
        
        Hereâs a text edition:
        For $50k the inference hardware market forces a trade-off between
        capacity and throughput:
        
        * Apple M3 Ultra Cluster ($50k): Maximizes capacity (3TB). It is the
        only option in this price class capable of running 3T+ parameter models
        (e.g., Kimi k2), albeit at low speeds (~15 t/s).
        
        * NVIDIA RTX 6000 Workstation ($50k): Maximizes throughput (>80 t/s).
        It is superior for training and inference but is hard-capped at 384GB
        VRAM, restricting model size to <400B parameters.
        
        To achieve both high capacity (3TB) and high throughput (>100 t/s)
        requires a ~$270,000 NVIDIA GH200 cluster and data center
        infrastructure. The Apple cluster provides 87% of that capacity for 18%
        of the cost.
       
          dsrtslnd23 wrote 1 day ago:
          what about a GB300 workstation with 784GB unified mem?
       
            rbanffy wrote 23 hours 49 min ago:
            That thing will be extremely expensive I guess. And neither CPU nor
            GPU have that much memory. It's also not a great workstation either
            - macOS is a lot more comfortable to use.
       
            wmf wrote 23 hours 50 min ago:
            $95K
       
              rbanffy wrote 23 hours 46 min ago:
              I miss the time you could go to Apple's website and build the
              most obscene computer possible. With the M series, all options
              got a lot more limited. IIRC, an x86 Mac Pro with 1.5 TB of RAM,
              a big GPU and the two accelerators would yield an eye watering
              hardware bill.
              
              Now you need to add 8 $5K monitors to get something similarly
              ludicrous.
       
          yieldcrv wrote 1 day ago:
          15 t/s way too slow for anything but chatting, call and response, and
          you don't need a 3T parameter model for that
          
          Wake me up when the situation improves
       
            rbanffy wrote 23 hours 44 min ago:
            Just wait for the M5-Ultra with a terabyte of RAM.
       
          3abiton wrote 1 day ago:
          What's the math on the $50k nvidia cluster? My understanding these
          things cost ~$8k and you can at least get 5 for $40k, that's around
          half a tb.
          
          That being said, for inference mac still remain the best, and the M5
          Ultra will even be a better value with its better PP.
       
            reilly3000 wrote 1 day ago:
            GPUs: 4x NVIDIA RTX 6000 Blackwell (96GB VRAM each)
            â¢ Cost: 4 Ã $9,000 = $36,000
            
            â¢ CPU: AMD Ryzen Threadripper PRO 7995WX (96-Core)
            â¢ Cost: $10,000
            
            â¢ Motherboard: WRX90 Chipset (supports 7x PCIe Gen5 slots)
            â¢ Cost: $1,200
            
            â¢ RAM: 512GB DDR5 ECC Registered
            â¢ Cost: $2,000
            
            â¢ Chassis & Power: Supermicro or specialized Workstation case +
            2x 1600W PSUs.
            â¢ Cost: $1,500
            
            â¢ Total Cost: ~$50,700
            
            Itâs a bit maximalist, but if you had to spend $50k itâs going
            to be about as fast as you can make it.
       
              broretore wrote 22 hours 27 min ago:
              This is basically a tinybox pro?
       
          conradev wrote 1 day ago:
          Apple deploys LPDDR5X for the energy efficiency and cost (lower is
          better), whereas NVIDIA will always prefer GDDR and HBM for
          performance and cost (higher is better).
       
            _zoltan_ wrote 1 day ago:
            the GH/GB compute has LPDDR5X - a single or dual GPU shares 480GB,
            depending if it's GH or GB, in addition to the HBM memory, with
            NVLink C2C - it's not bad!
       
              wtallis wrote 1 day ago:
              Essentially, the Grace CPU is a memory and IO expander that
              happens to have a bunch of ARM CPU cores filling in the interior
              of the die, while the perimeter is all PHYs for LPDDR5 and NVLink
              and PCIe.
       
                rbanffy wrote 23 hours 55 min ago:
                > have a bunch of ARM CPU cores filling in the interior of the
                die
                
                The main OS needs to run somewhere. At least for now.
       
                  wtallis wrote 17 hours 3 min ago:
                  Sure, but 72x Neoverse V3 (approximately Cortex X3) is a
                  choice that seems more driven by convenience than by any real
                  need for an AI server to have tons of somewhat slow CPU
                  cores.
       
                    _zoltan_ wrote 15 hours 51 min ago:
                    there are uses cases where those cores are used for aux
                    processing. there is more to these boxes than AI :-)
       
                _zoltan_ wrote 1 day ago:
                fully agree!
                
                with MGX and CX8 we see PCIe root moving to the NIC, which is
                very exciting.
       
          FuckButtons wrote 1 day ago:
          Are you factoring in the above comment about as yet un-implemented
          parallel speed up in there? For on prem inference without any kind of
          asic this seems quite a bargain relatively speaking.
       
          icedchai wrote 1 day ago:
          For $50K, you could buy 25 Framework desktop motherboards (128G VRAM
          each w/Strix Halo, so over 3TB total) Not sure how you'll cluster all
          of them but it might be fun to try. ;)
       
            sspiff wrote 1 day ago:
            There is no way to achieve a high throughput low latency connection
            between 25 Strix Halo systems. After accounting for storage and
            network, there are barely any PCIe lanes left to link two of them
            together.
            
            You might be able to use USB4 but unsure how the latency is for
            that.
       
              0manrho wrote 1 day ago:
              In general I agree with you, the IO options exposed by Strix Halo
              are pretty limited, but if we're getting technical you can tunnel
              PCIe over USB4v2 by the spec in a way that's functionally similar
              to Thunderbolt 5. That gives you essentially 3 sets of native
              PCIe4x4 from the chipset and an additional 2 sets tunnelled over
              USB4v2. TB5 and USB4 controllers are not made equal, so in
              practice YMMV. Regardless of USB4v2 or TB5, you'll take a minor
              latency hit.
              
              Strix Halo IO topology: [1] Frameworks mainboard implements 2 of
              those PCIe4x4 GPP interfaces as M.2 PHY's which you can use a
              passive adapter to connect a standard PCIe AIC (like a NIC or
              DPU) to, and also interestingly exposes that 3rd x4 GPP as a
              standard x4 length PCIe CEM slot, though the system/case isn't
              compatible with actually installing a standard PCIe add in card
              in there without getting hacky with it, especially as it's not an
              open-ended slot.
              
              You absolutely could slap 1x SSD in there for local storage, and
              then attach up to 4x RDMA supporting NIC's to a RoCE enabled
              switch (or Infiniband if you're feeling special) to build out a
              Strix Halo cluster (and you could do similar with Mac Studio's to
              be fair). You could get really extra by using a DPU/SmartNIC that
              allows you to boot from a NVMeoF SAN to leverage all 5 sets of
              PCIe4x4 for connectivity without any local storage but we're
              hitting a complexity/cost threshold with that that I doubt most
              people want to cross. Or if they are willing to cross that
              threshold, they'd also be looking at other solutions better
              suited to that that don't require as many workarounds.
              
              Apple's solution is better for a small cluster, both in pure
              connectivity terms and also with respect to it's memory
              advantages, but Strix Halo is doable. However, in both cases,
              scaling up beyond 3 or especially 4 nodes you rapidly enter
              complexity and cost territory that is better served by nodes that
              are less restrictive unless you have some very niche reason to
              use either Mac's (especially non-pro) or Strix Halo specifically.
              
 (HTM)        [1]: https://www.techpowerup.com/cpu-specs/ryzen-ai-max-395.c...
       
              bee_rider wrote 1 day ago:
              Do they need fast storage, in this application? Their OS could be
              on some old SATA drive or whatever. The whole goal is to get them
              on a fast network together; the models could be stored on some
              network filesystem as well, right?
       
                pests wrote 1 day ago:
                It's more than just the model weights. During inference there
                would be a lot of cross-talk as each node broadcasts its
                results and gathers up what it needs from the others for the
                next step.
       
              icedchai wrote 1 day ago:
              I figured, but it's good to have confirmation.
       
            3abiton wrote 1 day ago:
            You could use llama.cpp rpc mode over "network" via
            usb4/thunderbolt connection
       
          mechagodzilla wrote 1 day ago:
          You can keep scaling down! I spent $2k on an old dual-socket xeon
          workstation with 768GB of RAM - I can run Deepseek-R1 at ~1-2
          tokens/sec.
       
            rpastuszak wrote 1 day ago:
            Nice! What do you use it for?
       
              mechagodzilla wrote 1 day ago:
              1-2 tokens/sec is perfectly fine for 'asynchronous' queries, and
              the open-weight models are pretty close to frontier-quality
              (maybe a few months behind?). I frequently use it for a variety
              of research topics, doing feasibility studies for wacky ideas,
              some prototypy coding tasks. I usually give it a prompt and come
              back half an hour later to see the results (although the thinking
              traces are sufficiently entertaining that sometimes it's fun to
              just read as it comes out). Being able to see the full thinking
              traces (and pause and alter/correct them if needed) is one of my
              favorite aspects of being able to run these models locally. The
              thinking traces are frequently just as or more useful than the
              final outputs.
       
            jacquesm wrote 1 day ago:
            I did the same, then put in 14 3090's. It's a little bit power
            hungry but fairly impressive performance wise. The hardest parts
            are power distribution and riser cards but I found good solutions
            for both.
       
              tucnak wrote 1 day ago:
              You get occasional accounts of 3090 home-superscalers whereas
              they would put up eight, ten, fourteen cards. I normally
              attribute this to obsessive-compulsive behaviour. What kind of
              motherboard you ended up using and what's the bi-directional
              bandwidth you're seeing? Something tells me you're not using EPYC
              9005's with up to 256x PCIe 5.0 lanes per socket or something...
              Also: I find it hard to believe the "performance" claims, when
              your rig is pulling 3 kW from the wall (assuming undervolting at
              200W per card?) The electricity costs alone would surely make
              this intractable, i.e. the same as running six washing machines
              all at once.
       
                jacquesm wrote 22 hours 58 min ago:
                I love your skepsis of what I consider to be a fairly normal
                project, this is not to brag, simply to document.
                
                And I'm way above 3 kW, more likely 5000 to 5500 with the GPUs
                running as high as I'll let them, or thereabouts, but I only
                have one power meter and it maxes out at 2500 watts or so. This
                is using two Xeons in a very high end but slightly older
                motherboard. When it runs the space that it is in becomes hot
                enough that even in the winter I have to use forced air from
                outside otherwise it will die.
                
                As for electricity costs, I have 50 solar panels and on a good
                day they more than offset the electricity use, at 2 pm (solar
                noon here) I'd still be pushing 8 KW extra back into the grid.
                This obviously does not work out so favorably in the winter.
                
                Building a system like this isn't very hard, it is just a lot
                of money for a private individual but I can afford it, I think
                this build is a bit under $10K, so a fraction of what you'd pay
                for a commercial solution but obviously far less polished and
                still less performant. But it is a lot of bang for the buck and
                I'd much rather have this rig at $10K than the first commercial
                solution available at a multiple of this.
                
                I wrote a bit about power efficiency in the run-up to this
                build when I only had two GPUs to play with: [1] My main issue
                with the system is that it is physically fragile, I can't
                transport it at all, you basically have to take it apart and
                then move the parts and re-assemble it on the other side. It's
                just too heavy and the power distribution is messy so you end
                up with a lot of loose wires and power supplies. I could make a
                complete enclosure for everything but this machine is not
                running permanently and when I need the space for other things
                I just take it apart, store the GPUs in their original boxes
                until the next home-run AI project. Putting it all together is
                about 2 hours of work. We call it Frankie, on account of how it
                looks.
                
                edit: one more note, the noise it makes is absolutely
                incredible and I would not recommend running something like
                this in your house unless you are (1) crazy or (2) have a
                separate garage where you can install it.
                
 (HTM)          [1]: https://jacquesmattheij.com/llama-energy-efficiency/
       
                  tucnak wrote 3 hours 12 min ago:
                  Thanks for replying, and your power story does make more
                  sense all things considering. I'm no stranger to homelabbing,
                  in fact just now I'm running both IBM POWER9 system (really
                  power-hungry) as well as AMD 8004, both watercooled now while
                  trying to bring the noise down. The whole rack, along with
                  100G switches and NIC/FPGA's, is certainly keeping us warm in
                  the winter! And it's only dissipating up to 1.6 kW (mostly,
                  thanks to ridiculous efficiency of 8434PN CPU which is like
                  48 cores at 150W or sommat)
                  
                  I cannot imagine dissipating 5 kW at home!
       
              r0b05 wrote 1 day ago:
              I think 14 3090's are more than a little power hungry!
       
                jacquesm wrote 1 day ago:
                to the point that I had to pull an extra circuit... but tri
                phase so good to go even if I would like to go bigger.
                
                I've limited power consumption to what I consider the optimum,
                each card will draw ~275 Watts (you can very nicely configure
                this on a per-card basis). The server itself also uses some for
                the motherboard, the whole rig is powered from 4 1600W
                supplies, the gpus are divided 5/5/4 and the mother board is
                connected to its own supply. It's a bit close to the edge for
                the supplies that have five 3090's on them but so far it held
                up quite well, even with higher ambient temps.
                
                Interesting tidbit: at 4 lanes/card throughput is barely
                impacted, 1 or 2 is definitely too low. 8 would be great but
                the CPUs don't have that many lanes.
                
                I also have a threadripper which should be able to handle that
                much RAM but at current RAM prices that's not interesting (that
                server I could populate with RAM that I still had that fit that
                board, and some more I bought from a refurbisher).
       
                  nonplus wrote 1 day ago:
                  What pcie version are you running? Normally I would not
                  mention one of these, but you have already invested in all
                  the cards, and it could free up some space if any of your
                  lanes being used now are 3.0.
                  
                  If you can afford the 16 (pcie 3) lanes, you could get a PLX
                  ("PCIe Gen3 PLX Packet switch X16 -  x8x8x8x8" on ebay for
                  like $300) and get 4 of your cards up to x8.
       
                    jacquesm wrote 23 hours 13 min ago:
                    All are PCIe 3.0, I wasn't aware of those switches at all,
                    in spite of buying my risers and cables from that source!
                    Unfortunately all of the slots on the board are x8, there
                    are no x16 slots at all.
                    
                    So that switch would probably work but I wonder how big the
                    benefit would be: you will probably see effectively an x4
                    -> (x4 / x8) -> (x8 / x8) -> (x8 / x8) -> (x8 / x4) -> x4
                    pipeline, and then on to the next set of four boards.
                    
                    It might run faster on account of the three passes that are
                    are double the speed they are right now as long as the CPU
                    does not need to talk to those cards and all transfers are
                    between layers on adjacent cards (very likely), and with
                    even more luck (due to timing and lack of overlap) it might
                    run the two x4 passes at approaching x8 speeds as well. And
                    then of course you need to do this a couple of times
                    because four cards isn't enough, so you'd need four of
                    those switches.
                    
                    I have not tried having a single card with fewer lanes in
                    the pipeline but that should be an easy test to see what
                    the effect on throughput of such a constriction would be.
                    
                    But now you have me wondering to what extent I could bundle
                    2 x8 into an x16 slot and then to use four of these cards
                    inserted into a fifth! That would be an absolutely unholy
                    assembly but it has the advantage that you will need far
                    fewer risers, just one x16 to x8/x8 run in reverse (which I
                    have no idea if that's even possible but I see no reason
                    right away why it would not work unless there are more
                    driver chips in between the slots and the CPUs, which may
                    be the case for some of the farthest slots).
                    
                    PCIe is quite amazing in terms of the topology tricks that
                    you can pull off with it, and c-payne's stuff is extremely
                    high quality.
       
                      nonplus wrote 17 hours 0 min ago:
                      If you end up trying it please share your findings!
                      
                      I've basically been putting this kind of gear in my cart,
                      and then deciding I dont want to manage more than the 2
                      3090s, 4090 and a5000 I have now, then I take the PLX out
                      of my cart.
                      
                      Seeing you have the cards already it could be a good fit!
       
                        jacquesm wrote 16 hours 54 min ago:
                        Yes, it could be. Unfortunately I'm a bit distracted by
                        both paid work and some more urgent stuff but
                        eventually I will get back to it. By then this whole
                        rig might be hopelessly outdated but we've done some
                        fun experiments with it and have kept our confidential
                        data in-house which was the thing that mattered to me.
       
                          r0b05 wrote 14 hours 4 min ago:
                          Yes, the privacy is amazing, and there's no rate
                          limiting so you can be as productive as you want.
                          There's also tons of learnings in this exercise. I 
                          have just 2x 3090's and I've learnt so much about
                          pcie and hardware that just makes the creative
                          process that more fun.
                          
                          The next iteration of these tools will likely be more
                          efficient so we should be able to run larger models
                          at a lower cost. For now though, we'll run nvidia-smi
                          and keep an eye on those power figures :)
       
                            jacquesm wrote 12 hours 37 min ago:
                            You can tune that power down to what gives you the
                            best tokencount per joule, which I think is a very
                            important metric by which to optimize these systems
                            and by which you can compare them as well.
                            
                            I have a hard time understanding all of these
                            companies that toss their NDA's and client
                            confidentiality into the wind and feed newfangled
                            AI companies their corporate secrets with abandon.
                            You'd think there would be a more prudent approach
                            to this.
       
            a012 wrote 1 day ago:
            And heat the whole house in parallel
       
            Weryj wrote 1 day ago:
            Just keep going! 2TB of swap disk for 0.0000001 t/sec
       
              kergonath wrote 1 day ago:
              Hang on, starting benchmarks on my Raspberry Pi.
       
                pickle-wizard wrote 1 day ago:
                On a lark a friend setup Ollama on a 8GB Raspberry Pi with one
                of the smaller models. It worked by it was very slow. IIRC it
                did 1 token/second.
       
                euroderf wrote 1 day ago:
                By the year 2035, toasters will run LLMs.
       
            ternus wrote 1 day ago:
            And if you get bored of that, you can flip the RAM for more than
            you spent on the whole system!
       
        reaperducer wrote 1 day ago:
        As someone not involved in this space at all, is this similar to the
        old MacOS Xgrid?
        
 (HTM)  [1]: https://en.wikipedia.org/wiki/Xgrid
       
          wmf wrote 1 day ago:
          No.
       
        daft_pink wrote 1 day ago:
        Hoping Apple has secured plentiful DDR5 to use in their machines so we
        can buy M5 chips with massive amounts of RAM soon.
       
          colechristensen wrote 1 day ago:
          Apple tends to book its fab time / supplier capacity years in advance
       
            lossolo wrote 1 day ago:
            I hope so, I want to replace my M1 Pro with MacBook Pro with M5 Pro
            when they release it next year.
       
              colechristensen wrote 1 day ago:
              I mostly want the M5 Pro because my choice of an M4 Air this year
              with 24 GB of RAM is turning out to be less than I want with the
              things I'm doing these days.
       
        storus wrote 1 day ago:
        Is there any way to connect DGX Sparks to this via USB4? Right now only
        10GbE can be used despite both Spark and MacStudio having vastly faster
        options.
       
          zackangelo wrote 1 day ago:
          Sparks are built for this and actually have Connect-X 7 NICs built
          in! You just need to get the SFPs for them. This means you can
          natively cluster them at 200Gbps.
       
            wtallis wrote 1 day ago:
            That doesn't answer the question, which was how to get a high-speed
            interconnect between a Mac and a DGX Spark. The most likely
            solution would be a Thunderbolt PCIe enclosure and a 100Gb+ NIC,
            and passive DAC cables. The tricky part would be macOS drivers for
            said NIC.
       
              zackangelo wrote 1 day ago:
              Youâre right I misunderstood.
              
              Iâm not sure if it would be of much utility because this would
              presumably be for tensor parallel workloads. In that case you
              want the ranks in your cluster to be uniform or else everything
              will be forced to run at the speed of the slowest rank.
              
              You could run pipeline parallel but not sure itâd be that much
              better than what we already have.
       
                storus wrote 1 day ago:
                It was about this use case:
                
 (HTM)          [1]: https://blog.exolabs.net/nvidia-dgx-spark/
       
        givemeethekeys wrote 1 day ago:
        Would this also work for gaming?
       
          AndroTux wrote 1 day ago:
          No
       
        geerlingguy wrote 1 day ago:
        This implies you'd run more than one Mac Studio in a cluster, and I
        have a few concerns regarding Mac clustering (as someone who's managed
        a number of tiny clusters, with various hardware):
        
        1. The power button is in an awkward location, meaning rackmounting
        them (either 10" or 19" rack) is a bit cumbersome (at best)
        
        2. Thunderbolt is great for peripherals, but as a semi-permanent
        interconnect, I have worries over the port's physical stability... wish
        they made a Mac with QSFP :)
        
        3. Cabling will be important, as I've had tons of issues with TB4 and
        TB5 devices with anything but the most expensive Cable Matters and
        Apple cables I've tested (and even then...)
        
        4. macOS remote management is not nearly as efficient as Linux, at
        least if you're using open source / built-in tooling
        
        To that last point, I've been trying to figure out a way to, for
        example, upgrade to macOS 26.2 from 26.1 remotely, without a GUI, but
        it looks like you _have_ to use something like Screen Sharing or an IP
        KVM to log into the UI, to click the right buttons to initiate the
        upgrade.
        
        Trying "sudo softwareupdate -i -a" will install minor updates, but not
        full OS upgrades, at least AFAICT.
       
          cromniomancer wrote 1 day ago:
          VNC over SSH tunneling always worked well for me before I had Apple
          Remote Desktop available, though I don't recall if I ever initiated a
          connection attempt from anything other than macOS...
          
          erase-install can be run non-interactively  when the correct
          arguments are used. I've only ever used it with an MDM in play so
          YMMV:
          
 (HTM)    [1]: https://github.com/grahampugh/erase-install
       
          ThomasBb wrote 1 day ago:
          With MDM solutions you can not only get software update management,
          but even full LOM for models that support this.
          There are free and open source MDM out there.
       
          827a wrote 1 day ago:
          They do still sell the Mac Pro in a rack mount configuration. But, it
          was never updated for M3 Ultra, and feels not long for this world.
       
          badc0ffee wrote 1 day ago:
          > To that last point, I've been trying to figure out a way to, for
          example, upgrade to macOS 26.2 from 26.1 remotely,
          
          I think you can do this if you install a MDM profile on the Macs and
          use some kind of management software like Jamf.
       
          rsync wrote 1 day ago:
          "... Thunderbolt is great for peripherals, but as a semi-permanent
          interconnect, I have worries over the port's physical stability ..."
          
          Thunderbolt as a server interconnect displeases me aesthetically but
          my conclusion is the opposite of yours:
          
          If the systems are locked into place as servers in a rack the
          movements and stresses on the cable are much lower than when it is
          used as a peripheral interconnect for a desktop or laptop, yes ?
       
            827a wrote 1 day ago:
            This is a semi-solved problem e.g. [1] Appleâs chassis do not
            support it. But conceptually thatâs not a Thunderbolt problem,
            itâs an Apple problem. You could probably drill into the Mac
            Studio chassis to create mount points.
            
 (HTM)      [1]: https://www.sonnetstore.com/products/thunderlok-a
       
              broretore wrote 22 hours 25 min ago:
              You could also epoxy it.
       
          colechristensen wrote 1 day ago:
          There are open source MDM projects, I'm not familiar but [1] might do
          the job for OS upgrades.
          
 (HTM)    [1]: https://github.com/micromdm/nanohub
       
          timc3 wrote 1 day ago:
          Itâs been terrible for years/forever. Even Xserves didnât really
          meet the needs of a professional data centre. And itâs got worse as
          a server OS because itâs not a core focus. Donât understand why
          anyone tries to bother - apart from this MLX use case or as a ProRes
          render farm.
       
            crote wrote 1 day ago:
            iOS build runner. Good luck developing cross-platform apps without
            a Mac!
       
              jeroenhd wrote 1 day ago:
              Practically, just run the macos-inside-kvm-inside-docker command.
              Not very fast, but you can compile the entire thing outside of
              the VM, all you need is the final incantations to get Apple's
              signatures on there.
              
              Legally, you probably need a Mac. Or rent access to one, that's
              probably cheaper.
       
          wlesieutre wrote 1 day ago:
          For #2, OWC puts a screw hole above their dock's thunderbolt ports so
          that you can attach a stabilizer around the cord [1] It's a poor
          imitation of old ports that had screws on the cables, but should help
          reduce inadvertent port stress.
          
          The screw only works with limited devices (ie not the Mac Studio end
          of the cord) but it can also be adhesive mounted.
          
 (HTM)    [1]: https://www.owc.com/solutions/thunderbolt-dock
 (HTM)    [2]: https://eshop.macsales.com/item/OWC/CLINGON1PK/
       
            crote wrote 1 day ago:
            That screw hole is just the regular locking USB-C variant, is it
            not?
            
            See for example:
            
 (HTM)      [1]: https://www.startech.com/en-jp/cables/usb31cctlkv50cm
       
              TheJoeMan wrote 1 day ago:
              Now thatâs one way to enforce not inserting a USB upside-down.
       
              wlesieutre wrote 1 day ago:
              Looks like it! Thanks for pointing this out, I had no idea it was
              a standard.
              
              Apparently since 2016 [1] So for any permanent Thunderbolt GPU
              setups, they should really be using this type of cable
              
 (HTM)        [1]: https://www.usb.org/sites/default/files/documents/usb_ty...
       
                wtallis wrote 1 day ago:
                Note that the locking connector OWC uses is a standard, not the
                standard. This is USB we're dealing with, so they made it
                messy: the spec defines two different mutually-incompatible
                locking mechanisms.
       
                  jamiek88 wrote 1 day ago:
                  Of course they do.
       
          eurleif wrote 1 day ago:
          I have no experience with this, but for what it's worth, looks like
          there's a rack mounting enclosure available which mechanically
          extends the power switch:
          
 (HTM)    [1]: https://www.sonnetstore.com/products/rackmac-studio
       
            geerlingguy wrote 1 day ago:
            I have something similar from MyElectronics, and it works, but it's
            a bit expensive, and still imprecise. At least the power button
            isn't in the back corner underneath!
       
        timsneath wrote 1 day ago:
        Also see
        
 (HTM)  [1]: https://www.engadget.com/ai/you-can-turn-a-cluster-of-macs-int...
       
        btown wrote 1 day ago:
        It would be incredibly ironic if, with Apple's relatively stable supply
        chain relative to the chaos of the RAM market these days (projected to
        last for years), Apple compute became known as a cost-effective way to
        build medium-sized clusters for inference.
       
          teaearlgraycold wrote 1 day ago:
          It already is depending on your needs.
       
          andy99 wrote 1 day ago:
          Itâs gonna suck if all the good Macs get gobbled up by commercial
          users.
       
            icedchai wrote 1 day ago:
            Outside of YouTube influencers, I doubt many home users are buying
            a 512G RAM Mac Studio.
       
              kridsdale1 wrote 1 day ago:
              I did. Admittedly it was for video processing at 8k which uses
              more than 128gb of ram, but I am NOT a YouTuber.
       
              7e wrote 1 day ago:
              That product can still steal fab slots from cheaper, more
              prosumer products.
       
              mirekrusin wrote 1 day ago:
              Of course they're not. Everybody is waiting for next generation
              that will run LLMs faster to start buying.
       
                rbanffy wrote 1 day ago:
                Every generation runs LLMs faster than the previous one.
       
              DrStartup wrote 1 day ago:
              I'm neither and have 2. 24/7 async inference against github
              issues. Free. (once you buy the macs that is)
       
                servercobra wrote 22 hours 26 min ago:
                Interesting. Answering them? Solving them? Looking for ones to
                solve?
       
                madeofpalk wrote 1 day ago:
                I'm not sure who 'home users' are, but i doubt they're buying
                two $9,499 computers.
       
                  trvz wrote 1 day ago:
                  Peanuts for people who make their living with computers.
       
                    jon-wood wrote 1 day ago:
                    So, not a home user then. If you make your living with
                    computers in that manner you are by definition a
                    professional, and just happen to have your work hardware at
                    home.
       
                icedchai wrote 1 day ago:
                Heh. I'm jealous. I'm still running a first gen Mac Studio (M1
                Max, 64 gigs RAM.) It seemed like a beast only 3 years ago.
       
                Waterluvian wrote 1 day ago:
                I wonder what the actual lifetime amortized cost will be.
       
                  oidar wrote 1 day ago:
                  Every time I'm tempted to get one of these beefy mac studios,
                  I just calculate how much inference I can buy for that amount
                  and it's never a good deal.
       
                    stingraycharles wrote 1 day ago:
                    Nevermind the fact that there are a lot of high quality
                    (the highest quality?) models that are not released as open
                    source.
       
                    dontlaugh wrote 1 day ago:
                    For now, while everything you can rent is sold at a loss.
       
                    asimovDev wrote 1 day ago:
                    anyone buying these is usually more concerned with just
                    being able to run stuff on their own terms without handing
                    their data off. otherwise it's probably always cheaper to
                    rent compute for intense stuff like this
       
                    embedding-shape wrote 1 day ago:
                    Every time someone brings up that, it brings me back
                    memories of trying to frantically finish stuff as quickly
                    as possible as either my quota slowly go down with each API
                    request, or the pay-as-you-go bill is increasing 0.1% for
                    each request.
                    
                    Nowadays I fire off async jobs that involve 1000s of
                    requests, billion of tokens, yet it costs basically the
                    same as if I didn't.
                    
                    Maybe it takes a different type of person, than the one I
                    am, but all these "pay-as-you-go"/tokens/credits platforms
                    make me nervous to use, and I end up not using it or
                    spending time trying to "optimize", while investing in
                    hardware and infrastructure I can run at home and use that
                    seems to be no problem for my head to just roll with.
       
                      noname120 wrote 1 day ago:
                      But the downside is that you are stuck with inferior
                      LLMs. None of the best models have open weights: Gemini
                      3.5, Claude Sonnet/Opus 4.5, ChatGPT 5.2. The best model
                      with open weights performs an order of magniture worse
                      than those.
       
                        embedding-shape wrote 1 day ago:
                        The best weights are the weights you can train yourself
                        for specific use cases. As long as you have the data
                        and the infrastructure to train/fine-tune your own
                        small models, you'll get drastically better results.
                        
                        And just because you're mostly using local models
                        doesn't mean you can't use API hosted models in
                        specific contexts. Of course, then the same dread sets
                        in, but if you can do 90% of the tokens with local
                        models and 10% with pay-per-usage API hosted models,
                        you get the best of both worlds.
       
                    bee_rider wrote 1 day ago:
                    Are the inference providers profitable yet? Might be nice
                    to be ready for the day when we see the real price of their
                    services.
       
                      Nextgrid wrote 1 day ago:
                      Isn't it then even better to enjoy cheap inference thanks
                      to techbro philanthropy while it lasts? You can always
                      buy the hardware once the free money runs out.
       
                        bee_rider wrote 1 day ago:
                        Probably depends on what you are interested in. IMO,
                        setting up local programs is more fun anyway. Plus, any
                        project Iâd do with LLMs would just be for fun and
                        learning at this point, so I figure it is better to
                        learn skills that will be useful in the long run.
       
              FireBeyond wrote 1 day ago:
              I doubt many of them are, either.
              
              When the 2019 Mac Pro came out, it was "amazing" how many still
              photography YouTubers all got launch day deliveries of the same
              BTO Mac Pro, with exactly the same spec:
              
              18 core CPU, 384GB memory, Vega II Duo GPU and an 8TB SSD.
              
              Or, more likely, Apple worked with them and made sure each of
              them had this Mac on launch day, while they waited for the model
              they actually ordered. Because they sure as hell didn't need an
              $18,000 computer for Lightroom.
       
                lukeh wrote 1 day ago:
                Still rocking a 2019 Mac Pro with 192GB RAM for audio work,
                because I need the slots and I canât justify the expense of a
                new one. But Iâm sure a M4 Mini is faster.
       
                  NSUserDefaults wrote 1 day ago:
                  How crazy do you have to get with # of tracks or plugins
                  before it starts to struggle? I was under the impression that
                  most studios would be fine with an Intel Mac Mini + external
                  storage.
       
            mschuster91 wrote 1 day ago:
            it's not like regular people can afford this kind of Apple machine
            anyway.
       
              teeray wrote 1 day ago:
              Itâs just depressing that the âPC in every homeâ era is
              being rapidly pulled out from under our feet by all these supply
              shocks.
       
                Aurornis wrote 1 day ago:
                You can get a Mac Mini for $600 with 16GB of RAM and it will be
                more powerful than the "PC in every home" people would need for
                any common software.
                
                The personal computing situation is great right now. RAM is
                temporarily more expensive, but it's definitely not ending any
                eras.
       
                  m-s-y wrote 1 day ago:
                  Not Appleâs ram.
       
                    jeroenhd wrote 1 day ago:
                    RAM prices have exploded enough that Apple's RAM is now no
                    longer a bad deal. At least until their next price hikes.
                    
                    We're going back to the "consumer PCs have 8GB of RAM era"
                    thanks to the AI bubble.
       
                      RestartKernel wrote 1 day ago:
                      Funny, considering Macbooks finally started shipping at
                      16 GB due to Apple Intelligence.
       
                dghlsakjg wrote 1 day ago:
                Huh?
                
                Home PCs are as cheap as theyâve ever been. Adjusted for
                inflation the same can be said about âhome useâ Macs. The
                list price of an entry level MacBook Air has been pretty much
                the same for more than a decade. Adjust for inflation, and you
                get a MacBook air for less than half the real cost of the
                launch model that is massively better in every way.
                
                A blip in high end RAM prices has no bearing on affordable home
                computing. Look at the last year or two and the proliferation
                of cheap, moderately to highly speced mini desktops.
                
                I can get a Ryzen 7 system with 32gb of ddr5, and a 1tb drive
                delivered to my house before dinner tomorrow for $500 + tax.
                
                Thatâs not depressing, thatâs amazing!
       
                  jeroenhd wrote 1 day ago:
                  > I can get a Ryzen 7 system with 32gb of ddr5, and a 1tb
                  drive delivered to my house before dinner tomorrow for $500 +
                  tax
                  
                  That's an amazing price, but I'd like to see where you're
                  getting it. 32GB of RAM alone costs â¬450 here (â¬250 if
                  you're willing to trust Amazon's February 2026 delivery
                  dates).
                  
                  Getting a PC isn't that expensive, but after the blockchain
                  hype and then the AI hype, prices have yet to come down. All
                  estimations I've seen will have RAM prices increase further
                  until the summer of next year, and the first dents in pricing
                  coming the year after at the very earliest.
       
                    dghlsakjg wrote 1 day ago:
                    Amazon[0] link below. Equivalent systems also available at
                    Newegg for the same price since someone nitpicked that you
                    need a $15 prime membership to get that Amazon deal.
                    
                    Shipping might screw you but hereâs in stock 32gb kits of
                    name brand RAM from a well known retailer in the US for
                    $280[1].
                    
                    Edit: same crucial RAM kit is 220GBP in stock at amazon[2]
                    
                    (0) [1] (1) [2] (2)
                    
 (HTM)              [1]: https://www.amazon.com/BOSGAME-P3-Gigabit-Ethernet...
 (HTM)              [2]: https://www.bhphotovideo.com/c/product/1809983-REG...
 (HTM)              [3]: https://www.amazon.co.uk/dp/B0CTHXMYL8?tag=pcp0f-2...
       
                  behnamoh wrote 1 day ago:
                  > Home PCs are as cheap as theyâve ever been.
                  
                  just the 5090 GPU costs +$3k, what are you even talking about
       
                    platevoltage wrote 1 day ago:
                    Man you positively demolished that straw man.
                    
                    How much as a base model MacBook Air changed in price over
                    the last 15 years? With inflation, it's gotten cheaper.
       
                      morshu9001 wrote 1 day ago:
                      It's also gotten cheaper nominally. I just got a new base
                      MBA for $750. Kinda surprised, like there has to be some
                      catch.
       
                        morshu9001 wrote 10 hours 23 min ago:
                        Also, the MBA vs MBP lineup is different now. MBP was
                        the default choice before even for students, so
                        MacBooks sorta started at $1300. Now the MBA is decent,
                        and the MBP is really only for pros who need extra
                        power and features.
       
                        teaearlgraycold wrote 1 day ago:
                        I feel bad for their competitors. We need good
                        competition in the long run but over the last few years
                        it's made less and less sense to get something other
                        than an Apple laptop for most use cases.
       
                          platevoltage wrote 15 hours 40 min ago:
                          I don't. They're being weighed down by Windows and to
                          a lesser extent, x86. If they want to excel in the
                          market, make a change. Use what Valve is doing as an
                          example.
       
                      dghlsakjg wrote 1 day ago:
                      Some numbers to drive your point home:
                      
                      The original base MacBook Air sold for $1799 in 2008. The
                      inflation adjusted price is $2715.
                      
                      The current base model is $999, and literally better in
                      every way except thickness on one edge.
                      
                      If we constrain ourselves to just 15 years. The $999 MBA
                      was released 15 years ago ($1488 in real dollars). The
                      list price has remained the same for the base model, with
                      the exception of when they sold the discontinued 11â
                      MBAs for $899.
                      
                      Itâs actually kind of wild how much better and cheaper
                      computers have gotten.
       
                    dghlsakjg wrote 1 day ago:
                    âA computer in every homeâ (from the original post I
                    was replying to) does not mean âA computer with the
                    highest priced version of the highest priced optional
                    accessory for computers in every homeâ
                    
                    Iâm talking about the hundreds of affordable models that
                    are perfectly suitable for everything up to and including
                    AAA gaming.
                    
                    The existence of expensive, and very much optional, high
                    end computer parts does not mean that affordable computers
                    are not more incredible than ever.
                    
                    Just because cutting edge high end parts are out of reach
                    to you, does not mean that perfectly usable computers are
                    too, as I demonstrated with actual specs and prices in my
                    post.
                    
                    Thatâs what Iâm talking about.
       
                    pests wrote 1 day ago:
                    A home PC has to have a SOTA gpu?
       
                      morshu9001 wrote 1 day ago:
                      Probably upset that the high-end video game "hobby" costs
                      more than it used to. Used to be $1-2K for the very best
                      gaming GPU of the time.
       
                  inferiorhuman wrote 1 day ago:
                  A blip in high end RAM prices 
                  
                  It's not a blip and it's not limited to high end machines and
                  configurations.  Altman gobbled up the lion's share of wafer
                  production.  Look at that Raspberry Pi article that made it
                  to the front page, that's pretty far from a high end Mac and
                  according to the article's author likely to be exported from
                  China due to the RAM supply crisis.
                  
                    I can get a Ryzen 7 system with 32gb of ddr5, and a 1tb
                  drive delivered to my house
                    before dinner tomorrow for $500 + tax.
                  
                  B&H is showing a 7700X at $250 with their cheapest 32GB DDR5
                  5200 sticks at $384.  So you've already gone over budget for
                  just the memory and CPU.  No motherboard, no SSD.
                  
                  Amazon is showing some no-name stuff at $298 as their
                  cheapest memory and a Ryzen 7700X at $246.
                  
                  Add another $100 for an NVMe drive and another $70â100 for
                  the cheapest AM5 motherboards I could find on either of those
                  sites.
       
                    sspiff wrote 1 day ago:
                    Add to that a case, PSU and monitor and you're realitically
                    over $1000
       
                    dghlsakjg wrote 1 day ago:
                    People that can reliably predict the future, especially
                    when it comes to rising markets, are almost always
                    billionaires. It is a skill so rare that it can literally
                    make you the richest man on earth. Why should I trust your
                    prediction of future markets that this pricing is the new
                    standard, and will never go down? Line doesnât always go
                    up, even if it feels like it is right now, and all the tech
                    media darlings are saying so.
                    
                    If everything remains the same, RAM pricing will also. I
                    have never once found a period in known history where
                    everything stays the same, and I would be willing to bet 5
                    figures that at some point in the future I will be able to
                    buy DDR5 or better ram for cheaper than today. I can point
                    out that in the long run, prices for computing equipment
                    have always fallen. I would trust that trend a lot more
                    than a shortage a few months old changing the very nature
                    of commodity markets. Mind you, Iâm not the richest man
                    on earth either, so my pattern matched opinion should be
                    judged the same.
                    
                    > B&H is showing a 7700X at $250 with their cheapest 32GB
                    DDR5 5200 sticks at $384. So you've already gone over
                    budget for just the memory and CPU. No motherboard, no SSD.
                    
                    I didn't say I could build one from parts. Instead I said
                    buy a mini pc, and then went and looked up the specs and
                    price point to be sure.
                    
                    The PC that I was talking about is here[ [1] ]. I live in
                    Canada so translated the prices to USD. Remember that US
                    stores are sometimes forced to hide a massive import tax in
                    those parts prices. The rest of the world isnât subject
                    to that and pays less.
                    
                    Edit: hereâs an equivalent speced pc available in the US
                    for $439 with a prime membership. So even with the cost of
                    prime membership you can get a Ryzen 7 32gb 1tb for $455.
                    
 (HTM)              [1]: https://a.co/d/6c8Udbp
 (HTM)              [2]: https://www.amazon.com/BOSGAME-P3-Gigabit-Ethernet...
       
                      SunlitCat wrote 1 day ago:
                      Donât forget that many of these manufacturers operate
                      with long-term supply contracts for components like RAM,
                      maintain existing inventory, or are selling systems that
                      were produced some time ago. That helps explain why we
                      are still seeing comparatively low prices at the moment.
                      
                      If the current RAM supply crisis continues, it is very
                      likely that these kinds of offers will disappear and that
                      systems like this will become more expensive as well, not
                      to mention all the other products that rely on DRAM
                      components.
                      
                      I also donât believe RAM prices will drop again anytime
                      soon, especially now that manufacturers have seen how
                      high prices can go while demand still holds. Unlike
                      something like graphics cards, RAM is not optional, it is
                      a fundamental requirement for building any computer (or
                      any device that contains one). People donât buy it
                      because they want to, but because they have to.
                      
                      In the end, I suspect that some form of market-regulating
                      mechanism may be required, potentially through government
                      intervention. Otherwise, itâs hard for me to see what
                      would bring prices down again, unless Chinese
                      manufacturers manage to produce DRAM at scale, at
                      significantly lower cost, and effectively flood the
                      market.
       
                      inferiorhuman wrote 1 day ago:
                      People that can reliably predict the future
                      
                      You don't need to be a genius or a billionaire to realize
                      that when most of the global supply of a product becomes
                      unavailable the remaining supply gets more expensive.
                      
                        hereâs an equivalent speced pc available in the US
                      for $439 with a prime membership.
                      
                      So with prime that's $439+139 for $578 which is only
                      slightly higher than the cost without prime of $549.99.
       
                        dghlsakjg wrote 1 day ago:
                        > You don't need to be a genius or a billionaire to
                        realize that when most of the global supply of a
                        product becomes unavailable the remaining supply gets
                        more expensive.
                        
                        Yes. Absolutely correct if you are talking about the
                        short term. I was talking about the long term, and said
                        that. If you are so certain would you take this bet:
                        any odds, any amount that within 1 month I can buy 32gb
                        of new retail DDR5 in the US for at least 10% less than
                        the $384 you cited. (think very hard on why I might
                        offer you infinite upside so confidently. It's not
                        because I know where the price of RAM is going in the
                        short term)
                        
                        > So with prime that's $439+139 for $578 which is only
                        slightly higher than the cost without prime of $549.99.
                        
                        At this point I can't tell if you are arguing in bad
                        faith, or just unfamiliar with how prime works. Just in
                        case: You have cited the cost of prime for a full year.
                        You can buy just a month of prime for a maximum price
                        of $14.99 (that's how I got $455) if you have already
                        used your free trial, and don't qualify for any
                        discounts. Prime also allows cancellation within 14
                        days of signing up for a paid option, which is more
                        than enough time to order a computer, and have it
                        delivered, and cancel for a full refund.
                        
                        So really, if you use a trial or ask for a refund for
                        your prime fees the price is $439. So we have actually
                        gotten the price a full 10% lower than I originally
                        cited.
                        
                        Edit: to eliminate any arguments about Prime in the
                        price of the PC, here's an indentically speced mini PC
                        for the same price from Newegg
                        
 (HTM)                  [1]: https://www.newegg.com/p/2SW-00BM-00002
       
                          inferiorhuman wrote 16 hours 41 min ago:
                          At this point I can't tell if you are arguing in bad
                          faith, or just unfamiliar with how prime works. Just
                          in case: You have cited the cost of prime for a full
                          year.
                          
                          Oh for the love of fuck.  I don't subscribe to Prime
                          or pay any attention to how it's priced.  I've gotten
                          offers for free trials of Prime before, should I just
                          ignore that for most people Prime is something they
                          have to pay for?
       
                          r0b05 wrote 1 day ago:
                          What is your estimate for when memory prices will
                          decrease?
                          
                          I agree that we've seen similar fluctuations in the
                          past and the price of compute trends down in the
                          long-term. This could be a bubble, which it likely
                          is, in which case prices should return to baseline
                          eventually. The political climate is extremely
                          challenging at this time though so things could take
                          longer to stabilize. Do you think we're in this ride
                          for months or years?
       
                            dghlsakjg wrote 1 day ago:
                            I canât be more clear: specificity around
                            predicting the future is close to impossible. There
                            are 9 figure bets on both sides of the RAM issue,
                            and strategic national concerns. I say that prices
                            will go down at some point in the future for
                            reasons highlighted already, but I have no clue
                            when. Keep in mind what I myself have said about
                            human ability to predict the future. You would be a
                            fool to believe anyoneâs specific estimates.
                            
                            Maybe the AI money train stops after Christmas. The
                            entire economy is fucked, but RAM is cheap.
                            
                            Maybe we unlock AGI and the price sky rockets
                            further before factories can get built.
                            
                            There are just too many variables.
                            
                            The real test is if someone had seen this coming,
                            they would have made massive absurd investment
                            returns just by buying up stock and storing it for
                            a few months. Anyone who didnât take advantage of
                            that opportunity has proved that they had no real
                            confidence in their ability to predict the future
                            price of RAM. RAM inventory might have been one of
                            the highest return investments possible this year.
                            Where are all the RAM whales in Lambos who saw this
                            coming?
                            
                            As a corollary: we can say that unless you have
                            some skin in the game and have invested a
                            significant amount of your wealth in RAM chips,
                            then you donât know which way the price is going
                            or when.
                            
                            Extending that even further: people complaining
                            about RAM prices being so high, and moaning that
                            they bought less RAM because of it are actually
                            signaling through action that they think that
                            prices will go down or have leveled off. Anyone who
                            believes that sticks of DDR5 RAM will continue the
                            trend should be cleaning out Amazon, Best Buy and
                            Newegg since the price will never be lower than
                            today.
                            
                            The distinct lack of serious people saying âI
                            told ya soâ with receipts, combined with the lack
                            of people hoarding RAM to sell later is a good
                            indirect signal that no one knows what is happening
                            in the near term.
       
                              inferiorhuman wrote 16 hours 38 min ago:
                              I canât be more clear: specificity around
                              predicting the future is close to impossible.
                              
                              And I can't be more clear: a single entity bought
                              more than 70% of the wafer production for the
                              next year.  That's across all types of memory
                              modules.  That will increase prices.
                              
                                people complaining about RAM prices being so
                              high, and moaning that they bought less RAM
                                because of it are actually signaling through
                              action that they think that prices will go
                                down or have leveled off
                              
                              No, no they're not.  They're saying nothing about
                              what they think future prices will be.
       
                  heavyset_go wrote 1 day ago:
                  Home calculators are cheap as they've ever been, but this era
                  of computing is out of reach for the majority of people.
                  
                  The analogous PC for this era requires a large amount of high
                  speed memory and specialized inference hardware.
       
                    platevoltage wrote 1 day ago:
                    No it doesn't. The majority of people aren't trying to run
                    Ollama on their personal computers.
       
                    dghlsakjg wrote 1 day ago:
                    What regular home workload are you thinking of that the
                    computer I described is incapable of?
                    
                    You can call a computer a calculator, but that doesnât
                    make it a calculator.
                    
                    Can they run SOTA LLMs? No. Can they run smaller, yet still
                    capable LLMs? Yes.
                    
                    However, I donât think that the ability to run SOTA LLMs
                    is a reasonable expectation for âa computer in every
                    homeâ just a few years into that software category even
                    existing.
       
                      buu700 wrote 1 day ago:
                      It's kind of funny to see "a computer in every home"
                      invoked when we're talking about the equivalent of ~$100
                      buying a non-trivial percentage of all computational
                      power in existence at the time of the quote. By the
                      standards of that time, we don't just have a computer in
                      every home, we have a supercomputer in every pocket.
       
                    atonse wrote 1 day ago:
                    You can have access to a supercomputer for pennies,
                    internet access for very little money, and even an m4 Mac
                    mini for $500. You can have a raspberry pi computer for
                    even less. And buy a monitor for a couple hundred dollars.
                    
                    I feel like youâre twisting the goalposts to make your
                    point that it has to be local compute to have access to AI.
                    Why does it need to be local?
                    
                    Update: I take it back. You can get access to AI for free.
       
        novok wrote 1 day ago:
        Now we need some hardware that is rackmount friendly, an OS that is not
        fidly as hell to manage in a data center or headless server and we are
        off to the races! And no, custom racks are not 'rackmount friendly'.
       
          joeframbach wrote 1 day ago:
          So, the Powerbook Duo Dock?
       
        jeffbee wrote 1 day ago:
        Very cool. It requires a fully-connected mesh so the scaling limit here
        would seem to be 6 Mac Studio M3 Ultra, up to 3TB of unified memory to
        work with.
       
          PunchyHamster wrote 1 day ago:
          I'm sure someone will figure out how to make thunderbolt
          switch/router
       
            huslage wrote 1 day ago:
            I don't believe the standard supports such a thing. But I wonder if
            TB6 will.
       
              kmeisthax wrote 1 day ago:
              RDMA is a networking standard, it's supposed to be switched. The
              reason why it's being done over Thunderbolt is that it's the only
              cheap/prosumer I/O standard with enough bandwidth to make this
              work. Like, 100Gbit Ethernet cards are several hundred dollars
              minimum, for two ports, and you have to deal with SFP+ cabling.
              Thunderbolt is just way nicer[0].
              
              The way this capability is exposed in the OS is that the
              computers negotiate an Ethernet bridge on top of the TB link. I
              suspect they're actually exposing PCIe Ethernet NICs to each
              other, but I'm not sure. But either way, a "Thunderbolt router"
              would just be a computer with a shitton of USB-C ports (in the
              same way that an "Ethernet router" is just a computer with a
              shitton of Ethernet ports). I suspect the biggest hurdle would
              actually just be sourcing an SoC with a lot of switching fabric
              but not a lot of compute. Like, you'd need Threadripper levels of
              connectivity but with like, one or two actual CPU cores.
              
              [0] Like, last time I had to swap work laptops, I just plugged a
              TB cable between them and did an `rsync`.
       
                bleepblap wrote 1 day ago:
                I think you might be swapping RDMA with RoCE - RDMA can happen
                entirely within a single node. For example between an NVME and
                a GPU.
       
                  wmf wrote 1 day ago:
                  Within a single node it's just called DMA. RDMA is DMA over a
                  network and RoCE is RDMA over Ethernet.
       
                    bleepblap wrote 1 day ago:
                    Sorry, but it certainly isn't-- [1] The "R" in RDMA means
                    there are multiple DMA controllers who can "transparently"
                    share address spaces. You can certainly share address
                    spaces across nodes with RoCE or Infiniband, but thats a
                    layer on top
                    
 (HTM)              [1]: https://docs.nvidia.com/cuda/gpudirect-rdma/index....
       
                      wtallis wrote 1 day ago:
                      I don't know why that NVIDIA document is wrong, but the
                      established term for doing DMA from eg. an NVMe SSD to a
                      GPU within a single system without the CPU initiating the
                      transfer is peer to peer DMA. RDMA is when your data
                      leaves the local machine's PCIe fabric.
       
                      wmf wrote 1 day ago:
                      I'm going to agree to disagree with Nvidia here.
       
        pstuart wrote 1 day ago:
        I imagine that M5 Ultra with Thunderbolt 5 could be a decent contender
        for building plug and play AI clusters. Not cheap, but neither is
        Nvidia.
       
          baq wrote 1 day ago:
          at current memory prices today's cheap is yesterday's obscenely
          expensive - Apple's current RAM upgrade prices are cheap
       
          whimsicalism wrote 1 day ago:
          nvidia is absolutely cheaper per flop
       
            adastra22 wrote 1 day ago:
            FLOPS are not what matters here.
       
              whimsicalism wrote 1 day ago:
              also cheaper memory bandwidth. where are you claiming that M5
              wins?
       
                Infernal wrote 1 day ago:
                I'm not sure where else you can get a half TB of 800GB/s memory
                for < $10k. (Though that's the M3 Ultra, don't know about the
                M5). Is there something competitive in the nvidia ecosystem?
       
                  whimsicalism wrote 1 day ago:
                  I wasn't aware that M3 Ultra offered a half terabyte of
                  unified memory, but an RTX5090 has double that bandwidth and
                  that's before we even get into B200 (~8TB/s).
       
                    650REDHAIR wrote 1 day ago:
                    You could get x1 M3 Ultra w/ 512gb of unified ram for the
                    price of x2 RTX 5090 totaling 64gb of vram not including
                    the cost of a rig capable of utilizing x2 RTX 5090.
       
                      bigyabai wrote 1 day ago:
                      Which would almost be great, if the M3 Ultra's GPU wasn't
                      ~3x weaker than a single 5090: [1] I don't think I can
                      recommend the Mac Studio for AI inference until the M5
                      comes out. And even then, it remains to be seen how fast
                      those GPUs are or if we even get an Ultra chip at all.
                      
 (HTM)                [1]: https://browser.geekbench.com/opencl-benchmarks
       
                        adastra22 wrote 1 day ago:
                        Again, memory bandwidth is pretty much all that matters
                        here. During inference or training the CUDA cores of
                        retail GPUs are like 15% utilized.
       
                          my123 wrote 23 hours 11 min ago:
                          Not for prompt processing. Current Macs are really
                          not great at long contexts
       
            FlacksonFive wrote 1 day ago:
            To acquire, maybe, but to power?
       
              whimsicalism wrote 1 day ago:
              machine capex currently dominates power
       
                amazingman wrote 1 day ago:
                Sounds like an ecosystem ripe for horizontally scaling cheaper
                hardware.
       
                  crote wrote 1 day ago:
                  If I understand correctly, a big problem is that the
                  calculation isn't embarrasingly parallel: the various chunks
                  are not independent, so you need to do a lot of IO to get the
                  results from step N from your neighbours to calculate step
                  N+1.
                  
                  Using more smaller nodes means your cross-node IO is going to
                  explode. You might save money on your compute hardware, but I
                  wouldn't be surprised if you'd end up with an even greater
                  cost increase on the network hardware side.
       
        simonw wrote 1 day ago:
        I follow the MLX team on Twitter and they sometimes post about using
        MLX on two or more joined together Macs to run models that need more
        than 512GB of RAM.
        
        A couple of examples:
        
        Kimi K2 Thinking (1 trillion parameters): [1] DeepSeek R1 (671B): [2] -
        that one came with setup instructions in a Gist:
        
 (HTM)  [1]: https://x.com/awnihannun/status/1986601104130646266
 (HTM)  [2]: https://x.com/awnihannun/status/1881915166922863045
 (HTM)  [3]: https://gist.github.com/awni/ec071fd27940698edd14a4191855bba6
       
          anemll wrote 1 day ago:
          Tensor Parallel test with RDMA last week [1] Note fast sync
          workaround
          
 (HTM)    [1]: https://x.com/anemll/status/1996349871260107102
       
          CamperBob2 wrote 1 day ago:
          Almost the most impressive thing about that is the power consumption.
           ~50 watts for both of them?  Am I reading it wrong?
       
            wmf wrote 1 day ago:
            Yeah, two Mac Studios is going to be ~400 W.
       
              m-s-y wrote 1 day ago:
              Can confirm. My M3 Ultra tops out at 210W when ComfyUI or ollama
              is running flat out. Confirmed via smart plug.
       
              CamperBob2 wrote 1 day ago:
              What am I missing? [1] (Edit: interesting, thanks.  So the
              underlying OS APIs that supply the power-consumption figures
              reported by asitop are just outright broken.  The discrepancy is
              far too large to chalk up to static power losses or die-specific
              calibration factors that the video talks about.)
              
 (HTM)        [1]: https://i.imgur.com/YpcnlCH.png
       
                wmf wrote 1 day ago:
                
                
 (HTM)          [1]: https://www.youtube.com/watch?v=zCkbVLqUedg
       
          andy99 wrote 1 day ago:
          Iâm hoping this isnât as attractive as it sounds for
          non-hobbyists because the performance wonât scale well to parallel
          workloads or even context processing, where parallelism can be better
          used.
          
          Hopefully this makes it really nice for people that want the
          experiment with LLMs and have a local model but means well funded
          companies wonât have any reason to grab them all vs GPUs.
       
            willtemperley wrote 1 day ago:
            I think itâs going to be great for smaller shops that want on
            premise private cloud. Iâm hoping this will be a win for
            in-memory analytics on macOS.
       
            api wrote 1 day ago:
            No way buying a bunch of minis could be as efficient as much denser
            GPU racks. You have to consider all the logistics and power draw,
            and high end nVidia stuff and probably even AMD stuff is faster
            than M series GPUs.
            
            What this does offer is a good alternative to GPUs for smaller
            scale use and research. At small scale itâs probably competitive.
            
            Apple wants to dominate the pro and serious amateur niches. Feels
            like theyâre realizing that local LLMs and AI research is part of
            that, is the kind of thing end users would want big machines to do.
       
              FuckButtons wrote 1 day ago:
              Power draw? A entire Mac Pro running flat out uses less power
              than 1 5090. 
              If you have a workload that needs a huge memory footprint then
              the tco of the Macs, even with their markup may be lower.
       
              gumboshoes wrote 1 day ago:
              Exactly: The AI appliance market. A new kind of home or
              small-business server.
       
                jabbywocker wrote 1 day ago:
                Iâm expecting Apple to release a new Mac Pro in the next
                couple years whoâs main marketing angle is exactly this
       
                  alwillis wrote 23 hours 27 min ago:
                  > Iâm expecting Apple to release a new Mac Pro in the next
                  couple years
                  
                  I think Apple is done with expansion slots, etc.
                  
                  You'll likely see M5 Mac Studios fairly soon.
       
                    jabbywocker wrote 15 hours 57 min ago:
                    Iâm not saying a Mac Pro with expansion slots, Iâm
                    saying a Mac Pro whose marketing angle is locally running
                    AI models. A hungry market that would accept moderate
                    performance and is already used to bloated price tags has
                    to have them salivating.
                    
                    I think the hold up here is whether TSMC can actually
                    deliver the M5 Pro/Ultra and whether the MLX team can give
                    them a usable platform.
       
                  pjmlp wrote 1 day ago:
                  I fear they no longer care about the workstation market, even
                  the folks at ATP Podcast are at the verge of accepting it.
       
                  api wrote 1 day ago:
                  Itâs really the only common reason to buy a machine that
                  big these days. I could see a Mac Pro with a huge GPU and up
                  to a terabyte of RAM.
                  
                  I guess there are other kinds of scientific simulation, very
                  large dev work, and etc., but those things are quite a bit
                  more niche.
       
                  firecall wrote 1 day ago:
                  Seems like it could be a thing.
                  
                  Also, Iâm curious and in case anyone that knows reads this
                  comment:
                  
                  Apple say they canât get the performance they want out of
                  discreet GPUs.
                  
                  Fair enough. But yet nVidia becomes the most valuable company
                  in the world selling GPUs.
                  
                  Soâ¦
                  
                  Now I get that Apples use case is essentially sealed consumer
                  devices built with power consumption and performance
                  tradeoffs in mind.
                  
                  But could Apple use its Apple Silicon tech to build a Mac Pro
                  with its own expandable GPU options?
                  
                  Or even other brand GPUs knowing they would be used for AI
                  research etcâ¦.
                  If Apple ever make friends with nVidia again of course :-/
                  
                  What we know of Tim Cooks Apple is that it doesnât like to
                  leave money on the table, and clearly they are right now!
       
                    jabbywocker wrote 1 day ago:
                    Thereâs been rumors of Apple working on M-chips that have
                    the GPU and CPU as discrete chiplets. The original rumor
                    said this would happen with the M5 Pro, so itâs
                    potentially on the roadmap.
                    
                    Theoretically they could farm out the GPU to another
                    company but it seems like theyâre set on owning all of
                    the hardware designs.
       
                      storus wrote 1 day ago:
                      TSMC has a new tech that allows seamless integration of
                      mini chiplets, i.e. you can add as many CPU/GPU cores in
                      mini chiplets as you wish and glue them seamlessly
                      together, at least in theory. The rumor is that TSMC had
                      some issues with it which is why M5P and M5M are delayed.
       
                      nntwozz wrote 1 day ago:
                      Apple always strives for complete vertical integration.
                      
                      SJ loved to quote Alan Kay:
                      
                      "People who are really serious about software should make
                      their own hardware."
                      
                      Qualcomm are the latest on the chopping block, history
                      repeating itself.
                      
                      If I were a betting man I'd say Apple's never going back.
       
                        jabbywocker wrote 16 hours 5 min ago:
                        Yeah outside of TSMC, I donât see them ever going
                        back to having a hardware partner.
       
            bigyabai wrote 1 day ago:
            The lack of official Linux/BSD support is enough to make it DOA for
            any serious large-scale deployment. Until Apple figures out what
            they're doing on that front, you've got nothing to worry about.
       
              mjlee wrote 1 day ago:
              Why? AWS manages to do it ( [1] ). Smaller companies too - [2]
              Having used both professionally, once you understand how to drive
              Apple's MDM, Mac OS is as easy to sysadmin as Linux. I'll grant
              you it's a steep learning curve, but so is Linux/BSD if you're
              coming at it fresh.
              
              In certain ways it's easier - if you buy a device through Apple
              Business you can have it so that you (or someone working in a
              remote location) can take it out of the shrink wrap, connect it
              to the internet, and get a configured and managed device
              automatically. No PXE boot, no disk imaging, no having it shipped
              to you to configure and ship out again. If you've done it
              properly the user can't interrupt/corrupt the process.
              
              The only thing they're really missing is an iLo, I can imagine
              how AWS solved that, but I'd love to know.
              
 (HTM)        [1]: https://aws.amazon.com/ec2/instance-types/mac/
 (HTM)        [2]: https://macstadium.com
       
                bigyabai wrote 15 hours 5 min ago:
                Where the in the world are you working where MDM is the
                limiting factor on Linux deployments? North Korea?
                
                Macs are a minority in the datacenter even compared to Windows
                server. The concept of a datacenter Mac would disappear
                completely if Apple let free OSes sign macOS/iOS apps.
       
                  mjlee wrote 8 hours 44 min ago:
                  Iâm talking about using MDM with Mac OS (to take advantage
                  of Apple Silicon, not licensing) in contrast to the tools we
                  already have with other OSes. Probably you could do it to
                  achieve a large scale on prem Linux deployment, fortunately
                  Iâve never tried.
       
              Eggpants wrote 1 day ago:
              Not sure I understand, Mac OS is BSD based.
              
 (HTM)        [1]: https://en.wikipedia.org/wiki/Darwin_(operating_system)
       
                bigyabai wrote 1 day ago:
                macOS is XNU-based. There is BSD code that runs in the
                microkernel level and BSD tools in the userland, but the kernel
                does not resemble BSD's architecture or adopt BSD's license.
                
                This is an issue for some industry-standard software like CUDA,
                which does provide BSD drivers with ARM support that just never
                get adopted by Apple:
                
 (HTM)          [1]: https://www.nvidia.com/en-us/drivers/unix/
       
                  7e wrote 1 day ago:
                  If there were TCO advantages with this setup, CUDA would not
                  be a blocker.
       
                    bigyabai wrote 1 day ago:
                    CUDA's just one example; there's a lot of hardware support
                    on the BSDs that Apple doesn't want to inherit.
       
                      ngcc_hk wrote 1 day ago:
                      Why maint other and have baggage ?
       
                        bigyabai wrote 23 hours 51 min ago:
                        Because Apple already does...? There's still PowerPC
                        and MIPS code that runs in macOS. Asking for CUDA
                        compatibility is not somehow too hard for the
                        trillion-dollar megacorp to handle.
       
            codazoda wrote 1 day ago:
            I havenât looked yet but I might be a candidate for something
            like this, maybe. Iâm RAM constrained and, to a lesser extent,
            CPU constrained. It would be nice to offload some of that. That
            said, I donât think I would buy a cluster of Macs for that. Iâd
            probably buy a machine that can take a GPU.
       
              ChrisMarshallNY wrote 1 day ago:
              Iâm not particularly interested in training models, but it
              would be nice to have eGPUs again. When Apple Silicon came out,
              support for them dried up. I sold my old BlackMagic eGPU.
              
              That said, the need for them also faded. The new chips have
              performance every bit as good as the eGPU-enhanced Intel chips.
       
                andy_ppp wrote 1 day ago:
                eGPU with an Apple accelerator with a bunch or RAM and GPU
                cores could be really interesting honestly. Iâm pretty sure
                they are capable of designing something very competitive
                especially in terms of performance per watt.
       
                  sroussey wrote 18 hours 23 min ago:
                  Really, thatâs a place for the MacPro: slide in SoC with
                  ram modules / blades. Put 4, 8, 16 Ultra chips in one
                  machine.
       
                    andy_ppp wrote 6 hours 20 min ago:
                    You honestly donât need extra CPUs in this system at some
                    point do you?
       
                      sroussey wrote 1 hour 49 min ago:
                      They are inseparable for Apple. CPUS/GPUs/memory. They
                      can use chipsets to tweak ratios, but I doubt they will
                      change the underlying module formatâeverything
                      together.
                      
                      My suggestion is to accept that format and just provide a
                      way to network them at a low level via pci or better.
       
          awnihannun wrote 1 day ago:
          For a bit more context, those posts are using pipeline parallelism.
          For N machines put the first L/N layers on machine 1, next L/N layers
          on machine 2, etc. With pipeline parallelism you don't get a speedup
          over one machine - it just buys you the ability to use larger models
          than you can fit on a single machine.
          
          The release in Tahoe 26.2 will enable us to do fast tensor
          parallelism in MLX. Each layer of the model is sharded across all
          machines. With this type of parallelism you can get close to N-times
          faster for N machines. The main challenge is latency since you have
          to do much more frequent communication.
       
            aimanbenbaha wrote 1 day ago:
            Exo-Labs is an open source project that allows this too, pipeline
            parallelism I mean not the latter, and it's device agnostic meaning
            you can daisy-chain anything you have that has memory and the
            implementation will intelligently shard model layers across them,
            though its slow but scales linearly with concurrent requests.
            
            Exo-Labs:
            
 (HTM)      [1]: https://github.com/exo-explore/exo
       
            dpe82 wrote 1 day ago:
            > The main challenge is latency since you have to do much more
            frequent communication.
            
            Earlier this year I experimented with building a cluster to do
            tensor parallelism across large cache CPUs (AMD EPYC 7773X have
            768mb of L3). My thought was to keep an entire model in SRAM and
            take advantage of the crazy memory bandwidth between CPU cores and
            their cache, and use Infiniband between nodes for the
            scatter/gather operations.
            
            Turns out the sum of intra-core latency and PCIe latency absolutely
            dominate. The Infiniband fabric is damn fast once you get data to
            it, but getting it there quickly is a struggle. CXL would help but
            I didn't have the budget for newer hardware. Perhaps modern Apple
            hardware is better for this than x86 stuff.
       
              wmf wrote 1 day ago:
              That's how Groq works. A cluster of LPUv2s would probably be
              faster and cheaper than an Infiniband cluster of Epycs.
       
                dpe82 wrote 1 day ago:
                Yeah I'm familiar; I was hoping I could do something related on
                previous generation commodity(ish) hardware. It didn't work but
                I learned a ton.
       
                fooblaster wrote 1 day ago:
                what is an lpuv2
       
                  wmf wrote 1 day ago:
                  The chip that Groq makes.
       
            liuliu wrote 1 day ago:
            But that's only for prefilling right? Or is it beneficial for
            decoding too (I guess you can do KV lookup on shards, not sure how
            much speed-up that will be though).
       
              monster_truck wrote 1 day ago:
              Even if it wasn't outright beneficial for decoding by itself, it
              would still allow you to connect a second machine running a
              smaller, more heavily quantized version of the model for
              speculative decoding which can net you >4x without quality loss
       
              zackangelo wrote 1 day ago:
              No you use tensor parallelism in both cases.
              
              The way it typically works in an attention block is: smaller
              portions of the Q, K and V linear layers are assigned to each
              node and are processed independently. Attention, rope norm etc is
              run on the node-specific output of that. Then, when the output
              linear layer is applied an "all reduce" is computed which
              combines the output of all the nodes.
              
              EDIT: just realized it wasn't clear -- this means that each node
              ends up holding a portion of the KV cache specific to its KV
              tensor shards. This can change based on the specific style of
              attention (e.g., in GQA where there are fewer KV heads than ranks
              you end up having to do some replication etc)
       
                liuliu wrote 1 day ago:
                I usually call it "head parallelism" (which is a type of tensor
                parallelism, but paralllelize for small clusters, and specific
                to attention). That is what you described: sharding input
                tensor by number of heads and send to respective Q, K, V shard.
                They can do Q / K / V projection, rope, qk norm whatever and
                attention all inside that particular shard. The out projection
                will be done in that shard too but then need to all reduce sum
                amongst shard to get the final out projection broadcasted to
                every participating shard, then carry on to do whatever else
                themselves.
                
                I am asking, however, is whether that will speed up decoding as
                linearly as it would for prefilling.
       
                  awnihannun wrote 1 day ago:
                  Right, my comment was mostly about decoding speed. For
                  prefill you can get a speed up but there you are less latency
                  bound.
                  
                  In our benchmarks with MLX / mlx-lm it's as much as 3.5x for
                  token generation (decoding) at batch size 1 over 4 machines.
                  In that case you are memory bandwidth bound so sharding the
                  model and KV cache 4-ways means each machine only needs to
                  access 1/4th as much memory.
       
                    liuliu wrote 1 day ago:
                    Oh! That's great to hear. Congrats! Now, I want to get the
                    all-to-all primitives ready in s4nnc...
       
        nodesocket wrote 1 day ago:
        Can we get proper HDR support first in macOS? If I enable HDR on my LG
        OLED monitor it looks completely washed out and blacks are grey.
        Windows 11 HDR works fine.
       
          m-ack-toddler wrote 1 day ago:
          AI is arguably more important than whatever gaming gimmick you're
          talking about.
       
          Razengan wrote 1 day ago:
          Really? I thought it's always been that HDR was notorious on Windows,
          hopeless on Linux, and only really worked in a plug-and-play manner
          on Mac, unless your display has an incorrect profile or something/
          
 (HTM)    [1]: https://www.youtube.com/shorts/sx9TUNv80RE
       
            masspro wrote 1 day ago:
            MacOS does wash out SDR content in HDR mode specifically on
            non-Apple monitors. An HDR video playing in windowed mode will look
            fine but all the UI around it has black and white levels very close
            to grey.
            
            Edit: to be clear, macOS itself (Cocoa elements) is all SDR content
            and thus washed out.
       
              robflynn wrote 1 day ago:
              Oh, that explains why it looked so odd when I enabled HDR on my
              Studio.
       
              crazygringo wrote 1 day ago:
              Define "washed out"?
              
              The white and black levels of the UX are supposed to stay in SDR.
              That's a feature not a bug.
              
              If you mean the interface isn't bright enough, that's intended
              behavior.
              
              If the black point is somehow raised, then that's bizarre and
              definitely unintended behavior. And I honestly can't even imagine
              what could be causing that to happen. It does seem like that it
              would have to be a serious macOS bug.
              
              You should post a photo of your monitor, comparing a black #000
              image in Preview with a pitch-black frame from a video. People
              edit HDR video on Macs, and I've never heard of this happening
              before.
       
              adastra22 wrote 1 day ago:
              Huh, so thatâs why HDR looks like shit on my Mac Studio.
       
              Starmina wrote 1 day ago:
              That's intended behavior for monitor limited in peak brightness
       
                kmeisthax wrote 1 day ago:
                Actually, intended behavior in general. Even on their own
                displays the UI looks grey when HDR is playing.
                
                Which, personally, I find to be extremely ugly and gross and I
                do not understand why they thought this was a good idea.
       
                masspro wrote 1 day ago:
                Thatâs the statement I found last time I went down this
                rabbit hole, that they donât have physical brightness info
                for third-party displays so it just canât be done any better.
                But I donât understand how this can lead to making the black
                point terrible. Black should be the one color every emissive
                colorspace agrees on.
       
                nodesocket wrote 1 day ago:
                I don't think so. Windows 11 has a HDR calibration utility that
                allows you to adjust brightness and HDR and it maintains blacks
                being perfectly black (especially with my OLED). When I enable
                HDR on macOS whatever settings I try, including adjusting
                brightness and contrast on the monitor the blacks look
                completely washed out and grey. HDR DOES seem to work correctly
                on macOS but only if you use Mac displays.
       
            heavyset_go wrote 1 day ago:
            Works well on Linux, just toggle a checkmark in the settings.
       
       
 (DIR) <- back to front page