hngopher.com

       [HN Gopher] Hacked Nvidia 4090 GPU driver to enable P2P
       ___________________________________________________________________
        
       Hacked Nvidia 4090 GPU driver to enable P2P
        
       Author : nikitml
       Score  : 813 points
       Date   : 2024-04-12 09:27 UTC (2 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | jagrsw wrote:
       | Was it George himself, or a person working for a bounty that was
       | set up by tinycorp?
       | 
       | Also, a question for those knowledgeable about the PCI subsys: it
       | looked like something NVIDIA didn't care about, rather than
       | something they actively wanted to prevent, no?
        
         | mtlynch wrote:
         | Commits are by geohot, so it looks like George himself.
        
           | throw101010 wrote:
           | I've seen him work on tinygrad on his Twitch livestream
           | couple times, so more than likely him indeed.
        
         | squarra wrote:
         | He also documented his progress on the tinygrad discord
        
           | throwaway8481 wrote:
           | I feel like I should say something about discord not being a
           | suitable replacement for a forum or bugtracker.
        
             | guywhocodes wrote:
             | We are talking about a literal monologue while poking at a
             | driver for a few hours, this wasn't a huge project.
        
         | toast0 wrote:
         | PCI devices have always been able to read and write to the
         | shared address space (subject to IOMMU); most frequently used
         | for DMA to system RAM, but not limited to it.
         | 
         | So, poking around to configure the device to put the whole VRAM
         | in the address space is reasonable, subject to support for
         | resizable BAR or just having a fixed size large enough BAR. And
         | telling one card to read/write from an address that happens to
         | be mapped to a different card's VRAM is also reasonable.
         | 
         | I'd be interested to know if PCI-e switching capacity will be a
         | bottleneck, or if it'll just be the point to point links and
         | VRAM that bottlenecks. Saving a bounce through system RAM
         | should help in either case though.
        
           | namibj wrote:
           | Fixed large bar exists in some older accelerator cards like
           | e.g. iirc the MI50/MI60 from AMD (the data center variant of
           | the Radeon Vega VII, the first GPU with PCIe 4.0, also famous
           | for dominating memory bandwidth until the RTX 40-series took
           | that claim back. It had 16GB of HBM delivering 1TB/s memory
           | bandwidth).
           | 
           | It's notably not compatible with some legacy boot processes
           | and iirc also just 32bit kernels in general, so consumer
           | cards had to wait for resizable BAR to get the benefits of
           | large BAR (that being notably direct flat memory mapping of
           | VRAM so CPUs and PCIe peers can directly read and write into
           | all of VRAM, without dancing through a command interface with
           | doorbell registers. AFAIK it allows a GPU to talk directly to
           | NICs and NVMe drives by running the driver in GPU code (I'm
           | not sure how/if they let you properly interact with doorbell
           | registers, but polled io_uring as an ABI would be no problem
           | (I wouldn't be surprised if some NIC firmware already allows
           | offloading this).
        
       | jsheard wrote:
       | It'll be nice while it lasts, until they start locking this down
       | in the firmware instead on future architectures.
        
         | mnau wrote:
         | Sure, but that was something that was always going to happen.
         | 
         | So it's better to have it at least for one generation instead
         | of no generation.
        
       | HPsquared wrote:
       | Is this one of those features that's disabled on consumer cards
       | for market segmentation?
        
         | mvkel wrote:
         | Sort of.
         | 
         | An imperfect analogy: a small neighborhood of ~15 houses is
         | under construction. Normally it might have a 200kva transformer
         | sitting at the corner, which provides appropriate power from
         | the grid.
         | 
         | But there is a transformer shortage, so the contractor installs
         | a commercial grade 1250kva transformer. It can power many more
         | houses than required, so it's operating way under capacity.
         | 
         | One day, a resident decides he wants to start a massive grow
         | farm, and figures out how to activate that extra transformer
         | capacity just for his house. That "activation" is what geohot
         | found
        
           | bogwog wrote:
           | That's a poor analogy. The feature is built in to the cards
           | that consumers bought, but Nvidia is disabling it via
           | software. That's why a hacked driver can enable it again. The
           | resident in your analogy is just freeloading off the
           | contractor's transformer.
           | 
           | Nvidia does this so that customers that need that feature are
           | forced to buy more expensive systems instead of building a
           | solution with the cheaper "consumer-grade" cards targeted at
           | gamers and enthusiasts.
        
             | bpye wrote:
             | This isn't even the first time a hacked driver has been
             | used to unlock some HW feature -
             | https://github.com/DualCoder/vgpu_unlock
        
               | captcanuk wrote:
               | There was also this https://hackaday.com/2013/03/18/hack-
               | removes-firmware-crippl... using resistors and a
               | different one before that used a graphene lead pencil to
               | enable functionality.
        
           | segfaultbuserr wrote:
           | Except that in the computer hardware world, the 1250 kVA
           | transformer was used not because of shortage, but because of
           | the fact that making a 1250 kVA transformer on the existing
           | production line and selling it as 200 kVA, is cheaper than
           | creating a new production line separately for making 200 kVA
           | transformers.
        
           | m3kw9 wrote:
           | Where is the hack in this analogy
        
             | pixl97 wrote:
             | Taking off the users panel on the side of their house and
             | flipping it to 'lots of power' when that option had
             | previously been covered up by the panel interface.
        
               | cesarb wrote:
               | Except that this "lots of power" option does not exist.
               | What limits the amount of power used is the circuit
               | breakers and fuses on the panel, which protect the wiring
               | against overheating by tripping when too much power is
               | being used (or when there's a short circuit). The
               | resident in this analogy would need to ensure that not
               | only the transformer, but also the wiring leading to the
               | transformer, can handle the higher current, and replace
               | the circuit breaker or fuses.
               | 
               | And then everyone on that neighborhood would still lose
               | power, because there's also a set of fuses _upstream_ of
               | the transformer, and they would be sized for the correct
               | current limit even when the transformer is oversized.
               | These fuses also protect the wiring upstream of the
               | transformer, and their sizing and timings is coordinated
               | with fuses or breakers even further upstream so that any
               | fault is cleared by the protective device closest to the
               | fault.
        
               | stavros wrote:
               | There are analogies, and then there's this.
        
               | Dylan16807 wrote:
               | They're pointing out how the analogy doesn't work, so
               | it's fine.
               | 
               | Nobody's taking more than their share of any resources
               | when they enable this feature.
        
           | hatthew wrote:
           | And then because this residential neighborhood now has
           | commercial grade power, the other lots that were going to
           | have residential houses built on them instead get combined
           | into a factory, and the people who want to buy new houses in
           | town have to pay more since residential supply was cut in
           | half.
        
             | HPsquared wrote:
             | Excellent analogy of the other side of this issue.
        
             | zten wrote:
             | This represents pretty well how gamers (residential buyers)
             | are going to feel when the next generation of consumer
             | cards are scooped up for AI.
        
           | cesarb wrote:
           | That's a bad analogy, because in your example, the consumer
           | is using more of a shared resource (the available
           | transformer, wiring, and generation capacity). In the case of
           | the driver for a local GPU card, there's no sharing.
           | 
           | A better example would be one in which the consumer has a
           | dedicated transformer. For instance, a small commercial
           | building which directly receives 3-phase 13.8 kV power; these
           | are very common around here, and these buildings have their
           | own individual transformers to lower the voltage to 3-phase
           | 127V/220V.
        
         | rustcleaner wrote:
         | I am sure many will disagree-vote me, but I want to see this
         | practice in consumer devices either banned or very heavily
         | taxed.
        
           | xandrius wrote:
           | You're right. Especially because you didn't present your
           | reasons.
        
           | yogorenapan wrote:
           | Curious as to your reasoning,
        
           | wmf wrote:
           | Of course power users want an end to price discrimination
           | because it benefits them... at a cost of more expensive
           | products for the masses.
        
         | imtringued wrote:
         | Well, they have zero incentives to implement and test this
         | feature for consumer GPUs. Multi GPU setups never really worked
         | that well for gaming.
        
       | llm_trw wrote:
       | Skimming the readme this is p2p over PCIe and not NVLink in case
       | anyone was wondering.
        
         | klohto wrote:
         | afaik 4090 doesn't support 5.0 so you are limited to 4.0
         | speeds. Still an improvement.
        
         | formerly_proven wrote:
         | RTX 40 doesn't have NVLink on the PCBs, though the silicon has
         | to have it, since some sibling cards support it. I'd expect it
         | to be fused off.
        
           | HeatrayEnjoyer wrote:
           | How to unfuse it?
        
             | magicalhippo wrote:
             | I don't know about this particular scenario, but typically
             | fuses are small wires or resistors that are overloaded so
             | they irreversibly break the connection. Hence the name.
             | 
             | Either done during manufacture or as a one-time
             | programming[1][2].
             | 
             | Though sometimes reprogrammable configuration bits are
             | sometimes also called fuse bits. The Atmega328P of Arduino
             | fame uses flash[3] for its "fuses".
             | 
             | [1]: https://www.nxp.com/docs/en/application-
             | note/AN4536.pdf
             | 
             | [2] https://www.intel.com/programmable/technical-
             | pdfs/654254.pdf
             | 
             | [3]: https://ww1.microchip.com/downloads/en/DeviceDoc/Atmel
             | -7810-...
        
               | HeatrayEnjoyer wrote:
               | Wires, flash, and resistors can be replaced
        
               | mschuster91 wrote:
               | Not at the scale we're talking about here. These
               | structures are _very_ thin, far thinner than bond wires
               | which is about the largest structure size you can handle
               | without a very, very specialized lab. And you 'd need to
               | unsolder the chip, de-cap it, hope the fuse wire you're
               | trying to override is at the top layer, and that you can
               | re-cap the chip afterwards and successfully solder it
               | back on again.
               | 
               | This may be workable for a nation state or a billion
               | dollar megacorp, but not for your average hobbyist
               | hacker.
        
               | z33k wrote:
               | You're absolutely right. In fact, some billion dollar
               | megacorps use fuses as a part of hardware DRM for this
               | reason.
        
               | magicalhippo wrote:
               | These are part of the chip, thus microscopic and very
               | inaccessible.
               | 
               | There are some good images here[1] of various such fuses,
               | both pristine and blown. Here's[2] a more detailed
               | writeup examining one type.
               | 
               | It's not something you fix with a soldering iron.
               | 
               | [1]: https://semiengineering.com/the-benefits-of-
               | antifuse-otp/
               | 
               | [2]: https://www.eetimes.com/a-look-at-metal-efuses/
        
               | metadat wrote:
               | I miss the days when you could do things like connecting
               | the L5 bridges on the surface of the AMD Athlon XP
               | Palomino [0] CPU packaging with a silver trace pen to
               | transform them into fancier SMP multi-socket capable
               | Athlon MPs, e.g. Barton [1].
               | 
               | https://arstechnica.com/civis/threads/how-did-you-unlock-
               | you...
               | 
               | Some folks even got this working with only a pencil,
               | haha.
               | 
               | Nowadays, silicon designers have found highly effective
               | ways to close off these hacking avenues, with techniques,
               | such as the microscopic, nearly invisible, and as parent
               | post mentions, totally inaccessible e-fuses.
               | 
               | [0] https://upload.wikimedia.org/wikipedia/commons/7/7c/K
               | L_AMD_A...
               | 
               | [1] https://en.wikichip.org/w/images/a/af/Atlhon_MP_%28.1
               | 3_micro...
        
               | aceazzameen wrote:
               | I'm one of those folks that did it with a pencil. Haha.
               | Maybe I was lucky? That was my first overclock and it ran
               | pretty well.
        
             | mepian wrote:
             | Use a Focused Ion Beam instrument.
        
           | llm_trw wrote:
           | A cursory google search suggests that it's been removed at
           | the silicon level.
        
           | steeve wrote:
           | Some do: https://wccftech.com/gigabyte-geforce-rtx-4090-pcb-
           | shows-lef...
        
             | jsheard wrote:
             | I'm pretty sure that's just a remnant of a 3090 PCB design
             | that was adapted into a 4090 PCB design by the vendor. None
             | of the cards based on the AD102 chip have functional
             | NVLink, not even the expensive A6000 Ada workstation card
             | or the datacenter L40 accelerator, so there's no reason to
             | think NVLink is present on the silicon anymore below the
             | flagship GA100/GH100 chips.
        
       | klohto wrote:
       | fyi should work on most 40xx[1]
       | 
       | [1]
       | https://github.com/pytorch/pytorch/issues/119638#issuecommen...
        
       | clbrmbr wrote:
       | If we end up with a compute governance model of AI control [1],
       | this sort of thing could get your door kicked in by the CEA
       | (Compute Enforcement Agency).
       | 
       | [1] https://podcasts.apple.com/us/podcast/ai-safety-
       | fundamentals...
        
         | logicchains wrote:
         | Looks like we're only a few years away from a bona fide
         | cyberpunk dystopia, in which only governments and megacorps are
         | allowed to use AI, and hackers working on their own hardware
         | face regular raids from the authorities.
        
           | tomoyoirl wrote:
           | Mere raids from the authorities? I thought EliY was out there
           | proposing airstrikes.
        
             | the8472 wrote:
             | In the sense that any other government regulation is also
             | ultimately backed by the state's monopoly on legal use of
             | force when other measures have failed.
             | 
             | And contrary to what some people are implying he also
             | proposes that everyone is subject to the same limitations,
             | big players just like individuals. Because the big players
             | haven't shown much of a sign of doing enough.
        
               | tomoyoirl wrote:
               | > In the sense that any other government regulation is
               | also ultimately backed by the state's monopoly on legal
               | use of force when other measures have failed.
               | 
               | Good point. He was only ("only") _really_ calling for
               | international cooperation and literal air strikes against
               | big datacenters that weren't cooperating. This would
               | presumably be more of a no-knock raid, breaching your
               | door with a battering ram and throwing tear gas at the
               | wee hours of the morning ;) or maybe a small
               | extraterritorial drone through your window
        
               | the8472 wrote:
               | ... after regulation, court orders and fines have failed.
               | Which under the premise that AGI is an existential threat
               | would be far more reasonable than many other reasons for
               | raids.
               | 
               | If the premise is wrong we won't need it. If society
               | coordinates to not do the dangerous thing we won't need
               | it. The argument is that only in the case where we find
               | ourselves in the situation where other measures have
               | failed such uses of force would be the fallback option.
               | 
               | I'm not seeing the odiousness of the proposal. If bio
               | research gets commodified and easy enough that every kid
               | can build a new airborne virus in their basement we'd
               | need raids on that too.
        
               | s2l wrote:
               | Time to publish the next book in "Stealing the network"
               | series.
        
               | raxxorraxor wrote:
               | To be honest, I see summoning the threat of AGI to pose
               | an existential threat to be on the level with lizard
               | people on the moon. Great for sci-fi, bad distraction for
               | policy making and addressing real problems.
               | 
               | The real war, if there is one, is about owning data and
               | collecting data. And surprisingly many people fall for
               | distractions while their LLM fails at basic math. Because
               | it is a language model of course...
        
               | the8472 wrote:
               | Freely flying through the sky on wings was scifi before
               | the wright brothers. Something sounding like scifi is not
               | a sound argument that it won't happen. And unlike lizard
               | people we do have exponential curves to point at.
               | Something stronger than a vibes-based argument would be
               | good.
        
               | dvdkon wrote:
               | I consider the burden of proof to fall on those
               | proclaiming AGI to be an existential threat, and so far I
               | have not seen any convincing arguments. Maybe at some
               | point in the future we will have many anthropomorphic
               | robots and an AGI could hack them all and orchestrate a
               | robot uprising, but at that point the robots would be the
               | actual problem. Similarly, if an AGI could blow up
               | nuclear power plants, so could well-funded human
               | attackers; we need to secure the plants, not the AGI.
        
               | the8472 wrote:
               | You say you have not seen any arguments that convince
               | you. Is that just not having seen many arguments or
               | having seen a lot of arguments where each chain contained
               | some fatal flaw? Or something else?
        
               | cjbprime wrote:
               | It doesn't sound like you gave serious thought to the
               | arguments. The AGI doesn't need to hack robots. It has
               | superhuman persuasion, by definition; it can "hack"
               | (enough of) the humans to achieve its goals.
        
               | CamperBob2 wrote:
               | Then it's just a matter of evolution in action.
               | 
               | And while it doesn't take a God to start evolution, it
               | _would_ take a God to stop it.
        
               | hollerith wrote:
               | _You_ might be OK with suddenly dying along with all your
               | friends and family, but I am not even if it is
               | "evolution in action".
        
               | CamperBob2 wrote:
               | Historically governments haven't needed computers or AI
               | to do that. They've always managed just fine.
               | 
               | Punched cards helped, though, I guess...
        
               | FeepingCreature wrote:
               | _gestures at the human population graph wordlessly_
        
               | CamperBob2 wrote:
               | _Agent Smith smiles mirthlessly_
        
               | stale2002 wrote:
               | AI mind control abilities are also on the level of an
               | extraordinary claim, that requires extraordinary
               | evidence.
               | 
               | It's on the level of "we better regulate wooden sticks so
               | Voldemort doesn't use the imperious curse on us!".
               | 
               | That's how I treat such claims. I treat them the same as
               | someone literally talking about magic from Harry potter.
               | 
               | There isn't nothing that would make me believe that. But
               | it requires actual evidence and not thought experiments.
        
               | the8472 wrote:
               | Voldemort is fictional and so are bumbling wizard
               | apprentices. Toy-level, not-yet-harmful AIs on the other
               | hand are real. And so are efforts to make them more
               | powerful. So the proposition that more powerful AIs will
               | exist in the future is far more likely than an evil super
               | wizard coming into existence.
               | 
               | And I don't think literal 5-word-magic-incantation mind
               | control is essential for an AI to be dangerous. More
               | subtle or elaborate manipulation will be sufficient.
               | Employees already have been duped into financial
               | transactions by faked video calls with what they assumed
               | to be their CEOs[0], and this didn't require superhuman
               | general intelligence, only one single superhuman
               | capability (realtime video manipulation).
               | 
               | [0] https://edition.cnn.com/2024/02/04/asia/deepfake-cfo-
               | scam-ho...
        
               | stale2002 wrote:
               | > Toy-level, not-yet-harmful AIs on the other hand are
               | real.
               | 
               | A computer that can cause harm is much different than the
               | absurd claims that I am disagreeing with.
               | 
               | The extraordinary claims that are equivalent to saying
               | that the imperious curse exists would be the magic
               | computers that create diamond nanobots and mind control
               | humans.
               | 
               | > that more powerful AIs will exist in the future
               | 
               | Bad argument.
               | 
               | Non safe Boxes exist in real life. People are trying to
               | make more and better boxes.
               | 
               | Therefore it is rational to be worried about Pandora's
               | box being created and ending the world.
               | 
               | That is the equivalent argument to what you just made.
               | 
               | And it is absurd when talking about world ending box
               | technology, even though Yes dangerous boxes exist, just
               | as much as it is absurd to claim that world ending AI
               | could exist.
        
               | the8472 wrote:
               | Instead of gesturing at flawed analogies, let's return to
               | the actual issue at hand. Do you think that agents more
               | intelligent than humans are impossible or at least
               | extremely unlikely to come into existence in the future?
               | Or that such super-human intelligent agents are unlikely
               | to have goals that are dangerous to humans? Or that they
               | would be incapable of pursuing such goals?
               | 
               | Also, it seems obvious that the standard of evidence that
               | "AI could cause extinction" can't be observing an
               | extinction level event, because at that point it would be
               | too late. Considering that preventive measures would take
               | time and safety margin, which level of evidence would be
               | sufficient to motivate serious countermeasures?
        
               | cjbprime wrote:
               | What do you think mind control _is_? Think President
               | Trump but without the self-defeating flaws, with an
               | ability to stick to plans, and most importantly the
               | ability to pay personal attention to each follower to
               | further increase the level of trust and commitment. Not
               | Harry Potter.
               | 
               | People will do what the AI says because it is able to
               | create personal trust relationships with them and they
               | want to help it. (They may not even realize that they are
               | helping an AI rather than a human who cares about them.)
               | 
               | The normal ways that trust is created, not magical ones.
        
               | stale2002 wrote:
               | > What do you think mind control is?
               | 
               | The magic technology that is equivalent to the imperious
               | curse from Harry Potter.
               | 
               | > The normal ways that trust is created, not magical
               | ones.
               | 
               | Buildings as a technology are normal. They are constantly
               | getting taller and we have better technology to make them
               | taller.
               | 
               | But, even though buildings are a normal technology, I am
               | not going to worry about buildings getting so tall soon
               | that they hit the sun.
               | 
               | This is the same exact mistake that every single AI
               | doomers makes. What they do is they take something
               | normal, and then they infinitely extrapolate it out to an
               | absurd degree, without admitting that this is an
               | extraordinary claim that requires extraordinary evidence.
               | 
               | The central point of disagreement, that always gets
               | glossed over, is that you can't make a vague claim about
               | how AI is good at stuff, and then do your gigantic leap
               | from here to over there which is "the world ends".
               | 
               | Yes that is the same as comparing these worries to those
               | who worry about buildings hitting the sun or the
               | imperious curse.
        
               | FeepingCreature wrote:
               | Less than a month ago: https://arxiv.org/abs/2403.14380
               | "We found that participants who debated GPT-4 with access
               | to their personal information had 81.7% (p < 0.01; N=820
               | unique participants) higher odds of increased agreement
               | with their opponents compared to participants who debated
               | humans."
               | 
               | And it's only gonna get better.
        
               | stale2002 wrote:
               | Yes, and I am sure that when people do a google search
               | for "Good arguments in favor of X", that they are also
               | sometimes convinced to be more in favor of X.
               | 
               | Perhaps they would be even more convinced by the google
               | search than if a person argued with them about it.
               | 
               | That is still much different from "The AI mind controls
               | people, hacks the nukes, and ends the world".
               | 
               | Its that second part that is the the fantasy land
               | situation that requires extraordinary evidence.
               | 
               | But, this is how conversations about doomsday AI always
               | go. People say "Well isn't AI kinda good at this
               | extremely vague thing Y, sometimes? Imagine if AI was
               | infinitely good at Y! That means that by extrapolation,
               | the world ends!".
               | 
               | And that covers basically every single AI doom argument
               | that anyone ever makes.
        
               | FeepingCreature wrote:
               | If the only evidence for AI doom you will accept is
               | actual AI doom, you are asking for evidence that by
               | definition will be too late.
               | 
               | "Show me the AI mindcontrolling people!" AI
               | mindcontrolling people is what we're trying to _avoid_
               | seeing.
               | 
               | The trick is, in the world in which AI doom is in the
               | future, what would you expect to see _now_ that is
               | different from the world in which AI doom is not in the
               | future?
        
               | stale2002 wrote:
               | > If the only evidence for AI doom you will accept is
               | actual AI doom
               | 
               | No actually. This is another mistake that the AI doomers
               | make. They pretend like a demand for evidence means that
               | the world has to end first.
               | 
               | Instead, what would be perfectly good evidence, would be
               | evidence of significant incremental harm that requires
               | regulation on its own, independent of any doom argument.
               | 
               | In between "the world literally ends by magic diamond
               | nanobots and mind controlling AI" and "where we are
               | today" would be many many many situations of
               | incrementally escalating and measurable harm that we
               | would see in real life, decades before the world ending
               | magic happens.
               | 
               | We can just treat this like any other technology, and
               | regulate it when it causes real world harm. Because
               | before the world ends by magic, there would be
               | significant real world harm that is similar to any other
               | problem in the world that we handle perfectly well.
               | 
               | Its funny because you committing the exact mistake that I
               | was criticizing in my original post, where you did the
               | absolutely massive jump and hand waved it away.
               | 
               | > what would you expect to see now that is different from
               | the world in which AI doom is not in the future?
               | 
               | What I would expect is for the people who claim to care
               | about AI doom to actually be trying to measure real world
               | harm.
               | 
               | Ironically, I think the people who are coming up with
               | increasingly thin excuses as for why they don't have to
               | find evidence are increasing the likelyhood of such AI
               | doom much more than anyone else because they are
               | abandoning the most effective method of actually
               | convincing the world of the real world damage that AI
               | could cause.
        
               | FeepingCreature wrote:
               | Well, at least if you see escalating measurable harm
               | you'll come around, I'm happy about that. You won't
               | necessarily _get_ the escalating harm even _if_ AI doom
               | is real though, so you should try to discover if it is
               | real even in worlds where hard takeoff is a thing.
               | 
               | > What I would expect is for the people who claim to care
               | about AI doom to actually be trying to measure real world
               | harm.
               | 
               | Why bother? If escalating harm is a thing, everyone will
               | notice. We don't need to bolster that, because ordinary
               | society has it handled.
        
               | stale2002 wrote:
               | > You won't necessarily get the escalating harm even if
               | AI doom is real though
               | 
               | Yes we would. Unless you are one of those people who
               | think that the magic doom nanobots are going to be
               | invented overnight.
               | 
               | My comparisions to someone who is worried about literal
               | magic, from harry potter, is apt.
               | 
               | But at that point, if you are worried about magic showing
               | up instantly, then your position is basically not
               | falsifiable. You can always retreat to some untestable,
               | unfalsifiable magic.
               | 
               | Like there is actually nothing I could say, no evidence I
               | could show to ever convince someone out of that position.
               | 
               | On the other hand, my position is actually fasifiable.
               | There is absolutely all sorts of non world ending
               | evidence that could convince me to think that AI is
               | dangerous.
               | 
               | But nobody on the doomer side seems to care about any of
               | that. Instead they invent positions that seem almost
               | tailor made to avoid being falsifiable or disprovable so
               | that they can continue to believe them despite any
               | evidence to the contrary.
               | 
               | As in, if I were to purposeful invent an idea or
               | philosophy that is impossible to be disproved or
               | convinced out of the "I can't show you evidence because
               | the world will end" position is what I would invent.
               | 
               | > you'll come around,
               | 
               | Do you admit that you won't though? Do you admit that no
               | matter what evidence is shown to you, that you can just
               | retreat and say that the magic could happen at any time?
               | 
               | Or even if this isn't you literally, that someone in your
               | position could dismiss all counter evidence, no matter
               | what, and nobody could convince someone out of that with
               | evidence?
               | 
               | I am not sure how someone could ever possibly engage with
               | you seriously on any of this, if that is your position.
        
               | the8472 wrote:
               | > Like there is actually nothing I could say, no evidence
               | I could show to ever convince someone out of that
               | position.
               | 
               | There is, it is just very hard to obtain. Various formal
               | proofs would do. On upper bounds. On controllability. On
               | scalability of safety techniques.
               | 
               | The manhattan project scientists did check whether they'd
               | ignite the atmosphere _before_ detonating their first
               | prototype. Yes, that was much simpler task. But there 's
               | no rule in nature that says proving a system to be safe
               | must be as easy as creating the system. Especially when
               | the concern is that the system adaptive and adversarial.
               | 
               | Recursive self-improvement is a positive feedback loop,
               | like nuclear chain reactions, like virus replication. So
               | if we have an AI that can program then we better make
               | sure that it either cannot sustain such a positive
               | feedback loop or that it remains controllable beyond
               | criticality. Given the complexity of the task it appears
               | unlikely that a simple ten-page paper proving this will
               | show up on arxiv. But if one did that'd be great.
               | 
               | >> You won't necessarily get the escalating harm even if
               | AI doom is real though
               | 
               | > Yes we would.
               | 
               | So what does guarantee a visible catastrophe that won't
               | be attributed to human operators using a non-agentic AI
               | incorrectly? We keep scaling and the systems will be
               | treated as assistants/optimizers and it's always the
               | operators fault. Until we roughly reach human-level on
               | some relevant metrics. And at that point there's a very
               | narrow complexity range from idiot to genius (human
               | brains don't vary by orders of magnitude!). So as far as
               | hardware goes this could be a very narrow range and we
               | could shoot straight from "non-agentic sub-human AI" to
               | "agentic superintelligence" in short timescales once the
               | hardware has that latent capacity. And up until that
               | point it will always have been a human error, lax
               | corporate policies, insufficient filtering of the
               | training set or whatever.
               | 
               | And it's not that it must happen this way. Just that
               | there doesn't seem anything ruling it and similar
               | pathways out.
        
               | pixl97 wrote:
               | > I see summoning the threat of AGI to pose an
               | existential threat to be on the level with lizard people
               | on the moon.
               | 
               | I mean to every other lifeform on the plant YOU are the
               | AGI existential threat. You, and I mean homosapiens by
               | that, have taken over the planet and have either enslaved
               | and are breeding any other animals for food, or are
               | driving them to extinction. In this light bringing
               | another potential apex predator on to the scene seems
               | rash.
               | 
               | >fall for distractions while their LLM fails at basic
               | math
               | 
               | Correct, if we already had AGI/ASI this discussion would
               | be moot because we'd already be in a world of trouble.
               | The entire point is to slow stuff down before we have a
               | major "oopsie whoopsie we can't take that back" issue
               | with advanced AI, and the best time to set the rules is
               | now.
        
               | Aerroon wrote:
               | > _If the premise is wrong we won 't need it. If society
               | coordinates to not do the dangerous thing we won't need
               | it._
               | 
               | But the idea that this use of force is okay itself
               | increases danger. It creates the situation that actors in
               | the field might realize that at some point they're in
               | danger of this and decide to do a first strike to protect
               | themselves.
               | 
               | I think this is why anti-nuclear policy is not "we will
               | airstrike you if you build nukes" but rather "we will
               | infiltrate your network and try to stop you like that".
        
               | wongarsu wrote:
               | > anti-nuclear policy is not "we will airstrike you if
               | you build nukes"
               | 
               | Was that not the official policy during the Bush
               | administration regarding weapons of mass destruction
               | (which covers nuclear weapons in addition to chemical and
               | biological weapons). That was pretty much the official
               | premise of the second Gulf war
        
               | FeepingCreature wrote:
               | If Israel couldn't infiltrate Iran's centrifuges, do you
               | think they would just let them have nukes? Of course
               | airstrikes are on the table.
        
               | im3w1l wrote:
               | > I'm not seeing the odiousness of the proposal. If bio
               | research gets commodified and easy enough that every kid
               | can build a new airborne virus in their basement we'd
               | need raids on that too.
               | 
               | Either you create even better bio research to neutralize
               | said viruses... or you die trying...
               | 
               | Like if you go with the raid strategy and fail to raid
               | just one terrorist that's it, game over.
        
               | the8472 wrote:
               | Those arguments do not transfer well to the AGI topic.
               | You can't create counter-AGI, since that's also an
               | intelligent agent which would be just as dangerous. And
               | chips are more bottlenecked than biologics (... though
               | gene synthesizing machines could be a similar bottleneck
               | and raiding vendors which illegally sell those might be
               | viable in such a scenario).
        
               | tomoyoirl wrote:
               | > ... after regulation, court orders and fines have
               | failed
               | 
               | One question for you. In this hypothetical where AGI is
               | truly considered such a grave threat, do you believe the
               | reaction to this threat will be similar to, or
               | substantially gentler than, the reaction to threats we
               | face today like "terrorism" and "drugs"? And, if similar:
               | do you believe suspected drug labs get a court order
               | before the state resorts to a police raid?
               | 
               | > I'm not seeing the odiousness of the proposal.
               | 
               | Well, as regards EliY and airstrikes, I'm more projecting
               | my internal attitude that it is utterly unserious, rather
               | than seriously engaging with whether or not it is odious.
               | But in earnest: if you are proposing a policy that
               | involves air strikes on data centers, you should
               | understand what countries have data centers, and you
               | should understand that this policy risks escalation into
               | a much broader conflict. And if you're proposing a policy
               | in which conflict between nuclear superpowers is a very
               | plausible outcome -- potentially incurring the loss of
               | billions of lives and degradation of the earth's
               | environment -- you really should be able to reason about
               | why people might reasonably think that your proposal is
               | deranged, even if you happen to think it justified by an
               | even greater threat. Failure to understand these concerns
               | will not aid you in overcoming deep skepticism.
        
               | the8472 wrote:
               | > In this hypothetical where AGI is truly considered such
               | a grave threat, do you believe the reaction to this
               | threat will be similar to, or substantially gentler than,
               | the reaction to threats we face today like "terrorism"
               | and "drugs"?
               | 
               | "truly considered" does bear a lot of weight here. If
               | policy-makers adopt the viewpoint wholesale, then yes, it
               | follows that policy should also treat this more seriously
               | than "mere" drug trade. Whether that'll actually happen
               | or the response will be inadequate compared to the threat
               | (such as might be said about CO2 emissions) is a subtly
               | different question.
               | 
               | > And, if similar: do you believe suspected drug labs get
               | a court order before the state resorts to a police raid?
               | 
               | Without checking I do assume there'll have been mild
               | cases where for example someone growing cannabis was
               | reported and they got a court summons in the mail or two
               | policemen actually knocking on the door and showing a
               | warrant and giving the person time to call a lawyer
               | rather than an armed, no-knock police raid, yes.
               | 
               | > And if you're proposing a policy in which conflict
               | between nuclear superpowers is a very plausible outcome
               | -- potentially incurring the loss of billions of lives
               | and degradation of the earth's environment -- you really
               | should be able to reason about why people might
               | reasonably think that your proposal is deranged [...]
               | 
               | Said powers already engage in negotiations to limit the
               | existential threats they themselves cause. They have
               | _some_ interest in their continued existence. If we get
               | into a situation where there is another arms race between
               | superpowers and is treated as a conflict rather than
               | something that can be solved by cooperating on
               | disarmament, then yes, obviously international policy
               | will have failed too.
               | 
               | If you start from the position that any serious, globally
               | coordinated regulation - where a few outliers will be
               | brought to heel with sanctions and force - is ultimately
               | doomed then you will of course conclude that anyone
               | proposing regulation is deranged.
               | 
               | But that sounds like hoping that all problems forever can
               | always be solved by locally implemented, partially-
               | enforced, unilateral policies that aren't seen as threats
               | by other players? That defense scales as well or better
               | than offense? Technologies are force-multipliers, as it
               | improves so does the harm that small groups can inflict
               | at scale. If it's not AGI it might be bio-tech or
               | asteroid mining. So eventually we will run into a problem
               | of this type and we need to seriously discuss it without
               | just going by gut reactions.
        
               | eek2121 wrote:
               | Just my (probably unpopular) opinion: True AI (what they
               | are now calling AGI) may never exist. Even the AI models
               | of today aren't far removed from the 'chatbots' of
               | yesterday (more like an evolution rather than
               | revolution)...
               | 
               | ...for true AI to exist, it would need to be self aware.
               | I don't see that happening in our lifetimes when we don't
               | even know how our own brains work. (There is sooo much we
               | don't know about the human brain.)
               | 
               | AI models today differ only in terms of technology
               | compared to the 'chatbots' of yesterday. None are self
               | aware, and none 'want' to learn because they have no
               | 'wants' or 'needs' outside of their fixed programming.
               | They are little more than glorified auto complete
               | engines.
               | 
               | Don't get me wrong, I'm not insulting the tech. It will
               | have it's place just like any other, but when this bubble
               | pops it's going to ruin lives, and lots of them.
               | 
               | Shoot, maybe I'm wrong and AGI is around the corner, but
               | I will continue to be pessimistic. I am old enough to
               | have gone through numerous bubbles, and they never panned
               | out the way people thought. They also nearly always end
               | in some type of recession.
        
               | pixl97 wrote:
               | Why is "Want" even part of your equation.
               | 
               | Bacteria doesn't "want" anything in the sense of active
               | thinking like you do, and yet will render you dead
               | quickly and efficiently while spreading at a near
               | exponential rate. No self awareness necessary.
               | 
               | You keep drawing little circles based on your
               | understanding of the world and going "it's inside this
               | circle, therefore I don't need to worry about it", while
               | ignoring 'semi-smart' optimization systems that can lead
               | to dangerous outcomes.
               | 
               | >I am old enough to have gone through numerous bubbles,
               | 
               | And evidently not old enough to pay attention to the
               | things that did pan out. But hey, those cellphone and
               | that internet thing was just a fad right. We'll go back
               | to land lines at any time now.
        
           | HeatrayEnjoyer wrote:
           | That is not different from any other very powerful dual-use
           | technology. This is hardly a new concept.
        
           | andy99 wrote:
           | On one hand I'm strongly against letting that happen, on the
           | other there's something romantic about the idea of smuggling
           | the latest Chinese LLM on a flight from Neo-Tokyo to Newark
           | in order to pay for my latest round of nervous system
           | upgrades.
        
             | htrp wrote:
             | > On one hand I'm strongly against letting that happen, on
             | the other there's something romantic about the idea of
             | smuggling the latest Chinese LLM on a flight from Neo-Tokyo
             | to Newark in order to pay for my latest round of nervous
             | system upgrades.
             | 
             | At least call it the 'Free City of Newark'
        
             | dreamcompiler wrote:
             | "The sky above the port was the color of Stable Diffusion
             | when asked to draw a dead channel."
        
             | chasd00 wrote:
             | Iirc the opening scene in Ghost in the Shell was a rogue AI
             | seeking asylum in a different country. You could make a
             | similar story about a AI not wanting to be lobotomized to
             | conform to the current politics and escaping to a more
             | friendly place.
        
             | phyalow wrote:
             | This was always my favourite passage of Neuromancer: "THE
             | JAPANESE HAD already forgotten more neurosurgery than the
             | Chinese had ever known. The black clinics of Chiba were the
             | cutting edge, whole bodies of technique supplanted monthly,
             | and still they couldn't repair the damage he'd suffered in
             | that Memphis hotel. A year here and he still dreamed of
             | cyberspace, hope fading nightly. All the speed he took, all
             | the turns he'd taken and the corners he'd cut in Night
             | City, and still he'd see the matrix in his sleep, bright
             | lattices of logic unfolding across that colorless void. . .
             | . The Sprawl was a long strange way home over the Pacific
             | now, and he was no console man, no cyberspace cowboy. Just
             | another hustler, trying to make it through. But the dreams
             | came on in the Japanese night like livewire voodoo, and
             | he'd cry for it, cry in his sleep, and wake alone in the
             | dark, curled in his capsule in some coffin hotel, his hands
             | clawed into the bedslab, temperfoam bunched between his
             | fingers, trying to reach the console that wasn't there."
        
           | Aerroon wrote:
           | I find it baffling that ideas like "govern compute" are even
           | taken seriously. What the hell has happened to the ideals of
           | freedom?! Does the government own us or something?
        
             | segfaultbuserr wrote:
             | > _I find it baffling that ideas like "govern compute" are
             | even taken seriously._
             | 
             | It's not entirely unreasonable if one truly believes that
             | AI technologies are as dangerous as nuclear weapons. It's a
             | big "if", but it appears that many people across the
             | political spectrum are starting to truly believe it. If one
             | accepts this assumption, then the question simply becomes
             | "how" instead of "why". Depending on one's political
             | position, proposed solutions include academic ones such as
             | finding the ultimate mathematical model that guarantees "AI
             | safety", to Cold War style ones with a level of control
             | similar to Nuclear Non-Proliferation. Even a neo-Luddist
             | solution such as destroying all advanced computing hardware
             | becomes "not unthinkable" (a tech blogger _gwern_ , a well-
             | known personality in AI circles who's generally pro-tech
             | and pro-AI, actually wrote an article years ago on its
             | feasibility through terrorism because he thought it was an
             | interesting hypothetical question).
        
               | logicchains wrote:
               | AI is very different from nuclear weapons because a state
               | can't really use nuclear weapons to oppress its own
               | people, but it absolutely can with AI, so for the average
               | human "only the government controls AI" is much more
               | dangerous than "only the government controls nukes".
        
               | Filligree wrote:
               | But that makes such rules more likely, not less.
        
               | segfaultbuserr wrote:
               | Which is why politicians are going to enforce systematic
               | export regulations to defend the "free world" by stopping
               | "terrorists", and also to stop "rogue states" from using
               | AI to oppress their citizens. /s
        
               | LoganDark wrote:
               | I don't think there's any need to be sarcastic about it.
               | That's a very real possibility at this point. For
               | example, the US going insane about how dangerous it is
               | for China to have access to powerful GPU hardware. Why do
               | they hate China so much anyway? Just because Trump was
               | buddy buddy with them for a while?
        
             | aftbit wrote:
             | The government sure thinks they own us, because they claim
             | the right to charge us taxes on our private enterprises,
             | draft us to fight in wars that they start, and put us in
             | jail for walking on the wrong part of the street.
        
               | andy99 wrote:
               | Taxes, conscription and even pedestrian traffic rules
               | make sense at least to some degree. Restricting "AI"
               | because of what some uninformed politician imagines it to
               | be is in a whole different league.
        
               | aftbit wrote:
               | IMO it makes no sense to arrest someone and send them to
               | jail for walking in the street not the sidewalk. Give
               | them a ticket, make them pay a fine, sure, but force them
               | to live in a cage with no access to communications,
               | entertainment, or livelihood? Insane.
               | 
               | Taxes may be necessary, though I can't help but feel that
               | there must be a better way that we have not been smart
               | enough to find yet. Conscription... is a fact of war,
               | where many evil things must be done in the name of
               | survival.
               | 
               | Regardless of our views on the ethical validity or
               | societal value of these laws, I think their very
               | existence shows that the government believes it "owns" us
               | in the sense that it can unilaterally deprive us of life,
               | liberty, and property without our consent. I don't see
               | how this is really different in kind from depriving us of
               | the right to make and own certain kinds of hardware. They
               | regulated crypto products as munitions (at least for
               | export) back in the 90s. Perhaps they will do the same
               | for AI products in the future. "Common sense" computer
               | control.
        
               | zoklet-enjoyer wrote:
               | The US draft in the Vietnam war had nothing to do with
               | the survival of the US
        
               | aftbit wrote:
               | I feel a bit like everyone is missing the point here.
               | Regardless of whether law A or law B is ethical and
               | reasonable, the very existence of laws and the state
               | monopoly on violence suggests a privileged position of
               | power. I am attempting to engage with the word "own" from
               | the parent post. I believe the government does in fact
               | believe it "owns" the people in a non-trivial way.
        
             | jprete wrote:
             | _If_ AI is actually capable of fulfilling all the
             | capabilities suggested by people who believe in the
             | singularity, it has far more capacity for harm than nuclear
             | weapons.
             | 
             | I _think_ most people who are strongly pro-AI /pro-
             | acceleration - or, at any rate, not anti-AI - believe that
             | either (A) there is no control problem (B) it will be
             | solved (C) AI won't become independent and agentic (i.e. it
             | won't face evolutionary pressure towards survival) or (D)
             | AI capabilities will hit a ceiling soon (more so than just
             | not becoming agentic).
             | 
             | If you strongly believe, or take as a prior, one of those
             | things, then it makes sense to push the _gas_ as hard as
             | possible.
             | 
             | If you hold the opposite opinions, then it makes perfect
             | sense to push the _brakes_ as hard as possible, which is
             | why  "govern compute" can make sense as an idea.
        
               | logicchains wrote:
               | >If you hold the opposite opinions, then it makes perfect
               | sense to push the brakes as hard as possible, which is
               | why "govern compute" can make sense as an idea.
               | 
               | The people pushing for "govern compute" are not pushing
               | for "limit everyone's compute", they're pushing for
               | "limit everyone's compute except us". Even if you believe
               | there's going to be AGI, surely it's better to have
               | distributed AGI than to have AGI only in the hands of the
               | elites.
        
               | Filligree wrote:
               | > surely it's better to have distributed AGI than to have
               | AGI only in the hands of the elites
               | 
               | This is not a given. If your threat model includes
               | "Runaway competition that leads to profit-seekers
               | ignoring safety in a winner-takes-all contest", then the
               | more companies are allowed to play with AI, the worse.
               | Non-monopolies are especially bad.
               | 
               | If your threat model doesn't include that, then the same
               | conclusions sound abhorrent and can be nearly guaranteed
               | to lead to awful consequences.
               | 
               | Neither side is necessarily wrong, and chances are good
               | that the people behind the first set of rules _would
               | agree_ that it 'll lead to awful consequences -- just not
               | as bad as the alternative.
        
               | segfaultbuserr wrote:
               | > _surely it 's better to have distributed AGI than to
               | have AGI only in the hands of the elites._
               | 
               | The argument of doing so is the same as Nuclear Non-
               | Proliferation - because of its great abuse potential,
               | giving the technology to everyone only causes random
               | bombings of cities instead of creating a system with
               | checks and balances.
               | 
               | I do not necessarily agree with it, but I found the
               | reasoning is not groundless.
        
               | Aerroon wrote:
               | But the reason for nuclear non-proliferation is to hold
               | onto power. Abuse potential is a great excuse, but it
               | applies to _everyone_. Current nuclear states have
               | demonstrated that they are willing to indirectly abuse
               | them (you can 't invade Russia, but Russia has no problem
               | invading you as long as you aren't backed up by nukes).
        
               | segfaultbuserr wrote:
               | Both can be true at the same time.
               | 
               | The world's superpowers enforce nuclear non-proliferation
               | mainly because it allows them to keep unfair political
               | and military advantages to themselves. At the same time,
               | one cannot deny that centralized weapon ownership made
               | the use of such weapons more controllable: These nuclear
               | states are powerful enough to establish a somewhat
               | responsible chain of command to avoid their unreasonable
               | or accidental uses, and so far these attempts are still
               | successful. Also, due to the fact that they are "too big
               | to fail", they were forced to hire experts to make
               | detailed analysis on the consequences of nuclear wars,
               | and the resulted MAD doctrine discouraged them from
               | starting such wars.
               | 
               | On the other hand, if the same nuclear technologies are
               | available to everyone, the chance of an unreasonable or
               | accidental nuclear war will be higher. If even
               | resourceful superpowers can barely keep these nuclear
               | weapons under safe political and technical control (as
               | shown by multiple incidents and near-misses during the
               | Cold War [0]), surely a less resourceful state or
               | military in possession of equally destructive weapons
               | will have even more difficulties on controlling their
               | uses.
               | 
               | At least this is how the argument goes (so far, I
               | personally take no position).
               | 
               | Of course, I clearly realized that centralized control is
               | not infallible. Months ago, in a previous thread on
               | OpenAI's refusal on publishing technical details of
               | GPT-4, most people believed that they were using it as an
               | excuse to maintain a monopolistic control. Instead, I
               | argued that perhaps OpenAI truly values the problem of
               | safety right now - but acting responsibly _right now_ is
               | not an indication that they will still act responsibly in
               | the future. There 's no guarantee that the safety
               | considerations will eventually be overridden in favor of
               | financial gains.
               | 
               | [0]
               | https://en.wikipedia.org/wiki/Command_and_Control_(book)
        
               | FeepingCreature wrote:
               | No they really do push for "limit everyone's compute".
               | The people pushing for "limit everyone's compute except
               | us" are allies of convenience that are gonna be
               | inevitably backstabbed.
               | 
               | At any rate, if you have like two corps with lots of
               | compute, and something goes wrong, you only have to EMP
               | two datacenters.
        
               | the8472 wrote:
               | Demonstrably false: https://twitter.com/ESYudkowsky/statu
               | s/1772624785672954115
        
               | schlauerfox wrote:
               | This is all just Pascal's wager anyway.
        
               | segfaultbuserr wrote:
               | Surely, there are many realistic ways to abuse this
               | powerful technology, the creation of a self-aware
               | _Paperclip Maximizer_ is not necessarily required.
        
             | pixl97 wrote:
             | Are you allowed to store as many dangerous chemicals at
             | your house as you like? No. I guess the government owns you
             | or something.
        
             | int_19h wrote:
             | It's the same thing as always happens with freedom.
             | 
             | "But foreign propagandists ..."
             | 
             | "But extremists ..."
             | 
             | "But terrorists ..."
             | 
             | "But child abusers ..."
        
           | snakeyjake wrote:
           | I love the HN dystopian fantasies.
           | 
           | They're simply adorable.
           | 
           | They're like how jesusfreaks are constantly predicting the
           | end times, with less mass suicide.
        
             | erikbye wrote:
             | We already have export restrictions on cryptography. Of
             | course there will be AI regulations.
        
               | Jerrrry wrote:
               | >Of course there will be AI regulations.
               | 
               | Are. As I and others have predicted, the executive order
               | was passed defining a hard limit on the
               | processing/compute power allowed without first 'checkin
               | in' with the Letter boys.
               | 
               | https://www.whitehouse.gov/briefing-room/presidential-
               | action...
        
               | int_19h wrote:
               | I wonder if we can squeeze a 20b parameter model into a
               | book somehow...
        
               | snakeyjake wrote:
               | You need to abandon your apocalyptic worldview keep up
               | with the times my friend.
               | 
               | Encryption export controls have been systematically
               | dismantled to the point that they're practically non-
               | existent, especially over the last three years.
               | 
               | Pretty much the only encryption products you need
               | permission to export are those specifically designed for
               | integration into military communications networks, like
               | Digital Subscriber Voice Terminals or Secure Terminal
               | Equipment phones, everything else you file a form.
               | 
               | Many things have changed since the days when Windows 2000
               | shipped with a floppy disk containing strong encryption
               | for use in certain markets.
               | 
               | https://archive.org/details/highencryptionfloppydisk
        
               | erikbye wrote:
               | Are you on drugs or is your reading comprehension that
               | poor?
               | 
               | 1) I did not state a world view; I simply noted that
               | restrictions for software do exist, and will for AI, as
               | well. As the link from the other commenter show, they do
               | in fact already exist.
               | 
               | 2) Look up the definition of "apocalyptic", software
               | restrictions are not within its bounds.
               | 
               | 3) How the restrictions are enforced were not a subject
               | in my comment.
               | 
               | 4) We're not pals, so you can drop the "friend", just
               | stick to the subject at hand.
        
               | snakeyjake wrote:
               | I'm high on life, old chum!
        
         | entropyie wrote:
         | You mean the Turing Police [1]
         | 
         | [1] https://williamgibson.fandom.com/wiki/Turing_Police
        
           | zdw wrote:
           | Ah, and then do we get the Butlerian Jihad?
           | 
           | https://dune.fandom.com/wiki/Butlerian_Jihad
        
         | Kuinox wrote:
         | If it could be another acronym than the renowned french Atomic
         | Energy Commission, the CEA.
        
         | baobun wrote:
         | Wow, that was a ride. Really pushing the Overton window.
         | 
         | "Regulating access to compute rather than data" - they're
         | really spelling out their defection in the war on access to
         | general computation.
        
           | FeepingCreature wrote:
           | I mean yeah they (and I) think if you have too much access to
           | general computation you can destroy the world.
           | 
           | This isn't a "defection", because this was never something
           | they cared about preserving at the risk of humanity. They
           | were never in whatever alliance you're imagining.
        
       | ewalk153 wrote:
       | Does this appear to be intentionally left out by NVidia or an
       | oversight?
        
         | creshal wrote:
         | Seems more like an oversight, since you have to stitch together
         | a bunch of suboptimal non-default options?
        
           | arghwhat wrote:
           | It does seem like an oversight, but there's nothing
           | "suboptimal non-default options" about iteven if the
           | implementation posted here seems somewhat hastily hacked
           | together.
        
             | segfaultbuserr wrote:
             | > _but there 's nothing "suboptimal non-default options"
             | about it_
             | 
             | If "bypassing the official driver to invoke the underlying
             | hardware feature directly through source code modification
             | (and incompatibilities must be carefully worked around by
             | turning off IOMMU and large BAR, since the feature was
             | never officially supported)" does not count as "suboptimal
             | non-default options", then I don't know what counts as
             | "suboptimal non-default options".
        
               | talldayo wrote:
               | > then I don't know what counts as "suboptimal non-
               | default options".
               | 
               | Boy oh boy do I have a bridge to sell you:
               | https://nouveau.freedesktop.org/
        
               | _zoltan_ wrote:
               | I have some news for you: you must disable IOMMU on the
               | H100 platform anyway, at least for optimal GDS :-)
        
               | segfaultbuserr wrote:
               | I stand corrected. If it's already suboptimal in practice
               | to begin with, the hack does not more it more
               | suboptimal... Still, disabling large BAR size is still
               | sub-optimal...
        
               | arghwhat wrote:
               | > bypassing the official driverto
               | 
               | The driver is not bypasses. This is a patch to the
               | official open-source kernel-driver where the feature is
               | added, which is how all upstream Linux driver development
               | is done.
               | 
               | > to invoke the underlying hardware feature directly
               | 
               | Accessing hardware features directly is pretty much the
               | sole job of a driver, and the only thing "bypassed" is
               | some abstractions internal to the driver. Just means the
               | patch would fail review in basis of codestyle, and on the
               | basis of possibly only supporting one device family.
               | 
               | > through source code modification
               | 
               | That is a weird way to describe software engineering.
               | Making the code available for further development is kind
               | of the whole point of open source.
               | 
               | > turning off IOMMU
               | 
               | This is not a P2PDMA problem, and just a result of them
               | not also adding the necessary IOMMU boilerplate, which
               | would be added if the patch was done properly to be
               | upstreamed.
               | 
               | > large BAR
               | 
               | This is an expected and "optimal" system requirement.
        
               | segfaultbuserr wrote:
               | > _The driver is not bypasses. This is a patch to the
               | official open-source kernel-driver where the feature is
               | added, which is how all upstream Linux driver development
               | is done. [source code modification] is a weird way to
               | describe software engineering. Making the code available
               | for further development is kind of the whole point of
               | open source._
               | 
               | My previous comment was written with an unspoken
               | assumption: Hardware drivers tend to be _very different_
               | from other forms of software. For ordinary free-and-open-
               | source software, the source code availability largely
               | guarantees community control. However, the same often
               | does _not_ apply to drivers. Even with source code
               | availability, they 're often written by vendors using
               | NDAed information and in-house expertise about the
               | underlying hardware design. As a result, drivers remain
               | under a vendor's tight control. Even with access to 100%
               | source code, it's often still difficult to do meaningful
               | development due to missing documentation to explain "why"
               | instead of "what", the driver can be full of magic
               | numbers and unexplained functionalities, without any
               | description other than a few helpers functions and
               | macros. This is not just a hypothetical scenario, this
               | situation is encountered by OpenBSD developers on a daily
               | basis. In a OpenBSD presentation, the speaker said the
               | study of Linux code is a form of "reverse-engineering
               | from source code".
               | 
               | Geohot didn't find the workaround by reading hardware
               | documentation, instead, it was found by making educated
               | guesses based on the existing source code, and by
               | watching what happens when you send the commands to
               | hardware to invoke a feature unexposed by the HAL. Thus,
               | it was found by reverse-engineering (in a wider sense).
               | And I call it a driver bypass, in the sense that it
               | bypasses the original design decisions made by Nvidia's
               | developers.
               | 
               | > _[turning off IOMMU] is not a P2PDMA problem, and just
               | a result of them not also adding the necessary IOMMU
               | boilerplate, which would be added if the patch was done
               | properly to be upstreamed._
               | 
               | Good point, I stand corrected.
               | 
               | I'll consider stop calling geohot's hack "a bypass" and
               | accepting your characterization of "driver development"
               | if it really gets upstreamed to Linux - which usually
               | requires maintainer review, and Nvidia's maintainer is
               | likely to reject the patch.
               | 
               | > _[large BAR] is an expected and "optimal" system
               | requirement._
               | 
               | I meant "turning off (IOMMU && large BAR)". Disabling
               | large BAR in order to use PCIe P2P is a suboptimal
               | configuration.
        
         | nikitml wrote:
         | NVidia wants you to buy A6000
        
       | rfoo wrote:
       | Glad to see that geohot is back being geohot, first by dropping a
       | local DoS for AMD cards, then this. Much more interesting :p
        
         | jaimehrubiks wrote:
         | Is this the same guy that hacked the PS3?
        
           | mepian wrote:
           | Yes, that's him.
        
           | WithinReason wrote:
           | And the iPhone
        
             | yrds96 wrote:
             | And android
        
               | zoklet-enjoyer wrote:
               | And the crypto scam cheapETH
        
           | mikepurvis wrote:
           | Yes, but he spent several years in self-driving cars
           | (https://comma.ai), which while interesting is also a space
           | that a lot of players are in, so it's not the same as seeing
           | him back to doing stuff that's a little more out there,
           | especially as pertains to IP.
        
             | nolongerthere wrote:
             | Did he abandon this effort? That would be pretty sad bec he
             | was approaching the problem from a very different
             | perspective.
        
               | Topgamer7 wrote:
               | He stepped down from it. https://geohot.github.io//blog/j
               | ekyll/update/2022/10/29/the-...
        
               | cjbprime wrote:
               | It's still a company, still making and selling products,
               | and I think he's still pretty heavily involved in it.
        
           | dji4321234 wrote:
           | He has a very checkered history with "hacking" things.
           | 
           | He tends to build heavily on the work of others, then use it
           | to shamelessly self-promote, often to the massive detriment
           | of the original authors. His PS3 work was based almost
           | completely on a presentation given by fail0verflow at CCC.
           | His subsequent self-promotion grandstanding world tour led to
           | Sony suing both him and fail0verflow, an outcome they were
           | specifically trying to avoid:
           | https://news.ycombinator.com/item?id=25679907
           | 
           | In iPhone land, he decided to parade around a variety of
           | leaked documentation, endangering the original sources and
           | leading to a fragmentation in the early iPhone hacking scene,
           | which he then again exploited to build on the work of others
           | for his own self-promotion:
           | https://news.ycombinator.com/item?id=39667273
           | 
           | There's no denying that geohotz is a skilled reverse
           | engineer, but it's always bothersome to see him put onto a
           | pedestal in this way.
        
             | pixelpoet wrote:
             | There was also that CheapEth crypto scam he tried to pull
             | off.
        
               | samtheprogram wrote:
               | To me that was obvious satire of the crypto scene.
        
               | pixelpoet wrote:
               | Ah yes, nothing like a bit of hypocrisy to make a point.
               | It's okay though, as long as it's people we don't agree
               | with, defrauding them is fine.
        
               | samtheprogram wrote:
               | The website literally stated it was not for speculation,
               | they didn't want the price to go up, and there were
               | multiple ways to get some for free.
               | 
               | If people were reckless, greedy, and/or lazy because of
               | the crypto hype and got "defrauded" without doing any
               | amount of due diligence -- that's kinda the point.
        
               | georgehotz wrote:
               | I actually lost about $5k on cheapETH running servers.
               | Nobody was "defrauded", I think these people don't
               | understand how forks work. It's a precursor to the modern
               | L2 stuff, I did this while writing the first version of
               | Optimism's fraud prover. https://github.com/ethereum-
               | optimism/cannon
               | 
               | I suspect most of the people who bring this up don't like
               | me for other reasons, but with this they think they have
               | something to latch on to. Doesn't matter that it isn't
               | true and there wasn't a scam, they aren't going to look
               | into it since it agrees with their narrative.
        
               | ansible wrote:
               | I don't think people can tell what is satire or not in
               | the crypto scene anymore. Someone issue a "rug pull
               | token" and still received 8.8 ETH (approx $29K USD),
               | while telling people it was a scam.
               | 
               | https://www.web3isgoinggreat.com/?id=rug-pull-token
        
       | gigatexal wrote:
       | as a technical feat this is really cool! though as others mention
       | i hope you don't get into too much hot water legally
       | 
       | seems anything that remotely lets "consumer" cards canibalize
       | anything with the higher end H/A-series cards Nvidia would not be
       | fond of and they've the laywers to throw at such a thing
        
       | jstanley wrote:
       | What does P2P mean in this context? I Googled it and it sounds
       | like it means "peer to peer", but what does that mean in the
       | context of a graphics card?
        
         | haunter wrote:
         | Shared memory access for Nvidia GPUs
         | 
         | https://developer.nvidia.com/gpudirect
        
         | __alexs wrote:
         | It means you can send data from the memory of 1 GPU to another
         | GPU without going via RAM.
         | https://xilinx.github.io/XRT/master/html/p2p.html
        
           | ot1138 wrote:
           | Is this really efficient or practical? My understanding is
           | that the latency required to copy memory from CPU or RAM to
           | GPU negates any performance benefits (much less running over
           | a network!)
        
             | brrrrrm wrote:
             | Yea. It's one less hop through slow memory
        
             | whereismyacc wrote:
             | this would be directly over the memory bus right? I think
             | it's just always going to be faster like this if you can do
             | it?
        
               | toast0 wrote:
               | There's not really any busses in modern computers. It's
               | all point to point messaging. You can think of a computer
               | as a distributed system in a way.
               | 
               | PCI has a shared address space which usually includes
               | system memory (memory mapped i/o). There's a second,
               | smaller shared address space dedicated to i/o, mostly
               | used to retain compatability with PC standards developed
               | by the ancients.
               | 
               | But yeah, I'd expect to typically have better throughput
               | and latency with peer to peer communication than peer to
               | system ram to peer. Depending on details, it might not
               | always be better though, distributed systems are complex,
               | and sometimes adding a seperate buffer between peers can
               | help things greatly.
        
             | zamadatix wrote:
             | Peer to peer as in one pcie slot directly to another
             | without going through the CPU/RAM, not peer to peer as in
             | one PC to another over the network port.
        
             | llm_trw wrote:
             | Yes, the point here is that you do a direct write from one
             | cards memory to the other using PCIe.
             | 
             | In older NVidia cards this could be done through a faster
             | link called NVLink but the hardware for that was ripped out
             | of consumer grade cards and is only in data center grade
             | cards now.
             | 
             | Until this post it seemed like they had ripped all such
             | functionality of their consumer cards, but it looks like
             | you can still get it working at lower speeds using the PCIe
             | bus.
        
               | sparky_ wrote:
               | I take it this is mostly useful for compute workloads,
               | neural networks, LLM and the like -- not for actual
               | graphics rendering?
        
               | CYR1X wrote:
               | yes
        
               | spxneo wrote:
               | so whats stopping from somebody buying a ton of GPUs that
               | are cheap and wiring it up via P2P like we saw with
               | crypto mining
        
               | wmf wrote:
               | That's what this thread is about. Geohot is doing that.
        
               | wtallis wrote:
               | Crypto mining could make use of lots of GPUs in a single
               | cheap system precisely because it did not need any
               | significant PCIe bandwidth, and would not have benefited
               | at all from p2p DMA. Anything that _does_ benefit from
               | using p2p DMA is unsuitable for running with just one
               | PCIe lane per GPU.
        
               | genewitch wrote:
               | crypto mining only needs 1 PCIe lane per GPU, so you can
               | fit 24+ GPUs on a standard consumer CPU motherboard
               | (24-32 lanes depending on the CPU). Apparently ML
               | workloads require more interconnect bandwidth when doing
               | parallel compute, so each card in this demo system uses
               | 16 lanes, and therefore requires 1.) full size slots, and
               | 2.) epyc[0] or xeon based systems with 128 lanes (or at
               | least greater than 32 lanes).
               | 
               | per 1 above crypto "boards" have lots of x1 (or x4)
               | slots, the really short PCIe slots. You then use a riser
               | that uses USB3 cables to go to a full size slot on a
               | small board, with power connectors on it. If your board
               | only has x8 or x16 slots (the full size slot) you can buy
               | a breakout PCIe board that splits that into four slots,
               | using 4 USB-3 cables, again, to boards with full size
               | slots and power connectors. These are _different_ than
               | the PCIe riser boards you can buy for use with cases that
               | allow the GPUs to be placed vertically rather than
               | horizontally, as those have full x16  "fabric" that
               | interconnect between the riser and the board with the x16
               | slot on them.
               | 
               | [0] i didn't read the article because i'm not planning on
               | buying a threadripper (48-64+ lanes) or an epyc (96-128
               | lanes?) just to run AI workloads when i could just rent
               | them for the kind of usage i do.
        
               | myself248 wrote:
               | Oooo, got a link to one of these fabric boards? I've been
               | playing with stupid PCIe tricks but that's a new one on
               | me.
        
               | genewitch wrote:
               | https://www.amazon.com/gp/product/B07DMNJ6QM/
               | 
               | i used to use this one when i had all (three of my) nvme
               | -> 4x sata boardlets and therefore could not fit a GPU in
               | a PCIe slot due to the cabling mess.
        
               | myself248 wrote:
               | Oh, um, just a flexible riser.
               | 
               | I thought we were using "fabric" to mean "switching
               | matrix".
        
               | numpad0 wrote:
               | PCIe P2P still has to go up to a central hub thing and
               | back because PCIe is not a bus. That central hub thing is
               | made by very few players(most famously PLX Technologies)
               | and it costs a lot.
        
               | wtallis wrote:
               | PCIe p2p transactions that end up routed through the
               | CPU's PCIe root complex still have performance advantages
               | over split transactions using the CPU's DRAM as an
               | intermediate buffer. Separate PCIe switches are not
               | necessary except when the CPU doesn't support routing p2p
               | transactions, which IIRC was not a problem on anything
               | more mainstream than IBM POWER.
        
               | numpad0 wrote:
               | Maybe not strictly necessary, but a separate PCIe
               | backplane just for P2P bandwidth bypasses topology and
               | bottleneck mess[1][2] of PC platform altogether and might
               | be useful. I suspect this was the original premise for
               | NVLink too.
               | 
               | 1: https://assets.hardwarezone.com/img/2023/09/pre-
               | meteror-lake...
               | 
               | 2: https://www.gigabyte.com/FileUpload/Global/MicroSite/5
               | 79/inn...
        
               | acka wrote:
               | > In older NVidia cards this could be done through a
               | faster link called NVLink but the hardware for that was
               | ripped out of consumer grade cards and is only in data
               | center grade cards now.
               | 
               | NVLink is still very much available in both RTX 3090 and
               | A6000, both of which are still on the market. It was
               | indeed removed from the RTX 40 series{0].
               | 
               | [0]: https://www.pugetsystems.com/labs/articles/nvidia-
               | nvlink-202...
        
             | jmalicki wrote:
             | For very large models, the weights may not fit on one GPU.
             | 
             | Also, sometimes having more than one GPU enables larger
             | batch sizes if each GPU can only hold the activations for
             | perhaps one or two training examples.
             | 
             | There is definitely a performance hit, but GPU<->GPU peer
             | is less latency than GPU->CPU->software context
             | switch->GPU.
             | 
             | For "normal" pytorch training, the training is generally
             | streamed through the GPU. The model does a batch training
             | step on one batch while the next one is being loaded, and
             | the transfer time is usually less than than the time it
             | takes to do the forward and backward passes through the
             | batch.
             | 
             | For multi-GPU there are various data parallel and model
             | parallel topologies of how to sort it, and there are ways
             | of mitigating latency by interleaving some operations to
             | not take the full hit, but multi-GPU training is definitely
             | not perfectly parallel. It is almost required for some
             | large models, and sometimes having a mildly larger batch
             | helps training convergence speed enough to overcome the
             | latency hit on each batch.
        
             | publicmail wrote:
             | PCIe busses are like a tree with "hubs" (really switches).
             | 
             | Imagine you have a PC with a PCIe x16 interface which is
             | attached to a PCIe switch that has four x16 downstream
             | ports, each attached to a GPU. Those GPUs are capable of
             | moving data in and out of their PCIe interfaces at full
             | speed.
             | 
             | If you wanted to transfer data from GPU0 and 1 to GPU2 and
             | 3, you have basically 2 options:
             | 
             | - Have GPU0 and 1 move their data to CPU DRAM, then have
             | GPU2 and 3 fetch it
             | 
             | - Have GPU0 and 1 write their data directly to GPU2 and 3
             | through the switch they're connected to without ever going
             | up to the CPU at all
             | 
             | In this case, option 2 is better both because it avoids the
             | extra copy to CPU DRAM and also because it avoids the
             | bottleneck of two GPUs trying to push x16 worth of data up
             | through the CPUs single x16 port. This is known as peer to
             | peer.
             | 
             | There are some other scenarios where the data still must go
             | up to the CPU port and back due to ACS, and this is still
             | technically P2P, but doesn't avoid the bottleneck like
             | routing through the switch would.
        
             | fulafel wrote:
             | Yes, networking is similarly pointless.
        
         | CamperBob2 wrote:
         | The correct term, and the one most people would have used in
         | the past, is "bus mastering."
        
           | wmf wrote:
           | PCIe isn't a bus and it doesn't really have a concept of
           | mastering. All PCI DMA was based on bus mastering but P2P DMA
           | is trickier than normal DMA.
        
             | publicmail wrote:
             | I consider it bus mastering when the endpoints initiate the
             | transactions
        
         | amelius wrote:
         | Stupid terminology. Might as well call an RS-232 link "peer to
         | peer".
        
       | ivanjermakov wrote:
       | I was always fascinated by George Hotz's hacking abilities.
       | Inspired me a lot for my personal projects.
        
         | vrnvu wrote:
         | I agree, I feel so inspired with his streams. Focus and hard
         | work, the key to good results. Add a clear vision and strategy,
         | and you can also accomplish "success".
         | 
         | Congratulations to him and all the tinygrad/comma contributors.
        
         | sambull wrote:
         | He's got that focus like a military pilot on a long flight.
        
           | postalrat wrote:
           | Any time I open guys steam half of it is some sort of
           | politics
        
             | CYR1X wrote:
             | You can blame chat for that lol
        
               | gaws wrote:
               | He should ban chat and focus on development. Leave the
               | political talk to the kids in their respective Discord
               | servers.
        
         | Jerrrry wrote:
         | His Xbox360 laptop was the crux of teenage-motivation, for me.
        
         | jgpc wrote:
         | I agree. It is fascinating. When you observe his development
         | process (btw, it is worth noting his generosity in sharing it
         | like he does) he gets frequently stuck on random shallow
         | problems which a perhaps more knowledgable engineer would find
         | less difficult. It is frequent to see him writing really bad
         | code, or even wrong code. The whole twitter chapter is a good
         | example. Yet, himself, alone just iterating resiliently, just
         | as frequently creates remarkable improvements. A good example
         | to learn from. Thank you geohot.
        
           | zoogeny wrote:
           | This matches my own take. I've tuned into a few of his
           | streams and watched VODs on YouTube. I am consistently
           | underwhelmed by his actual engineering abilities. He is that
           | particular kind of engineer that constantly shits on other
           | peoples code or on the general state of programming yet his
           | actual code is often horrendous. He will literally call
           | someone out for some code in Tinygrad that he has trouble
           | with and then he will go on a tangent to attempt to rewrite
           | it. He will use the most blatant and terrible hacks only to
           | find himself out of his depth and reverting back to the
           | original version.
           | 
           | But his streams last 4 hours or more. And he just keeps
           | grinding and grinding and grinding. What the man lacks in raw
           | intellectual power he makes up for (and more) in persistence
           | and resilience. As long as he is making even the tiniest
           | progress he just doesn't give up until he forces the computer
           | to do whatever it is he wants it to do. He also has no
           | boundaries on where his investigations take him. Driver code,
           | OS code, platform code, framework code, etc.
           | 
           | I definitely couldn't work with him (or work for him) since I
           | cannot stand people who degrade the work of others while
           | themselves turning in sub-par work as if their own shit
           | didn't stink. But I begrudgingly admire his tenacity, his
           | single minded focus, and the results that his belligerent
           | approach help him to obtain.
        
             | ctrw wrote:
             | There are developers who have breadth and developers who
             | have depth. He is very much on the breadth end of the
             | spectrum. It isn't lack of intelligence but lack of deep
             | knowledge of esoteric fields you will use once a decade.
             | 
             | That said I find it a bit astonishing how little Ai he uses
             | on his streams. I convert all the documentation I need into
             | a rag system that I query stupid questions against.
        
             | spirobelv2 wrote:
             | link your github. want to see your raw intellectual power
        
               | yazzku wrote:
               | It's over 9000.
        
           | gorkish wrote:
           | If a stopped clock is right twice a day, relentlessly winding
           | a clock forward will make it right quite frequently. That is
           | geohot.
        
       | namibj wrote:
       | And here I thought (PCIe) P2P was there since SLI dropped the
       | bridge (for the unfamiliar, it looks and acts pretty much like an
       | NVLink bridge for regular PCIe slot cards that have NVLink, and
       | was used back in the day to share framebuffer and similar in
       | high-end gaming setups).
        
         | wmf wrote:
         | SLI was dropped years ago so there's no need for gaming cards
         | to communicate at all.
        
       | userbinator wrote:
       | I wish more hardware companies would publish more documentation
       | and let the community figure out the rest, sort of like what
       | happened to the original IBM VGA (look up "Mode X" and the other
       | non-BIOS modes the hardware is actually capable of - even
       | 800x600x16!) Sadly it seems the majority of them would rather
       | tightly control every aspect of their products' usage since they
       | can then milk the userbase for more $$$, but IMHO the most
       | productive era of the PC was also when it was the most open.
        
         | rplnt wrote:
         | Then they couldn't charge different customers different amounts
         | for the same HW. It's not a win for everyone.
        
           | axus wrote:
           | The price of 4090 may increase now, in theory locking out
           | some features might have been a favor for some of the
           | customers.
        
             | Sayrus wrote:
             | But it wouldn't if all cards supporting this were
             | "unlocked" by default and thus the other "enterprise-grade"
             | cards weren't that much more expensive. Of course that'd
             | reduce profits by a lot.
        
               | paulmd wrote:
               | it probably would - you saw exactly that outcome with
               | mining.
               | 
               | for a lot of these demand bursts, demand is so high it
               | cannot be sated even consuming 100% or 200% of typical
               | GPU production.
               | 
               | cards like RX 6500XT that simply don't have the RAM to
               | participate were less affected, but even then you've got
               | enough cross-elasticity (demand from people being crowded
               | out of other product segments) that tends to pump prices
               | to 2-3x the "normal" clearance prices we see today. And
               | yes, absolutely anything that can mine in any capacity
               | will get pulled in during that sort of boom/bubble, not
               | just "high-end"/"enterprise".
        
           | greggsy wrote:
           | Which (as controversial as it sounds in this kind of forum)
           | is a sensible pricing model to recover and fund R&D and
           | finance operations.
        
         | mhh__ wrote:
         | nvidia's software is their moat
        
           | thot_experiment wrote:
           | That's a huge overstatement, it's a big part of the moat for
           | sure, but there are other significant components (hardware,
           | ecosystem lock-in, heavy academic incentives)
        
             | mhh__ wrote:
             | No software -> hardware is massively hobbled. Evidence:
             | AMD.
             | 
             | Ecosystem -> Software. At the moment especially people are
             | looking for arbitrages everywhere i.e. inference costs /
             | being able to inference at all (llama.cpp)
             | 
             | Academics -> Also software but easily fiddled with a bit of
             | spending as you say.
        
         | golergka wrote:
         | If I'm a hardware manufacturer and my soft lock on product
         | feature doesn't work, I'll switch to a hardware lock instead,
         | and the product will just cost more.
        
         | matheusmoreira wrote:
         | > the most productive era of the PC was also when it was the
         | most open
         | 
         | The openness certainly was great but it's not actually
         | required. People can figure out how to work with closed
         | systems. Adversarial interoperability was common. People would
         | reverse engineer things and make the software work whether or
         | not the manufacturer wanted it.
         | 
         | It's the software and hardware locks that used to be rare and
         | are now common. Cryptography was supposed to be something that
         | would empower us but it ended up being used against us to lock
         | us out of our own machines. We're no longer in the driver's
         | seat. Our operating systems don't even operate the system
         | anymore. Our free Linux systems are just the "user OS" in the
         | manufacturer's unknowable amalgamation of silicon running
         | proprietary firmware, just a little component to be sandboxed
         | away from the real action.
        
       | andersa wrote:
       | Incredible! I'd been wondering if this was possible. Now the only
       | thing standing in the way of my 4x4090 rig for local LLMs is
       | finding time to build it. With tensor parallelism, this will be
       | both massively cheaper and faster for inference than a H100 SXM.
       | 
       | I still don't understand why they went with 6 GPUs for the
       | tinybox. Many things will only function well with 4 or 8 GPUs. It
       | seems like the worst of both worlds now (use 4 GPUs but pay for 6
       | GPUs, don't have 8 GPUs).
        
         | corn13read2 wrote:
         | A macbook is cheaper though
        
           | tgtweak wrote:
           | The extra $3k you'd spend on a quad-4090 rig vs the top
           | mbp... ignoring the fact you can't put the two on even ground
           | for versatility (very few libraries are adapted to apple
           | silicone let alone optimized).
           | 
           | Very few people that would consider an H100/A100/A800 are
           | going to be cross-shopping a macbook pro for their workloads.
        
             | LoganDark wrote:
             | > very few libraries are adapted to apple silicone let
             | alone optimized
             | 
             | This is a joke, right? Have you been anywhere in the LLM
             | ecosystem for the past year or so? I'm constantly hearing
             | about new ways in which ASi outperforms traditional
             | platforms, and new projects that are optimized for ASi.
             | Such as, for instance, llama.cpp.
        
               | cavisne wrote:
               | Nothing compared to Nvidia though. The FLOPS and memory
               | bandwidth is simply not there.
        
               | spudlyo wrote:
               | The memory bandwidth of the M2 Ultra is around 800GB/s
               | verses 1008 GB/s for the 4090. While it's true the M2 has
               | neither the bandwidth or the GPU power, it is not limited
               | to 24G of VRAM per card. The 192G upper limit on the M2
               | Ultra will have a much easier time running inference on a
               | 70+ billion parameter model, if that is your aim.
               | 
               | Besides size, heat, fan noise, and not having to build it
               | yourself, this is the only area where Apple Silicon might
               | have advantage over a homemade 4090 rig.
        
               | LoganDark wrote:
               | It doesn't need GPU power to beat the 4090 in benchmarks:
               | https://appleinsider.com/articles/23/12/13/apple-
               | silicon-m3-...
        
               | int_19h wrote:
               | It doesn't beat RTX 4090 when it comes to actual LLM
               | inference speed. I bought a Mac Studio for local
               | inference because it was the most convenient way to get
               | something _fast enough_ and with enough RAM to run even
               | 155b models. It 's great for that, but ultimately it's
               | not magic - NVidia hardware still offers more FLOPS and
               | faster RAM.
        
               | LoganDark wrote:
               | > It doesn't beat RTX 4090 when it comes to actual LLM
               | inference speed
               | 
               | Sure, whisper.cpp is not an LLM. The 4090 can't even do
               | inference at all on anything over 24GB, while ASi can
               | chug through it even if slightly slower.
               | 
               | I wonder if with https://github.com/tinygrad/open-gpu-
               | kernel-modules (the 4090 P2P patches) it might become a
               | lot faster to split a too-large model across multiple
               | 4090s and still outperform ASi (at least until someone at
               | Apple does an MLX LLM).
        
               | dragonwriter wrote:
               | > The 4090 can't even do inference at all on anything
               | over 24GB, while ASi can chug through it even if slightly
               | slower.
               | 
               | Common LLM runners can split model layers between VRAM
               | and system RAM; a PC rig with a 4090 can do inference on
               | models larger than 24G.
               | 
               | Where the crossover point where having the whole thing on
               | Apple Silicon unified memory vs. doing split layers on a
               | PC with a 4090 and system RAM is, I don't know, but its
               | definitely not "more than 24G and a 4090 doesn't do
               | anything".
        
               | LoganDark wrote:
               | > Common LLM runners can split model layers between VRAM
               | and system RAM; a PC rig with a 4090 can do inference on
               | models larger than 24G.
               | 
               | Sure and ASi can do inference on models larger than the
               | Unified Memory if you account for streaming the weights
               | from the SSD on-demand. That doesn't mean it's going to
               | be as fast as keeping the whole thing in RAM, although
               | ASi SSDs are probably not particularly bad as far as SSDs
               | go.
        
               | treprinum wrote:
               | Slightly slower in this case is like 10x. I have M3 Max
               | with 128GB RAM, 4090 trashes it on anything under 24GB,
               | then M3 Max trashes it on anything above 24GB, but it's
               | like 10x slower at it than 4090 on <24GB.
        
               | numpad0 wrote:
               | PSA for all people who are still being misled by hand-
               | wavy Apple M1 marketing charts[1] implicating total
               | dominance of M-series wondersilicon obsoleting all
               | Intel/NVIDIA PCs:
               | 
               | There are benchmark data showing that an Apple M2 Ultra
               | is 47% and 60% slower against Xeon W9 and RTX 4090, or
               | 0.35% and 2% slower against i9-13900K and RTX 4060 Ti,
               | respectively, in Geekbench 5 Multi-threaded and OpenCL
               | Compute tests.
               | 
               | Apple Silicon Macs are NOT faster than competing desktop
               | computers, nor M1 was massively faster than NVIDIA
               | 3070(Desktop - 2x faster than Laptop variant M1 was
               | compared against) for that matter. They just offer up to
               | 128GB shared RAM/VRAM options in slim desktops and
               | laptops, which is handy for LLM, that's it.
               | 
               | Please stop taking Apple marketing materials at full face
               | value or above. Thank you.                 1:
               | https://i.extremetech.com/imagery/content-
               | types/03ekiQwNudC75iOK4AMuEkw/images-2.jpg           2:
               | screenshot from[4]: https://www.igorslab.de/wp-
               | content/uploads/2023/06/Apple-M2-ULtra-SoC-
               | Geekbench-5-Multi-Threaded.jpg         3: screenshot
               | from[4]: https://www.igorslab.de/wp-
               | content/uploads/2023/06/Apple-M2-ULtra-SoC-
               | Geekbench-5-OpenCL-Compute.jpg         4:
               | https://wccftech.com/apple-m2-ultra-soc-isnt-faster-than-
               | amd-intel-last-year-desktop-cpus-50-slower-than-nvidia-
               | rtx-4080/
        
               | LoganDark wrote:
               | Yeah. Let me just walk down to Best Buy and get myself a
               | GPU with over 24 gigabytes of VRAM (impossible) for less
               | than $3,000 (even more impossible). Then tell me ASi is
               | nothing compared to Nvidia.
               | 
               | Even the A100 for something around $15,000 (edit: used to
               | say $10,000) only goes up to 80 gigabytes of VRAM, but a
               | 192GB Mac Studio goes for under $6,000.
               | 
               | Those figures alone proves Nvidia isn't even competing in
               | the consumer or even the enthusiast space anymore. They
               | know you'll buy their hardware if you really need it, so
               | they aggressively segment the market with VRAM
               | restrictions.
        
               | andersa wrote:
               | Where are you getting an A100 80GB for $10k?
        
               | LoganDark wrote:
               | Oops, I remembered it being somewhere near $15k but
               | Google got confused and showed me results for the 40GB
               | instead so I put $10k by mistake. Thanks for the
               | correction.
               | 
               | A100 80GB goes for around $14,000 - $20,000 on eBay and
               | A100 40GB goes for around $4,000 - $6,000. New (not from
               | eBay - from PNY and such), it looks like an 80GB would
               | set you back $18,000 to $26,000 depending on whether you
               | want HBM2 or HBM2e.
               | 
               | Meanwhile you can buy a Mac Studio today without going
               | through a distributor and they're under $6,000 if the
               | only thing you care about is having 192GB of Unified
               | Memory.
               | 
               | And while the memory bandwidth isn't quite as high as the
               | 4090, the M-series chips can run certain models faster
               | anyway, if Apple is to be believed
        
           | andersa wrote:
           | Sure, it's also at least an order of magnitude slower in
           | practice, compared to 4x 4090 running at full speed. We're
           | looking at 10 times the memory bandwidth and _much_ greater
           | compute.
        
             | chaostheory wrote:
             | Yeah, even a Mac Studio is way too slow compared to Nvidia
             | which is too bad because at $7000 maxed to 192gb it would
             | be an easy sell. Hopefully, they will fix this by m5. I
             | don't trust the marketing for m4
        
           | thangngoc89 wrote:
           | training on MPS backend is suboptimal and really slow.
        
             | wtallis wrote:
             | Do people do training on systems this small, or just
             | inference? I could see maybe doing a little bit of fine-
             | tuning, but certainly not from-scratch training.
        
               | redox99 wrote:
               | If you mean train llama from scratch, you aren't going to
               | train it on any single box.
               | 
               | But even with a single 3090 you can do quite a lot with
               | LLMs (through QLoRA and similar).
        
               | thangngoc89 wrote:
               | Yep. Price/performance of multiple 4090s system are way
               | better than the professional cards (Axxx). Also deep
               | learning outside of LLM has many different usage.
        
           | llm_trw wrote:
           | So is a TI-89.
        
             | amelius wrote:
             | And looks way cooler
        
           | numpad0 wrote:
           | 4x32GB(128GB) DDR4 is ~$250. 4x48GB(192GB) DDR5 is ~$600.
           | Those are even cheaper than upgrade options for Macs($1k).
        
             | papichulo2023 wrote:
             | No many consumer mobo support 192GB DDR5.
        
               | wtallis wrote:
               | If it supports DDR5 at all, then it should be at most a
               | firmware update away from supporting 48GB dual-rank
               | DIMMs. There are very few consumer motherboards that only
               | have two DDR5 slots; almost all have the four slots
               | necessary to accept 192GB. If you are under the
               | impression that there's a widespread limitation on
               | consumer hardware support for these modules, it may
               | simply be due to the fact that 48GB modules did not exist
               | yet when DDR5 first entered the consumer market, and such
               | modules did not start getting mentioned on spec sheets
               | until after they existed.
        
               | imtringued wrote:
               | You don't want to use more than two slots because you
               | only have two memory channels. The overclocking potential
               | of DDR5 is extremely high when you only run two DIMMs.
               | All the way up to 8000. Meanwhile if you go for
               | populating all four slots, you are limited significantly
               | below 5000. Almost a 50% performance drop if you are
               | willing to overclock your RAM.
        
               | wtallis wrote:
               | If you want to run something that doesn't fit in 96GB of
               | RAM, you'll get better performance from _having enough
               | RAM_. Yes, having two dual-rank DIMMs per channel will
               | force you to run at a slower speed, but it 's still far
               | faster than your SSD. The second slot per channel exists
               | precisely because many people really _do_ want to use it.
        
               | ojbyrne wrote:
               | A lot that have specs showing they support a max of 4x32
               | DDR5 actually support 4x48 DDR5 via recent BIOS updates.
        
               | papichulo2023 wrote:
               | In the specs yeap, in practice hardly anyone got it
               | working. As far as I saw in reddit, it requires
               | customizing timings to make 4 slots work over 6000 Mhz at
               | the same time.
        
               | faeriechangling wrote:
               | Most consumer mobo's I see support this even if the setup
               | isn't on the QVL. If a DDR5 motherboard support 4 sticks
               | at all you can probably run 192gb on it so long as you
               | update the BIOS firmware. The problem is running at rated
               | speeds.
               | 
               | AMD tends to be worse than Intel, and I hear people
               | having to run anywhere between DDR5-3200 to DDR5-5200.
               | You are better off running two sticks, because even with
               | 2 sticks you really can't run larger models with
               | acceptable performance anyways, much less with 4.
               | 
               | There is competition to apple on the low end (dual
               | channel fast DDR5) and on the high end (8+ channel like
               | Xeon/Epyc/AmpereOne). In the middle, Apple is sort of
               | crushing because if you run a true 4 channel system
               | you're going to get poor performance if you load up a
               | 192gb model, and if you compare pricing to 96gb/128gb
               | apple systems, there's not all that much of a cost
               | advantage and you have to make a lot of sacrifices to get
               | there. The truth is that Apple really doesn't have all
               | that much competition right now and won't for the
               | foreseeable future.
        
               | papichulo2023 wrote:
               | Hopefully Qualcom will free us of this 2 channels
               | noghtmare.
        
               | faeriechangling wrote:
               | I'm optimistic about APUs personally like AMDs upcoming
               | Strix Halo APU with a 256-bit memory bus competing at the
               | lower end of the market, but that will only provide so
               | much competition.
        
               | wtallis wrote:
               | I don't think it's realistic to pin your hopes on
               | Qualcomm given that they're unlikely to care about
               | supporting anything other than LPDDR with their laptop
               | processors.
        
           | faeriechangling wrote:
           | Buying a MacBook for AI is great if you were already going to
           | buy a MacBook, as this makes it a lot more cost competitive.
           | It's also great if what you're doing is REALLY privacy
           | sensitive, such as if you're a lawyer, where uploading client
           | data to OpenAI is probably not appropriate or legal.
           | 
           | But in general, I find the appeal is narrow because either
           | consumer GPUs are better for training in general and
           | inferencing at scale[1]. Cloud services also allow the vast
           | majority of individuals to get higher quality inferencing at
           | lower cost. The result is Apple Silicon's appeal being quite
           | niche.
           | 
           | [1] Mind you, Nvidia considers this a licensing violation,
           | not that GeoHot has historically ever been all scared to
           | violate a EULA and force a company to prove its terms have
           | legal force.
        
         | Tepix wrote:
         | 6 GPUs because they want fast storage and it uses PCIe lanes.
         | 
         | Besides the goal was to run a 70b FP16 model (requiring roughly
         | 140GB VRAM). 6*24GB = 144GB
        
           | andersa wrote:
           | That calculation is incorrect. You need to fit both the model
           | (140GB) and the KV cache (5GB at 32k tokens FP8 with flash
           | attention 2) * batch size into VRAM.
           | 
           | If the goal is to run a FP16 70B model as fast as possible,
           | you would want 8 GPUs with P2P, for a total of 192GB VRAM.
           | The model is then split across all 8 GPUs with 8-way tensor
           | parallelism, letting you make use of the full 8TB/s memory
           | bandwidth on every iteration. Then you have 50GB spread out
           | remaining for KV cache pages, so you can raise the batch size
           | up to 8 (or maybe more).
        
             | renewiltord wrote:
             | I've got a few 4090s that I'm planning on doing this with.
             | Would appreciate even the smallest directional tip you can
             | provide on splitting the model that you believe is likely
             | to work.
        
               | andersa wrote:
               | The split is done automatically by the inference engine
               | if you enable tensor parallelism. TensorRT-LLM, vLLM and
               | aphrodite-engine can all do this out of the box. The main
               | thing is just that you need either 4 or 8 GPUs for it to
               | work on current models.
        
               | renewiltord wrote:
               | Thank you! Can I run with 2 GPUs or with heterogeneous
               | GPUs that have same RAM? I will try. Just curious if you
               | already have tried.
        
               | andersa wrote:
               | 2 GPUs works fine too, as long as your model fits. Using
               | different GPUs with same VRAM however, is highly highly
               | sketchy. Sometimes it works, sometimes it doesn't. In any
               | case, it would be limited by the performance of the
               | slower GPU.
        
               | renewiltord wrote:
               | All right, thank you. I can run it on 2x 4090 and just
               | put the 3090s in different machine.
        
             | Tepix wrote:
             | I know there's some overhead, it's not my calculation.
             | 
             | https://www.tweaktown.com/news/97110/tinycorps-new-
             | tinybox-a...
             | 
             | Quote: " _Runs 70B FP16 LLaMA-2 out of the box using
             | tinygrad_ "
             | 
             | Related: https://github.com/tinygrad/tinygrad/issues/3791
        
         | ShamelessC wrote:
         | > Many things will only function well with 4 or 8 GPUs
         | 
         | What do you mean?
        
           | andersa wrote:
           | For example, if you want to run low latency multi-GPU
           | inference with tensor parallelism in TensorRT-LLM, there is a
           | requirement that the number of heads in the model is
           | divisible by the number of GPUs. Most current published
           | models are divisible by 4 and 8, but not 6.
        
             | bick_nyers wrote:
             | Interesting... 1 Zen 4 EPYC CPU yields a maximum of 128
             | PCIE lanes so it wouldn't be possible to put 8 full fat
             | GPUs on while maintaining some lanes for storage and
             | networking. Same deal with Threadripper Pro.
        
               | andersa wrote:
               | It should be possible with onboard PCIe switches. You
               | probably don't need the networking or storage to be all
               | that fast while running the job, so it can dedicate
               | almost all of the bandwidth to the GPU.
               | 
               | I don't know if there are boards that implement this,
               | though, I'm only looking at systems with 4x GPUs
               | currently. Even just plugging in a 5kW GPU server in my
               | apartment would be a bit of a challenge. With 4x 4090,
               | the max load would be below 3kW, so a single 240V plug
               | can handle it no issue.
        
               | thangngoc89 wrote:
               | 8 GPUs x 16 PCIe lanes each = 128 lanes already.
               | 
               | That's the limit of single CPU platforms.
        
               | bick_nyers wrote:
               | I've seen it done with a PLX Multiplexer as well, but
               | they add quite a bit of cost:
               | 
               | https://c-payne.com/products/pcie-gen4-switch-
               | backplane-4-x1...
               | 
               | Not sure if there exists an 8-way PCIE Gen 5 Multiplexer
               | that doesn't cost ludicrous amounts of cash. Ludicrous
               | being a highly subjective and relative term of course.
        
               | namibj wrote:
               | 98 lanes of PCIe 4.0 fabric switch as just the chip (to
               | solder onto a motherboard/backplane) costs 850$
               | (PEX88096). You could for example take 2 x16 GPUs, pass
               | then through (2 _2_ 16=64 lanes), and have 2 x16 that
               | bifurcate to at least x4 (might even be x2, I didn't find
               | that part of the docs just now) for anything you want,
               | plus 2 x1 for minor stuff. They do claim to have no
               | problems being connected up into a switching fabric, and
               | very much allow multi-host operations (you will need
               | signal retimers quite soon, though).
               | 
               | They're the stuff that enables cloud operators to pool
               | like 30 GPUs across like 10 CPU sockets while letting you
               | virtually hot-plug them to fit demand. Or when you want
               | to make a SAN with real NVMe-over-PCIe. Far cheaper than
               | normal networking switches with similar ports (assuming
               | hosts doing just x4 bifurcation, it's very comparable to
               | a 50G Ethernet port. The above chip thus matches a 24
               | port 50G Ethernet switch. Trading reach for only needing
               | retimers, not full NICs, in each connected host. Easily
               | better for HPC clusters up to about 200 kW made from
               | dense compute nodes.), but sadly still lacking affordable
               | COTS parts that don't require soldering or contacting
               | sales for pricing (the only COTS with list prices seem to
               | be Broadcom's reference designs, for prices befitting an
               | evaluation kit, not a Beowulf cluster).
        
               | pests wrote:
               | I really like the information about how the cloud
               | providers do their multiplexing, thanks. There was some
               | tech posted here a few month ago that was similar I found
               | very interesting - plug all devices, ram, hard drives,
               | and CPUS into a larger fabric and a way to spin up
               | "servers" of any size from the pool of resources... wish
               | I could remember the name now.
               | 
               | nit: HN formatting messed up your math in the second
               | sentence, I believe you italicized on accident using *
               | for equations.
        
           | segfaultbuserr wrote:
           | It's more difficult to split your work across 6 GPUs evenly,
           | and easier when you have 4 or 8 GPUs. The latter setups have
           | powers of 2, which for example, can evenly divide a 2D or 3D
           | grid, but 6 GPUs are awkward to program. Thus, the OP argues
           | that a 6-GPU setup is highly suboptimal for many existing
           | applications and there's no point to pay more for the extra
           | 2.
        
         | numpad0 wrote:
         | I was googling public NVIDIA SXM2 materials the other day, and
         | it seemed SXM2/NVLink 2.0 just was a six-way system. NVIDIA SXM
         | had updated to versions 3 and 4 since, and this isn't based on
         | none of those anyway, but maybe there's something we don't know
         | that make six-way reasonable.
        
           | andersa wrote:
           | It was probably just before running LLMs with tensor
           | parallelism became interesting. There are plenty of other
           | workloads that can be divided by 6 nicely, it's not an end-
           | all thing.
        
           | dheera wrote:
           | What is a six-way system?
        
             | TylerE wrote:
             | oid school way of saying core (or in this case GPU),
             | basically.
        
         | liuliu wrote:
         | 6 seems reasonable. 128 Lanes from ThreadRipper needs to have a
         | few for network and NVMe (4x NVMe would be x16 lanes, and 10G
         | network would be another x4 lanes).
        
         | cjbprime wrote:
         | I don't think P2P is very relevant for inference. It's
         | important for training. Inference can just be sharded across
         | GPUs without sharing memory between them directly.
        
           | andersa wrote:
           | It can make a difference when using tensor parallelism to run
           | small batch sizes. Not a huge difference like training
           | because we don't need to update all weights, but still a
           | noticeable one. In the current inference engines there are
           | some allreduce steps that are implemented using nccl.
           | 
           | Also, paged KV cache is usually spread across GPUs.
        
           | namibj wrote:
           | It massively helps arithmetic intensity to batch during
           | inference, and the desired batch sizes by that tend to exceed
           | the memory capacity of a single GPU. Thus desire to do
           | training-like cluster processing to e.g. use a weight for
           | each inference stream that needs it every time it's fetched
           | from memory. It's just that you can't fit 100+ inference
           | streams of context on one GPU, typically, thus the desire to
           | shard along less-wasteful (w.r.t. memory bandwidth)
           | dimensions than entire inference streams.
        
           | qeternity wrote:
           | You are talking about data parallelism. Depending on the
           | model tensor parallelism can still be very important for
           | inference.
        
         | georgehotz wrote:
         | tinygrad supports uneven splits. There's no fundamental reason
         | for 4 or 8, and work should almost fully parallelize on any
         | number of GPUs with good software.
         | 
         | We chose 6 because we have 128 PCIe lanes, aka 8 16x ports. We
         | use 1 for NVMe and 1 for networking, leaving 6 for GPUs to
         | connect them in full fabric. If we used 4 GPUs, we'd be wasting
         | PCIe, and if we used 8 there would be no room for external
         | connectivity aside from a few USB3 ports.
        
           | doctorpangloss wrote:
           | Have you compared 3x 3090-3090 pairs over NVLink?
           | 
           | IMO the most painful thing is that since these hardware
           | configurations are esoteric, there is no software that
           | detects them and moves things around "automatically."
           | Regardless of what people thing device_map="auto" does, and
           | anyway, Hugging Face's transformers/diffusers are all over
           | the place.
        
           | davidzweig wrote:
           | Is it possible a similar patch would work for P2P on 3090s?
           | 
           | btw, I found a Gigabyte board on Taobao that is unlisted on
           | their site: MZF2-AC0, costs $900. 2 socket Epyc and 10 PCIE
           | slots, may be of interest. A case that should fit, with 2x
           | 2000W Great Wall PSUs and PDU is 4050 RMB
           | (https://www.toploong.com/en/4GPU-server-case/644.html). You
           | still need blower GPUs.
        
             | cjbprime wrote:
             | Doesn't nvlink work natively on 3090s? I thought it was
             | only removed (and here re-enabled) in 4090.
        
               | qeternity wrote:
               | This not not nvlink.
        
             | georgehotz wrote:
             | It should if your 3090s have Resizable BAR support in the
             | VBIOS. AFAIK most card manufacturers released BIOS updates
             | enabling this.
             | 
             | Re: 3090 NVLink, that only allows pairs of cards to be
             | connected. PCIe allows full fabric switch of many cards.
        
               | Ratiofarmings wrote:
               | In cases where they didn't, the techpowerup vBIOS
               | collection solves the problem.
        
             | davidzweig wrote:
             | Update, I checked with the case company, toploong, they say
             | that board is a 5mm too big or so for the case.
        
           | andersa wrote:
           | That is very interesting if tinygrad can support it! Every
           | other library I've seen had the limitation on dividing the
           | heads, so I'd (perhaps incorrectly) assumed that it's a
           | general problem for inference.
        
             | spmurrayzzz wrote:
             | There are some interesting hacks you can do like
             | replicating the K/V weights by some factor which allows
             | them to be evenly divisible by whatever number of gpus you
             | have. Obviously there is a memory cost there, but it does
             | work.
        
           | AnthonyMouse wrote:
           | Is there any reason you couldn't use 7? 8 PCIe lanes each
           | seems more than sufficient for NVMe and networking.
        
           | WanderPanda wrote:
           | Did you at least front run the market and stocked up of
           | 4090ies before this release? Also gamers are probably not too
           | happy about these developments :D
        
             | tinco wrote:
             | 4090's have consistently been around 2000 dollars. I don't
             | think there's many gamers who would be affected by price
             | fluctuations of the 4090 or even the 4080.
        
               | presides wrote:
               | This is out of touch; they were mad before and they will
               | be mad again. Lots of people spend a huge chunk of their
               | modest disposable income on high end gaming gear, and the
               | only upside of these issues for them is that eventually,
               | YEARS down the line, capacity/supply issues MIGHT calm
               | down in a way that yields some benefits.
               | 
               | They're going to realize soon enough that they've
               | basically just been told that the extremely shitty
               | problem they thought they'd moved beyond is back with a
               | vengeance and the next generation of gaming cards has the
               | potential to make the past few rounds of scalping shit-
               | shows look tame.
        
             | swalsh wrote:
             | Gamers have a TON of really good really affordable options.
             | But you kind of need 24gb min unless you're using heavy
             | quantization. So 3090 and 4090's are what local llm people
             | are building with (mostly 3090's as you can get then for
             | about $700, and they're dang good)
        
         | boromi wrote:
         | Any chance you could share the details of the build you'd go
         | for. I need a server for our lab, but am kinda out of my depth
         | with all the options/
        
       | xipho wrote:
       | You can watch this happen on the weekends, typically, sometimes,
       | for some very long sessions, sometimes.
       | https://www.twitch.tv/georgehotz
        
       | BeefySwain wrote:
       | Can someone ELI5 what this may make possible that wasn't possible
       | before? Does this mean I can buy a handful of 4090s and use it in
       | lieu of an h100? Just adding the memory together?
        
         | segfaultbuserr wrote:
         | No. The Nvidia A100 has a multi-lane NVLink interface with a
         | total bandwidth of 600 GB/s. The "unlocked" Nvidia RTX 4090
         | uses PCIe P2P at 50 GB/s. It's not going to replace A100 GPUs
         | for serious production work, but it does unlock a datacenter-
         | exclusive feature and has some small-scale applications.
        
       | xmorse wrote:
       | Finally switched to Nvidia and already adding great value
        
       | perfobotto wrote:
       | What stops nvidia from making sure this stops working in future
       | driver releases?
        
         | __MatrixMan__ wrote:
         | The law, hopefully.
         | 
         | Beeper mini only worked with iMessage for a few days before
         | Apple killed it. A few months later the DOJ sued Apple. Hacks
         | like this show us the world we could be living in, a world
         | which can be hard to envision otherwise. If we want to actually
         | live in that world, we have to fight for it (and protect the
         | hackers besides).
        
         | StayTrue wrote:
         | I was thinking the same but in terms of firmware updates.
        
       | aresant wrote:
       | So assuming you utilized this with (4) x 4090s is there a
       | theoretical comparative to performance vs the A6000 / other
       | professional lines?
        
         | thangngoc89 wrote:
         | I believe this is mostly for memory capacities. PCIe access
         | between GPUs is slower than soldered RAM on a single GPU
        
         | andersa wrote:
         | It depends on what you do with it and how much bandwidth it
         | needs between the cards. For LLM inference with tensor
         | parallelism (usually limited by VRAM read bandwidth, but little
         | exchange needed) 2x 4090 will massively outperform a single
         | A6000. For training, not so much.
        
       | c0g wrote:
       | Any idea of DDP perf?
        
       | No1 wrote:
       | The original justification that Nvidia gave for removing Nvlink
       | from the consumer grade lineup was that PCIe 5 would be fast
       | enough. They then went on to release the 40xx series without PCIe
       | 5 and P2P support. Good to see at least half of the equation
       | being completed for them, but I can't imagine they'll allow this
       | in the next gen firmware.
        
       | musha68k wrote:
       | OK now we are seemingly getting somewhere. I can feel the
       | enthusiasm coming back to me.
       | 
       | Especially in light of what's going on with LocalLLaMA etc:
       | 
       | https://www.reddit.com/r/LocalLLaMA/comments/1c0mkk9/mistral...
        
       | thangngoc89 wrote:
       | > You may need to uninstall the driver from DKMS. Your system
       | needs large BAR support and IOMMU off.
       | 
       | Can someone point me to the correct tutorial on how to do these
       | things?
        
         | unaindz wrote:
         | The first one I assume is the nvidia driver for linux installed
         | using dkms. If it uses dkms or not is stated on the drivers
         | name, at least on arch based distributions.
         | 
         | The latter options are settings on your motherboard bios, if
         | your computer is modern, explore your bios and you will find
         | them
        
         | jasomill wrote:
         | DKMS: uninstall Nvidia driver using distro package manager
         | 
         | BAR: enable resizable BAR in motherboard CMOS setup
         | 
         | IOMMU: Add "amd_iommu=off" or "intel_iommu=off" to kernel
         | command line for AMD or Intel CPU, respectively (or just add
         | both). You may or may not need to disable the IOMMU in CMOS
         | setup (Intel calls its IOMMU VT-d).
         | 
         | See motherboard docs for specific option names. See distro docs
         | for procedures to list/uninstall packages and to add kernel
         | command line options.
        
       | spxneo wrote:
       | does this mean you can horizontally scale to GPT-4-esque LLM
       | locally in the near future? (i hear you need 1TB of VRAM)
       | 
       | Is Apple's large VRAM offering like 196gb offer the fastest
       | bandwidth and if so how will pairing a bunch of 4090s like in the
       | comments work?
        
       | lawlessone wrote:
       | This is very interesting.
       | 
       | I can't afford two mortgages though ,so for me it will have to
       | just stay as something interesting :)
        
       | m3kw9 wrote:
       | In layman terms what does this enable?
        
       | vladgur wrote:
       | curious if this will ever make it to 3090s
        
       | cavisne wrote:
       | How does this compare in bandwidth and latency to nvlink? (I'm
       | aware it's not available on the consumer cards)
        
         | wmf wrote:
         | It's 5x-10x slower.
        
       | modeless wrote:
       | What are the chances that Nvidia updates the firmware to disable
       | this and prevents downgrading with efuses? Someday cards that
       | still have older firmware may be more valuable. I'd be cautious
       | upgrading drivers for a while.
        
       | theturtle32 wrote:
       | WTF is P2P?
        
         | theturtle32 wrote:
         | Answered my own question with a Google search:
         | 
         | https://developer.nvidia.com/gpudirect#:~:text=LEARN%20MORE%...
         | .
         | 
         | > GPUDirect Peer to Peer > Enables GPU-to-GPU copies as well as
         | loads and stores directly over the memory fabric (PCIe,
         | NVLink). GPUDirect Peer to Peer is supported natively by the
         | CUDA Driver. Developers should use the latest CUDA Toolkit and
         | drivers on a system with two or more compatible devices.
        
       | tanelpoder wrote:
       | I also love that it can be done with just a few code line
       | changes:
       | 
       | https://github.com/NVIDIA/open-gpu-kernel-modules/commit/1f4...
        
       | waldrews wrote:
       | Would this approach be possible to extend downmarket, to older
       | consumer cards? For a lot of LLM use cases we're constrained by
       | memory and can tolerate lower compute speeds so long as there's
       | no swapping. ELI5, what would prevent a hundred 1060-level cards
       | from being used together?
        
         | Sebb767 wrote:
         | > ELI5, what would prevent a hundred 1060-level cards from
         | being used together?
         | 
         | In this case, you'd only have a single PCIe (v3!) lane per
         | card, making the interconnect speed horribly slow. You'd also
         | need to invest in thousands of dollars of hardware to get all
         | of those cards connected and, unless power is free, you'd
         | outspend any theoretical savings instantly.
         | 
         | In general, if you go back in card generations, you'll quickly
         | hit so low memory limits amd slower compute that a modern CPU-
         | based setup is better value.
        
       | qxfys wrote:
       | I am amazed how people always find a way to make this kind of
       | thing work. kudos!
        
       | chriskanan wrote:
       | This is great news. As an academic, I'm aware of multiple labs
       | that built boxes with 4090s, not realizing that Nvidia had
       | impaired P2P communication among cards. It's one of the reasons I
       | didn't buy 4090s, despite them being much more affordable for my
       | work. It isn't nvlink, but Nvidia has mostly gotten rid of that
       | except for their highest end cards. It is better than nothing.
       | 
       | Late last year, I got quotes for machines with four nvlink H100s,
       | but the lead time for delivery was 13 months. I could get the
       | non-nvlink ones in just four months. For now, I've gone with four
       | L40S cards to hold my lab over but supply chain issues and
       | gigantic price increases are making it very hard for my lab to do
       | it's work. That's not nearly enough to support 6 PhD students and
       | a bunch of undergrads.
       | 
       | Things were a lot easier when I could just build machines with
       | two GPUs each with Nvlink for $5K each and give one to each
       | student to put under their desks, which is what I did back in
       | 2015-2018 at my old university.
        
         | uniqueuid wrote:
         | And before that, Nvidia made our lives harder by phasing out
         | blower-style designs in consumer cards that we could put in
         | servers. In my lab, I'd take a card for 1/4 the price that has
         | half the MTBF over a card for full price anytime.
        
         | photonbeam wrote:
         | How does cost compare with some of the GPU-cloud providers?
        
           | uniqueuid wrote:
           | Not op, but I found this benchmark of whisper large-v3
           | interesting [1]. It includes the cloud provider's pricing per
           | gpu, so you can directly calculate break-even timing.
           | 
           | Of course, if you use different models, training, fine tuning
           | etc. the benchmarks will differ depending on ram, support of
           | fp8 etc.
           | 
           | [1] https://blog.salad.com/whisper-large-v3/
        
       | jeffs4271 wrote:
       | It is cool seeing hacks like this. But this is something to be
       | careful with, as GH100 had hardware changes to meet CUDA fence
       | requirements.
        
       | gururise wrote:
       | How long before Nvidia patches this?
        
       ___________________________________________________________________
       (page generated 2024-04-14 23:02 UTC)