[HN Gopher] Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 n...
       ___________________________________________________________________
        
       Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in
       llm.c
        
       Author : alecco
       Score  : 154 points
       Date   : 2024-07-11 19:21 UTC (3 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | alecco wrote:
       | Also https://x.com/karpathy/status/1811467135279104217#m
        
         | iforiq wrote:
         | How much did gpt2 training cost when it came out in 2019?
        
           | ozr wrote:
           | About $50,000.
        
           | withinboredom wrote:
           | They probably spent more on the training data, to be honest.
           | They had to get it the hard way.
        
       | jamestimmins wrote:
       | Anyone have an idea if this is feasible to do on a Macbook with a
       | built-in GPU?
        
         | arthurcolle wrote:
         | Will take more hours
        
         | michaelmior wrote:
         | Probably not with the same amount of training time, but I'd
         | imagine a recent MBP GPU could handle GPT-2 training. The
         | biggest challenge is that the training would need to be
         | reimplemented for Metal instead of CUDA.
        
           | jamestimmins wrote:
           | Ah so I couldn't just run this on my laptop for ~48 hours?
           | That's too bad.
        
             | danielmarkbruce wrote:
             | He does it on 8 H100's in 24 hours, ie 192 H100 hours. It's
             | going to be thousands of laptop hours.
        
             | mmoskal wrote:
             | H100 SXM is 2000 TFLOPS at FP16. Multiply by 8.
             | 
             | M3 Max is 28 TFLOPS at FP16.
             | 
             | Based on FLOPS alone, it would be more like a year or two.
        
               | Davidzheng wrote:
               | Can you estimate how long it would take to replicate
               | alphago zero today on one set of 8xH100.
        
               | karpathy wrote:
               | (H100 SXM is 1000 TFLOPS, *2 is from "with sparsity",
               | which is not used here.)
        
               | mmoskal wrote:
               | Right... and there are probably some communication
               | overheads over NVLink that would not be present on single
               | laptop. So a few months maybe :)
        
           | rty32 wrote:
           | Slightly off topic -- I just saw people saying how Mac's
           | unified memory makes it a strength to train models on Macs:
           | https://www.macrumors.com/2024/07/10/apple-leads-global-
           | pc-g..., and how energy efficient they are etc. But what I am
           | seeing is that people don't often even touch Macs at all --
           | they write code with CUDA and that's it. I find this kind of
           | conversation fascinating.
        
       | tomalaci wrote:
       | With how much NVidia is developing AI-workload accelerating
       | hardware, I expect this will cost maybe few dozen dollars and
       | train in few hours within next few years.
       | 
       | What I think will be interesting is when commodity hardware can
       | run cheap inference from very capable, specialized models. Pretty
       | sure it will spawn a new golden age of AI-powered desktop
       | applications.
       | 
       | For example, video game space has already been trying to create
       | AI-powered NPCs, world generation and story-telling (e.g. Inworld
       | AI).
        
         | talldayo wrote:
         | > I expect this will cost maybe few dozen dollars and train in
         | few hours within next few years.
         | 
         | I wouldn't count on it. Nvidia's been cleaning up shop, but
         | their best option for expanding right now is through
         | parallelization (bigger clusters, basically). Now that
         | Blackwell is on TSMC, Nvidia is alongside Apple waiting for new
         | and denser nodes to upgrade to. A real "generational leap" in
         | training cost is going to require some form of efficiency gain
         | that we're not seeing right now. It's _possible_ that Nvidia
         | has something up their sleeve, but I 'm not holding my breath.
         | 
         | > What I think will be interesting is when commodity hardware
         | can run cheap inference from very capable, specialized models.
         | 
         | What's funny is, you basically already can. The problem is
         | becoming integration, and in the case of video games, giving
         | the AI a meaningful role to fill. With today's finest
         | technology, you can enjoy an AI-generated roguelike that is
         | nigh-incomprehensible:
         | https://store.steampowered.com/app/1889620/AI_Roguelite/
         | 
         | As time goes on, I really think developers are just going to
         | not use AI for video games. Maybe I'm missing the "minecraft
         | moment" for procedurally-generated stories here, but the sort
         | of constraints needed to tell a story of create an interactive
         | experience don't exist within LLMs. It's a stochastic nightmare
         | of potential softlocks, contradictions or outright offensive
         | requests. The majority of places I've seen AI applied today
         | isn't for content creation, but instead automated moderation.
        
         | up2isomorphism wrote:
         | > For example, video game space has already been trying to
         | create AI-powered NPCs, world generation and story-telling
         | (e.g. Inworld AI).
         | 
         | To me this is a downside compared to the NPC generated by
         | humans, since that's the only reason I would like to read them.
        
           | MisterBastahrd wrote:
           | You don't even necessarily need to have them coming up with
           | valid speech. Simply giving out random quests and rewards
           | would keep people running on a loot treadmill for most open
           | world multiplayer games.
        
             | LeanderK wrote:
             | I think at first even background NPCs that don't give
             | quests and rewards would be nice. Sort of give everyone a
             | character and let them just babble. It breaks the immersion
             | to hear the same phrases repeatedly. You can still
             | handcraft every important quest and interaction to achieve
             | high fidelity, but I would like random NPCs you bump into
             | to not just repeat things all the time.
        
               | swatcoder wrote:
               | You're more likely to see studios (who want lots of more
               | varied content) use generative AI in the studio, where
               | they might generate and review it before release.
               | 
               | Letting generators run free on the client sets up
               | different kinds of immersion-breaking, where NPC's
               | hallucinate misleading details about the story/world, can
               | be tricked into reciting off-topic absurdities or age-
               | rating violations, etc. AAA studios can't afford the
               | embarrassment of that and smaller designers with pride of
               | craft won't see their signature come through the art in
               | it. Surely, some designers will figure out ways to make
               | it work great for some specific idea, but it's not the
               | best way to use the technology in most cases.
        
             | swatcoder wrote:
             | To make this work, you need your LLM-based AI to outperform
             | any other form of generating quests and rewards -- and that
             | performance is measured on things like player enjoyment,
             | game/story progression, exploitability, client system
             | requirements or server operating costs, etc and most of
             | those things are very hard to constrain or optimize for
             | with an LLM right now.
             | 
             | While the costs are hidden from end users and are going
             | down quickly, good LLM's remain very expensive to run and
             | very hard to keep on track compared to other options.
        
             | techjamie wrote:
             | You could maybe pull this off in a game like Borderlands
             | where the loot is basically just the same dozen guns but
             | with different numbers and effects. But as is, the LLM text
             | isn't going to be much different than a sufficiently large
             | AdLib system.
             | 
             | I think there is value to be gained in having LLMs as part
             | of the development process, maybe even the game itself, but
             | I think conventional methods are about as sufficient for
             | quests.
        
           | bick_nyers wrote:
           | What if the LLM that a specific NPC utilizes was
           | handcrafted/fine-tuned by a human?
        
           | chongli wrote:
           | Yeah. I have played a bunch of the roguelike Caves of Qud [1]
           | and it has both hand-written text and procedurally generated
           | text. The former is quite interesting and relevant to both
           | gameplay and plot. The latter is mostly uninteresting and
           | irrelevant, though it does work as "filler." This is similar
           | to how procedurally-generated grass can give a more natural
           | look to a hill than you'd get with tiles (which are
           | incredibly easy to spot unless a ton of work is put into
           | hiding the seams and repeating patterns).
           | 
           | I still long for the day when we can have procedurally-
           | generated stories and quests that are actually interesting to
           | play through. I have no idea how that is going to work
           | though!
           | 
           | [1] https://www.cavesofqud.com
        
         | forrestthewoods wrote:
         | > For example, video game space has already been trying to
         | create AI-powered NPCs, world generation and story-telling
         | (e.g. Inworld AI).
         | 
         | Current AI isn't even close to good enough for video game NPCs
         | and related. We're several breakthroughs away from that being
         | possible at any cost. Those breakthroughs might happen in 3
         | years, or they might not happen in 10. Hard to predict.
        
           | Tiberium wrote:
           | Are you sure? Models like Claude 3.5 Sonnet are both good at
           | writing and instructions, as long as you set some guardrails
           | for the model, they can be great NPCs.
        
             | forrestthewoods wrote:
             | > Are you sure?
             | 
             | Absolutely.
             | 
             | Current LLMs have insufficient world state. Imagine a game
             | like Stardew Valley. It's got a town with 30 NPCs or some
             | such. They all have personalities and the player builds a
             | relationship with them over time. Current LLMs can't do
             | that. They hallucinate waaaaaaay too much. You can't
             | reliably define and evolve relationships. Amongst many
             | other short comings.
             | 
             | I'm super pro AI and use ChatGPT all the time for
             | programming. So I'm not being an AI hater. But I am a
             | gamedev and I can say that what exists today simply isn't
             | good enough.
        
               | Tiberium wrote:
               | But you're saying that you want a single model to handle
               | all NPCs and the whole world. Of course this isn't
               | possible currently. But using a separate model with
               | separate context for each character is. Also, if you use
               | ChatGPT for programming, try Claude 3.5 Sonnet - it's
               | really better than GPT-4o for programming.
        
               | forrestthewoods wrote:
               | > But you're saying that you want a single model to
               | handle all NPCs and the whole world.
               | 
               | No, I did not say that at all. I didn't specify how the
               | LLMs may or may not be structure. I'm saying that current
               | LLMs - and yes I've used Claude 3.5 Sonnet - are
               | insufficient. There is no existence proof that they're
               | sufficient.
               | 
               | LLMs are great. They aren't great enough for video game
               | NPC. Not yet. Further innovation is needed. You're free
               | to disagree. I can't prove a dispositive. But there is no
               | working example.
        
         | swatcoder wrote:
         | > For example, video game space has already been trying to
         | create AI-powered NPCs, world generation and story-telling
         | (e.g. Inworld AI).
         | 
         | This'll be a niche for a long, long time.
         | 
         | Games are _generally_ carefully crafted to deliver a specific
         | mechanical and /or narrative experience. A world populated by
         | LLM/etc bots or content is one choice of what that experience
         | might be, but it's not going to be a very satisfying one for
         | many game designers -- especially given the current/near state
         | of the technology. There will be games and experiments that
         | explore it, for sure, but the vast majority of games just don't
         | have any need for it.
        
           | ImHereToVote wrote:
           | I don't have needs. I have wants. You see, I don't need. I
           | want.
        
           | headcanon wrote:
           | I do think there is a big opportunity for widely supported
           | hardware-accelerated matrix algebra in games. Currently most
           | of that is geared towards graphics (naturally) but being able
           | to easily encode arbitrary models and have them run on-device
           | would open up a lot of opportunities for games (like deep
           | simulation) that weren't possible before. Its currently
           | possible of course but requires custom tooling and
           | (relatively) niche hardware like a high-end graphics card.
           | 
           | I see the development energy around LLMs as a way to open up
           | support for that.
        
           | jetrink wrote:
           | Those don't have to be mutually exclusive though. To take AI
           | out of it, think of those murder mystery parties where actors
           | interact with attendees. Actors have roles to play and things
           | they must do to move the story forward, but they improvise
           | their dialog when talking with the players and sometimes each
           | other. Or if you've ever played D&D, you have experienced
           | talking with NPCs that are controlled by your DM. I think
           | video game AI could be a lot like that, where NPCs use
           | natural language instead of rigid dialog trees, but
           | otherwise, they behave a lot like they do today.
        
             | thatguymike wrote:
             | Ah, like this: https://www.youtube.com/watch?v=Kw51fkRiKZU
        
             | swatcoder wrote:
             | Yes, that's the intuition for where the technology might
             | go. Someday.
             | 
             | But actors and DM's are _much_ more disciplined than LLM
             | 's, partly because they have careers and friendships on the
             | line for misbehavior. For what amazing things they can do
             | in good weather, LLM's are not really reliable when you
             | want them to consistently deliver something very specific,
             | very secure, or very artfully crafted. They may get there,
             | but their design makes it a very hard problem that we're
             | still a long way from seeing commercialized.
        
           | 123yawaworht456 wrote:
           | for narrative/dialogue, yes, text generation is currently
           | useless. censorship, slop, extreme positivity bias. even
           | jailbroken Opus is shit.
           | 
           | but audio generation we _already_ have is pretty much good
           | enough, and this is big. it 's not AAA tier yet, sure, but
           | still lightyears better half-assing it with mediocre voice
           | actors. it is now an option to only use real voice actors for
           | a few key characters, and even that won't be necessary within
           | the next decade. even indie video games will be fully-voiced
           | soon.
        
             | glial wrote:
             | Similar for textures - it would be really neat if textures
             | were auto-generated to add detail when you get close to
             | something. As it is, the sprites just look bad.
        
         | robbomacrae wrote:
         | Whilst I agree with the reservations of the other replies I
         | think you were implying in the future and I'm sure the LLM's
         | will be more trustworthy and up to the task at some point.
         | 
         | What I would really like to see now is all the new TTS models
         | being used more widespread. There are still so many games that
         | have text only output. My kid love Alba: A wildlife Adventure
         | but the eldest still isn't quite ready to read all the text so
         | I have to sit with them reading out all the lines.
         | 
         | If anyone has a way of applying universal mods / accessibility
         | features to existing games I'd love to see someone solve this
         | and happy to help with the TTS!
        
         | charlescurt123 wrote:
         | I imagine we could do this now but not the way you think.
         | 
         | have a human created story and text as a guideline.
         | 
         | With that have genAI make the text per stage, you would get
         | different statements every time and would stay on track.
         | 
         | Would be interesting to play a game where all players say the
         | same information in slightly different ways every single
         | playthrough.
        
         | HPsquared wrote:
         | Similar scaling to genome sequencing. First genome was a huge
         | undertaking, now routine after a few Moore-esque cycles.
        
       | alecco wrote:
       | It will be interesting to see this with today's FlashAttention 3
       | for H100.
        
       | rurban wrote:
       | Would be free for us because we have those H100`s, but currently
       | it's way too hot now. They will reach 70degC, even watercooled.
        
       ___________________________________________________________________
       (page generated 2024-07-11 23:02 UTC)