[HN Gopher] Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 n...
___________________________________________________________________
Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in
llm.c
Author : alecco
Score : 154 points
Date : 2024-07-11 19:21 UTC (3 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| alecco wrote:
| Also https://x.com/karpathy/status/1811467135279104217#m
| iforiq wrote:
| How much did gpt2 training cost when it came out in 2019?
| ozr wrote:
| About $50,000.
| withinboredom wrote:
| They probably spent more on the training data, to be honest.
| They had to get it the hard way.
| jamestimmins wrote:
| Anyone have an idea if this is feasible to do on a Macbook with a
| built-in GPU?
| arthurcolle wrote:
| Will take more hours
| michaelmior wrote:
| Probably not with the same amount of training time, but I'd
| imagine a recent MBP GPU could handle GPT-2 training. The
| biggest challenge is that the training would need to be
| reimplemented for Metal instead of CUDA.
| jamestimmins wrote:
| Ah so I couldn't just run this on my laptop for ~48 hours?
| That's too bad.
| danielmarkbruce wrote:
| He does it on 8 H100's in 24 hours, ie 192 H100 hours. It's
| going to be thousands of laptop hours.
| mmoskal wrote:
| H100 SXM is 2000 TFLOPS at FP16. Multiply by 8.
|
| M3 Max is 28 TFLOPS at FP16.
|
| Based on FLOPS alone, it would be more like a year or two.
| Davidzheng wrote:
| Can you estimate how long it would take to replicate
| alphago zero today on one set of 8xH100.
| karpathy wrote:
| (H100 SXM is 1000 TFLOPS, *2 is from "with sparsity",
| which is not used here.)
| mmoskal wrote:
| Right... and there are probably some communication
| overheads over NVLink that would not be present on single
| laptop. So a few months maybe :)
| rty32 wrote:
| Slightly off topic -- I just saw people saying how Mac's
| unified memory makes it a strength to train models on Macs:
| https://www.macrumors.com/2024/07/10/apple-leads-global-
| pc-g..., and how energy efficient they are etc. But what I am
| seeing is that people don't often even touch Macs at all --
| they write code with CUDA and that's it. I find this kind of
| conversation fascinating.
| tomalaci wrote:
| With how much NVidia is developing AI-workload accelerating
| hardware, I expect this will cost maybe few dozen dollars and
| train in few hours within next few years.
|
| What I think will be interesting is when commodity hardware can
| run cheap inference from very capable, specialized models. Pretty
| sure it will spawn a new golden age of AI-powered desktop
| applications.
|
| For example, video game space has already been trying to create
| AI-powered NPCs, world generation and story-telling (e.g. Inworld
| AI).
| talldayo wrote:
| > I expect this will cost maybe few dozen dollars and train in
| few hours within next few years.
|
| I wouldn't count on it. Nvidia's been cleaning up shop, but
| their best option for expanding right now is through
| parallelization (bigger clusters, basically). Now that
| Blackwell is on TSMC, Nvidia is alongside Apple waiting for new
| and denser nodes to upgrade to. A real "generational leap" in
| training cost is going to require some form of efficiency gain
| that we're not seeing right now. It's _possible_ that Nvidia
| has something up their sleeve, but I 'm not holding my breath.
|
| > What I think will be interesting is when commodity hardware
| can run cheap inference from very capable, specialized models.
|
| What's funny is, you basically already can. The problem is
| becoming integration, and in the case of video games, giving
| the AI a meaningful role to fill. With today's finest
| technology, you can enjoy an AI-generated roguelike that is
| nigh-incomprehensible:
| https://store.steampowered.com/app/1889620/AI_Roguelite/
|
| As time goes on, I really think developers are just going to
| not use AI for video games. Maybe I'm missing the "minecraft
| moment" for procedurally-generated stories here, but the sort
| of constraints needed to tell a story of create an interactive
| experience don't exist within LLMs. It's a stochastic nightmare
| of potential softlocks, contradictions or outright offensive
| requests. The majority of places I've seen AI applied today
| isn't for content creation, but instead automated moderation.
| up2isomorphism wrote:
| > For example, video game space has already been trying to
| create AI-powered NPCs, world generation and story-telling
| (e.g. Inworld AI).
|
| To me this is a downside compared to the NPC generated by
| humans, since that's the only reason I would like to read them.
| MisterBastahrd wrote:
| You don't even necessarily need to have them coming up with
| valid speech. Simply giving out random quests and rewards
| would keep people running on a loot treadmill for most open
| world multiplayer games.
| LeanderK wrote:
| I think at first even background NPCs that don't give
| quests and rewards would be nice. Sort of give everyone a
| character and let them just babble. It breaks the immersion
| to hear the same phrases repeatedly. You can still
| handcraft every important quest and interaction to achieve
| high fidelity, but I would like random NPCs you bump into
| to not just repeat things all the time.
| swatcoder wrote:
| You're more likely to see studios (who want lots of more
| varied content) use generative AI in the studio, where
| they might generate and review it before release.
|
| Letting generators run free on the client sets up
| different kinds of immersion-breaking, where NPC's
| hallucinate misleading details about the story/world, can
| be tricked into reciting off-topic absurdities or age-
| rating violations, etc. AAA studios can't afford the
| embarrassment of that and smaller designers with pride of
| craft won't see their signature come through the art in
| it. Surely, some designers will figure out ways to make
| it work great for some specific idea, but it's not the
| best way to use the technology in most cases.
| swatcoder wrote:
| To make this work, you need your LLM-based AI to outperform
| any other form of generating quests and rewards -- and that
| performance is measured on things like player enjoyment,
| game/story progression, exploitability, client system
| requirements or server operating costs, etc and most of
| those things are very hard to constrain or optimize for
| with an LLM right now.
|
| While the costs are hidden from end users and are going
| down quickly, good LLM's remain very expensive to run and
| very hard to keep on track compared to other options.
| techjamie wrote:
| You could maybe pull this off in a game like Borderlands
| where the loot is basically just the same dozen guns but
| with different numbers and effects. But as is, the LLM text
| isn't going to be much different than a sufficiently large
| AdLib system.
|
| I think there is value to be gained in having LLMs as part
| of the development process, maybe even the game itself, but
| I think conventional methods are about as sufficient for
| quests.
| bick_nyers wrote:
| What if the LLM that a specific NPC utilizes was
| handcrafted/fine-tuned by a human?
| chongli wrote:
| Yeah. I have played a bunch of the roguelike Caves of Qud [1]
| and it has both hand-written text and procedurally generated
| text. The former is quite interesting and relevant to both
| gameplay and plot. The latter is mostly uninteresting and
| irrelevant, though it does work as "filler." This is similar
| to how procedurally-generated grass can give a more natural
| look to a hill than you'd get with tiles (which are
| incredibly easy to spot unless a ton of work is put into
| hiding the seams and repeating patterns).
|
| I still long for the day when we can have procedurally-
| generated stories and quests that are actually interesting to
| play through. I have no idea how that is going to work
| though!
|
| [1] https://www.cavesofqud.com
| forrestthewoods wrote:
| > For example, video game space has already been trying to
| create AI-powered NPCs, world generation and story-telling
| (e.g. Inworld AI).
|
| Current AI isn't even close to good enough for video game NPCs
| and related. We're several breakthroughs away from that being
| possible at any cost. Those breakthroughs might happen in 3
| years, or they might not happen in 10. Hard to predict.
| Tiberium wrote:
| Are you sure? Models like Claude 3.5 Sonnet are both good at
| writing and instructions, as long as you set some guardrails
| for the model, they can be great NPCs.
| forrestthewoods wrote:
| > Are you sure?
|
| Absolutely.
|
| Current LLMs have insufficient world state. Imagine a game
| like Stardew Valley. It's got a town with 30 NPCs or some
| such. They all have personalities and the player builds a
| relationship with them over time. Current LLMs can't do
| that. They hallucinate waaaaaaay too much. You can't
| reliably define and evolve relationships. Amongst many
| other short comings.
|
| I'm super pro AI and use ChatGPT all the time for
| programming. So I'm not being an AI hater. But I am a
| gamedev and I can say that what exists today simply isn't
| good enough.
| Tiberium wrote:
| But you're saying that you want a single model to handle
| all NPCs and the whole world. Of course this isn't
| possible currently. But using a separate model with
| separate context for each character is. Also, if you use
| ChatGPT for programming, try Claude 3.5 Sonnet - it's
| really better than GPT-4o for programming.
| forrestthewoods wrote:
| > But you're saying that you want a single model to
| handle all NPCs and the whole world.
|
| No, I did not say that at all. I didn't specify how the
| LLMs may or may not be structure. I'm saying that current
| LLMs - and yes I've used Claude 3.5 Sonnet - are
| insufficient. There is no existence proof that they're
| sufficient.
|
| LLMs are great. They aren't great enough for video game
| NPC. Not yet. Further innovation is needed. You're free
| to disagree. I can't prove a dispositive. But there is no
| working example.
| swatcoder wrote:
| > For example, video game space has already been trying to
| create AI-powered NPCs, world generation and story-telling
| (e.g. Inworld AI).
|
| This'll be a niche for a long, long time.
|
| Games are _generally_ carefully crafted to deliver a specific
| mechanical and /or narrative experience. A world populated by
| LLM/etc bots or content is one choice of what that experience
| might be, but it's not going to be a very satisfying one for
| many game designers -- especially given the current/near state
| of the technology. There will be games and experiments that
| explore it, for sure, but the vast majority of games just don't
| have any need for it.
| ImHereToVote wrote:
| I don't have needs. I have wants. You see, I don't need. I
| want.
| headcanon wrote:
| I do think there is a big opportunity for widely supported
| hardware-accelerated matrix algebra in games. Currently most
| of that is geared towards graphics (naturally) but being able
| to easily encode arbitrary models and have them run on-device
| would open up a lot of opportunities for games (like deep
| simulation) that weren't possible before. Its currently
| possible of course but requires custom tooling and
| (relatively) niche hardware like a high-end graphics card.
|
| I see the development energy around LLMs as a way to open up
| support for that.
| jetrink wrote:
| Those don't have to be mutually exclusive though. To take AI
| out of it, think of those murder mystery parties where actors
| interact with attendees. Actors have roles to play and things
| they must do to move the story forward, but they improvise
| their dialog when talking with the players and sometimes each
| other. Or if you've ever played D&D, you have experienced
| talking with NPCs that are controlled by your DM. I think
| video game AI could be a lot like that, where NPCs use
| natural language instead of rigid dialog trees, but
| otherwise, they behave a lot like they do today.
| thatguymike wrote:
| Ah, like this: https://www.youtube.com/watch?v=Kw51fkRiKZU
| swatcoder wrote:
| Yes, that's the intuition for where the technology might
| go. Someday.
|
| But actors and DM's are _much_ more disciplined than LLM
| 's, partly because they have careers and friendships on the
| line for misbehavior. For what amazing things they can do
| in good weather, LLM's are not really reliable when you
| want them to consistently deliver something very specific,
| very secure, or very artfully crafted. They may get there,
| but their design makes it a very hard problem that we're
| still a long way from seeing commercialized.
| 123yawaworht456 wrote:
| for narrative/dialogue, yes, text generation is currently
| useless. censorship, slop, extreme positivity bias. even
| jailbroken Opus is shit.
|
| but audio generation we _already_ have is pretty much good
| enough, and this is big. it 's not AAA tier yet, sure, but
| still lightyears better half-assing it with mediocre voice
| actors. it is now an option to only use real voice actors for
| a few key characters, and even that won't be necessary within
| the next decade. even indie video games will be fully-voiced
| soon.
| glial wrote:
| Similar for textures - it would be really neat if textures
| were auto-generated to add detail when you get close to
| something. As it is, the sprites just look bad.
| robbomacrae wrote:
| Whilst I agree with the reservations of the other replies I
| think you were implying in the future and I'm sure the LLM's
| will be more trustworthy and up to the task at some point.
|
| What I would really like to see now is all the new TTS models
| being used more widespread. There are still so many games that
| have text only output. My kid love Alba: A wildlife Adventure
| but the eldest still isn't quite ready to read all the text so
| I have to sit with them reading out all the lines.
|
| If anyone has a way of applying universal mods / accessibility
| features to existing games I'd love to see someone solve this
| and happy to help with the TTS!
| charlescurt123 wrote:
| I imagine we could do this now but not the way you think.
|
| have a human created story and text as a guideline.
|
| With that have genAI make the text per stage, you would get
| different statements every time and would stay on track.
|
| Would be interesting to play a game where all players say the
| same information in slightly different ways every single
| playthrough.
| HPsquared wrote:
| Similar scaling to genome sequencing. First genome was a huge
| undertaking, now routine after a few Moore-esque cycles.
| alecco wrote:
| It will be interesting to see this with today's FlashAttention 3
| for H100.
| rurban wrote:
| Would be free for us because we have those H100`s, but currently
| it's way too hot now. They will reach 70degC, even watercooled.
___________________________________________________________________
(page generated 2024-07-11 23:02 UTC)