[HN Gopher] Train Your Own O1 Preview Model Within $450
___________________________________________________________________
Train Your Own O1 Preview Model Within $450
Author : 9woc
Score : 364 points
Date : 2025-02-21 08:42 UTC (14 hours ago)
(HTM) web link (sky.cs.berkeley.edu)
(TXT) w3m dump (sky.cs.berkeley.edu)
| Tiberium wrote:
| Better URL: https://novasky-ai.github.io/posts/sky-t1/
| 9woc wrote:
| True. The previous discussion on this is here:
| https://news.ycombinator.com/item?id=42681417
| danielhanchen wrote:
| If anyone's interested, I made Colab notebooks with free GPUs for
| both GRPO (the algo DeepSeek used) to train a reasoning model
| from scratch, and also general finetuning, which the Berkeley
| team employed!
|
| GRPO notebook for Llama 3.1 8B:
| https://colab.research.google.com/github/unslothai/notebooks...
|
| General finetuning notebook:
| https://colab.research.google.com/github/unslothai/notebooks...
|
| The Berkeley team's 17K dataset:
| https://huggingface.co/datasets/NovaSky-AI/Sky-T1_data_17k
| Hugging Face also released a 220K dataset:
| https://huggingface.co/datasets/open-r1/OpenR1-Math-220k
| fl4tul4 wrote:
| I do love competition.
|
| In the last weeks are are seeing a torrent of advances, just
| because someone opened their architectures.
|
| Imagine where we could go if the training datasets were also
| publicly available and unbounded by any copyright laws. (I'm not
| talking about doing anything illegal).
|
| I can only dream, I guess.
| paper2d wrote:
| Those training datasets can never be free as almost all of them
| is copyrighted.
| lionkor wrote:
| almost all free things are copyrighted
| chii wrote:
| perhaps copyright needs to be updated. And in any case, my
| personal belief is that training on data that is publicly
| released, and as well as purchased media, is fair use.
| tonyedgecombe wrote:
| The UK government is doing that at the behest of the AI
| companies which tends to indicate they have bet misbehaving
| up to now.
| philipwhiuk wrote:
| If anything it needs to be updated to actually prevent the
| rampant profit extraction from human creation in order to
| protect actual creators.
| FergusArgyll wrote:
| Not OP, but that should be _part_ of the update, I think.
|
| I think we can all agree there does need to be an update.
| You don't want to forever outlaw deep learning (even if
| you do want to, that's not going to happen so it's worth
| helping to shape the future)
|
| It's very complicated with a bunch of moving parts but I
| really want society to start arguing about it so we can
| get to a semi-fair place
| CamperBob2 wrote:
| Yeah, that's a good idea. Stop the most important advance
| in storing, retrieving, and disseminating knowledge since
| the printing press because _muh copyright!!1!!_
|
| Never mind that you've just handed control of an
| incredibly-powerful tool over to nations that DGAF about
| copyright law.
|
| If copyright interests want to fight AI, then copyright
| has to go. It's that simple. It's an unnecessary fight,
| but somebody needs to convince them of that.
| woah wrote:
| Each time someone clicks "send" on chatGPT, Warner Bros
| gets 1c
|
| $25 to Elsevier per GPU purchase
| spookie wrote:
| I'll be honest, even if this comment won't fly: It is
| impossible to change the views here, on this point.
| Specifically, here.
|
| I do share your opinion. Others may argue "What about x
| country? They don't care!", even though that position is
| about as good as making anything excusable because
| someone else did it.
|
| I might add, I'm really not trying to be toxic. Just
| saying this based on what I see when this comes up.
| eikenberry wrote:
| I don't think you will ever see any law to benefit the
| creators. Better to eliminate it and at least let the
| artists the freedom to work with any media they want.
| Artists will generally still be poor, but they'll be more
| creative.
| azinman2 wrote:
| Why should it be? I'd personally be pissed if my book,
| which came from my own hard work and is sold per person,
| all of the sudden get subsumed by a general AI. Even worse
| if it is commercialized and I get nothing for it.
| taosx wrote:
| Share the non-copyrighted ones and it's still a win if you
| make it possible to people to contribute, both through PRs,
| testing and discussion.
| landryraccoon wrote:
| Japan has said AI can train on copyrighted materials.
|
| https://www.privacyworld.blog/2024/03/japans-new-draft-
| guide...
|
| I imagine if copyright is a big issue for AI, Japanese
| startups will have an advantage.
| 0xdeadbeefbabe wrote:
| Does China need to say anything or can you guess their
| policy?
| noduerme wrote:
| Isn't the general attitude these days to just break laws and
| bribe officials once you own the hottest startup? /s
|
| edit: re. the /s I was living offshore and running the most
| popular bitcoin casino at the time, spending a vast amount of
| money and energy to block any player who might be American. As
| a result I didn't make that much money. And I tried to
| calculate how much I would need to make if I wanted to break
| the law and hide out forever. I figured I could make $10-15M a
| year but that wouldn't be enough to hide. I fucked up, I guess.
| Because the richest man in the world made most of his first
| round of money facilitating gambling transactions, and he's now
| got his snout in every federal agency. I should have had the
| balls, I guess, to ask forgiveness rather than permission.
| coliveira wrote:
| This was always like this. Youtube started publishing mostly
| copyrighted content, then Google settled with copyright
| owners. Google by the way has perfected the "art" of training
| their algos with content without approval from copyright
| owners.
| Lucasoato wrote:
| A torrent of advances is the right way to word it, especially
| after it has been discovered what Meta trained their models on
| :)
| Kye wrote:
| It seems like the torrent was already happening and DeepSeek's
| part is just one example of that. They did help bring
| _attention_ to those advancements, and that 's led to lots more
| people contributing and finding more niche applications.
| brador wrote:
| I just want to make music with AI and it is very difficult. The
| meta model on hugging gives an error when used through the
| website and no one will ever fix it.
| polishdude20 wrote:
| Suno?
| fragmede wrote:
| Yeah. If you want to play ai researcher, by all means go play
| around with hugging face and build a local AI GPU rig. if you
| want to make some music, just use Suno.
| ionwake wrote:
| I find I can only give them one sentence to describe the
| music I want which is not good enough - has this changed at
| all?
| xyproto wrote:
| You can describe or upload the first N seconds, then extend
| from that by using another description, then extend from N
| further seconds etc. But Suno music within a genre has a
| pretty limited range.
| petercooper wrote:
| It's still only 240 characters or whatever, but it pays to
| be dense. So rather than "Write a song that sounds like
| polka etc etc" just keyword pack it.
| Kye wrote:
| It depends on how much you want it to do for you. I've used
| ChatGPT to come up with song briefs which I then turn into
| music myself.
| magicalhippo wrote:
| So this is a fine-tune and not from scratch, which makes the
| proposition much more reasonable.
|
| That said, for someone who's not in the game but been curious as
| to the details of fine-tuning, it's great to get both the dataset
| and the code.
| rlforllms wrote:
| Wait so Qwen trained QWQ 32B from Qwen 32B and then they distill
| QWQ back into Qwen 32B? What's the point?
|
| This is massive marketing scam here. Borderline academic
| dishonesty.
| barrenko wrote:
| Not sure if scam, honestly depends on the data sometimes it
| might work.
| rlforllms wrote:
| The goal is distillation is to distill into smaller models
| like 7B, 1.5B.
|
| They didn't even change the model size, let alone try a
| different class of models.
|
| Getting expert model's trajectories is trivial if you have
| vLLM to do batched inference.
| jojaja wrote:
| So you are better off just using QwQ
| andy_xor_andrew wrote:
| I wouldn't go that far, but I agree, my reaction to reading the
| details was to go "huh?"
|
| From the title, my best guess was they applied some kind of
| RL/GRPO to an existing model.
|
| But... they took an existing model that had already undergone
| SFT for reasoning... and then used it to generate data to SFT
| the exact same model... nothing wrong with that, but it doesn't
| seem to warrant the title they chose.
| _joel wrote:
| It's not from scratch, though, right? Am I missing something here
| as to why it's at the top of the posts?
| twobitshifter wrote:
| There's no real reason to start from true scratch anymore. You
| don't harvest wheat, mill flour, milk a cow, and churn butter
| for your cake.
| mkagenius wrote:
| Weird that they had to resort to click bait using "O1 preview" in
| their name.
|
| I expected some sort of way to actually get o1 preview retrained
| (and downloadable).
|
| Also, calling it O1 preview on just 7 benchmarks is not correct.
| What if someone comes up with some use cases where O1 preview
| does better than this.
|
| apart from that, good that things are becoming cheaper.
| codelion wrote:
| Yeah, I agree. The "O1 preview" naming feels a bit misleading.
| It sets an expectation of broader coverage than just those
| specific benchmarks. It's cool to see cost reductions, but the
| marketing could be more transparent about the scope.
| jug wrote:
| It's dishonest because they not only point towards a specific
| language model, but the beta version of a specific model. WTH?
| echelon wrote:
| It's not dishonest, it's simple human behavior.
|
| The vocabulary used to describe the culturally prevailing
| leader will be used to explain similar concepts and create
| analogies. That's an easier tool to communicate to the masses
| than crafting super tailored messages for only domain
| experts.
|
| It's why we keep doing this, and it's also why trademarks
| become generics.
|
| "Google it", "Uber for X", "band aid", "the band sounds like
| Y", "the actor looks like Z", etc. etc.
|
| This is a core part of how human language works and how we as
| a species communicate with one another.
| michaelt wrote:
| "Build your own Lamborghini Huracan at home for $450"
|
| "Wow! Quite a feat to deliver an iconic design, a 631
| horsepower engine, and performance of 0-150 mph in 15.4
| seconds on such a small budget!"
|
| "Actually what we mean is, like the Lamborghini Huracan,
| our vehicle has two seats."
| vineyardmike wrote:
| $450 for a Lamborghini clone is a lot more impressive
| when it _compares favorably on (some) benchmarks_.
|
| Also, at $450 no one expect it to truly be a from-scratch
| complete recreation of a model that cost hundreds of
| millions to produce.
|
| Instead, they built a model (via fine tuning) using
| similar technique and got similar results within their
| attempted are of experimentation that they created their
| training data for.
|
| I personally was not mislead by the title at all.
| echelon wrote:
| Nothing OpenAI has produced is a Lamborghini Huracan
| level above other generic AI models, though.
|
| There are open source models better than OpenAI's image
| and video models, and OpenAI is not winning the LLM space
| by any measure.
|
| The hobbyist absolutely won't feel as though they're
| trying to fake a Huracan with a Camry here. They're going
| to build useful products with whatever they choose,
| regardless of what vendor or open source project produced
| the model.
|
| Your analogy is silly. OpenAI is more like Band-Aid(r)
| than Lamborghini Huracan.
| yieldcrv wrote:
| ChatGPT is the market leader, nobody except enthusiasts are
| distinguishing between their models, any models. And the
| enthusiasts know the difference
|
| Verdict: dishonest
| scosman wrote:
| Inference time compute is still very under utilized in actual AI
| deployments. Lots of folks are working on foundation models,
| which require reasoning about broad problem domains. Not enough
| people are using the same techniques for task-specific
| performance improvements. You can easily distill the reasoning
| from larger models like R1 for your task. Often better, you can
| mix in custom thinking instructions for specific sub-problems so
| a fine tuned model learns a mix of task specific reasoning and
| custom logic. It's not hard and easily beats prompt iteration.
| When you find bugs, you can fix it.
|
| I made a GitHub project for distilling thinking models (and
| customs COT inference time fine tuning):
| https://docs.getkiln.ai/docs/guide-train-a-reasoning-model
| anon373839 wrote:
| Thanks for linking to this. That's a good resource!
|
| Do you have any pointers on assembling fine-tuning data not for
| isolated tasks, but for a flexible range of queries in a
| particular problem domain? Similar to general purpose
| instruction-tuning, but much more focused.
|
| For example, suppose you're building an app that helps doctors
| search through research literature to aid in diagnosis, check
| hypotheses, etc. Of course you would want to have some domain
| experts and real users available to see what kind of queries
| they would create. But getting from that point to a well-
| balanced dataset that adequately represents the distribution of
| possible queries, instructions, writing/cognitive styles,
| formatting, dialog flows, etc. your app will encounter --- it
| just seems kind of hard to know how to approach a task like
| that. It seems there are infinitely many dimensions you could
| accidentally overfit on.
| pizza wrote:
| General advice? Collect data, train a model, note the
| mistakes in the model, mistakes in the data, and think
| critically about what it is that you're ending up teaching.
| Repeat many, many, many times.. For some tasks, don't be
| surprised if it ends up taking months or a year or several.
| It took me 6 months of building a dataset, by hand, by
| myself, to produce ~1600 'gold standard' text examples
| (bolstered by ~100K synthetic examples) - texts plus 20
| dimensions rated 1-4. But I managed to beat SOTA models in
| this task from all the frontier labs by doing so. It also
| makes sense to consider all of the various "lacks" of the
| competing models.
|
| It's quite difficult to see all the future decisions you will
| make due to future insights about future versions of the
| whole loop. But you _will_ be needing to make some.
|
| I will say one more concrete thing though: the more metadata
| you collect, generally, the better, but this can make it more
| expensive.
|
| Also, if you ever need to update your schema.. well this is
| actually one reason why text data for LLMs is nice: your
| schema is essentially fluid in the first place, so you could
| eg stick metadata in the text itself if at some future point
| you start collecting it.
|
| I guess, also, it's a good thing to constantly add new
| benchmarks, if possible. Treat your model's capabilities as
| _knowable_ , but never treat your model's capabilities as
| actually _known_.
| genpfault wrote:
| > The model training finishes in 19 hours on 8 H100 with
| DeepSpeed Zero-3 offload (~ $450 according to Lambda Cloud
| pricing).
| moconnor wrote:
| They trained on QwQ traces and in their evaluation they are...
| mostly slightly worse than QwQ.
|
| Hardly a huge win.
| JoshTko wrote:
| Has anyone tested if the consensus of top 4-5 mini models
| together would out perform the best frontier model?
| rdli wrote:
| The blog post was a little unclear, so my summary was:
|
| - They used QwQ to generate training data (with some cleanup
| using GPT-4o-mini)
|
| - The training data was then used to FT Qwen2.5-32B-Instruct
| (non-reasoning model)
|
| - Result was that Sky-T1 performs slightly worse than QwQ but
| much better than Qwen2.5 on reasoning tasks
|
| There are a few dismissive comments here but I actually think
| this is pretty interesting as it shows how you can FT a
| foundation model to do better at reasoning.
| azinman2 wrote:
| I wish they would have compared to the r1 distills of qwen2.5
| m3kw9 wrote:
| Looks like they need to put quotes on the 450$
| tw1984 wrote:
| just several weeks ago, OpenAI was still using reasoning as a
| part of its tech moat to partially justify its hugely inflated
| valuation. in just weeks after the release of deepseek and kimi
| and their paper on how to do it, average joes can now train it at
| home by spending less than the purchase cost of one single mid
| end gaming GPU.
___________________________________________________________________
(page generated 2025-02-21 23:00 UTC)