[HN Gopher] Train Your Own O1 Preview Model Within $450
       ___________________________________________________________________
        
       Train Your Own O1 Preview Model Within $450
        
       Author : 9woc
       Score  : 364 points
       Date   : 2025-02-21 08:42 UTC (14 hours ago)
        
 (HTM) web link (sky.cs.berkeley.edu)
 (TXT) w3m dump (sky.cs.berkeley.edu)
        
       | Tiberium wrote:
       | Better URL: https://novasky-ai.github.io/posts/sky-t1/
        
         | 9woc wrote:
         | True. The previous discussion on this is here:
         | https://news.ycombinator.com/item?id=42681417
        
       | danielhanchen wrote:
       | If anyone's interested, I made Colab notebooks with free GPUs for
       | both GRPO (the algo DeepSeek used) to train a reasoning model
       | from scratch, and also general finetuning, which the Berkeley
       | team employed!
       | 
       | GRPO notebook for Llama 3.1 8B:
       | https://colab.research.google.com/github/unslothai/notebooks...
       | 
       | General finetuning notebook:
       | https://colab.research.google.com/github/unslothai/notebooks...
       | 
       | The Berkeley team's 17K dataset:
       | https://huggingface.co/datasets/NovaSky-AI/Sky-T1_data_17k
       | Hugging Face also released a 220K dataset:
       | https://huggingface.co/datasets/open-r1/OpenR1-Math-220k
        
       | fl4tul4 wrote:
       | I do love competition.
       | 
       | In the last weeks are are seeing a torrent of advances, just
       | because someone opened their architectures.
       | 
       | Imagine where we could go if the training datasets were also
       | publicly available and unbounded by any copyright laws. (I'm not
       | talking about doing anything illegal).
       | 
       | I can only dream, I guess.
        
         | paper2d wrote:
         | Those training datasets can never be free as almost all of them
         | is copyrighted.
        
           | lionkor wrote:
           | almost all free things are copyrighted
        
           | chii wrote:
           | perhaps copyright needs to be updated. And in any case, my
           | personal belief is that training on data that is publicly
           | released, and as well as purchased media, is fair use.
        
             | tonyedgecombe wrote:
             | The UK government is doing that at the behest of the AI
             | companies which tends to indicate they have bet misbehaving
             | up to now.
        
             | philipwhiuk wrote:
             | If anything it needs to be updated to actually prevent the
             | rampant profit extraction from human creation in order to
             | protect actual creators.
        
               | FergusArgyll wrote:
               | Not OP, but that should be _part_ of the update, I think.
               | 
               | I think we can all agree there does need to be an update.
               | You don't want to forever outlaw deep learning (even if
               | you do want to, that's not going to happen so it's worth
               | helping to shape the future)
               | 
               | It's very complicated with a bunch of moving parts but I
               | really want society to start arguing about it so we can
               | get to a semi-fair place
        
               | CamperBob2 wrote:
               | Yeah, that's a good idea. Stop the most important advance
               | in storing, retrieving, and disseminating knowledge since
               | the printing press because _muh copyright!!1!!_
               | 
               | Never mind that you've just handed control of an
               | incredibly-powerful tool over to nations that DGAF about
               | copyright law.
               | 
               | If copyright interests want to fight AI, then copyright
               | has to go. It's that simple. It's an unnecessary fight,
               | but somebody needs to convince them of that.
        
               | woah wrote:
               | Each time someone clicks "send" on chatGPT, Warner Bros
               | gets 1c
               | 
               | $25 to Elsevier per GPU purchase
        
               | spookie wrote:
               | I'll be honest, even if this comment won't fly: It is
               | impossible to change the views here, on this point.
               | Specifically, here.
               | 
               | I do share your opinion. Others may argue "What about x
               | country? They don't care!", even though that position is
               | about as good as making anything excusable because
               | someone else did it.
               | 
               | I might add, I'm really not trying to be toxic. Just
               | saying this based on what I see when this comes up.
        
               | eikenberry wrote:
               | I don't think you will ever see any law to benefit the
               | creators. Better to eliminate it and at least let the
               | artists the freedom to work with any media they want.
               | Artists will generally still be poor, but they'll be more
               | creative.
        
             | azinman2 wrote:
             | Why should it be? I'd personally be pissed if my book,
             | which came from my own hard work and is sold per person,
             | all of the sudden get subsumed by a general AI. Even worse
             | if it is commercialized and I get nothing for it.
        
           | taosx wrote:
           | Share the non-copyrighted ones and it's still a win if you
           | make it possible to people to contribute, both through PRs,
           | testing and discussion.
        
           | landryraccoon wrote:
           | Japan has said AI can train on copyrighted materials.
           | 
           | https://www.privacyworld.blog/2024/03/japans-new-draft-
           | guide...
           | 
           | I imagine if copyright is a big issue for AI, Japanese
           | startups will have an advantage.
        
             | 0xdeadbeefbabe wrote:
             | Does China need to say anything or can you guess their
             | policy?
        
         | noduerme wrote:
         | Isn't the general attitude these days to just break laws and
         | bribe officials once you own the hottest startup? /s
         | 
         | edit: re. the /s I was living offshore and running the most
         | popular bitcoin casino at the time, spending a vast amount of
         | money and energy to block any player who might be American. As
         | a result I didn't make that much money. And I tried to
         | calculate how much I would need to make if I wanted to break
         | the law and hide out forever. I figured I could make $10-15M a
         | year but that wouldn't be enough to hide. I fucked up, I guess.
         | Because the richest man in the world made most of his first
         | round of money facilitating gambling transactions, and he's now
         | got his snout in every federal agency. I should have had the
         | balls, I guess, to ask forgiveness rather than permission.
        
           | coliveira wrote:
           | This was always like this. Youtube started publishing mostly
           | copyrighted content, then Google settled with copyright
           | owners. Google by the way has perfected the "art" of training
           | their algos with content without approval from copyright
           | owners.
        
         | Lucasoato wrote:
         | A torrent of advances is the right way to word it, especially
         | after it has been discovered what Meta trained their models on
         | :)
        
         | Kye wrote:
         | It seems like the torrent was already happening and DeepSeek's
         | part is just one example of that. They did help bring
         | _attention_ to those advancements, and that 's led to lots more
         | people contributing and finding more niche applications.
        
       | brador wrote:
       | I just want to make music with AI and it is very difficult. The
       | meta model on hugging gives an error when used through the
       | website and no one will ever fix it.
        
         | polishdude20 wrote:
         | Suno?
        
           | fragmede wrote:
           | Yeah. If you want to play ai researcher, by all means go play
           | around with hugging face and build a local AI GPU rig. if you
           | want to make some music, just use Suno.
        
           | ionwake wrote:
           | I find I can only give them one sentence to describe the
           | music I want which is not good enough - has this changed at
           | all?
        
             | xyproto wrote:
             | You can describe or upload the first N seconds, then extend
             | from that by using another description, then extend from N
             | further seconds etc. But Suno music within a genre has a
             | pretty limited range.
        
             | petercooper wrote:
             | It's still only 240 characters or whatever, but it pays to
             | be dense. So rather than "Write a song that sounds like
             | polka etc etc" just keyword pack it.
        
         | Kye wrote:
         | It depends on how much you want it to do for you. I've used
         | ChatGPT to come up with song briefs which I then turn into
         | music myself.
        
       | magicalhippo wrote:
       | So this is a fine-tune and not from scratch, which makes the
       | proposition much more reasonable.
       | 
       | That said, for someone who's not in the game but been curious as
       | to the details of fine-tuning, it's great to get both the dataset
       | and the code.
        
       | rlforllms wrote:
       | Wait so Qwen trained QWQ 32B from Qwen 32B and then they distill
       | QWQ back into Qwen 32B? What's the point?
       | 
       | This is massive marketing scam here. Borderline academic
       | dishonesty.
        
         | barrenko wrote:
         | Not sure if scam, honestly depends on the data sometimes it
         | might work.
        
           | rlforllms wrote:
           | The goal is distillation is to distill into smaller models
           | like 7B, 1.5B.
           | 
           | They didn't even change the model size, let alone try a
           | different class of models.
           | 
           | Getting expert model's trajectories is trivial if you have
           | vLLM to do batched inference.
        
         | jojaja wrote:
         | So you are better off just using QwQ
        
         | andy_xor_andrew wrote:
         | I wouldn't go that far, but I agree, my reaction to reading the
         | details was to go "huh?"
         | 
         | From the title, my best guess was they applied some kind of
         | RL/GRPO to an existing model.
         | 
         | But... they took an existing model that had already undergone
         | SFT for reasoning... and then used it to generate data to SFT
         | the exact same model... nothing wrong with that, but it doesn't
         | seem to warrant the title they chose.
        
       | _joel wrote:
       | It's not from scratch, though, right? Am I missing something here
       | as to why it's at the top of the posts?
        
         | twobitshifter wrote:
         | There's no real reason to start from true scratch anymore. You
         | don't harvest wheat, mill flour, milk a cow, and churn butter
         | for your cake.
        
       | mkagenius wrote:
       | Weird that they had to resort to click bait using "O1 preview" in
       | their name.
       | 
       | I expected some sort of way to actually get o1 preview retrained
       | (and downloadable).
       | 
       | Also, calling it O1 preview on just 7 benchmarks is not correct.
       | What if someone comes up with some use cases where O1 preview
       | does better than this.
       | 
       | apart from that, good that things are becoming cheaper.
        
         | codelion wrote:
         | Yeah, I agree. The "O1 preview" naming feels a bit misleading.
         | It sets an expectation of broader coverage than just those
         | specific benchmarks. It's cool to see cost reductions, but the
         | marketing could be more transparent about the scope.
        
         | jug wrote:
         | It's dishonest because they not only point towards a specific
         | language model, but the beta version of a specific model. WTH?
        
           | echelon wrote:
           | It's not dishonest, it's simple human behavior.
           | 
           | The vocabulary used to describe the culturally prevailing
           | leader will be used to explain similar concepts and create
           | analogies. That's an easier tool to communicate to the masses
           | than crafting super tailored messages for only domain
           | experts.
           | 
           | It's why we keep doing this, and it's also why trademarks
           | become generics.
           | 
           | "Google it", "Uber for X", "band aid", "the band sounds like
           | Y", "the actor looks like Z", etc. etc.
           | 
           | This is a core part of how human language works and how we as
           | a species communicate with one another.
        
             | michaelt wrote:
             | "Build your own Lamborghini Huracan at home for $450"
             | 
             | "Wow! Quite a feat to deliver an iconic design, a 631
             | horsepower engine, and performance of 0-150 mph in 15.4
             | seconds on such a small budget!"
             | 
             | "Actually what we mean is, like the Lamborghini Huracan,
             | our vehicle has two seats."
        
               | vineyardmike wrote:
               | $450 for a Lamborghini clone is a lot more impressive
               | when it _compares favorably on (some) benchmarks_.
               | 
               | Also, at $450 no one expect it to truly be a from-scratch
               | complete recreation of a model that cost hundreds of
               | millions to produce.
               | 
               | Instead, they built a model (via fine tuning) using
               | similar technique and got similar results within their
               | attempted are of experimentation that they created their
               | training data for.
               | 
               | I personally was not mislead by the title at all.
        
               | echelon wrote:
               | Nothing OpenAI has produced is a Lamborghini Huracan
               | level above other generic AI models, though.
               | 
               | There are open source models better than OpenAI's image
               | and video models, and OpenAI is not winning the LLM space
               | by any measure.
               | 
               | The hobbyist absolutely won't feel as though they're
               | trying to fake a Huracan with a Camry here. They're going
               | to build useful products with whatever they choose,
               | regardless of what vendor or open source project produced
               | the model.
               | 
               | Your analogy is silly. OpenAI is more like Band-Aid(r)
               | than Lamborghini Huracan.
        
             | yieldcrv wrote:
             | ChatGPT is the market leader, nobody except enthusiasts are
             | distinguishing between their models, any models. And the
             | enthusiasts know the difference
             | 
             | Verdict: dishonest
        
       | scosman wrote:
       | Inference time compute is still very under utilized in actual AI
       | deployments. Lots of folks are working on foundation models,
       | which require reasoning about broad problem domains. Not enough
       | people are using the same techniques for task-specific
       | performance improvements. You can easily distill the reasoning
       | from larger models like R1 for your task. Often better, you can
       | mix in custom thinking instructions for specific sub-problems so
       | a fine tuned model learns a mix of task specific reasoning and
       | custom logic. It's not hard and easily beats prompt iteration.
       | When you find bugs, you can fix it.
       | 
       | I made a GitHub project for distilling thinking models (and
       | customs COT inference time fine tuning):
       | https://docs.getkiln.ai/docs/guide-train-a-reasoning-model
        
         | anon373839 wrote:
         | Thanks for linking to this. That's a good resource!
         | 
         | Do you have any pointers on assembling fine-tuning data not for
         | isolated tasks, but for a flexible range of queries in a
         | particular problem domain? Similar to general purpose
         | instruction-tuning, but much more focused.
         | 
         | For example, suppose you're building an app that helps doctors
         | search through research literature to aid in diagnosis, check
         | hypotheses, etc. Of course you would want to have some domain
         | experts and real users available to see what kind of queries
         | they would create. But getting from that point to a well-
         | balanced dataset that adequately represents the distribution of
         | possible queries, instructions, writing/cognitive styles,
         | formatting, dialog flows, etc. your app will encounter --- it
         | just seems kind of hard to know how to approach a task like
         | that. It seems there are infinitely many dimensions you could
         | accidentally overfit on.
        
           | pizza wrote:
           | General advice? Collect data, train a model, note the
           | mistakes in the model, mistakes in the data, and think
           | critically about what it is that you're ending up teaching.
           | Repeat many, many, many times.. For some tasks, don't be
           | surprised if it ends up taking months or a year or several.
           | It took me 6 months of building a dataset, by hand, by
           | myself, to produce ~1600 'gold standard' text examples
           | (bolstered by ~100K synthetic examples) - texts plus 20
           | dimensions rated 1-4. But I managed to beat SOTA models in
           | this task from all the frontier labs by doing so. It also
           | makes sense to consider all of the various "lacks" of the
           | competing models.
           | 
           | It's quite difficult to see all the future decisions you will
           | make due to future insights about future versions of the
           | whole loop. But you _will_ be needing to make some.
           | 
           | I will say one more concrete thing though: the more metadata
           | you collect, generally, the better, but this can make it more
           | expensive.
           | 
           | Also, if you ever need to update your schema.. well this is
           | actually one reason why text data for LLMs is nice: your
           | schema is essentially fluid in the first place, so you could
           | eg stick metadata in the text itself if at some future point
           | you start collecting it.
           | 
           | I guess, also, it's a good thing to constantly add new
           | benchmarks, if possible. Treat your model's capabilities as
           | _knowable_ , but never treat your model's capabilities as
           | actually _known_.
        
       | genpfault wrote:
       | > The model training finishes in 19 hours on 8 H100 with
       | DeepSpeed Zero-3 offload (~ $450 according to Lambda Cloud
       | pricing).
        
       | moconnor wrote:
       | They trained on QwQ traces and in their evaluation they are...
       | mostly slightly worse than QwQ.
       | 
       | Hardly a huge win.
        
       | JoshTko wrote:
       | Has anyone tested if the consensus of top 4-5 mini models
       | together would out perform the best frontier model?
        
       | rdli wrote:
       | The blog post was a little unclear, so my summary was:
       | 
       | - They used QwQ to generate training data (with some cleanup
       | using GPT-4o-mini)
       | 
       | - The training data was then used to FT Qwen2.5-32B-Instruct
       | (non-reasoning model)
       | 
       | - Result was that Sky-T1 performs slightly worse than QwQ but
       | much better than Qwen2.5 on reasoning tasks
       | 
       | There are a few dismissive comments here but I actually think
       | this is pretty interesting as it shows how you can FT a
       | foundation model to do better at reasoning.
        
         | azinman2 wrote:
         | I wish they would have compared to the r1 distills of qwen2.5
        
       | m3kw9 wrote:
       | Looks like they need to put quotes on the 450$
        
       | tw1984 wrote:
       | just several weeks ago, OpenAI was still using reasoning as a
       | part of its tech moat to partially justify its hugely inflated
       | valuation. in just weeks after the release of deepseek and kimi
       | and their paper on how to do it, average joes can now train it at
       | home by spending less than the purchase cost of one single mid
       | end gaming GPU.
        
       ___________________________________________________________________
       (page generated 2025-02-21 23:00 UTC)