[HN Gopher] The Illustrated DeepSeek-R1
___________________________________________________________________
The Illustrated DeepSeek-R1
Author : amrrs
Score : 142 points
Date : 2025-01-27 20:51 UTC (2 hours ago)
(HTM) web link (newsletter.languagemodels.co)
(TXT) w3m dump (newsletter.languagemodels.co)
| caithrin wrote:
| This is fantastic work, thank you!
| jasonjmcghee wrote:
| For the uninitiated, this is the same author as the many other
| "The Illustrated..." blog posts.
|
| A particularly popular one:
| https://jalammar.github.io/illustrated-transformer/
|
| Always very high quality.
| punkspider wrote:
| Thanks so much for mentioning this. His name carries a lot of
| weight for me as well.
| blackeyeblitzar wrote:
| The thing I still don't understand is how DeepSeek built the base
| model cheaply, and why their models seem to think they are GPT4
| when asked. This article says the base model is from their
| previous paper, but that paper also doesn't make clear what they
| trained on. The earlier paper is mostly a description of
| optimization techniques they applied. It does mention pretraining
| on 14.8T tokens with 2.7M H800 GPU hours to produce the base
| DeepSeek-V3. But what were those tokens? The paper describes the
| corpus only in vague ways.
| moralestapia wrote:
| A friend just sent me a screenshot where he asks DeepSeek if it
| has an app for Mac and it replies that they have a ChatGPT app
| from OpenAI, lol.
|
| I 100% believe they distilled GPT-4, hence the low "training"
| cost.
| Philpax wrote:
| Er, how would that reduce the cost? You still need to train
| the model, which is the expensive bit.
|
| Also, the base model for V3 and the only-RL-tuned R1-Zero are
| available, and they behave like base models, which seems
| unlikely if they used data from OpenAI as their primary data
| source.
|
| It's much more likely that they've consumed the background
| radiation of the web, where OpenAI contamination is dominant.
| moritonal wrote:
| I imagine it's a mix of either using ChatGPT as an the oracle
| to get training data. Or, it's the radiocarbon issue where the
| Internet has so much info on ChatGPT other models now get
| confused.
| whoistraitor wrote:
| It's remarkable we've hit a threshold where so much can be done
| with synthetic data. The reasoning race seems an utterly solvable
| problem now (thanks mostly to the verifiability of results). I
| guess the challenge then becomes non-reasoning domains, where
| qualitative and truly creative results are desired.
| kenjackson wrote:
| It seems like we need an evaluation model for creativity. I'm
| curious, is there research on this -- for example, can one
| score a random painting and output how creative/good a given
| population is likely to find it?
___________________________________________________________________
(page generated 2025-01-27 23:00 UTC)