[HN Gopher] DeepSeekMath-V2: Towards Self-Verifiable Mathematica...
___________________________________________________________________
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Author : victorbuilds
Score : 251 points
Date : 2025-12-01 08:54 UTC (14 hours ago)
(HTM) web link (huggingface.co)
(TXT) w3m dump (huggingface.co)
| victorbuilds wrote:
| Notable: they open-sourced the weights under Apache 2.0, unlike
| OpenAI and DeepMind whose IMO gold models are still proprietary.
| SilverElfin wrote:
| If they open source just weights and not the training code and
| data, then it's still proprietary.
| mips_avatar wrote:
| Yeah but you can distill
| amelius wrote:
| Is that the equivalent of decompile?
| c0balt wrote:
| No, that is the equivalent of lossy compression.
| littlestymaar wrote:
| You can distill closed weights models as well. (Just not
| logit-distillation)
| mips_avatar wrote:
| Though it violates their terms of service
| falcor84 wrote:
| Isn't that a bit like saying that if I open source a tool,
| but not a full compendium of all the code that I had read,
| which led me to develop it, then it's not really open source?
| fragmede wrote:
| No. In that case, you're providing two things, a binary
| version of your tool, and the tool's source. That tool's
| source is available to inspect and build their own copy.
| However, given just the weights, we don't have the source,
| and can't inspect what alignment went into it. In the case
| of DeepSeek, we know they had to purposefully cause their
| model to consider Tiananmen Square something it shouldn't
| discuss. But without the source used to create the model,
| we don't know what else is lurking around inside the model.
| NitpickLawyer wrote:
| > However, given just the weights, we don't have the
| source
|
| This is incorrect, given the definitions in the license.
|
| > (Apache 2.0) "Source" form shall mean _the preferred
| form for making modifications_ , including but not
| limited to software source code, documentation source,
| and configuration files.
|
| (emphasis mine)
|
| In LLMs, the weights _are_ the preferred form of making
| modifications. Weights are not _compiled_ from something
| else. You start with the weights (randomly initialised)
| and at every step of training you adjust the weights.
| That is not akin to compilation, for many reasons (both
| theoretical and practical).
|
| In general licenses do not give you rights over the
| "know-how" or "processes" in which the licensed parts
| were created. What you get is the ability to inspect,
| modify, redistribute the work as you see fit. And most
| importantly, you modify the work just like the creators
| modify the work (hence the preferred form). Just not with
| the same data (i.e. you can modify the source of chrome
| all you want, just not with the "know-how and knowledge"
| of a google engineer - the license can not offer that).
|
| This is also covered in the EU AI act btw.
|
| > General-purpose AI models released under free and open-
| source licences should be considered to ensure high
| levels of transparency and openness if their parameters,
| including the weights, the information on the model
| architecture, and the information on model usage are made
| publicly available. The licence should be considered to
| be free and open-source also when it allows users to run,
| copy, distribute, study, change and improve software and
| data, including models under the condition that the
| original provider of the model is credited, the identical
| or comparable terms of distribution are respected.
| fragmede wrote:
| > In LLMs, the weights are the preferred form of making
| modifications.
|
| No they aren't. We happen to be able to do things to
| modify the weights, sure, but why would any lab ever
| train something from scratch if editing weights was
| _preferred_?
| NitpickLawyer wrote:
| training _is_ modifying the weights. How you modify them
| is not the object of a license, never was.
| noodletheworld wrote:
| > And most importantly, you modify the work _just like
| the creators modify the work_
|
| Emphasis mine.
|
| Weights are not open source.
|
| You can define terms to mean whatever you want, but
| _fundametally_ if you cannot modify the "output" the way
| the original creators could, its not in the spirit of
| open source.
|
| Isnt that _literally_ what you said?
|
| How can you possibly claim both that a) you can modify it
| the creators did, b) thats all you need to be open
| source, but...
|
| Also c) the _categorically incorrect_ assertion that the
| weights allow you to do this?
|
| Whatever, I guess, but your argument is logically wrong,
| and philosophically flawed.
| NitpickLawyer wrote:
| > Weights are not open source.
|
| If they are released under an open source license, they
| are.
|
| I think you are confusing two concepts. One is the
| technical ability to modify weights. And that's what the
| license grants you. The right to modify. The second is
| the "know-how" on _how_ to modify the weights. That is
| not something that a license has ever granted you.
|
| Let me put it this way:
|
| ```python
|
| THRESHOLD = 0.73214
|
| if input() < THRESHOLD: print ("low")
|
| else: print ("high")
|
| ```
|
| If I release that piece of code under Apache 2.0, you
| have the right to study it, modify it and release it as
| you see fit. But you _can not_ have the right (at least
| the license doesn 't deal with that) to know _how_ I
| reached that threshold value. And me not telling you does
| not in any way invalidate the license being Apache 2.0.
| That 's simply not something that licenses do.
|
| In LLMs the source is a collection of architecture (when
| and how to apply the "ifs"), inference code (how to
| optimise the computation of the "ifs") and hardcoded
| values (weights). You are being granted a license to run,
| study, modify and release those hardcoded values. You do
| not, never had, never will in the scope of a license, get
| the right to know how those hardcoded values were
| reached. The process by which those values were found can
| be anything from "dreamt up" to "found via ML". The fact
| that you don't know _how_ those values were derived does
| not in any way preclude you from exercising the rights
| under the license.
| roblabla wrote:
| You are fundamentally conflating releasing a binary under
| an open source license with the software being open
| source. Nobody is saying that they're violating the
| license of Apache2 by not releasing the training data.
| What people are objecting to is that calling this release
| "open source", when the only thing covered by the open
| source license is the weights, to be an abuse of the
| meaning of "Open Source".
|
| To give you an example: I can release a binary (without
| sources) under the MIT - an open source license. That
| will give you the rights to use, copy, modify, merge,
| publish, distribute, sublicense, and/or sell copies of
| said binary. In doing so, I would have released the
| binary under an open source license. However, most people
| would agree that the software would not be open source
| under the conventional definition, as the sources would
| not be published. While people could modify it by
| disassembling it and modifying it, there is a general
| understanding that Open Source requires distributing the
| _sources_.
|
| This is very similar to what is being done here. They're
| releasing the weights under an open source license - but
| the overall software is not open source.
| v9v wrote:
| Would you accept the argument that compiling is modifying
| the bytes in the memory space reserved for an executable?
|
| I can edit the executable at the byte level if I so
| desire, and this is also what compilers do, but the
| developer would instead be modifying the source code to
| make changes to the program and then feed that through a
| compiler.
|
| Similarly, I can edit the weights of a neural network
| myself (using any tool I want) but the developers of the
| network would be altering the training dataset and the
| training code to make changes instead.
| NitpickLawyer wrote:
| I think the confusion for a lot of people comes from what
| they imagine compilation to be. In LLMs, the process is
| this (simplified):
|
| define_architecture (what the operations are, and the
| order in which they're performed)
|
| initialise_model(defined_arch) -> weights. Weights are
| "just" hardcoded values. Nothing more, nothing less.
|
| The weights are the result of the arch, at "compile"
| time.
|
| optimise_weights(weights, data) -> better_weights.
|
| ----
|
| You can, should you wish, totally release a model after
| iitialisation. It would be a useless model, but, again,
| the license does not deal with that. You would have the
| rights to run, modify and release the model, even if it
| were a random model.
|
| tl;dr; Licenses deal with _what_ you can do with a model.
| You can run it, modify it, redistribute it. They do not
| deal with _how_ you modify them (i.e. what data you use
| to arrive at the "optimal" hardcoded values). See also
| my other reply with a simplified code example.
| falcor84 wrote:
| The big difference that an Open Source license gives me
| is that regardless of the tool I use to make the edits,
| if I rewrite the bytes of the Linux kernel, I can freely
| release my version with the same license, but if I
| rewrite the bytes of Super Mario Odyssey and try to
| release the modified version, I'll soon be having a very
| fun time at the bankruptcy court.
| nextaccountic wrote:
| No, it's like saying that if you release under Apache
| license, it's not open source even though it's under an
| open source license
|
| For something to be open source it needs to have sources
| released. Sources are the things in the preferred format to
| be edited. So the code used for training is obviously
| source (people can edit the training code to change
| something about the released weights). Also the training
| data, under the same rationale: people can select which
| data is used for training to change the weights
| falcor84 wrote:
| Well, this is just semantics. I can have a repo that
| includes a collection of json files that I had generated
| via a semi-manual build process that depends on
| everything from the state of my microbiome to my cat's
| scratching pattern during Mercury's last retrograde. If I
| attach an open source license to it, then that's the
| source - do with it what you will. Otherwise, I don't see
| how this discussion doesn't lead to "you must first
| invent the universe".
| typ wrote:
| The difference is that you can customize/debug it or not.
| You might say that a .EXE can be modified too. But I
| don't think that's the conventional definition of open
| source.
|
| I understand that these days, businesses and hobbyists
| just want to use free LLMs without paying subscriptions
| for economic motives, that is, either saving money or
| making money. They don't really care whether the _source_
| is truly available or not. They are just end users of a
| product, not open-source developers by any means.
| nextaccountic wrote:
| Not just semantics, the concept of open source
| fundamentally depend on what the preferred form of
| modification is
|
| https://opensource.org/ai/open-source-ai-definition
| nurettin wrote:
| Is this a troll? They don't want to reproduce your open
| source code, they want to reproduce the weights.
| falcor84 wrote:
| What does open sourcing have to do with "reproducing"?
| Last I checked, open sourcing is about allowing others to
| modify and to distribute the modified version, which you
| can do with these. Yes, having the full training data and
| tooling would make it significantly easier, and it is a
| requirement for GPL, but not for Open Source licenses in
| general. You may add this as another argument in favor of
| going back in time and doing more to support Richard
| Stallman's vision, but this is the world in which we live
| now.
| nurettin wrote:
| For obvious reasons, there is no world in which you can
| "build" this kind of so-called open source project
| without the data sets. Play around with words all you
| want.
| KaiserPro wrote:
| No its like releasing a binary. I can hook into it and its
| API and make it do other things. But I can't rebuild it
| from scratch.
| falcor84 wrote:
| > rebuild it from scratch
|
| That's beyond the definition of Open Source. Doing a bit
| of license research now, only the GPL has such a
| requirement - GPLv3:
|
| > The "Corresponding Source" for a work in object code
| form means all the source code needed to generate,
| install, and (for an executable work) run the object code
| and to modify the work, including scripts to control
| those activities.
|
| But all other Open Source compliant licenses I checked
| don't, and just refer to making whatever is in the repo
| available to others.
| PunchyHamster wrote:
| ok but just the model isn't even close to anything open,
| it's literally a compiled binary, without even the source
| data
| KaiserPro wrote:
| If you distribute a binary to someone, with gpl2, you
| should also, if asked provide the source code used to
| _build_ that binary. Other licenses will differ. MIT for
| example lets you do pretty much anything, so long as you
| keep the MIT license and attribution public.
|
| But when people are talking about open source, they
| generally mean "oh I can see the source code and build it
| my self." rather than freeware which is "I can run the
| binary and not have to pay"
| exe34 wrote:
| "open source" as a verb is doing too much work here. are
| you proposing to release the human readable code or the
| object/machine code?
|
| if it's the latter, it's not the source. it's free as in
| beer. not freedom.
| falcor84 wrote:
| Yes, I 100% agree. Open Source is a lot more about not
| paying than about liberty.
|
| This is exactly the tradeoff that we had made in the
| industry a couple of decades ago. We could have pushed
| all-in on Stallman's vision and the FSF's definition of
| Free Software, but we (collectively) decided that it's
| more important to get the practical benefits of having
| all these repos up there on GitHub and us not suing each
| other over copyright infringement. It's absolutely
| legitimate to say that we made the wrong choice, and I
| might agree, but a choice was made, and Open Source !=
| Free Software.
|
| https://www.gnu.org/philosophy/open-source-misses-the-
| point....
| amelius wrote:
| True. But the headline says open weights.
| ekianjo wrote:
| It's just open weights, the source has no place in this
| expression
| jimmydoe wrote:
| you are absolutely right. I'd rather use true closed models,
| not fake open source ones from China.
| PunchyHamster wrote:
| I think we should treat copyright for the weights the same way
| the AI companies treat source material ;)
| littlestymaar wrote:
| We don't even have to do that: weights being entirely machine
| generated without human intervention, they are likely not
| copyrightable in the first place.
|
| In fact, we should collectively refuse to abide to these
| fantasy license before weight copyrightability gets created
| out of thin air because it's been commonplace for long
| enough.
| mitthrowaway2 wrote:
| There's an argument by which machine-learned neural network
| weights are a lossy compression of (as well as a smooth
| interpolator over) the training set.
|
| An mp3 file is also a machine-generated lossy compression
| of a cd-quality .wav file, but it's clearly copyrightable.
|
| To that extent, the main difference between a neural
| network and an .mp3 is that the mp3 compression cannot be
| used to interpolate between two copyrighted works to output
| something in the middle. This is, on the other hand,
| perhaps the most common use case for genAI, and it's
| actually tricky to get it to not output something "in the
| middle" (but also not impossible).
|
| I think the copyright argument could really go either way
| here.
| littlestymaar wrote:
| > An mp3 file is also a machine-generated lossy
| compression of a cd-quality .wav file, but it's clearly
| copyrightable.
|
| Not the .mp3 itself, the creative piece of art that it
| encode.
|
| You can't record Taylor Swift at a concert and claim
| copyright on that. Nor can you claim copyright on mp3 re-
| encoded old audio footage that belong to the public
| domain.
|
| Whether LLMs are in the first category (copyright
| infringement of copyright holders of the training data)
| or in the second (public domain or fair use) is an open
| question that jurisprudence is slowly resolving depending
| on the jurisdiction, but that doesn't address the
| question of the weight themselves.
| mitthrowaway2 wrote:
| Right, the .mp3 is machine generated but on a creatively
| -generated input. The analogy I'm making is that an LLM's
| weights (or let's say, a diffusion image model) are also
| machine-generated (by the training process) from the
| works in its training set, many of which are creative
| works, and the neural network encodes those creative
| works much like mp3 file does.
|
| In this analogy, distributing the weights would be akin
| to distributing an mp3, and offering a genAI service,
| like charGPT inference or a stable diffusion API, would
| be akin to broadcasting.
| littlestymaar wrote:
| I'd be fine with this interpretation, but that would
| definitely rule out fair use for training, and be even
| worse for LLM makers than having LLM non-copyrightable.
| mitthrowaway2 wrote:
| Oh yes, absolutely.
| larodi wrote:
| Of course we should! And everyone who says otherwise must be
| delusional or sort of a gaslighter, as this whole
| "innovation" (or remix (or comopression)) is enabled by the
| creative value of the source product. Given AI companies
| never ever respected this copyright, we should give them
| similar treatment.
| ilmj8426 wrote:
| It's impressive to see how fast open-weights models are catching
| up in specialized domains like math and reasoning. I'm curious if
| anyone has tested this model for complex logic tasks in coding?
| Sometimes strong math performance correlates well with debugging
| or algorithm generation.
| stingraycharles wrote:
| kimi-k2 is pretty decent at coding but it's nowhere near the
| SOTA models of Anthropic/OpenAI/Google.
| tripplyons wrote:
| Are you referring to the new reasoning version of Kimi K2?
| alansaber wrote:
| It makes complete sense to me: highly-specific models don't
| have much commercial value, and at-scale llm training favours
| generalism.
| yorwba wrote:
| Previous discussion:
| https://news.ycombinator.com/item?id=46072786 218 points 3 days
| ago, 48 comments
| victorbuilds wrote:
| Ah, missed that one. Thanks for the link.
| terespuwash wrote:
| Why isn't OpenAI's gold medal-winning model available to the
| public yet?
| esafak wrote:
| 'coz it was for advertisement. They'll roll their lessons into
| the next general purpose model.
| H8crilA wrote:
| How do you run this kind of a model at home? On a CPU on a
| machine that has about 1TB of RAM?
| pixelpoet wrote:
| Wow, it's 690GB of downloaded data, so yeah, 1TB sounds about
| right. Not even my two Strix Halo machines paired can do this,
| damn.
| Gracana wrote:
| You can do it slowly with ik_llama.cpp, lots of RAM, and one
| good GPU. Also regular llama.cpp, but the ik fork has some
| enhancements that make this sort of thing more tolerable.
| bertili wrote:
| Two 512GB Mac Studios connected with thunderbolt 5.
| sschueller wrote:
| How is OpenAI going to be able to serve ads in chatgpt without
| everyone immediately jumping ship to another model?
| miroljub wrote:
| I don't care about OpenAI even if they don't serve ads.
|
| I can't trust any of their output until they become honest
| enough to change their name to CloseAI.
| Coffeewine wrote:
| I suppose the hope is that they don't, and we wind up with
| commodity frontier models from multiple providers at market
| rates.
| KeplerBoy wrote:
| Google served ads for decades and no one ever jumped ship to
| another search engine.
| sschueller wrote:
| Because Google gave the best results for a long time.
| PunchyHamster wrote:
| and now, when they are not, everyone else's results are
| also pretty terrible...
| bootsmann wrote:
| They pay $30bn (more than OpenAIs lifetime revenue) each year
| to make sure noone does.
| KeplerBoy wrote:
| What are you referring to?
| rzerowan wrote:
| Search deals with mobile OEMs and apple(preferred engine
| in all mobile browsers) also paying off Mozilla for a
| start. Also Goog has had a first mover moat for a while
| before Duck came along.
| dist-epoch wrote:
| The same way people stayed on Google despite DuckDuckGo
| existing.
| PunchyHamster wrote:
| by having datacenters with GPUs and API everyone uses.
|
| So they are either earning money directly or on the API calls.
|
| Now, competition can come and compete on that, but they will
| probably still be the first choice for foreseeable future
| astrange wrote:
| ChatGPT is a website. There's nothing unusual about ads on a
| website.
|
| People use Instagram too.
| simianwords wrote:
| A bit important that this model is not general purpose whereas
| the ones Google and OpenAI used were general purpose.
| mangolie wrote:
| https://x.com/deepseek_ai/status/1995452646459858977
|
| Boom
| simianwords wrote:
| Oh you may be correct. Are these models general purpose or
| fine tuned for mathematics?
| yorwba wrote:
| That's a different model: https://huggingface.co/deepseek-
| ai/DeepSeek-V3.2-Speciale
| andy12_ wrote:
| Do note that that is a different model. The one we are
| talking about here, DeepSeekMath-V2, is indeed overcooked
| with math RL. It's so eager to solve math problems, that it
| even comes up with random ones if you prompt it with "Hello".
|
| https://x.com/AlpinDale/status/1994324943559852326?s=20
| yorwba wrote:
| Both OpenAI and Google used models made specifically for the
| task, not their general-purpose products.
|
| OpenAI:
| https://xcancel.com/alexwei_/status/1946477756738629827#m "we
| are releasing GPT-5 soon, and we're excited for you to try it.
| But just to be clear: the IMO gold LLM is an experimental
| research model. We don't plan to release anything with this
| level of math capability for several months."
|
| DeepMind: https://deepmind.google/blog/advanced-version-of-
| gemini-with... "we additionally trained this version of Gemini
| on novel reinforcement learning techniques that can leverage
| more multi-step reasoning, problem-solving and theorem-proving
| data. We also provided Gemini with access to a curated corpus
| of high-quality solutions to mathematics problems, and added
| some general hints and tips on how to approach IMO problems to
| its instructions."
| simianwords wrote:
| Not true
| simianwords wrote:
| https://x.com/sama/status/1946569252296929727
|
| >we achieved gold medal level performance on the 2025 IMO
| competition with a _general-purpose reasoning system!_ to
| emphasize, this is an LLM doing math and not a specific
| formal math system; it is part of our main push towards
| general intelligence.
|
| asterisks mine
| yorwba wrote:
| DeepSeekMath-V2 is also an LLM doing math and not a
| specific formal math system. What interpretation of
| "general purpose" were you using where one of them is
| "general purpose" and the other isn't?
| simianwords wrote:
| This model can't be used for say questions on biology or
| history.
| yorwba wrote:
| How do you know how well OpenAI's unreleased experimental
| model does on biology or history questions?
| simianwords wrote:
| Sam specifically says it is general purpose and also this
|
| > Typically for these AI results, like in
| Go/Dota/Poker/Diplomacy, researchers spend years making
| an AI that masters one narrow domain and does little
| else. But this isn't an IMO-specific model. It's a
| reasoning LLM that incorporates new experimental general-
| purpose techniques.
|
| https://x.com/polynoamial/status/1946478250974200272
| lossolo wrote:
| You are overinterpreting what they said again.
| "Go/Dota/Poker/Diplomacy" do not use LLMs, which means
| they are not considered "general purpose" by them. And to
| prove it to you, look at the OpenAI IMO solutions on
| GitHub, which clearly show that it's not a general
| purpose trained LLM because of how the words and
| sentences are generated there. These are models
| specifically fine tuned for math.
| simianwords wrote:
| they could not have been more clear - sorry but are you
| even reading?
| lossolo wrote:
| Clear about what? Do you know the difference between an
| LLM based on transformer attention and a monte carlo tree
| search system like the one used in Go? You do not
| understand what they are saying. It was a fine tuned
| model, just as DeepSeekMath is a fine tuned LLM for math,
| which means it was a special purpose model. Read the
| OpenAI GitHub IMO submissions to see the proof.
| letmetweakit wrote:
| Does anyone know if this will become available on OpenRouter?
| WhitneyLand wrote:
| Shouldn't there be a lot of skepticism here?
|
| All the problems they claim to have solved are on are the
| Internet and they explicitly say they crawled them. They do not
| mention doing any benchmark decontamination or excluding
| 2024/2025 competition problems from training.
|
| IIRC correctly OpenAI/Google did not have access to the 2025
| problems before testing their experimental math models.
| LZ_Khan wrote:
| Don't they distill directly off OpenAI/Google outputs?
___________________________________________________________________
(page generated 2025-12-01 23:02 UTC)