[HN Gopher] DeepSeek-v3.2-Exp
___________________________________________________________________
DeepSeek-v3.2-Exp
Author : meetpateltech
Score : 284 points
Date : 2025-09-29 10:26 UTC (12 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| terespuwash wrote:
| Looks like Deep Sparse Attention can help with code (structured
| and long-file reasoning)
| matrix2596 wrote:
| awesome that sparse attention used in real world setting
| mythz wrote:
| Happy to see Chinese OSS models keep getting better and cheaper.
| It also comes with a 50% API price drop for an already cheap
| model, now at:
|
| $0.28/M Input ($0.028/M cache hit) > $0.42/M Output
| manishsharan wrote:
| This price drop is nice but I wonder how long it will last.
| Their prices used to be very low,then they almost doubled, and
| now it dropped.
| nacs wrote:
| I don't know if it will stay this low but the whole point of
| v3.2 is to be cheaper to run than <= v3.1.
|
| (The inference costs are cheaper for them now as context
| grows because of the Sparse attention mechanism)
| guluarte wrote:
| I was using it daily, but after the price jump, using codex
| and claude was much cheaper than using deepseek.
| dizhn wrote:
| What was the price before? I thought they had just increased
| their prices.
| espadrine wrote:
| Input: $0.07 (cached), $0.56 (cache miss)
|
| Output: $1.68 per million tokens.
|
| https://api-docs.deepseek.com/news/news250929
| Havoc wrote:
| wow...gigantic reduction in cost while holding the benchmarks
| mostly steady. Impressive.
| awongh wrote:
| The 2nd order effect that not a lot of people talk about is
| price: the fact that model scaling at this pace also correlates
| with price is amazing.
|
| I think this is just as important to distribution of AI as model
| intelligence is.
|
| AFAIK there are no fundamental "laws" that prevent price from
| continuing to fall, at least correlated with Moore's law (or
| whatever the current AI/Nvidia chip development cycle is called
| right now)- each new generation of hardware is significantly
| faster/cheaper than the next- so will we see a ChatGPT-5 model at
| half the price in a year? (yes I know that thinking models cost
| more, but just on a per-token basis)
| samuelknight wrote:
| You are vastly underestimating the price decline. To cherrypick
| one article; in the first two years since GPT 3.5, inference
| price for the same amount of intelligence has decreased 10x per
| year according to a study by Andreessen Horowitz
| https://a16z.com/llmflation-llm-inference-cost/. So in a stark
| slowdown scenario, we could still see a 1000x decrease in the
| next 5 years.
|
| Price deflation is not tied to Moore's right now because much
| of the performance gains are from model optimization, high
| bandwidth memory supply chains, and electrical capacity build
| out, not FLOP density.
| awongh wrote:
| True! I just know that model optimization gains are much less
| guaranteed than say, FLOP density, even though model
| optimization has so far provided way more gains than hardware
| advancements.
|
| Part of me is optimistic that when the AI bubble bursts the
| excess data center capacity is going to be another force
| driving the cost of inference down.
| NemoNobody wrote:
| Haha, I love how delusional everyone is about AI.
|
| Yeppers, when that bubble burst - that's hilarious. This is
| the kinda stuff grandkids won't believe someday.
| naasking wrote:
| > I just know that model optimization gains are much less
| guaranteed than say, FLOP density, even though model
| optimization has so far provided way more gains than
| hardware advancements.
|
| Performance gained from model improvements has outpaced
| performance gained from hardware improvements for decades.
| throwaway314155 wrote:
| > has decreased 10x per year according to a study by
| Andreessen Horowitz
|
| I believe you but that's not exactly an unbiased source of
| information.
| wwizo wrote:
| You guys rock! I'm very curious how will this perform against
| real word data, where small nuance matters. Also have you tested
| it beyond 128K context window?
| esafak wrote:
| https://openrouter.ai/deepseek/deepseek-v3.2-exp
| nacs wrote:
| Strange - the model is marked as "Trains on data" ("To our
| knowledge, this provider may use your prompts and completions
| to train new models. This provider is disabled, but it can be
| re-enabled by changing your data policy.").
|
| This is usually not the case for paid models -- is Openrouter
| just marking this model incorrectly or do Deepseek actually
| train on submitted data?
| esafak wrote:
| https://cdn.deepseek.com/policies/en-US/deepseek-privacy-
| pol...
|
| https://openrouter.ai/docs/features/privacy-and-
| logging#data...
|
| It seems so.
| seunosewa wrote:
| It is no longer the case that paid providers don't train on
| your data on Openrouter. You can exclude such sources in the
| settings.
| nacs wrote:
| Yep I have that setting disabled so the number of providers
| for that model on Openrouter currently is 0 for me.
|
| I guess I'll wait for a 3rd party provider on Openrouter
| that doesn't log DS 3.2.
| echelon wrote:
| Is Open Router really open? I see their "main" repo as archived
| and various smaller projects.
|
| Is it just the API client bindings that are open and the core
| routing service is closed!
| esafak wrote:
| I don't know why they need to claim to be open. Their job is
| to connect you to providers on the basis of price and various
| metrics they track. Open or close would makes no difference
| to me.
| echelon wrote:
| It's in the name. Why not name themselves ModelRouter or
| something similar?
|
| If they lead the market, they'll extract value in lots of
| ways that an open company could at least be compelled not
| to. Plus there won't be competition.
|
| They're probably selling your data to LLM companies and you
| don't even see what they're doing.
|
| Without competition, they'll raise their rates.
|
| If they were open, you could potentially run the offering
| on-prem. You could bolt on new providers or use it
| internally for your own routing.
|
| Lots of reasons.
| esafak wrote:
| They can't raise their prices much because providers have
| the upper band, so users will always be able to go
| directly to the source. I use openrouter _and_ openai,
| anthropic, google, etc.
| burkaman wrote:
| Here's an open source alternative you can self-host:
| https://llmgateway.io/
|
| I think it's just called OpenRouter because the founder
| previously started OpenSea (an NFT marketplace), and also
| probably to sound a bit similar to OpenAI. It's like
| companies calling their products "natural" or "organic"
| or "artisan" when they can get away with it, just a
| marketing strategy of using words that conjure up vaguely
| positive connotations in your mind.
| smakosh wrote:
| Fun fact, we own closedrouter.ai and redirects to
| llmgateway.io
| wongarsu wrote:
| I always interpreted it as "open" as in "open market".
|
| It's a frictionless marketplace connecting inference
| providers and customers, creating a more competitive
| market. Or a more open market if you play a bit fast and
| loose with terminology
| mmastrac wrote:
| Interesting that models still evolve fast enough that dedicated
| model-specific hardware isn't a big contender right now. We're
| still seeing major scaling gains on mostly generic platforms.
| gunalx wrote:
| google tpm, groq and cerebras needs yo be mentioned even if
| they are more general architecture optimized.
| ramshanker wrote:
| What happened to Meta Open weights models? Lately I keep hearing
| more of Deepseek than LAAMA?
| Alifatisk wrote:
| Wasn't The Llama 4 maverick and scout a flop?
| grim_io wrote:
| One huge problem with these "cheap" models is that they happen to
| be more expensive in the typical agent workflow if the provider
| does not support caching.
|
| Input and output costs are peanuts compared to the order of
| magnitude(or more) amount of tokens that hit the cache.
|
| At that point you might as well use GPT-5. It will be the same
| price or cheaper, and more capable.
| NotMichaelBay wrote:
| I was under the impression that this model does support
| caching. The pricing page says the cost of input tokens (cache
| hit) is $0.028.
| segmondy wrote:
| you declared a huge problem and followed up with an IF.
|
| deepseek API supports caching, stop manufacturing problems
| where there is none.
|
| https://api-docs.deepseek.com/guides/kv_cache
| grim_io wrote:
| Sure. But there is no way I'm going to use the deepseek
| endpoint.
|
| Openrouter says they might use your data for training.
| cheema33 wrote:
| First you complained about lack of caching. When you were
| informed that the model supports caching, instead of
| admitting your error you switched to an unrelated
| complaint. I hope that you you do not use similar
| strategies for discussion in your personal and work life.
| grim_io wrote:
| Your broad attack on me as a person is unnecessary.
|
| If you read my post carefully, you will realize that I
| did not make any contradictory statements.
| JimDabell wrote:
| > One huge problem with these "cheap" models is that they
| happen to be more expensive in the typical agent workflow if
| the provider does not support caching.
|
| DeepSeek supports caching and cache hits are a tenth of the
| cost.
|
| $0.028/M for cache hit
|
| $0.28/M for cache miss
|
| $0.42/M for output
|
| -- https://api-docs.deepseek.com/news/news250929
| grim_io wrote:
| I auto disqualify the chinese first party endpoints.
|
| If they are okay for you, then sure go ahead. Enjoy the
| caching.
|
| What other provider is going to support it?
| JimDabell wrote:
| > I auto disqualify the chinese first party endpoints.
|
| Why?
| curseofcasandra wrote:
| I'm guessing it's something along the lines of this:
| https://youtu.be/kYiUY07TzS4
| guluarte wrote:
| by your logic then you have to disqualify openai and
| anthropic first party endpoints for testing gpt and
| claude...
| grim_io wrote:
| There is no bug in my logic. Anthropic and OpenAI are not
| chinese first party providers.
| eric15342335 wrote:
| Not sure if I get it correctly:
|
| They trained a thing to learn mimicking the full attention
| distribution but only filtering the top-k (k=2048) most important
| attention tokens so that when the context window increases, the
| compute does not go up linearly but constantly for the
| attention->[query,key] process (it does grow up linearly in the
| graph because you still need to roughly scan the entire context
| window (which an "indexer" will do), but just very roughly here
| in order to speed up things, which is O(L) here).
___________________________________________________________________
(page generated 2025-09-29 23:01 UTC)