[HN Gopher] DeepSeek-v3.2-Exp
       ___________________________________________________________________
        
       DeepSeek-v3.2-Exp
        
       Author : meetpateltech
       Score  : 284 points
       Date   : 2025-09-29 10:26 UTC (12 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | terespuwash wrote:
       | Looks like Deep Sparse Attention can help with code (structured
       | and long-file reasoning)
        
       | matrix2596 wrote:
       | awesome that sparse attention used in real world setting
        
       | mythz wrote:
       | Happy to see Chinese OSS models keep getting better and cheaper.
       | It also comes with a 50% API price drop for an already cheap
       | model, now at:
       | 
       | $0.28/M Input ($0.028/M cache hit) > $0.42/M Output
        
         | manishsharan wrote:
         | This price drop is nice but I wonder how long it will last.
         | Their prices used to be very low,then they almost doubled, and
         | now it dropped.
        
           | nacs wrote:
           | I don't know if it will stay this low but the whole point of
           | v3.2 is to be cheaper to run than <= v3.1.
           | 
           | (The inference costs are cheaper for them now as context
           | grows because of the Sparse attention mechanism)
        
           | guluarte wrote:
           | I was using it daily, but after the price jump, using codex
           | and claude was much cheaper than using deepseek.
        
         | dizhn wrote:
         | What was the price before? I thought they had just increased
         | their prices.
        
           | espadrine wrote:
           | Input: $0.07 (cached), $0.56 (cache miss)
           | 
           | Output: $1.68 per million tokens.
           | 
           | https://api-docs.deepseek.com/news/news250929
        
       | Havoc wrote:
       | wow...gigantic reduction in cost while holding the benchmarks
       | mostly steady. Impressive.
        
       | awongh wrote:
       | The 2nd order effect that not a lot of people talk about is
       | price: the fact that model scaling at this pace also correlates
       | with price is amazing.
       | 
       | I think this is just as important to distribution of AI as model
       | intelligence is.
       | 
       | AFAIK there are no fundamental "laws" that prevent price from
       | continuing to fall, at least correlated with Moore's law (or
       | whatever the current AI/Nvidia chip development cycle is called
       | right now)- each new generation of hardware is significantly
       | faster/cheaper than the next- so will we see a ChatGPT-5 model at
       | half the price in a year? (yes I know that thinking models cost
       | more, but just on a per-token basis)
        
         | samuelknight wrote:
         | You are vastly underestimating the price decline. To cherrypick
         | one article; in the first two years since GPT 3.5, inference
         | price for the same amount of intelligence has decreased 10x per
         | year according to a study by Andreessen Horowitz
         | https://a16z.com/llmflation-llm-inference-cost/. So in a stark
         | slowdown scenario, we could still see a 1000x decrease in the
         | next 5 years.
         | 
         | Price deflation is not tied to Moore's right now because much
         | of the performance gains are from model optimization, high
         | bandwidth memory supply chains, and electrical capacity build
         | out, not FLOP density.
        
           | awongh wrote:
           | True! I just know that model optimization gains are much less
           | guaranteed than say, FLOP density, even though model
           | optimization has so far provided way more gains than hardware
           | advancements.
           | 
           | Part of me is optimistic that when the AI bubble bursts the
           | excess data center capacity is going to be another force
           | driving the cost of inference down.
        
             | NemoNobody wrote:
             | Haha, I love how delusional everyone is about AI.
             | 
             | Yeppers, when that bubble burst - that's hilarious. This is
             | the kinda stuff grandkids won't believe someday.
        
             | naasking wrote:
             | > I just know that model optimization gains are much less
             | guaranteed than say, FLOP density, even though model
             | optimization has so far provided way more gains than
             | hardware advancements.
             | 
             | Performance gained from model improvements has outpaced
             | performance gained from hardware improvements for decades.
        
           | throwaway314155 wrote:
           | > has decreased 10x per year according to a study by
           | Andreessen Horowitz
           | 
           | I believe you but that's not exactly an unbiased source of
           | information.
        
       | wwizo wrote:
       | You guys rock! I'm very curious how will this perform against
       | real word data, where small nuance matters. Also have you tested
       | it beyond 128K context window?
        
       | esafak wrote:
       | https://openrouter.ai/deepseek/deepseek-v3.2-exp
        
         | nacs wrote:
         | Strange - the model is marked as "Trains on data" ("To our
         | knowledge, this provider may use your prompts and completions
         | to train new models. This provider is disabled, but it can be
         | re-enabled by changing your data policy.").
         | 
         | This is usually not the case for paid models -- is Openrouter
         | just marking this model incorrectly or do Deepseek actually
         | train on submitted data?
        
           | esafak wrote:
           | https://cdn.deepseek.com/policies/en-US/deepseek-privacy-
           | pol...
           | 
           | https://openrouter.ai/docs/features/privacy-and-
           | logging#data...
           | 
           | It seems so.
        
           | seunosewa wrote:
           | It is no longer the case that paid providers don't train on
           | your data on Openrouter. You can exclude such sources in the
           | settings.
        
             | nacs wrote:
             | Yep I have that setting disabled so the number of providers
             | for that model on Openrouter currently is 0 for me.
             | 
             | I guess I'll wait for a 3rd party provider on Openrouter
             | that doesn't log DS 3.2.
        
         | echelon wrote:
         | Is Open Router really open? I see their "main" repo as archived
         | and various smaller projects.
         | 
         | Is it just the API client bindings that are open and the core
         | routing service is closed!
        
           | esafak wrote:
           | I don't know why they need to claim to be open. Their job is
           | to connect you to providers on the basis of price and various
           | metrics they track. Open or close would makes no difference
           | to me.
        
             | echelon wrote:
             | It's in the name. Why not name themselves ModelRouter or
             | something similar?
             | 
             | If they lead the market, they'll extract value in lots of
             | ways that an open company could at least be compelled not
             | to. Plus there won't be competition.
             | 
             | They're probably selling your data to LLM companies and you
             | don't even see what they're doing.
             | 
             | Without competition, they'll raise their rates.
             | 
             | If they were open, you could potentially run the offering
             | on-prem. You could bolt on new providers or use it
             | internally for your own routing.
             | 
             | Lots of reasons.
        
               | esafak wrote:
               | They can't raise their prices much because providers have
               | the upper band, so users will always be able to go
               | directly to the source. I use openrouter _and_ openai,
               | anthropic, google, etc.
        
               | burkaman wrote:
               | Here's an open source alternative you can self-host:
               | https://llmgateway.io/
               | 
               | I think it's just called OpenRouter because the founder
               | previously started OpenSea (an NFT marketplace), and also
               | probably to sound a bit similar to OpenAI. It's like
               | companies calling their products "natural" or "organic"
               | or "artisan" when they can get away with it, just a
               | marketing strategy of using words that conjure up vaguely
               | positive connotations in your mind.
        
               | smakosh wrote:
               | Fun fact, we own closedrouter.ai and redirects to
               | llmgateway.io
        
             | wongarsu wrote:
             | I always interpreted it as "open" as in "open market".
             | 
             | It's a frictionless marketplace connecting inference
             | providers and customers, creating a more competitive
             | market. Or a more open market if you play a bit fast and
             | loose with terminology
        
       | mmastrac wrote:
       | Interesting that models still evolve fast enough that dedicated
       | model-specific hardware isn't a big contender right now. We're
       | still seeing major scaling gains on mostly generic platforms.
        
         | gunalx wrote:
         | google tpm, groq and cerebras needs yo be mentioned even if
         | they are more general architecture optimized.
        
       | ramshanker wrote:
       | What happened to Meta Open weights models? Lately I keep hearing
       | more of Deepseek than LAAMA?
        
         | Alifatisk wrote:
         | Wasn't The Llama 4 maverick and scout a flop?
        
       | grim_io wrote:
       | One huge problem with these "cheap" models is that they happen to
       | be more expensive in the typical agent workflow if the provider
       | does not support caching.
       | 
       | Input and output costs are peanuts compared to the order of
       | magnitude(or more) amount of tokens that hit the cache.
       | 
       | At that point you might as well use GPT-5. It will be the same
       | price or cheaper, and more capable.
        
         | NotMichaelBay wrote:
         | I was under the impression that this model does support
         | caching. The pricing page says the cost of input tokens (cache
         | hit) is $0.028.
        
         | segmondy wrote:
         | you declared a huge problem and followed up with an IF.
         | 
         | deepseek API supports caching, stop manufacturing problems
         | where there is none.
         | 
         | https://api-docs.deepseek.com/guides/kv_cache
        
           | grim_io wrote:
           | Sure. But there is no way I'm going to use the deepseek
           | endpoint.
           | 
           | Openrouter says they might use your data for training.
        
             | cheema33 wrote:
             | First you complained about lack of caching. When you were
             | informed that the model supports caching, instead of
             | admitting your error you switched to an unrelated
             | complaint. I hope that you you do not use similar
             | strategies for discussion in your personal and work life.
        
               | grim_io wrote:
               | Your broad attack on me as a person is unnecessary.
               | 
               | If you read my post carefully, you will realize that I
               | did not make any contradictory statements.
        
         | JimDabell wrote:
         | > One huge problem with these "cheap" models is that they
         | happen to be more expensive in the typical agent workflow if
         | the provider does not support caching.
         | 
         | DeepSeek supports caching and cache hits are a tenth of the
         | cost.
         | 
         | $0.028/M for cache hit
         | 
         | $0.28/M for cache miss
         | 
         | $0.42/M for output
         | 
         | -- https://api-docs.deepseek.com/news/news250929
        
           | grim_io wrote:
           | I auto disqualify the chinese first party endpoints.
           | 
           | If they are okay for you, then sure go ahead. Enjoy the
           | caching.
           | 
           | What other provider is going to support it?
        
             | JimDabell wrote:
             | > I auto disqualify the chinese first party endpoints.
             | 
             | Why?
        
               | curseofcasandra wrote:
               | I'm guessing it's something along the lines of this:
               | https://youtu.be/kYiUY07TzS4
        
             | guluarte wrote:
             | by your logic then you have to disqualify openai and
             | anthropic first party endpoints for testing gpt and
             | claude...
        
               | grim_io wrote:
               | There is no bug in my logic. Anthropic and OpenAI are not
               | chinese first party providers.
        
       | eric15342335 wrote:
       | Not sure if I get it correctly:
       | 
       | They trained a thing to learn mimicking the full attention
       | distribution but only filtering the top-k (k=2048) most important
       | attention tokens so that when the context window increases, the
       | compute does not go up linearly but constantly for the
       | attention->[query,key] process (it does grow up linearly in the
       | graph because you still need to roughly scan the entire context
       | window (which an "indexer" will do), but just very roughly here
       | in order to speed up things, which is O(L) here).
        
       ___________________________________________________________________
       (page generated 2025-09-29 23:01 UTC)