[HN Gopher] The path to open-sourcing the DeepSeek inference engine
       ___________________________________________________________________
        
       The path to open-sourcing the DeepSeek inference engine
        
       Author : Palmik
       Score  : 332 points
       Date   : 2025-04-14 15:03 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | rfoo wrote:
       | tl;dr "we had our vLLM fork and it's unmaintainable now; guess we
       | are going to rebuild it, in the public this time"
        
         | Havoc wrote:
         | Unmaintainable seems unduly harsh. There is a big gap between
         | maintainable internally and ready for public consumption
        
           | rfoo wrote:
           | > Codebase Divergence: Our engine is based on an early fork
           | of vLLM from over a year ago
           | 
           | If you are in the same boat you'll see how much changed in
           | vLLM compared to one year ago. Also, this meant that they
           | haven't rebased for over a year, I don't believe that's
           | because they don't want, it's because they effectively can't.
           | 
           | Yeah, surely they can maintain it as-is. But it will be
           | increasingly hard to port over anything community has.
        
         | lukeschlather wrote:
         | I get the impression their setup is very hard to maintain but
         | it's worth every penny. They've done optimizations that wring
         | incredible performance out of the hardware they have, but they
         | also have specific machine configurations and I wouldn't be
         | surprised if they have complicated hacks that get 100% speedups
         | for some stuff but those speedups disappear if you have a
         | slightly different motherboard configuration. Also there's
         | suggestion they've made firmware hacks which are worth it at
         | their scale, but might be very dangerous and difficult to apply
         | especially on a small scale. (And some of their hacks might
         | involve both firmware and cluster-level optimizations, which
         | would be useless or counterproductive independently.)
         | 
         | And even if you have somewhat similar hardware, the code might
         | not be that helpful, you might be better off with a sketch of
         | the solution and implementing it yourself. If you've got a
         | large enough cluster it's going to pay for itself anyway.
        
         | maknee wrote:
         | They're going to spend time and effort into making their
         | optimizations public. Would you rather have them keep their
         | changes internal?
        
       | oldgun wrote:
       | Nice. We've seen some good engineering work from DeepSeek. Keep
       | it coming.
        
         | jimmydoe wrote:
         | yes, before usa figures out a way to tariff open source.
        
           | fragmede wrote:
           | https://www.instagram.com/reel/DIVBmgUvFsN/
        
       | vintagedave wrote:
       | I really empathised with this part:
       | 
       | > Codebase Divergence: Our engine is based on an early fork of
       | vLLM from over a year ago. Although structurally similar, we've
       | heavily customized it for DeepSeek models, making it difficult to
       | extend for broader use cases.
       | 
       | I've been there. Probably a few of us have.
       | 
       | Their approach of working on splitting out maintainable
       | sublibraries and sharing info directly even if not integrated
       | seems a really nice way of working with the community -- ie, they
       | have obstacles, but they're not letting the obstacles cause them
       | to take the easy route of not contributing at all. And while it
       | might seem better to someone wanting to use their techniques to
       | share only working code, not info on the techniques, at least
       | it's still knowledge sharing. And again I think it'd be easier
       | for them not to do it. So kudos to them.
        
         | rvnx wrote:
         | They customized and optimized vLLM for their use case, so much
         | that it became a different product (e.g. Debian vs Ubuntu).
         | 
         | The fact they share back some of their improvements is great.
        
         | bonoboTP wrote:
         | Non-runnable code can be really useful. I often wish it was
         | available for some papers even if I never run it just to check
         | what they actually did, because text and equations are often
         | not specific enough.
        
       | ozgune wrote:
       | In March, vLLM picked up some of the improvements in the DeepSeek
       | paper. Through these, vLLM v0.7.3's DeepSeek performance jumped
       | to about 3x+ of what it was before [1].
       | 
       | What's exciting is that there's still so much room for
       | improvement. We benchmark around 5K total tokens/s with the
       | sharegpt dataset and 12K total token/s with random 2000/100,
       | using vLLM and under high concurrency.
       | 
       | DeepSeek-V3/R1 Inference System Overview [2] quotes "Each H800
       | node delivers an average throughput of 73.7k tokens/s input
       | (including cache hits) during prefilling _or_ 14.8k tokens /s
       | output during decoding."
       | 
       | Yes, DeepSeek deploys a different inference architecture. But
       | this goes onto show just how much room there is for improvement.
       | Looking forward to more open source!
       | 
       | [1] https://developers.redhat.com/articles/2025/03/19/how-we-
       | opt...
       | 
       | [2] https://github.com/deepseek-ai/open-infra-
       | index/blob/main/20...
        
       | nashashmi wrote:
       | I feel like this is one way to implement censorship.
        
         | sampton wrote:
         | There's an ongoing debate whether LLM should be considered
         | intelligent when it's just generating tokens from latent space.
         | Meanwhile there are humans that are only capable of spitting
         | out the same 5 tokens yet still considered to be "intelligent".
        
       | holoduke wrote:
       | I wonder if the large quantity release of opensource AI tools,
       | models etc is a deliberate strategy of China to counter the US
       | dominance. A good thing for the market imho
        
       | avodonosov wrote:
       | What motivates the commercial AI companies to share their
       | research results and know-how?
       | 
       | Why did Google published the Transformer architecture instead of
       | keeping it to themselves?
       | 
       | I understand that people may want to do good things for humanity,
       | facilitate progress, etc. But if an action goes against
       | commercial interest, how can the company management take it and
       | not get objections from shareholders?
       | 
       | Or there is a commercial logic that motivates sharing of
       | information and intellectual property? What logic is that?
        
         | lofaszvanitt wrote:
         | The more people copy your outdated thing, the better for you,
         | because they always gonna lag behind you.
        
         | bcoughlan wrote:
         | I would guess it comes down to that the best researchers in the
         | world want their work out in the open
        
         | nodja wrote:
         | My understanding is that frontier researchers will work for
         | companies that will let them publish papers and discuss them
         | with their peers.
         | 
         | When you're an engineer at the tier of these AI researchers,
         | winning an extra 100k/year on top of you current 500k (numbers
         | out of my ass) is not worth it vs getting name recognition.
         | Being known as one of the authors that made the transformer for
         | example will enable you work with other bright minded
         | individuals and create even better things.
         | 
         | So essentially these commercial companies have "we'll let you
         | publish papers when you work for us" as a perk.
        
         | Der_Einzige wrote:
         | The ACL, NeurIPS, ICLR and the rest of AI professional
         | organizations are why this happens. Forced open sourcing of
         | everything. No pay to access. It's the ideal open academic
         | environment for rapid innovation. We must jealously defend our
         | current system, as it will soon come under attack by those who
         | get angry about democratization of the means of computation.
         | 
         | Also, lots of copyright abolitionists in AI. Many people who
         | work in the space delight in the idea of making information,
         | especially their own, free.
         | 
         | The ghost of Aaron Swartz runs through every researcher in this
         | space.
        
       ___________________________________________________________________
       (page generated 2025-04-14 23:00 UTC)