[HN Gopher] The path to open-sourcing the DeepSeek inference engine
___________________________________________________________________
The path to open-sourcing the DeepSeek inference engine
Author : Palmik
Score : 332 points
Date : 2025-04-14 15:03 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| rfoo wrote:
| tl;dr "we had our vLLM fork and it's unmaintainable now; guess we
| are going to rebuild it, in the public this time"
| Havoc wrote:
| Unmaintainable seems unduly harsh. There is a big gap between
| maintainable internally and ready for public consumption
| rfoo wrote:
| > Codebase Divergence: Our engine is based on an early fork
| of vLLM from over a year ago
|
| If you are in the same boat you'll see how much changed in
| vLLM compared to one year ago. Also, this meant that they
| haven't rebased for over a year, I don't believe that's
| because they don't want, it's because they effectively can't.
|
| Yeah, surely they can maintain it as-is. But it will be
| increasingly hard to port over anything community has.
| lukeschlather wrote:
| I get the impression their setup is very hard to maintain but
| it's worth every penny. They've done optimizations that wring
| incredible performance out of the hardware they have, but they
| also have specific machine configurations and I wouldn't be
| surprised if they have complicated hacks that get 100% speedups
| for some stuff but those speedups disappear if you have a
| slightly different motherboard configuration. Also there's
| suggestion they've made firmware hacks which are worth it at
| their scale, but might be very dangerous and difficult to apply
| especially on a small scale. (And some of their hacks might
| involve both firmware and cluster-level optimizations, which
| would be useless or counterproductive independently.)
|
| And even if you have somewhat similar hardware, the code might
| not be that helpful, you might be better off with a sketch of
| the solution and implementing it yourself. If you've got a
| large enough cluster it's going to pay for itself anyway.
| maknee wrote:
| They're going to spend time and effort into making their
| optimizations public. Would you rather have them keep their
| changes internal?
| oldgun wrote:
| Nice. We've seen some good engineering work from DeepSeek. Keep
| it coming.
| jimmydoe wrote:
| yes, before usa figures out a way to tariff open source.
| fragmede wrote:
| https://www.instagram.com/reel/DIVBmgUvFsN/
| vintagedave wrote:
| I really empathised with this part:
|
| > Codebase Divergence: Our engine is based on an early fork of
| vLLM from over a year ago. Although structurally similar, we've
| heavily customized it for DeepSeek models, making it difficult to
| extend for broader use cases.
|
| I've been there. Probably a few of us have.
|
| Their approach of working on splitting out maintainable
| sublibraries and sharing info directly even if not integrated
| seems a really nice way of working with the community -- ie, they
| have obstacles, but they're not letting the obstacles cause them
| to take the easy route of not contributing at all. And while it
| might seem better to someone wanting to use their techniques to
| share only working code, not info on the techniques, at least
| it's still knowledge sharing. And again I think it'd be easier
| for them not to do it. So kudos to them.
| rvnx wrote:
| They customized and optimized vLLM for their use case, so much
| that it became a different product (e.g. Debian vs Ubuntu).
|
| The fact they share back some of their improvements is great.
| bonoboTP wrote:
| Non-runnable code can be really useful. I often wish it was
| available for some papers even if I never run it just to check
| what they actually did, because text and equations are often
| not specific enough.
| ozgune wrote:
| In March, vLLM picked up some of the improvements in the DeepSeek
| paper. Through these, vLLM v0.7.3's DeepSeek performance jumped
| to about 3x+ of what it was before [1].
|
| What's exciting is that there's still so much room for
| improvement. We benchmark around 5K total tokens/s with the
| sharegpt dataset and 12K total token/s with random 2000/100,
| using vLLM and under high concurrency.
|
| DeepSeek-V3/R1 Inference System Overview [2] quotes "Each H800
| node delivers an average throughput of 73.7k tokens/s input
| (including cache hits) during prefilling _or_ 14.8k tokens /s
| output during decoding."
|
| Yes, DeepSeek deploys a different inference architecture. But
| this goes onto show just how much room there is for improvement.
| Looking forward to more open source!
|
| [1] https://developers.redhat.com/articles/2025/03/19/how-we-
| opt...
|
| [2] https://github.com/deepseek-ai/open-infra-
| index/blob/main/20...
| nashashmi wrote:
| I feel like this is one way to implement censorship.
| sampton wrote:
| There's an ongoing debate whether LLM should be considered
| intelligent when it's just generating tokens from latent space.
| Meanwhile there are humans that are only capable of spitting
| out the same 5 tokens yet still considered to be "intelligent".
| holoduke wrote:
| I wonder if the large quantity release of opensource AI tools,
| models etc is a deliberate strategy of China to counter the US
| dominance. A good thing for the market imho
| avodonosov wrote:
| What motivates the commercial AI companies to share their
| research results and know-how?
|
| Why did Google published the Transformer architecture instead of
| keeping it to themselves?
|
| I understand that people may want to do good things for humanity,
| facilitate progress, etc. But if an action goes against
| commercial interest, how can the company management take it and
| not get objections from shareholders?
|
| Or there is a commercial logic that motivates sharing of
| information and intellectual property? What logic is that?
| lofaszvanitt wrote:
| The more people copy your outdated thing, the better for you,
| because they always gonna lag behind you.
| bcoughlan wrote:
| I would guess it comes down to that the best researchers in the
| world want their work out in the open
| nodja wrote:
| My understanding is that frontier researchers will work for
| companies that will let them publish papers and discuss them
| with their peers.
|
| When you're an engineer at the tier of these AI researchers,
| winning an extra 100k/year on top of you current 500k (numbers
| out of my ass) is not worth it vs getting name recognition.
| Being known as one of the authors that made the transformer for
| example will enable you work with other bright minded
| individuals and create even better things.
|
| So essentially these commercial companies have "we'll let you
| publish papers when you work for us" as a perk.
| Der_Einzige wrote:
| The ACL, NeurIPS, ICLR and the rest of AI professional
| organizations are why this happens. Forced open sourcing of
| everything. No pay to access. It's the ideal open academic
| environment for rapid innovation. We must jealously defend our
| current system, as it will soon come under attack by those who
| get angry about democratization of the means of computation.
|
| Also, lots of copyright abolitionists in AI. Many people who
| work in the space delight in the idea of making information,
| especially their own, free.
|
| The ghost of Aaron Swartz runs through every researcher in this
| space.
___________________________________________________________________
(page generated 2025-04-14 23:00 UTC)