[HN Gopher] DeepSeek open source DeepEP - library for MoE traini...
       ___________________________________________________________________
        
       DeepSeek open source DeepEP - library for MoE training and
       Inference
        
       Author : helloericsf
       Score  : 488 points
       Date   : 2025-02-25 02:27 UTC (20 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | helloericsf wrote:
       | - Efficient and optimized all-to-all communication - Both
       | intranode and internode support with NVLink and RDMA - High-
       | throughput kernels for training and inference prefilling - Low-
       | latency kernels for inference decoding - Native FP8 dispatch
       | support - Flexible GPU resource control for computation-
       | communication overlapping X:
       | https://x.com/deepseek_ai/status/1894211757604049133
        
       | Bimos wrote:
       | The PTX instructions they talked about in the tech report should
       | be pointing to the code here?
        
         | helloericsf wrote:
         | this might help:
         | https://x.com/main_horse/status/1894215779521794058/photo/1
        
         | zardinality wrote:
         | "For extreme performance, we discover and use a behavior-out-
         | of-doc PTX instruction: ld.global.nc.L1::no_allocate.L2::256B.
         | This instruction will lead to an undefined behavior: accessing
         | volatile GPU memory with non-coherent read-only PTX modifiers
         | .nc. But the correctness is tested to be guaranteed with
         | .L1::no_allocate on Hopper architectures, and performance will
         | be much better. If you find kernels not working on some other
         | platforms, you may add DISABLE_AGGRESSIVE_PTX_INSTRS=1 to
         | setup.py and disable this, or file an issue."
        
           | magicalhippo wrote:
           | So non-coherent refers to bypassing cache coherency, ie don't
           | care about what other units might have written to that
           | address? And the L1/L2 modifiers are to avoid L1 thrashing,
           | keeping the value in L2 only?
           | 
           | Or did I get that wrong?
        
             | ta988 wrote:
             | My understanding of the L2 part is that it asks for a 256b
             | pre-fetch (only available on some platforms it seems) but
             | they use vectors of 4 32bits signed ints max so not sure
             | why only the 256 would work or if the fact that it did
             | fetch the next 128 helps.
        
             | saagarjha wrote:
             | Yeah that's about right
        
       | ofou wrote:
       | You gotta love these guys, they're really pushing the open source
       | frontier for all of us, thanks for sharing
        
         | grg0 wrote:
         | Open AI(tm) (with a space)
        
           | hackit2 wrote:
           | Kind of ironic that DeepSeek is more Open than ChatGPT
        
             | gostsamo wrote:
             | They do it for their own reasons, but OpenAI are straight
             | up liars and they are neither open nor give a fuck about
             | humanity.
        
               | chefandy wrote:
               | OpenAyyyyI swear babe I'm gonna open it up any day. Yeah
               | for that grated good or whatever it is you keep yappin
               | about.
        
               | WiSaGaN wrote:
               | It would be hilarious if this scenario played out.
               | 
               | OpenAI starts as a nonprofit, aiming to benefit all
               | humanity. Eventually, they discover a path to AGI and
               | engage in intense internal debates: Should they abandon
               | their original mission and chase profit, knowing it could
               | bring generational wealth? They ultimately decide, "To
               | hell with humanity--let's go for the money."
               | 
               | As they pivot to prioritizing profit, DeepSeek emerges.
               | Staying true to OpenAI's original vision, DeepSeek open-
               | sources everything, benefiting humanity and earning
               | global admiration. Unintentionally, this move tanks
               | OpenAI's valuation. In the end, OpenAI fails to become
               | the hero or secure the massive profits they chased.
               | Instead, they leave behind a legacy rebranded as
               | "ClosedAI"
        
               | ghfhghg wrote:
               | Admittedly I'm a sideline observer but it feels like the
               | first half of your scenario is already happening (sans
               | the agi).
        
               | yieldcrv wrote:
               | "I don't want to live in a world where someone else is
               | making the world a better place better than we are"
               | 
               | - Silicon Valley Season 2
        
               | amelius wrote:
               | Well, they do give us a great free tool to use, but
               | that's where it ends and probably has some agenda behind
               | it.
        
             | azinman2 wrote:
             | Now. It's amazing to me that everyone is like fuck OpenAI
             | deepseek is the savior, when OpenAI's papers and code jump
             | started an AI revolution just a few years ago. Let's wait
             | the same number of years and see what deepseek does.
        
               | gertop wrote:
               | I thought the papers that jump started the revolution
               | came from Google?
        
               | jeffreygoesto wrote:
               | Hinton. And if you'd ask himself probably Schmidthuber.
        
               | larodi wrote:
               | Indeed. And the papers were about doing better
               | translation of char sequences, essentially the tech
               | emerged as linguistics improvement for language. Then
               | someone realised the parrot learns enough ZIP and JPEG
               | alongside and can spit back hazy memories of it all.
               | 
               | the one still super useful thing OpenAI ever released
               | must've been Whisper. But they could've been much more
               | open for sure.
        
             | ur-whale wrote:
             | > Kind of ironic that DeepSeek is more Open than ChatGPT
             | 
             | Not ironic at all.
             | 
             | You've simply be lied to by OpenAI.
             | 
             | Nothing ironic about being naive.
        
           | echelon wrote:
           | I hope you're reading this Sam Altman:
           | 
           | Make Open AI _open_.
           | 
           | Or else you'll lose to the ecosystem.
        
             | sciencesama wrote:
             | Sam is busy with his new kiddo
        
             | ta988 wrote:
             | Too late, there is no more innovation from openai all the
             | people that were the drivers left for Anthropic and the
             | others. They had some of the biggest funding, had the
             | advance... And yet they lost it.
        
             | alpb wrote:
             | That's an impossible ask. Sam is the pinnacle of capitalist
             | ruling class, he's a pure businessman. He has no interest
             | in giving anything for free unless there's a business plan.
             | He doesn't care about humanity. He'll pretend to change the
             | world and tell you that they're inventing AGI, Q*,
             | strawberry or whatever they're branding it, but the reality
             | is he knows it's all over and unless there's a major
             | breakthrough this company will be in major financial
             | trouble. Sorry for the rant but he doesn't deserve much
             | respect for turning all this science to grift. He's
             | actually the person the old openai board warned everyone
             | about.
        
               | anticensor wrote:
               | Their state-of-the-art speech to text model, Whisper, is
               | available as open weights for free.
        
               | echelon wrote:
               | Strategically, they know that needs to run at the edge,
               | and they want users to send them requests to their API
               | without incurring latency or bad user experience.
               | 
               | That is still a fair point, though, and it should be
               | commended. And that hasn't been their only contribution,
               | either.
        
               | anticensor wrote:
               | They could've made it a trusted-computing-only model
               | distributed with a proprietary encryption, unlocked with
               | an expensive licence key if they wanted.
        
             | ur-whale wrote:
             | > I hope you're reading this Sam Altman
             | 
             | I hope he's not.
             | 
             | All he deserves at this point is to go down as hard as
             | possible.
        
           | InkCanon wrote:
           | There's hilariously nothing open about OpenAI, and that was
           | the plan from the start. From the email by Ilya Sutsekver,
           | OpenAI was always going to keep all it's research and code as
           | proprietary information. Open supposedly meant the benefits
           | would be shared. So they basically just became a SaaS with a
           | free tier, like most of them. Musk was right when he called
           | them out for fishing for money as if they were a non profit,
           | but always had plans to become a company
        
             | danans wrote:
             | > Musk was right when he called them out for fishing for
             | money as if they were a non profit, but always had plans to
             | become a company
             | 
             | I believe that he was right, because he of all people
             | should recognize when someone is working from his own
             | playbook of lies and misrepresentation.
             | 
             | Musk is pretty obviously upset because he got outfoxed and
             | cut out of OpenAI, not because of some supposed ideal he
             | holds about safe use of gen AI models.
        
         | blackeyeblitzar wrote:
         | Not really open source. For a truly open source model, check
         | out OLMo 2 from AI2:
         | 
         | https://allenai.org/blog/olmo2
         | 
         | They literally share everything you need to recreate their
         | model, including the data itself. This is what they say on that
         | link above:
         | 
         | > Because fully open science requires more than just open
         | weights, we are excited to share a new round of OLMo updates-
         | including weights, data, code, recipes, intermediate
         | checkpoints, and instruction-tuned models--with the broader
         | language modeling community!
        
       | pama wrote:
       | I feel like a kid in a candy shop. Some of these tricks would
       | take way too long to reverse engineer correctly based on the
       | papers. I hope that the releases this week start a renaissance of
       | the use of MoE as baseline academic models.
        
         | antirez wrote:
         | From this point of view I don't understand what's happening
         | between the actual SOTA models practice and the academic
         | models. The former at this point are all MoEs, starting with
         | GPT4. But then the open models, if not for DeepSeek V3 and
         | Mixtral, are always dense models.
        
           | woctordho wrote:
           | MoEs require less computation and more memory, so they're
           | harder to setup in small labs
        
           | kristianp wrote:
           | I assumed gpt 4o wasn't MOE, being a smaller version of
           | gpt-4, but I've never heard either way.
        
       | deyiao wrote:
       | Is the PTX that everyone was looking forward to included this
       | time?
        
         | find0x90 wrote:
         | Yes, there's some in the csrc/kernels directory. Search for
         | 'asm' to find uses of it.
        
         | swyx wrote:
         | > the PTX that everyone was looking forward to
         | 
         | explanation for the rest of us why this is so important?
        
           | find0x90 wrote:
           | Much of the hype around DeepSeek is due to their
           | extraordinarily low training and inference costs. They
           | achieved this by optimizing their training code, apparently
           | using PTX in addition to CUDA. PTX is kind of an intermediate
           | assembly language for NVIDIA GPUs and people are eager to see
           | how it was used.
        
           | ta988 wrote:
           | Parallel Thread Execution. Think of them as opcodes for the
           | Nvidia GPUs. They are a bit more complex that your
           | traditional opcodes (the lowest level of abstraction
           | accessible to users) in CPUs, as you can specify cache
           | parameters, memory barriers etc.
           | 
           | There are documented combinations of parameters for those
           | instructions but if you fuzz (search new combinations in a
           | random or organized way because you hope some will work the
           | way you want) you can find new ones with unexpected effects
           | or with advantages (in various ways like not polluting
           | caches, speed...)
           | 
           | Which is the case for example for
           | ld.global.nc.L1::no_allocate.L2::256B that they use in
           | deepseek that provides significant acceleration while beeing
           | reliable (although not working on all architectures so they
           | have ways to disable it)
        
             | rfoo wrote:
             | Gonna check what SASS it get translated to and whether it
             | makes any sense.
             | 
             | I wonder if they had SASS assembler for Hopper (either by
             | reverse engineering nvdisasm or by fuzzing instructions +
             | nvdisasm + stare hard) and don't want to say it out :p
        
               | saagarjha wrote:
               | You'd be looking at ptxas here. FWIW, it looks like it
               | generates LDG.E.NA.LTC256B.U8.CONSTANT on my machine.
        
             | saagarjha wrote:
             | CPUs have instructions with similar semantics.
        
       | rvz wrote:
       | Round 2 of open source releases from an actual "Open AI(tm)"
       | company and licensed under MIT.
       | 
       | Once again, DeepSeek is more open than the $157B+ one that is
       | claiming to be "Open".
       | 
       | Almost no-one is talking about Meta's Llama and everyone should
       | expect them to release Llama 4 with reasoning.
       | 
       | The objective is to not be squeezed in the middle of the race to
       | zero.
        
         | swyx wrote:
         | https://www.llama.com/events/llamacon/signup/
        
       | deyiao wrote:
       | Now it includes the highly anticipated PTX! Of course, I don't
       | understand it, but I've already click the star and even the fork
       | button, which basically means I've mastered it, right? I feel
       | incredibly powerful right now...
        
       | mohsen1 wrote:
       | > For extreme performance, we discover and use an out-of-doc PTX
       | instruction: ld.global.nc.L1::no_allocate.L2::256B. This
       | instruction will lead to an undefined behavior: accessing
       | volatile GPU memory with non-coherent read-only PTX modifiers
       | .nc. But the correctness is tested to be guaranteed with
       | .L1::no_allocate on Hopper architectures, and performance will be
       | much better.
        
         | k_sze wrote:
         | Practically speaking, is it possible for NVIDIA to "pull the
         | rug" later, intentionally or otherwise, by subtly changing the
         | behaviour of this out-of-doc instruction on new architectures?
        
           | ammo1662 wrote:
           | They could. That's why there is a switch to disable it.
           | 
           | > If you find kernels not working on some other platforms,
           | you may add DISABLE_AGGRESSIVE_PTX_INSTRS=1 to setup.py and
           | disable this, or file an issue.
        
       | kennyloginz wrote:
       | Spring showers bring may flowers!
        
       | yieldcrv wrote:
       | so while the US is chasing GPU receipts in Singapore just to
       | ensure DeepSeek was using H800s only, the rest of the world can
       | run these optimizations on the full H100s?
       | 
       | while we also pretend that H100s were difficult to get or access
       | because of the US sanctions and their hubris to believe their
       | edicts blanket the globe?
       | 
       | am I understanding this correctly?
        
       | ur-whale wrote:
       | The incentive behind the work of DeepSeek might very well be
       | wrong (something along the lines of a state-sponsored attempt at
       | shrinking the US first mover advantage in AI to nil) but the net
       | result for everyone on the planet is simply fantastic.
       | 
       | So even in the worst case (doing this for the wrong reasons):
       | thank you DeepSeek, you are actually doing what OpenAI lied
       | through their teeth to the whole world about doing for years.
       | 
       | You rock.
        
         | danans wrote:
         | > The incentive behind the work of DeepSeek might very well be
         | wrong (something along the lines of a state-sponsored attempt
         | at shrinking the US first mover advantage in AI to nil)
         | 
         | In the space of international relations, right and wrong don't
         | apply nearly as much. Is open sourcing this any more "wrong"
         | than the export ban on high end Nvidia GPUs?
         | 
         | The open sourcing by DeepSeek (presumably with CCP consent)
         | just happens to be good for both the CCP and the broader open
         | source AI community at the same time, but don't take it as some
         | kind of principled stance by them.
         | 
         | Finding ways to take away other countries' competitive
         | advantages is a major activity off all governments, large and
         | small.
        
           | jimmydoe wrote:
           | It seems CCP is less hate worthy than they were two months
           | ago. Comparing fake democracy with real authoritarian is
           | kinda funny.
        
       | breadwinner wrote:
       | Zuckerberg should stop claiming Meta is open sourcing AI (they
       | are even running TV ads) when they are only releasing the
       | weights, and not the code. Only DeepSeek is real OSS AI.
        
         | lithiumii wrote:
         | Well technically even DeepSeek is not as OSS as OLMo or Open
         | Euro, because they didn't open the data.
        
           | tway223 wrote:
           | For understandable reasons
        
           | echelon wrote:
           | We're 2/3rds of the way there.
           | 
           | We need:
           | 
           | 1. Open datasets for pretrains, including the tooling used to
           | label and maintain
           | 
           | 2. Open model, training, and inference code. Ideally with the
           | research paper that guides the understanding of the approach
           | and results. (Typically we have the latter, but I've seen
           | some cases where that's omitted.)
           | 
           | 3. Open pretrained foundation model weights, fine tunes, etc.
           | 
           | Open AI = Data + Code + Paper + Weights
        
           | buyucu wrote:
           | Opening data is an invitation to lawsuits. That is why even
           | the most die-hard open source enthusiasts are reluctant. It
           | is also why people train a model and generate data with it,
           | rather than sharing the original datasets.
           | 
           | These datasets are huge, and it's practically impossible to
           | make sure they are clean of illegal or embarrassing stuff.
        
             | sdesol wrote:
             | I understand the reasoning and I hope there is legislation
             | in the future that basically goes "If you can't produce the
             | data, you can't charge more than this for it". Basically,
             | LLM producers will have to treat their product as a
             | commodity product that can only be priced based on the
             | compute resources plus some overhead.
        
             | johnla wrote:
             | Sounds like a job for AI.
        
           | chvid wrote:
           | It is pirated material / material that breaks various terms
           | of service but as I understand it is the stuff you can see in
           | Anna's Archive and a bunch of "artificial" training data from
           | queries to OpenAI ChatGPT and other LLMs.
        
         | echelon wrote:
         | Open Weights = Binary Blob
         | 
         | It's a return to the FREEWARE / SHAREWARE model.
         | 
         | This is the language we need to use for "open" weights.
        
         | prjkt wrote:
         | does pytorch count
        
         | duchenne wrote:
         | Come on... Meta has been refining pytorch for more than a
         | decade. It basically contains all that you need to train LLMs,
         | including the latest technologies. What more do you need? The
         | part of the code that is specific to Meta infrastructure?
        
         | blackeyeblitzar wrote:
         | DeepSeek is definitely not real OSS. To be open source, you
         | need to use a real open source license (like the ones OSI
         | lists), and you need to share all pre and post training code,
         | any code related to tuning, any evaluation code, everything
         | related to safety/censorship/etc, and probably the full
         | training data as well. Otherwise you can't reproduce their
         | weights. Sharing weights is like sharing a compiled program.
         | 
         | As far as I know the only true open source model that is
         | competitive is the OLMo 2 model from AI2:
         | 
         | https://allenai.org/blog/olmo2
         | 
         | They even released an app recently, which is also open source,
         | that does on-device inference:
         | 
         | https://allenai.org/blog/olmoe-app
         | 
         | They also have this other model called Tulu 3, which
         | outperforms DeepSeek V3:
         | 
         | https://allenai.org/blog/tulu-3-405B
        
           | startupsfail wrote:
           | Yes, releasing training source code code is like releasing
           | the source code of a compiler used to compile and link the
           | binary.
           | 
           | Lets say you took GCC, modified its sources, compiled your
           | code with it and released your binaries along with modified
           | GCC source code. And you are claiming that your software is
           | open source. Well, it wouldn't be.
           | 
           | Releasing training data is extremely hard, as licensing and
           | redistribution rights for that data are difficult to tackle.
           | And it is not clear, what exactly are the benefits in
           | releasing it.
        
       ___________________________________________________________________
       (page generated 2025-02-25 23:01 UTC)