[HN Gopher] DeepSeek open source DeepEP - library for MoE traini...
___________________________________________________________________
DeepSeek open source DeepEP - library for MoE training and
Inference
Author : helloericsf
Score : 488 points
Date : 2025-02-25 02:27 UTC (20 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| helloericsf wrote:
| - Efficient and optimized all-to-all communication - Both
| intranode and internode support with NVLink and RDMA - High-
| throughput kernels for training and inference prefilling - Low-
| latency kernels for inference decoding - Native FP8 dispatch
| support - Flexible GPU resource control for computation-
| communication overlapping X:
| https://x.com/deepseek_ai/status/1894211757604049133
| Bimos wrote:
| The PTX instructions they talked about in the tech report should
| be pointing to the code here?
| helloericsf wrote:
| this might help:
| https://x.com/main_horse/status/1894215779521794058/photo/1
| zardinality wrote:
| "For extreme performance, we discover and use a behavior-out-
| of-doc PTX instruction: ld.global.nc.L1::no_allocate.L2::256B.
| This instruction will lead to an undefined behavior: accessing
| volatile GPU memory with non-coherent read-only PTX modifiers
| .nc. But the correctness is tested to be guaranteed with
| .L1::no_allocate on Hopper architectures, and performance will
| be much better. If you find kernels not working on some other
| platforms, you may add DISABLE_AGGRESSIVE_PTX_INSTRS=1 to
| setup.py and disable this, or file an issue."
| magicalhippo wrote:
| So non-coherent refers to bypassing cache coherency, ie don't
| care about what other units might have written to that
| address? And the L1/L2 modifiers are to avoid L1 thrashing,
| keeping the value in L2 only?
|
| Or did I get that wrong?
| ta988 wrote:
| My understanding of the L2 part is that it asks for a 256b
| pre-fetch (only available on some platforms it seems) but
| they use vectors of 4 32bits signed ints max so not sure
| why only the 256 would work or if the fact that it did
| fetch the next 128 helps.
| saagarjha wrote:
| Yeah that's about right
| ofou wrote:
| You gotta love these guys, they're really pushing the open source
| frontier for all of us, thanks for sharing
| grg0 wrote:
| Open AI(tm) (with a space)
| hackit2 wrote:
| Kind of ironic that DeepSeek is more Open than ChatGPT
| gostsamo wrote:
| They do it for their own reasons, but OpenAI are straight
| up liars and they are neither open nor give a fuck about
| humanity.
| chefandy wrote:
| OpenAyyyyI swear babe I'm gonna open it up any day. Yeah
| for that grated good or whatever it is you keep yappin
| about.
| WiSaGaN wrote:
| It would be hilarious if this scenario played out.
|
| OpenAI starts as a nonprofit, aiming to benefit all
| humanity. Eventually, they discover a path to AGI and
| engage in intense internal debates: Should they abandon
| their original mission and chase profit, knowing it could
| bring generational wealth? They ultimately decide, "To
| hell with humanity--let's go for the money."
|
| As they pivot to prioritizing profit, DeepSeek emerges.
| Staying true to OpenAI's original vision, DeepSeek open-
| sources everything, benefiting humanity and earning
| global admiration. Unintentionally, this move tanks
| OpenAI's valuation. In the end, OpenAI fails to become
| the hero or secure the massive profits they chased.
| Instead, they leave behind a legacy rebranded as
| "ClosedAI"
| ghfhghg wrote:
| Admittedly I'm a sideline observer but it feels like the
| first half of your scenario is already happening (sans
| the agi).
| yieldcrv wrote:
| "I don't want to live in a world where someone else is
| making the world a better place better than we are"
|
| - Silicon Valley Season 2
| amelius wrote:
| Well, they do give us a great free tool to use, but
| that's where it ends and probably has some agenda behind
| it.
| azinman2 wrote:
| Now. It's amazing to me that everyone is like fuck OpenAI
| deepseek is the savior, when OpenAI's papers and code jump
| started an AI revolution just a few years ago. Let's wait
| the same number of years and see what deepseek does.
| gertop wrote:
| I thought the papers that jump started the revolution
| came from Google?
| jeffreygoesto wrote:
| Hinton. And if you'd ask himself probably Schmidthuber.
| larodi wrote:
| Indeed. And the papers were about doing better
| translation of char sequences, essentially the tech
| emerged as linguistics improvement for language. Then
| someone realised the parrot learns enough ZIP and JPEG
| alongside and can spit back hazy memories of it all.
|
| the one still super useful thing OpenAI ever released
| must've been Whisper. But they could've been much more
| open for sure.
| ur-whale wrote:
| > Kind of ironic that DeepSeek is more Open than ChatGPT
|
| Not ironic at all.
|
| You've simply be lied to by OpenAI.
|
| Nothing ironic about being naive.
| echelon wrote:
| I hope you're reading this Sam Altman:
|
| Make Open AI _open_.
|
| Or else you'll lose to the ecosystem.
| sciencesama wrote:
| Sam is busy with his new kiddo
| ta988 wrote:
| Too late, there is no more innovation from openai all the
| people that were the drivers left for Anthropic and the
| others. They had some of the biggest funding, had the
| advance... And yet they lost it.
| alpb wrote:
| That's an impossible ask. Sam is the pinnacle of capitalist
| ruling class, he's a pure businessman. He has no interest
| in giving anything for free unless there's a business plan.
| He doesn't care about humanity. He'll pretend to change the
| world and tell you that they're inventing AGI, Q*,
| strawberry or whatever they're branding it, but the reality
| is he knows it's all over and unless there's a major
| breakthrough this company will be in major financial
| trouble. Sorry for the rant but he doesn't deserve much
| respect for turning all this science to grift. He's
| actually the person the old openai board warned everyone
| about.
| anticensor wrote:
| Their state-of-the-art speech to text model, Whisper, is
| available as open weights for free.
| echelon wrote:
| Strategically, they know that needs to run at the edge,
| and they want users to send them requests to their API
| without incurring latency or bad user experience.
|
| That is still a fair point, though, and it should be
| commended. And that hasn't been their only contribution,
| either.
| anticensor wrote:
| They could've made it a trusted-computing-only model
| distributed with a proprietary encryption, unlocked with
| an expensive licence key if they wanted.
| ur-whale wrote:
| > I hope you're reading this Sam Altman
|
| I hope he's not.
|
| All he deserves at this point is to go down as hard as
| possible.
| InkCanon wrote:
| There's hilariously nothing open about OpenAI, and that was
| the plan from the start. From the email by Ilya Sutsekver,
| OpenAI was always going to keep all it's research and code as
| proprietary information. Open supposedly meant the benefits
| would be shared. So they basically just became a SaaS with a
| free tier, like most of them. Musk was right when he called
| them out for fishing for money as if they were a non profit,
| but always had plans to become a company
| danans wrote:
| > Musk was right when he called them out for fishing for
| money as if they were a non profit, but always had plans to
| become a company
|
| I believe that he was right, because he of all people
| should recognize when someone is working from his own
| playbook of lies and misrepresentation.
|
| Musk is pretty obviously upset because he got outfoxed and
| cut out of OpenAI, not because of some supposed ideal he
| holds about safe use of gen AI models.
| blackeyeblitzar wrote:
| Not really open source. For a truly open source model, check
| out OLMo 2 from AI2:
|
| https://allenai.org/blog/olmo2
|
| They literally share everything you need to recreate their
| model, including the data itself. This is what they say on that
| link above:
|
| > Because fully open science requires more than just open
| weights, we are excited to share a new round of OLMo updates-
| including weights, data, code, recipes, intermediate
| checkpoints, and instruction-tuned models--with the broader
| language modeling community!
| pama wrote:
| I feel like a kid in a candy shop. Some of these tricks would
| take way too long to reverse engineer correctly based on the
| papers. I hope that the releases this week start a renaissance of
| the use of MoE as baseline academic models.
| antirez wrote:
| From this point of view I don't understand what's happening
| between the actual SOTA models practice and the academic
| models. The former at this point are all MoEs, starting with
| GPT4. But then the open models, if not for DeepSeek V3 and
| Mixtral, are always dense models.
| woctordho wrote:
| MoEs require less computation and more memory, so they're
| harder to setup in small labs
| kristianp wrote:
| I assumed gpt 4o wasn't MOE, being a smaller version of
| gpt-4, but I've never heard either way.
| deyiao wrote:
| Is the PTX that everyone was looking forward to included this
| time?
| find0x90 wrote:
| Yes, there's some in the csrc/kernels directory. Search for
| 'asm' to find uses of it.
| swyx wrote:
| > the PTX that everyone was looking forward to
|
| explanation for the rest of us why this is so important?
| find0x90 wrote:
| Much of the hype around DeepSeek is due to their
| extraordinarily low training and inference costs. They
| achieved this by optimizing their training code, apparently
| using PTX in addition to CUDA. PTX is kind of an intermediate
| assembly language for NVIDIA GPUs and people are eager to see
| how it was used.
| ta988 wrote:
| Parallel Thread Execution. Think of them as opcodes for the
| Nvidia GPUs. They are a bit more complex that your
| traditional opcodes (the lowest level of abstraction
| accessible to users) in CPUs, as you can specify cache
| parameters, memory barriers etc.
|
| There are documented combinations of parameters for those
| instructions but if you fuzz (search new combinations in a
| random or organized way because you hope some will work the
| way you want) you can find new ones with unexpected effects
| or with advantages (in various ways like not polluting
| caches, speed...)
|
| Which is the case for example for
| ld.global.nc.L1::no_allocate.L2::256B that they use in
| deepseek that provides significant acceleration while beeing
| reliable (although not working on all architectures so they
| have ways to disable it)
| rfoo wrote:
| Gonna check what SASS it get translated to and whether it
| makes any sense.
|
| I wonder if they had SASS assembler for Hopper (either by
| reverse engineering nvdisasm or by fuzzing instructions +
| nvdisasm + stare hard) and don't want to say it out :p
| saagarjha wrote:
| You'd be looking at ptxas here. FWIW, it looks like it
| generates LDG.E.NA.LTC256B.U8.CONSTANT on my machine.
| saagarjha wrote:
| CPUs have instructions with similar semantics.
| rvz wrote:
| Round 2 of open source releases from an actual "Open AI(tm)"
| company and licensed under MIT.
|
| Once again, DeepSeek is more open than the $157B+ one that is
| claiming to be "Open".
|
| Almost no-one is talking about Meta's Llama and everyone should
| expect them to release Llama 4 with reasoning.
|
| The objective is to not be squeezed in the middle of the race to
| zero.
| swyx wrote:
| https://www.llama.com/events/llamacon/signup/
| deyiao wrote:
| Now it includes the highly anticipated PTX! Of course, I don't
| understand it, but I've already click the star and even the fork
| button, which basically means I've mastered it, right? I feel
| incredibly powerful right now...
| mohsen1 wrote:
| > For extreme performance, we discover and use an out-of-doc PTX
| instruction: ld.global.nc.L1::no_allocate.L2::256B. This
| instruction will lead to an undefined behavior: accessing
| volatile GPU memory with non-coherent read-only PTX modifiers
| .nc. But the correctness is tested to be guaranteed with
| .L1::no_allocate on Hopper architectures, and performance will be
| much better.
| k_sze wrote:
| Practically speaking, is it possible for NVIDIA to "pull the
| rug" later, intentionally or otherwise, by subtly changing the
| behaviour of this out-of-doc instruction on new architectures?
| ammo1662 wrote:
| They could. That's why there is a switch to disable it.
|
| > If you find kernels not working on some other platforms,
| you may add DISABLE_AGGRESSIVE_PTX_INSTRS=1 to setup.py and
| disable this, or file an issue.
| kennyloginz wrote:
| Spring showers bring may flowers!
| yieldcrv wrote:
| so while the US is chasing GPU receipts in Singapore just to
| ensure DeepSeek was using H800s only, the rest of the world can
| run these optimizations on the full H100s?
|
| while we also pretend that H100s were difficult to get or access
| because of the US sanctions and their hubris to believe their
| edicts blanket the globe?
|
| am I understanding this correctly?
| ur-whale wrote:
| The incentive behind the work of DeepSeek might very well be
| wrong (something along the lines of a state-sponsored attempt at
| shrinking the US first mover advantage in AI to nil) but the net
| result for everyone on the planet is simply fantastic.
|
| So even in the worst case (doing this for the wrong reasons):
| thank you DeepSeek, you are actually doing what OpenAI lied
| through their teeth to the whole world about doing for years.
|
| You rock.
| danans wrote:
| > The incentive behind the work of DeepSeek might very well be
| wrong (something along the lines of a state-sponsored attempt
| at shrinking the US first mover advantage in AI to nil)
|
| In the space of international relations, right and wrong don't
| apply nearly as much. Is open sourcing this any more "wrong"
| than the export ban on high end Nvidia GPUs?
|
| The open sourcing by DeepSeek (presumably with CCP consent)
| just happens to be good for both the CCP and the broader open
| source AI community at the same time, but don't take it as some
| kind of principled stance by them.
|
| Finding ways to take away other countries' competitive
| advantages is a major activity off all governments, large and
| small.
| jimmydoe wrote:
| It seems CCP is less hate worthy than they were two months
| ago. Comparing fake democracy with real authoritarian is
| kinda funny.
| breadwinner wrote:
| Zuckerberg should stop claiming Meta is open sourcing AI (they
| are even running TV ads) when they are only releasing the
| weights, and not the code. Only DeepSeek is real OSS AI.
| lithiumii wrote:
| Well technically even DeepSeek is not as OSS as OLMo or Open
| Euro, because they didn't open the data.
| tway223 wrote:
| For understandable reasons
| echelon wrote:
| We're 2/3rds of the way there.
|
| We need:
|
| 1. Open datasets for pretrains, including the tooling used to
| label and maintain
|
| 2. Open model, training, and inference code. Ideally with the
| research paper that guides the understanding of the approach
| and results. (Typically we have the latter, but I've seen
| some cases where that's omitted.)
|
| 3. Open pretrained foundation model weights, fine tunes, etc.
|
| Open AI = Data + Code + Paper + Weights
| buyucu wrote:
| Opening data is an invitation to lawsuits. That is why even
| the most die-hard open source enthusiasts are reluctant. It
| is also why people train a model and generate data with it,
| rather than sharing the original datasets.
|
| These datasets are huge, and it's practically impossible to
| make sure they are clean of illegal or embarrassing stuff.
| sdesol wrote:
| I understand the reasoning and I hope there is legislation
| in the future that basically goes "If you can't produce the
| data, you can't charge more than this for it". Basically,
| LLM producers will have to treat their product as a
| commodity product that can only be priced based on the
| compute resources plus some overhead.
| johnla wrote:
| Sounds like a job for AI.
| chvid wrote:
| It is pirated material / material that breaks various terms
| of service but as I understand it is the stuff you can see in
| Anna's Archive and a bunch of "artificial" training data from
| queries to OpenAI ChatGPT and other LLMs.
| echelon wrote:
| Open Weights = Binary Blob
|
| It's a return to the FREEWARE / SHAREWARE model.
|
| This is the language we need to use for "open" weights.
| prjkt wrote:
| does pytorch count
| duchenne wrote:
| Come on... Meta has been refining pytorch for more than a
| decade. It basically contains all that you need to train LLMs,
| including the latest technologies. What more do you need? The
| part of the code that is specific to Meta infrastructure?
| blackeyeblitzar wrote:
| DeepSeek is definitely not real OSS. To be open source, you
| need to use a real open source license (like the ones OSI
| lists), and you need to share all pre and post training code,
| any code related to tuning, any evaluation code, everything
| related to safety/censorship/etc, and probably the full
| training data as well. Otherwise you can't reproduce their
| weights. Sharing weights is like sharing a compiled program.
|
| As far as I know the only true open source model that is
| competitive is the OLMo 2 model from AI2:
|
| https://allenai.org/blog/olmo2
|
| They even released an app recently, which is also open source,
| that does on-device inference:
|
| https://allenai.org/blog/olmoe-app
|
| They also have this other model called Tulu 3, which
| outperforms DeepSeek V3:
|
| https://allenai.org/blog/tulu-3-405B
| startupsfail wrote:
| Yes, releasing training source code code is like releasing
| the source code of a compiler used to compile and link the
| binary.
|
| Lets say you took GCC, modified its sources, compiled your
| code with it and released your binaries along with modified
| GCC source code. And you are claiming that your software is
| open source. Well, it wouldn't be.
|
| Releasing training data is extremely hard, as licensing and
| redistribution rights for that data are difficult to tackle.
| And it is not clear, what exactly are the benefits in
| releasing it.
___________________________________________________________________
(page generated 2025-02-25 23:01 UTC)