[HN Gopher] What lengths will Chinese companies go to get an Nvi...
___________________________________________________________________
What lengths will Chinese companies go to get an Nvidia A100 chip?
Author : yorwba
Score : 49 points
Date : 2024-02-26 15:24 UTC (7 hours ago)
(HTM) web link (chinai.substack.com)
(TXT) w3m dump (chinai.substack.com)
| wtcactus wrote:
| Interesting, I didn't know this was so sought after.
|
| I actually have one for sale (the 40GB PCIe one), but I haven't
| got to list it on eBay due to lack of time yet (and because I
| didn't think there was so much interest in it).
|
| To be sincere, maybe for DL this is really much better than the
| alternatives, but for some simulations and parallelizing come
| radiative transfer code, it was not that much better than a RTX
| 4090 with the extra hassle that it's more difficult to cool it.
| moondev wrote:
| How did you end up cooling it? I have an a40 and it's been
| interesting testing all kinds of methods, from two 40mm fans to
| a 3A 9030 centrifugal blower with 3d printed duct
| siver_john wrote:
| As someone who used various types of GPUs in graduate school.
| For most simulations, and even machine learning (unless you
| need the VRAM) you are generally better off going with a
| consumer card. There is generally about the same number of CUDA
| cores and the higher clock speeds will generally net you better
| performance overall.
|
| Simulations where this isn't true is any that need double
| floating point (which you previously were able to do in the
| Titan series of consumer-ish cards). And where it is super
| important for DL is the VRAM it allows you to use much larger
| models. Plus the added features of being able to string them
| together and share memory which is an important feature that
| has been left off consumer cards (honestly in a way that makes
| sense because SLI has been dumb for some time).
| rhdunn wrote:
| The A100 is comparable to the 3090 but with more memory. The
| H100 is the one comparable to the 4090.
|
| The advantage of these is the access to the larger memory. And
| they are able to be linked together such that they all share
| the same memory via NVLink. This makes them scalable for
| processing the large data and holding the models for the larger
| scale LLMs and other NN/ML based models.
| tmaly wrote:
| Have you seen an actual A100?
|
| They are massive, I can imagine them being comparable to a
| 3090 at all.
| Feorn wrote:
| A reference 3090 is longer by 69mm, wider by 29mm, and
| thicker by a slot than a PCIe A100.
|
| Though I think the comment you're replying to was talking
| about them both using the same Nvidia GPU architecture,
| Ampere.
| adfbkandfionio wrote:
| >And they are able to be linked together such that they all
| share the same memory via NVLink. This makes them scalable
| for processing the large data and holding the models for the
| larger scale LLMs and other NN/ML based models.
|
| GPUs connected with NVLink do not exactly share memory. They
| don't look like a single logical GPU. One GPU can issue loads
| or stores to a different GPU's memory using "GPUDirect Peer-
| To-Peer", but you cannot have a single buffer or a single
| kernel that spans multiple GPUs. This is easier to use and
| more powerful than the previous system of explicit copies
| from device to device, perhaps, but a far cry from the way
| multiple CPU sockets "just work". Even if you could treat the
| system as one big GPU you wouldn't want to. The performance
| takes a serious hit if you constantly access off-device
| memory.
|
| NVLink doesn't open up any functionality that isn't available
| over PCIe, as far as I know. It's "merely" a performance
| improvement. The peer-to-peer technology still works without
| NVLink.
|
| NVidia's docs are, as always, confusing at best. There are
| several similarly-named technologies. The main documentation
| page just says "email us for more info". The best online
| documentation I've found is in some random slides.
|
| https://developer.nvidia.com/gpudirect
|
| https://developer.download.nvidia.com/CUDA/training/cuda_web.
| ..
| rhdunn wrote:
| Interesting. So that would mean that you would still need a
| 40 or 80 GB card to run the larger models (30B LLM, 70B
| LLM, 8x7B LLM) and perform training of them.
|
| Or would it be possible to split the model layers between
| the cards like you can between RAM and VRAM? I suppose in
| that case each card would be able to evaluate the results
| of the layers in its own memory and then pass those results
| to the other card(s) as necessary.
| dachworker wrote:
| Here's an interesting question: why are there lots of startups
| claiming they can create better bespoke DL hardware accelerators
| than Nvidia's offerings?
|
| I understand how a bespoke architecture could in theory
| accommodate larger models and offer better throughput despite
| being based on older generations of MOFSET nodes. But if that was
| the case, wouldn't China simply create their own bespoke hardware
| accelerators? So what's stopping them?
| russli1993 wrote:
| hint: they are
| alephnerd wrote:
| > wouldn't China simply create their own bespoke hardware
| accelerators? So what's stopping them?
|
| Fabrication.
|
| SMEE is supposedly able to mass produce 28nm lithography
| machines, but most modern (2016/Pascal onwards) GPUs are
| fabricated using 16nm lithography or lower (eg. Ampere uses a
| 7nm process, and there are multiple newer architectures in the
| pipeline that leverage 3nm fabrication processes at Samsung,
| TSMC, and Intel).
|
| Chinese companies like SMEE are trying, but it will take 3-5
| years to reach 16nm lithography at scale.
|
| Also, GPUs are being limited for simulation (aka Nuclear
| weapons testing) reasons - not "AI" - as just about every
| country except North Korea honors the Comprehensive Nuclear
| Test Ban Treaty, forcing countries to test using HPC (edit:
| also used for Jet Turbine simulation a la Autodesk Federal).
|
| A 28nm process is more than enough to make EWS, Avionics, and
| Precision weapons, which is what Russia uses to manufacture the
| Elbus-8s chipset domestically.
|
| This is partially related to why companies like Nvidia has
| begun moving fabrication to Samsung fabs over TSMC in the short
| term, as SK has a formal defense agreement unlike Taiwan.
| mistrial9 wrote:
| > GPUs are being limited for simulation (aka Nuclear weapons
| testing) reasons - not "AI"
|
| that information seems dated in 2024
| alephnerd wrote:
| Not really.
|
| You don't need bespoke cutting edge hardware or models for
| most defense applications (aka to kill people) today.
|
| For example, C-RAMs are using Maxwell level hardware at
| most.
|
| The biggest driver for GPU, FPGA, and CPU development has
| been nuclear research, and is a major reason why the top
| supercomputers and HPC programs globally are usually linked
| with Nuclear Weapons Labs (eg. LLNL, LBL, Argonne, Oak
| Ridge, NSC Guangzhou).
|
| It just so happens that you use the same math for nuclear
| simulations as you would for "ML", bioinformatics, and
| computer graphics.
|
| It's all Numerical Analysis and Optimization Theory at the
| end of the day.
| rfoo wrote:
| I don't know. I thought Gina specifically said the recent
| ban is for AI. [0]
|
| > "What we cannot allow them to ship is the most
| sophisticated, highest-processing power AI chips, which
| would enable China to train their frontier models," she
| added.
|
| [0] https://www.reuters.com/technology/us-talks-with-
| nvidia-abou...
| alephnerd wrote:
| The origin of these bans started in 2016-17, with Intel
| Xeon processors being restricted after the Chinese NSC
| program was found to be using some of their HPC infra for
| nuclear weapons simulations.
|
| No one wants to say the "Ne" word as it causes some
| pretty severe domestic political blowback.
|
| Already China has been looking at countering American
| second strike capabilities with their nuclear weapons
| buildup over the past decade.
|
| It also doesn't help that unlike most previous Premiers
| post-Mao, Xi Jinping started his administration career in
| the PLA.
| rfoo wrote:
| Xeon Phi bans were _targeted_ and successful. It shows
| that you don 't need to issue a country-wide ban to
| prevent China from using a chip to build their
| supercomputers.
|
| And then, Sunway happened, it's built for military-use
| supercomputers. Do we even know whether it's 40nm, 28nm
| or 14nm or who on the earth fabricated them? And nothing
| changed on how export control works after that, that's
| certainly not the trigger.
|
| GPU bans _before ChatGPT happened_ were also _targeted_ ,
| similar to how BIS Entity List works.
|
| Let's face it: the recent ban-entire-China movement was
| just about "AI", instead of HPC/simulations, the only
| purpose of it is to deny China NVIDIA GPU access and
| ensure they can't compete on SOTA language models.
| kkarakk wrote:
| >as just about every country except North Korea honors the
| Comprehensive Nuclear Test Ban Treaty
|
| i don't think this is correct -
| https://www.nti.org/education-center/treaties-and-
| regimes/co... there are 8 other countries that have not
| signed so why would they honor the treaty?
|
| india in particular.
| alephnerd wrote:
| > india in particular
|
| Last time India (and Pakistan) tested a live nuclear weapon
| was 1998, and both faced SEVERE sanctions at the time - and
| was a major reason both countries stopped buying American
| weapons systems and switched to Israeli and Chinese vendors
| respectively.
|
| The only country left that tests nuclear weapons live is
| North Korea
| kkarakk wrote:
| yes but couldn't the countries that want to do
| simulations just buy through india/saudi arabia etc?
| alephnerd wrote:
| It doesn't scale out logistically.
|
| While some lossage is expected, you can't build a
| Tianhe-2 type supercomputer by smuggling Nvidia (or AMD)
| GPUs - and a nuclear program needs dozens.
|
| China, Russia, NK, and Iran have very severe hardware
| import restrictions circa 2024.
|
| This is why Russia, China, and Iran have been building
| our domestic fabrication capabilities (Russia in the
| early 2010s, China in the early 2020s, and Iran
| presently)
| kkarakk wrote:
| Interesting, thank you for answering!
| cherioo wrote:
| Huawei has created the "ascend" series ai chip. Reportedly it
| has 80% of A100 in theoretical performance.
|
| Rumor is that they had trouble selling it against A100 due to
| worse price to performance and worse software integration. But
| the US sanction has now created a huge market for it.
| alephnerd wrote:
| It's still fabricated by TSMC though.
|
| Domestic fabrication is still at the 28nm phase at most.
|
| That said, the Ascend series a fairly massive leap, as it
| means domestic EDA capabilities have grown massively in
| China.
|
| That said, I remember they had a massive Design labs in both
| SV and Delhi+Bangalore that poached from the Samsung, Nvidia,
| and Intel labs in town before they were kicked out of
| US+India so idk how much design was done domestically in
| China.
| DinaCoder99 wrote:
| > Domestic fabrication is still at the 28nm phase at most.
|
| Allegedly SMIC has both 14nm and 7nm fabs as of last year.
| alephnerd wrote:
| Allegedly but with low yield rates and leveraging
| ASML+Nikon DUV products.
|
| Not to say that Chinese vendors won't eventually crack
| that nut, but it'll take several (3-7) years. Also,
| domestic design capacity seems to be somewhat lacking
| (though rapidly changing)
|
| The most cutting edge domestic lithography tool China has
| is SMEE's 28nm one - that's still a MASSIVE
| accomplishment, but still significantly far from
| 20/16/14/7nm processes.
|
| Even Russia had domestic capabilities for 28nm
| fabrication at scale in the 2010s.
|
| That said, for most military use cases, this is more than
| enough.
| imtringued wrote:
| Because most of them only implement inference, duh.
| elzbardico wrote:
| If money is not a problem for them, I don't see how can we deny
| Chinese companies access to those cards.
|
| The soviets were able to smuggle whole mainframes and mini-
| computers during the 70s. Giant fucking machines that you could
| only buy trough very specific sales channels. It was not like you
| could ebay yourself a S/360, and yet, they did it.
| The_Colonel wrote:
| It's not a question of "if" , but "how many". Abd that's
| largely what matters, same as in the case of USSR.
| __loam wrote:
| Asianometry on YouTube has a ton of videos on Soviet computing
| including one's discussing things like the failed semiconductor
| industry in East Germany. The Soviets certainly smuggled and
| tried to clone a lot of machines but they were always behind
| the United States.
| tibbydudeza wrote:
| I have seen some reports of modded GPU's in Brazil and China
| where they convert earlier gen consumer cards (3000 series) into
| 12GB or 16GB variants using GDDR6 from Samsung.
|
| Seems the GPU boot code is not hard limited but reads capacity
| from what is installed.
___________________________________________________________________
(page generated 2024-02-26 23:02 UTC)