[HN Gopher] Zluda: CUDA on Intel GPUs
___________________________________________________________________
Zluda: CUDA on Intel GPUs
Author : fho
Score : 242 points
Date : 2021-02-25 12:12 UTC (2 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| exhilaration wrote:
| Upvoting so someone from Intel sees it and hires these devs to
| make this actually happen. CUDA absolutely dominates, it would be
| a game changer to have it work with another GPU platform.
| k_sze wrote:
| I think the correct way to solve it is to make something
| competitive for Vulkan. Intel, AMD, and maybe ARM should really
| cooperate on that front.
| einpoklum wrote:
| Very nice! I look forward to trying this out.
|
| There a few issues which come to mid though:
|
| 1. How do these benchmark results compare to writing x64_64
| implementations directly? That is, is it enough to reach OpenCL-
| level performance?
|
| 2. What about if I want to use both a CPU and and a GPU? It seems
| that would not be supported, as this library produces a
| `libcuda.so`.
|
| 3. What about AMD chips?
|
| 4. The README says:
|
| > Authors of CUDA benchmarks used CUDA functions atomicInc
|
| > and atomicDec which have direct hardware support on NVIDIA
|
| > cards, but no hardware support on Intel cards. They have
|
| > to be emulated in software, which limits performance
|
| is there really no support, not even in the near future, for
| atomic increment on Intel chips?
|
| 5. This is all written in Rust. It's a bit fishy to me for a C
| library to be implemented in Rust, though maybe I'm just a little
| prejudiced.
|
| 6. Where is the documentation of the semantic differences between
| proper CUDA and ZLUDA? e.g. - what do streams do? How do events
| work? etc.
| johnnycerberus wrote:
| To be fair I have the same feeling about Rust. C++ is the
| language positioned for high performance computing, almost
| everything that screams HPC is in C++. We already have problems
| in our company modifying some old Fortran libraries that are
| called from C++ and in the future God knows what Rust or Zig
| training we will require for basically no upside for HPC. But
| if safety would be my concern, I would totally look towards
| Rust. Or hopefully, Julia will save the day.
| makapuf wrote:
| Does the same exist with amd ? Sorry for asking maybe the obvious
| but could it be made?
| tomash wrote:
| kinda:
| https://www.reddit.com/r/Amd/comments/hp88jb/is_there_any_ap...
| my123 wrote:
| My opinion about the subject: AMD actually does _not_ care
| about GPU compute on customer hardware.
|
| If they did they would not have:
|
| - Have had support for Linux only, with no ROCm on Windows
|
| - Have made ROCm as a GPU-specific targeted process, there's
| no IR like PTX to make your current *binary* run on future
| GPU generations
|
| - Having dropped support for GCN2/3 (https://github.com/Radeo
| nOpenCompute/ROCm/issues/1353#issuec...) making the _only_
| supported customer GPU generation Vega, with no support for
| RDNA/RDNA2.
|
| They obviously don't care about the market as they should,
| despite anything they might or they might not say. Nothing to
| see here... It's only and solely their own fault that NVIDIA
| is the only option.
|
| Intel so far has a much more competent strategy around GPU
| computing, and might prove to be an actual competitor. I've
| written off AMD as a possible competitor to NVIDIA for GPU
| computing a long time ago.
| dogma1138 wrote:
| They dropped GCN2/3 support before ROCm was even anywhere
| near production ready but the biggest issue is the lack of
| cross platform as you mentioned you can't even use it in
| Windows containers and also that they never even attempted
| to support APUs.
|
| Intel not only has better OpenCL support but will come out
| out of the gate with OneAPI that will support all Intel
| GPUs this means productivity applications could use it
| wether it's for a laptop or for a future productivity
| workstation with an Intel discrete GPU.
|
| It's pretty much impossible to buy a laptop with an AMD GPU
| and run ROCm on it, even the discrete cards are not
| officially supported and since the ROCm binaries are
| hardware specific without official support things tend to
| be even more broken than what they are now.
|
| I really can't understand how AMD could cock it up so
| badly.
| jonplackett wrote:
| So how come this still can't be done with AMD GPUs?
| vetinari wrote:
| See this comment below:
| https://news.ycombinator.com/item?id=26285380
| [deleted]
| martini333 wrote:
| Eth?
| krick wrote:
| You wouldn't care about GPU API when mining crypto. You just
| need more GPUs. There's nothing complicated about the software
| you need for that.
| teleforce wrote:
| Fun facts, now even the latest lower end Intel's CPUs (Atom,
| Pentium, Celeron) will feature the much improved Gen 11 iGPU over
| the last generation Intel's built in GPU [1].
|
| The latest iGPU supports more than 1 TFLOP in GPU performance.
| This is apparently more than the performance of two years old
| Nvidia GeForce GT 1030.
|
| [1]https://www.notebookcheck.net/Intel-s-Elkhart-Lake-SoC-
| will-...
| TomVDB wrote:
| >This is apparently more than the performance of two years old
| Nvidia GeForce GT 1030.
|
| Almost 4 years old...
|
| https://www.techpowerup.com/gpu-specs/geforce-gt-1030.c2954
|
| "The GeForce GT 1030 is an entry-level graphics card by NVIDIA,
| launched in May 2017."
| macksd wrote:
| Love this concept and hope this does well or something like it
| becomes a de facto standard. There's an explosion of software
| using GPUs. In the DL space alone, PyTorch and TensorFlow and
| many frameworks that compete with them, plug-in to them, etc.
| There's also an explosion of DL hardware coming. We have NVIDIA's
| whole product line, competing GPUs, TPUs, Inferentia, a bunch of
| start ups... That compatibility matrix is going to be insane. You
| need a good integration layer between them. If everyone can
| support CUDA, great. But there's also OpenCL and other competing
| standards. They're all going to have some performance trade-offs,
| but some of that must be worth it to keep some semblance of a
| common framework, otherwise small and new players don't stand
| much of a chance.
| tachyonbeam wrote:
| I wish PyTorch would utilize CPUs more effectively out of the
| box. The last time I tried to run some DNN training on CPU
| (last summer), I was disappointed to find that only one single
| core was used on my Ryzen machine. Yes, GPUs don't have as much
| memory bandwidth or compute as GPUs, but this is still leaving
| a lot of performance on the table.
| rrss wrote:
| pytorch can use more than one core.
| keyle wrote:
| On top of HN... I get excited to use it with Blender then scroll
| through a plethora of "awesomeness" text to find this
|
| > Warning: this is a very incomplete proof of concept. It's
| probably not going to work with your application. ZLUDA currently
| works only with applications which use CUDA Driver API or
| statically-linked CUDA Runtime API - dynamically-linked CUDA
| Runtime API is not supported at all.
|
| Common man. You can't call it a drop in replacement if it's an
| incomplete proof of concept.
| JosephRedfern wrote:
| I don't think it's unfair to call it a drop in replacement,
| though the current limitations should perhaps be caveated a
| little more clearly.
|
| I think calling it a drop in replacement highlights that the
| aim of Zluda is to not require recompilation of the software
| being run. Maybe "Proof of concept drop in replacement" would
| be a better description?
| rvz wrote:
| > Common man. You can't call it a drop in replacement if it's
| an incomplete proof of concept.
|
| I was nearly put off by that but maybe it should be given a
| chance with some funding or what not. All great and interesting
| things come from proof of concepts, that actually work.
|
| At least it isn't vapourware. Unlike some 'projects' I see on
| GitHub.
| rvz wrote:
| Downvoters: So this 'proof of concept' on GitHub is in fact
| some how _' vapourware'_, and it shows an empty repository
| with zero functioning working code? I'm looking at the same
| repository as everyone else in this HN post.
|
| Literally the top comment is even suggesting that Intel
| _could_ be interested in funding this, so surely it deserves
| a chance with some backing, even if it is _' incomplete'_.
|
| Care to explain the downvotes?
| [deleted]
| syspec wrote:
| That's the goal, even if not there yet
| bottled_poe wrote:
| What's does this mean? Is the statement in the article true
| or not?
| en4bz wrote:
| > Tying to the previous point, currently ZLUDA does not support
| asynchronous execution. This gives us an unfair advantage in a
| benchmark like GeekBench. GeekBench exclusively uses CUDA
| synchronous APIs
|
| Any "professional" application solely use async APIs so while
| these numbers may look impressive something like tensorflow or
| pytorch would either not run or be incredibly slow.
| anovikov wrote:
| can i use ffmpeg with h264_evenc to encode videos with it?
| wmf wrote:
| Video encoding uses different hardware and different APIs than
| GPGPU. FFMPEG can use Intel, AMD, and Nvidia hardware encoding;
| see https://trac.ffmpeg.org/wiki/HWAccelIntro
| anovikov wrote:
| what's wrong about my question? yes i know it makes little
| sense because every motherboard with Intel GPU probably has a
| CPU with QuickSync Video support and that will most certainly
| work better for encoding (h264_qsv), but in theory?
| dogma1138 wrote:
| How does this work with CUDA libraries? Does it has an
| implementation of the standard libraries at least?
| trevortheblack wrote:
| I was working at Intel on integrated graphics less than a year
| ago.
|
| Geekbench could, at best, be described as a _fine_ benchmark. But
| it 's pretty terrible in a bunch of respects.
|
| I would love to see more benchmarks run through with this.
| maleadt wrote:
| Impressive! The PTX to SPIR-V compiler must have been quite a bit
| of work; what's the coverage of the ISA like?
|
| With oneAPI I had hoped to get the inverse, a oneAPI
| implementation for NVIDIA hardware, but I don't think the CUDA
| driver API is low-level enough to do so (e.g. explicit vs global
| contexts). And yes, I know of Codeplay's implementation of DPC++
| for NVIDIA GPUs, but that doesn't implement oneAPI Level0 APIs so
| is not usable for other languages.
| villgax wrote:
| Intel should back the crap outta this project/dev
| sciprojguy wrote:
| I'm stoked for this to be usable on things like PyTorch.
| rrss wrote:
| The author has a comment here describing what that would take:
| https://github.com/vosen/ZLUDA/issues/17#issuecomment-735403...
| .
|
| tl;dr: someone would need to re-implement cuDNN
| varispeed wrote:
| This looks like something Intel should be paying good money for,
| but I feel like they are just going to "embrace" open source and
| snatch it without giving a penny to the author.
| jcranmer wrote:
| Disclosure: I work at Intel.
|
| At one of the internal Q&As, there was a question as to why we
| didn't just implement CUDA. One of the reasons given was that
| the lawyers looked at the license of CUDA and decided that
| Intel could not legally implement CUDA for Intel's GPU devices.
| I don't know the details, but quite frankly, it wouldn't
| surprise me if Nvidia didn't somehow put a poison pill in there
| to prevent Intel or AMD from implementing it for their own GPUs
| (note that AMD also doesn't provide an implementation of CUDA
| for its own GPUs).
|
| Instead, the strategy Intel pursued was to develop a migration
| tool from CUDA to Sycl:
| https://software.intel.com/content/www/us/en/develop/tools/o...
| The_rationalist wrote:
| Why not collaborate with AMD for once an improve HIP? If AMD
| and Intel stay divided all hope is lost and nvidia will stay
| the main target
| my123 wrote:
| Because HIP/ROCm is awful.
|
| Instead of targeting an IR, it directly targets a given
| GPU's ISA, so that your existing binary will not run on
| future hardware. That's a total no-go for basically every
| non-HPC use case.
|
| Intel is much better off building a sound technical
| foundation from scratch.
| The_rationalist wrote:
| You better make Zluda work well on AMD gpus then
| otherwise it will be irrelevant
| randomNumber7 wrote:
| And it doesn't even work on most consumer AMD graphic
| cards^^
| eslaught wrote:
| AMD implemented HIP, which is nearly CUDA (if not identical).
| There is an implementation for Intel too though it is third-
| party:
|
| https://github.com/cpc/hipcl
| lumost wrote:
| The problem here seems to be that everyone kinda wants
| Nvidia's market capture. Intel didn't contribute to AMD's
| project - they started their own.
| stefan_ wrote:
| And they are all approaching it the wrong way, the
| typical hubris of a hardware company that thinks they can
| just have some interns make a software solution for their
| problem.
|
| Here is how you beat the CUDA lock-in: _consistently make
| better performing GPUs so not using them is a liability_.
| Instead buying AMD you not only get a worse GPU but also
| the intern software solution, and that is just not
| compelling.
| tal8d wrote:
| Or AMD could simply knock off the binary blobs. The DRM
| excuse has always been weak, because it is a tiny
| fraction of the total firmware blob - and it'd be easy to
| make it so that the legally hobbled hardware decoder
| simply errors out in cases where the end user chooses not
| to load the DRM blob. Boom, no more dependence on AMD
| interns. I've been following their commit logs pretty
| closely for a year now, and I frequently see some amazing
| accidental admissions about the left hand (software team)
| not knowing what the right hand (hardware team) is doing.
| During one very frustrating series of patches it was
| difficult resisting the impulse to say "Go get your dad."
| numlock86 wrote:
| You make it sound like that's an issue with Intel while the
| license is to blame.
| krick wrote:
| So, basically this project is illegal and waiting to be
| removed from Github then?
| qiqitori wrote:
| (IANAL.)
|
| Without looking into the CUDA licenses (who knows, they
| might even expressly allow this kind of thing, but seems
| pretty unlikely to me), I'd expect this to be a case of
| whether "APIs are copyrightable" or not, same as the famous
| and sort of still ongoing https://en.wikipedia.org/wiki/Goo
| gle_LLC_v._Oracle_America,_....
|
| The US courts said "yes" in this case (note: 100% stupid
| IMO), but I'm not confident that nVidia'd have an easy win
| if they decided to sue the developer, and I'm also
| relatively sure they wouldn't send a DMCA request at this
| stage (and that if they did, their request would probably
| be reviewed harder than normal).
| jcranmer wrote:
| That's not the most accurate summary of Google v Oracle.
| The case has been tried twice, Google has been found in
| the clear twice at the district court by the jury, and
| the Court of Appeals for the Federal Circuit has twice
| overturned the jury result, and the case is now at the
| Supreme Court awaiting a decision as to whether or not
| CAFC's decision is off it's rocker.
|
| It is not usual for CAFC to hear copyright disputes; that
| it was appealed to CAFC instead of the 9th Circuit is
| because there was a patent claim at one point, and CAFC
| should have followed 9th Circuit precedent on the matter.
| Google contends that 9th Circuit precedent holds that the
| API is not copyrightable, which means that CAFC erred in
| ignoring precedent. Most software companies ultimately
| agree with Google here, not Oracle: it's telling that
| most of the amici who side with Oracle are _not_ software
| companies but media publishers (e.g., MPAA, RIAA).
| iforgotpassword wrote:
| I think you could argue from the interop side which would
| make such a project legal in Europe. So Intel or AMD
| could funnel money into this via their European branches
| or some subsidiary?
| sodality2 wrote:
| That link is missing a trailing period by the way
|
| https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_Americ
| a,_....
|
| Edit: huh? I pasted it in with the period and it gets
| removed. Do trailing periods get removed because it
| thinks it's the end of a sentence?
|
| Anyway, this redirects to the right URL:
| https://en.wikipedia.org/wiki/Google_v_Oracle
| monocasa wrote:
| Yeah, HN strips trailing periods for the reason you
| stated. You can normally get around that by adding a
| hash/pound sign.
|
| https://en.wikipedia.org/wiki/Google_LLC_v._Oracle_Americ
| a,_...
| barkingcat wrote:
| yes. I am not a lawyer but nvidia's lawyers would be
| interested I bet.
| [deleted]
| krick wrote:
| This makes me think, why Intel doesn't actually care? There's a
| lot of money poured into marketing, and today it's always about
| "AI-optimized muti-core nano-whateverer GPU", so isn't it
| obvious making your shitty GPUs compatible with software
| written for competition's less shitty GPUs is probably even
| more profitable in the 5-year run than trying to make your GPUs
| a bit less trash? Yes, Nvidia wouldn't want that, so it won't
| be easy to make your product CUDA-compatible, but not
| impossible. Is it illegal for them to do, or what? We all are
| kind of accustomed to the idea that Intel doesn't care, but why
| don't they care? It is money laying around waiting to be taken,
| and they just ignore it.
| pjmlp wrote:
| > Is ZLUDA a drop-in replacement for CUDA? > > Yes, but certain
| applications use CUDA in ways which make it incompatible with
| ZLUDA
|
| I fail to see how that is a drop-down replacement.
|
| Plus apparently it doesn't support the polyglot CUDA ecosystem.
| Abishek_Muthian wrote:
| Excellent effort. Nvidia has become defacto GPGPU hardware vendor
| due to CUDA, but I wish it was OpenCL or other general API
| instead. Even Raspberry Pi's VideoCore has OpenCL support[1].
|
| But a look at HW Acceleration support table at FFmpeg[2] shows
| why GPGPU Platform API is such a mess. But performance benefits
| are incredible, using VAAPI for FFmpeg to encode 1080p 2560x1080
| screen capture at 60fps reduces CPU usage from 90% to 10% on a
| old corei5 with intel HD 3000; An old laptop could be perfectly
| used as an encoding machine for streaming just by using HW
| Acceleration.
|
| What's funny is that the laptop also has Radeon HD 6490M with 1GB
| GDDR5 dedicated memory and it's not supported by VAAPI for
| encoding! GPGPU API/Platform Support are astonishingly messy.
|
| [1]https://github.com/doe300/VC4CL
|
| [2]https://trac.ffmpeg.org/wiki/HWAccelIntro
| zamadatix wrote:
| GPU accelerated video encoding/decoding is done largely via
| special fixed function hardware not GPGPU so there really isn't
| a relation between the links and the statements. The reason the
| Radeon HD 6000 is not supported for hardware accelerated
| encoding is simply because it did not have a video encoding
| ASIC to use. The HD 7000 series introduced it to the family.
| https://en.wikipedia.org/wiki/Radeon_HD_6000_series#Radeon_F...
| marmaduke wrote:
| > The reason the Radeon HD 6000 is not supported for hardware
| accelerated encoding is simply because it did not have a
| video encoding ASIC to use.
|
| Doesn't this prove the broader point that GPGPU cross
| platform would benefit everyone? A new codec is written.. and
| everyone gets to use it not just those with fixed function
| support.
| zamadatix wrote:
| No, it proves fixed function hardware outperforms general
| purpose hardware for the task it's made for. Without the
| fixed function hardware the GPU is worse suited for the
| task than a general purpose CPU.
| Const-me wrote:
| > Authors of CUDA benchmarks used CUDA functions atomicInc and
| atomicDec which have direct hardware support on NVIDIA cards, but
| no hardware support on Intel cards.
|
| Then how does InterlockedAdd HLSL intrinsic works on Intel?
| https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/...
| [deleted]
| xena wrote:
| Yes please take business away from Nvidia and AMD. I just want to
| buy a GPU.
| wmu wrote:
| Trivia: the Polish word "cuda" means "miracles", "zluda"
| ("zluda") means "a delusion". Nice pun.
| antjanus wrote:
| I'm Czech and under the dialect where I grew up, "Zluda" could
| be translated to "evil person/monster" or "mischievous
| person/monster".
|
| Growing up, people sometimes called their kids "zludy" (plural
| of "zluda" in my language) -- or trouble-makers.
| hackyhacky wrote:
| Interesting. Possibly a variant of the standard word zruda?
| antjanus wrote:
| Yeah, could be! I grew up close to the polish/slovak/czech
| border. I bet some of those words got mixed together.
| neolog wrote:
| Off topic...I heard that a common Polish swear-word is
| "cholera", because the disease is so bad. Is that true?
| ols wrote:
| It is.
___________________________________________________________________
(page generated 2021-02-27 23:01 UTC)