[HN Gopher] Zluda: Run CUDA code on Intel GPUs, unmodified
       ___________________________________________________________________
        
       Zluda: Run CUDA code on Intel GPUs, unmodified
        
       Author : goranmoomin
       Score  : 217 points
       Date   : 2023-06-15 14:42 UTC (8 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | raphaelj wrote:
       | Related question, what is the best way to handle kernel
       | compatibility for CUDA, OpenCL, etc ... ?
       | 
       | I had to write a cross-platform kernel a few weeks ago, and I
       | ended using pre-processor guards to make it work with the OpenCL
       | and CUDA compilers [1].
       | 
       | [1]
       | https://github.com/RaphaelJ/libhum/blob/main/libhum/match.ke...
        
         | vkaku wrote:
         | Answer is unconventional. Run CI/CD ... It's way easier to see
         | if things will break live when you have it being run-tested on
         | these stacks.
        
       | dang wrote:
       | Related:
       | 
       |  _Zluda: CUDA on Intel GPUs_ -
       | https://news.ycombinator.com/item?id=26262038 - Feb 2021 (77
       | comments)
        
       | 01100011 wrote:
       | > Is ZLUDA a drop-in replacement for CUDA? Yes, but certain
       | applications use CUDA in ways which make it incompatible with
       | ZLUDA
       | 
       | So no then.
        
         | glimshe wrote:
         | It's a drop-in replacement in the sense that you don't need to
         | modify your code. But it has limitations/incompatibilities.
         | Contrast to something that isn't a drop-in replacement... That
         | would require changes to the application.
        
           | 01100011 wrote:
           | This statement makes no sense.
           | 
           | "It's compatible with CUDA as long as you don't use all the
           | features of CUDA."
           | 
           | So it's a drop in replacement for some subset of modern CUDA.
           | I feel like most folks who are upvoting this don't program
           | CUDA professionally or aren't very advanced in their usage of
           | it.
        
             | dragonwriter wrote:
             | "Drop-in" = to the extent it works, it requires no app
             | changes.
             | 
             | "Complete" = it covers everything.
             | 
             | It is drop-in, but not complete.
        
               | dahart wrote:
               | The issue being that in the original question "drop-in"
               | implies complete, that's what being a drop-in replacement
               | actually means in other contexts. If it's not complete,
               | then it's not _really_ drop-in, even though I don't
               | necessarily disagree with your definition. You can be
               | right, and parent can be right too, IMHO. The FAQ
               | question is stated ambiguously and in misleadingly black
               | and white terms, and the answer really does look kinda
               | funny starting with the word "Yes" and then following
               | that with "but... not exactly". Wouldn't it be better to
               | say drop-in is the goal, and because it's not complete,
               | we're not there yet?
        
             | glimshe wrote:
             | Does the statement _make no sense_ or you simply don 't
             | like it? I can understand what it says, sounds like plain
             | English to me.
             | 
             | I think that what you're trying to say is that before
             | claiming to be a "drop-in replacement", make sure that your
             | supported feature set is representative enough of mainline
             | CUDA development.
        
         | jjallen wrote:
         | yeah would probably be good to include where it doesn't work.
         | 1% of the time? 10%?
        
           | mattkrause wrote:
           | It's near the bottom:
           | 
           | "What is the status of the project?
           | 
           | This project is a Proof of Concept. About the only thing that
           | works currently is Geekbench. It's amazingly buggy and
           | incomplete. You should not rely on it for anything serious."
        
         | meepmorp wrote:
         | 50% of the time it works every time.
        
           | catchnear4321 wrote:
           | that's not quite accurate.
           | 
           | it works 100% of the time. until it does not.
        
       | jlebar wrote:
       | Oh wow, I think this translates PTX (nvidia's high-level assembly
       | code) to SPIR-V? Am I reading this right? That's...a lot.
       | 
       | A note to any systems hackers who might try something like this,
       | you can also retarget clang's CUDA support to SPIR-V via a LLVM-
       | to-SPIR-V translator. I can say with confidence that this works.
       | :)
        
       | rapatel0 wrote:
       | Intel should pour money into this project until the code is
       | hosted in scrooge's money bin
        
       | YetAnotherNick wrote:
       | If a single dev could do it, why can't AMD do the same for their
       | GPUs.
        
         | ftxbro wrote:
         | it's because AMD drivers generate dead loop
         | https://youtu.be/Mr0rWJhv9jU?t=320
        
         | simcop2387 wrote:
         | Patents and software licenses probably.
        
         | mrstone wrote:
         | I imagine they could, but it is probably more of a legal thing.
        
           | viewtransform wrote:
           | Yes it is a legal issue. AMD cannot implement CUDA.
           | 
           | However, they have gone around that by creating HIP which is
           | a CUDA adjacent language that runs on AMD and also translates
           | to CUDA for Nvidia GPUs. There is also the HIPify tool to
           | automatically convert existing sources from CUDA to HIP.
           | https://docs.amd.com/bundle/HIP-Programming-
           | Guide-v5.3/page/...
        
             | CoastalCoder wrote:
             | Legal question:
             | 
             | Let's suppose that an open-source CUDA API is in a legal
             | gray zone that could only be clarified by a judge.
             | 
             | Could a company like AMD create a wholly owned subsisiary
             | to make an attempt, whithout exposing the parent company to
             | legal liability?
        
             | remram wrote:
             | Is that true? AMD implemented the x86 instruction set,
             | Google implemented the Java APIs, what is different about
             | CUDA?
        
               | hardware2win wrote:
               | Doesnt AMD and Intel have agreement on x86?
        
               | remram wrote:
               | I see
        
               | constantcrying wrote:
               | AMD has an x86 license from Intel and Intel has a x86-64
               | license from AMD.
               | 
               | You can guess how much money lawyers have been paid over
               | that circumstance.
        
               | circuit10 wrote:
               | x86 is protected by patents and only a few companies can
               | use them, that's why you don't see random companies
               | making x86 CPUs like they can with ARM (which also needs
               | licensing but it's much easier to get)
        
               | cesarb wrote:
               | The first x86-64 processor has been released more than 20
               | years ago, so any patents on that base architecture
               | (which includes SSE2) have already expired.
        
               | riemannzeta wrote:
               | Good luck enforcing those software parents though in the
               | current environment. My sense is that hardware companies
               | put more faith in the enforceability of patent claims to
               | something like CUDA than software companies with
               | experience with litigating patent claims to APIs would.
               | Post Oracle v. Google, something like CUDA is vulnerable
               | to being knocked off.
               | 
               | And FWIW, that seems to be a reasonable result given the
               | overall market structure at the moment. Having all eggs
               | in the Nvidia basket is great for Nvidia shareholders,
               | but not for customers and probably not even for the
               | health of the surrounding industry.
        
               | rany_ wrote:
               | genuine question, why don't these protections apply to
               | emulators? How could emulators get away with emulating
               | x86 but chip manufacturers cannot use x86 for their chips
               | without a license?
        
               | remram wrote:
               | Probably fair use, which is a subjective thing but one
               | Intel must be confident enough they would lose. Or just
               | lack of incentive, there is no money for Intel to gain if
               | they win.
        
             | YetAnotherNick wrote:
             | HIPify is such a half baked effort, in everything from
             | installation to benchmark to marketing. It doesn't look
             | like they are trying to support this method at all.
        
             | Onavo wrote:
             | Didn't Scotus affirm that APIs are not copyrightable?
        
               | pavon wrote:
               | No, they ruled that APIs _are_ copyrightable, but that
               | Google 's re-implementation was fair use. Based on the
               | reasoning of the decision one would expect that in most
               | cases independently reimplementing an API would generally
               | be fair use. However, from a practical point of view, if
               | you are defending yourself in a copyright lawsuit, fair
               | use decisions happen much later in the process and are
               | more subjective.
               | 
               | Furthermore, CUDA is a language (dialect of C/C++) not an
               | API, so that precedent may not have much weight.
        
               | hedora wrote:
               | In the end, google reimplemented Java, and the supreme
               | court ruled on a very narrow piece of the
               | reimplementation. I think it came down to a former
               | sun/oracle employee at google actually copy pasting code
               | from the original java code base.
               | 
               | I'm reasonably sure they could reimplement CUDA from a
               | copyright / trademark perspective. It's possible that
               | they could be blocked with patents though.
        
               | monocasa wrote:
               | > think it came down to a former sun/oracle employee at
               | google actually copy pasting code from the original java
               | code base.
               | 
               | IIRC, the verbatim copying of rangeCheck didn't make it
               | to SCOTUS. They really did instead rule on the
               | copyrightability of the "structure, sequence, and
               | organization" of the Java API as a whole.
        
               | COGlory wrote:
               | Then why isn't Microsoft suing Valve for Proton/DXVK?
        
               | constantcrying wrote:
               | Because these lawsuits are costly, a PR nightmare,
               | loosing them is a serious possibility and going around
               | fighting your competition with lawsuits can put you into
               | a bad place with government agencies.
               | 
               | Playing games on linux is not a threat to microsoft. The
               | money they loose on that is miniscule.
        
               | hobofan wrote:
               | Probably because they don't care?
        
               | dotnet00 wrote:
               | MS has been building lots of goodwill with gamers by
               | bringing games to PC and subverting expectations by not
               | being opposed to using game pass on Steam Deck. Suing
               | Valve or trying to shut down Proton/DXVK would instantly
               | burn all that.
        
               | easyThrowaway wrote:
               | Because they ran their numbers and realized they have
               | more to lose by going against Valve than... amicably find
               | a compromise.
               | 
               | A more aggressive approach was tried during the Xbox 360
               | era with the Games For Windows Live framework and by
               | removing their games from the Steam store. It ended up
               | catastrophically bad and they had to backtrack on both
               | decisions.
               | 
               | The irony of Proton de facto killing any chance for
               | native linux ports of windows games isn't lost to them,
               | either.
        
               | CoastalCoder wrote:
               | Not sure if this is relevant, but IIUC Proton and Wine
               | implement Windows' ABI, rather than something involving
               | copyrighted header files.
        
         | hedora wrote:
         | Probably for the same reason JWZ reimplemented OpenGL 1.3 on
         | top of OpenGL ES 1.1 in three days, but the vendors can't do
         | it:
         | 
         | https://www.jwz.org/blog/2012/06/i-have-ported-xscreensaver-...
         | 
         | https://news.ycombinator.com/item?id=4134426
        
           | voxadam wrote:
           | It's probably a good idea to hide the referrer on links to
           | jwz's site, he holds some fairly strong options about HN.
           | 
           | https://dereferer.me/?https%3A//www.jwz.org/blog/2012/06/i-h.
           | ..
        
             | tempaccount420 wrote:
             | Although true, I don't think we should be trying to
             | circumvent his block.
        
           | Ygg2 wrote:
           | It's better suggestion to just not visit JWZ site :P
           | 
           | Went there a few days ago. Got a colonoscopy picture. Even
           | without deferrer/referrer.
        
         | bee_rider wrote:
         | From the README:
         | 
         | > Is ZLUDA a drop-in replacement for CUDA?
         | 
         | > Yes, but certain applications use CUDA in ways which make it
         | incompatible with ZLUDA
         | 
         | > What is the status of the project?
         | 
         | > This project is a Proof of Concept. About the only thing that
         | works currently is Geekbench. It's amazingly buggy and
         | incomplete. You should not rely on it for anything serious
         | 
         | It is a cool proof of concept but we don't know how far away it
         | is from becoming something that a company would willingly
         | endorse. And I suspect AMD or Intel wouldn't want to put a ton
         | of effort into... helping people continue to write code in
         | their competitor's ecosystem.
        
           | jabradoodle wrote:
           | Cuda has won though, it's not about helping people write code
           | for your competitors, it's about allowing the most used
           | packages for ML to run on your hardware.
        
       | NelsonMinar wrote:
       | This looks really interesting but also early days. "this is a
       | very incomplete proof of concept. It's probably not going to work
       | with your application." I hope it develops into something broadly
       | usable!
        
       | pjmlp wrote:
       | CUDA is a polyglot development enviroment, usually all these
       | projects fail short to focus only on C++.
       | 
       | I failed to find information regarding Fortran, Haskell, Julia,
       | .NET, Java support for CUDA workloads.
        
       | misterbishop wrote:
       | Why is this trending when there hasn't been a commit since Jan
       | 2021? There's a comment here like "it's early days"... the repo
       | has been dormant for longer than it was active.
        
       | trostaft wrote:
       | This project has been unmaintained for a while.
       | 
       | Both Intel and AMD have the opportunity to create some actual
       | competition for NVIDIA in the GPGPU space. Intel, at least, I can
       | forgive since they only just entered the market. Why AMD has
       | struggled so hard to get anything going for so long, I don't
       | know...
        
         | garbagecoder wrote:
         | * and Apple.
        
         | moooo99 wrote:
         | AMD has made several attempts, their most recent effort
         | apparently is the ROCm [0] software platform. There is an
         | official PyTorch distro for Linux that supports ROCm [1] for
         | acceleration. There's also frameworks like tinygrad [2] that
         | (claim) support for all sorts of accelerators. Thats as far as
         | the claims go, I don't know how it handles the real world. If
         | the occasional George Hotz livestream (creator of TinyGrad) is
         | anything to go by, AMD has to rule out a lot of driver issues
         | to be any actual competition for team green.
         | 
         | I really hope AMD manages a comeback like they showed a few
         | years ago with their CPUs. Intel joining the market is
         | certainly helping, but having three big players competing would
         | certainly be desirable for all sorts of applications that
         | require GPUs. AMD cars like the 7900 XTX are already fairly
         | promising on paper with fairly big VRAMs, they'd probably be
         | much more cost effective than NVIDIA cards if software support
         | was anywhere near comparable.
         | 
         | [0]: https://www.amd.com/en/graphics/servers-solutions-rocm
         | 
         | [1]: https://pytorch.org/
         | 
         | [2]: https://github.com/geohot/tinygrad
        
           | mmis1000 wrote:
           | I think the most weird thing is that. The ROCm works fine on
           | linux. At least some dedicated workstation can use it with
           | specific cards. And it already exists for many years. But
           | somehow they can't make any single card work on windows after
           | so many years passed. (Or they don't want for some reason?).
           | It's really weird, given they already have a working
           | implementation (just not for windows) so they are not lacking
           | of the ability for making it work.
        
           | jacooper wrote:
           | The issue with ROCm is that its completely unaccessible for
           | most users. It only support high end GPUs.
           | 
           | While cuda works on a 1050ti.
        
             | kkielhofner wrote:
             | Supported on CUDA 12 no less!
             | 
             | To get an idea, the 1050ti is a card with an MSRP of $140 -
             | almost seven years ago when it was released. Between the
             | driver and CUDA support matrix it will likely end up with a
             | 10 year support life.
             | 
             | While it's not going to impress with an LLM or such it's
             | the lowest minimum supported card for speech to text with
             | Willow Inference Server (I'm the creator) and it still puts
             | up impressive numbers.
             | 
             | Same for the GTX 1060/1070, which you can get with up to
             | 8GB VRAM today for ~$100 used. Again, not impressive for
             | the hotter LLMs, etc but it will do ASR, TTS, video
             | encoding/decoding, Frigate, Plex transcoding, and any
             | number of other things with remarkable performance
             | (considering cost). People also run LLMs on them and from a
             | price/performance/power standpoint it's not even close
             | compared to CPU.
             | 
             | The 15 year investment and commitment to universal support
             | for any Nvidia GPU across platforms (with very long support
             | lifecycles) is extremely hard to compete with (as we see
             | again and again with AMD attempts in the space).
        
               | jacooper wrote:
               | To be fair, AMD does offer good long term support for
               | cards, just not with ROCm.
        
             | vkaku wrote:
             | Agreed. What they've also been doing is stalling/removing
             | support on gfx803 and that literally could have allowed
             | people to still use the GPU for doing many decent small
             | nets.
        
           | joe_the_user wrote:
           | " _AMD has made several attempts..._ "
           | 
           | And failed to make any of them work, which to my mind means
           | they've burned their possibilities _more_ than if they flat-
           | out did nothing.
        
           | kadoban wrote:
           | Does ROCm count as an attempt? They burned so many people by
           | not supporting any of the cards anyone cares about.
        
             | Tostino wrote:
             | All it would take to remedy that, is actually providing
             | good support going forward and a bit of advertising. Not a
             | huge barrier IMO.
        
         | MuffinFlavored wrote:
         | > Both Intel and AMD have the opportunity to create some actual
         | competition for NVIDIA in the GPGPU space.
         | 
         | Apple Silicon being on Metal Performance Shaders (I think they
         | deprecated OpenCL support?) kind of makes this all more
         | confusing.
         | 
         | It definitely feels like CUDA is the leader and anything else
         | is backseat/a non starer, which is fine. The community support
         | isn't there.
         | 
         | I haven't heard anybody talk about AMD Radeon GPUs in a looong
         | time.
        
           | pjmlp wrote:
           | All the competition fails in tooling and not being polyglot
           | like CUDA.
           | 
           | So it is already a non starter if they can't meet those
           | baselines.
        
         | easythrees wrote:
         | AMD has HIP.
        
           | trostaft wrote:
           | Yes, but I don't think its debatable to say that the entire
           | ecosystem is firmly behind NVIDIA's. Usually it comes as a
           | surprise when something does support their framework, whether
           | directly ROCm or even HIP which should be easier....
           | 
           | I shouldn't be surprised that AMD's ecosystem is lagging
           | behind, since their GPU division spent a good decade
           | suffering to be even relevant. Not to mention that NVIDIA has
           | spent a lot of effort on their HPC tools.
           | 
           | I don't want this to be too negative towards AMD, they have
           | been steadily growing in this space. Some things do work
           | well, e.g. stable diffusion is totally fine on AMD GPUs. So
           | they seem to be catching up. I just feel a little impatient,
           | especially since their cards are more than powerful enough to
           | be useful. I suppose my point is that the gap in HPC software
           | between NVIDIA and AMD is much larger than the actual
           | capability gap in their hardware, and that's a shame.
        
             | Eisenstein wrote:
             | Apple was a few months from bankruptcy during most of the
             | 90s competing with IBM and Microsoft, then turned around to
             | become the most profitable company on the planet. It takes
             | a leader and a plan and a lot of talent and the exact right
             | conditions, but industry behemoths get pulled down from the
             | top spot all the time.
        
               | tracker1 wrote:
               | Apple's success is mostly UX and marketing with a walled
               | garden for an application tax. AMD has to actually
               | achieve on the hardware side, not just marketing. Beyond
               | this, AMD has demonstrated that they are, indeed working
               | on closing and pulling ahead. AMD is well ahead of Intel
               | on the server CPU front, they're neck and neck on
               | desktop, with spans ahead in the past few years. And on
               | the GPU side, they've closed a lot of gaps.
               | 
               | While I am a little bit of a fan of AMD, there's still
               | work to do. I think AMD really needs to take advantage of
               | their production margins to gain more market share. They
               | also need to get something a bit closer to the 4090 on a
               | performance gpu + entry workstation api/gpgpu workload
               | card. The 7900 XTX is really close, but if they had
               | something with say 32-48gb vram in the sub-2000 space it
               | would really get a lot of the hobbiest and soho types to
               | consider them.
        
               | elzbardico wrote:
               | Yeah, sure, changing their platform 3 times in the space
               | of some twenty years is just marketing and UX from Apple.
               | They are just a bunch of MBAs. Sometimes I feel like I am
               | reading slashdot.
        
               | AnthonyMouse wrote:
               | The platform changes had very little to do with their
               | success. They switched from PowerPC to Intel because
               | PowerPC was uncompetitive, but that doesn't explain why
               | they did any better than Dell or anyone else using the
               | exact same chips. Then they developed their own chips
               | because Intel was stagnant, but they barely came out
               | before AMD had something competitive and it's not obvious
               | they'd have been in a meaningfully different position had
               | they just used that.
               | 
               | Their hardware is good but if all they were selling was
               | Macbooks and iPhones with Windows and Android on them,
               | they wouldn't have anything near their current margins.
        
               | robbiep wrote:
               | You're not really making sense.
               | 
               | If they hadn't made platform changes they would have
               | never been able to turn into what they are today. I
               | hardly thing that is 'little to do'.
               | 
               | They would likely barely exist. They have 'achieved
               | product market fit' as the saying goes. Which requires
               | more than just a sharp UI, as their history shows
        
           | freedomben wrote:
           | Yeah, but try running ML projects on your AMD card and you'll
           | quickly see that they're an afterthought nearly everywhere,
           | even projects that use PyTorch (which has backend support for
           | AMD). If consumers can't use it, they're going to learn
           | nvidia and experience has shown that people opt for
           | enterprise tech that they're familiar with, and most people
           | get familiar hacking on it locally
        
       | zackmorris wrote:
       | Are there any projects that go the opposite direction, to run CPU
       | code on GPU? I understand that there might be limitations, like
       | not being able to access system calls or the filesystem. What I'm
       | mainly looking for is a way to write C-style code and have it run
       | auto-parallelized, without having to drop into a different
       | language or annotate my code or manually manage buffers.
        
         | andy_ppp wrote:
         | I think C code is full of branches and CPUs are designed to
         | guess and parallelise those (possible) decisions. Graphics
         | cards are designed for running tiny programs against millions
         | of pixels per second. I'm not sure it possible to make these
         | two different concepts the same.
        
       | karim79 wrote:
       | > Is ZLUDA a drop-in replacement for CUDA? Yes, but certain
       | applications use CUDA in ways which make it incompatible with
       | ZLUDA
       | 
       | I think this might get better if and when people redesign their
       | dev workflows, CI/CD pipelines, builds, etc, to deploy code to
       | both hardware platforms to ensure matching functionality and
       | stability. I'm not going to hold my breath just yet. But it would
       | be _really_ great to have two _viable_ platforms /players in this
       | space where code can be run and behave equally.
        
       | Tade0 wrote:
       | I appreciate how the name translates to "delusion", considering
       | how "cuda" translates to "miracles" in the same language
       | (Polish).
        
       | tgtweak wrote:
       | How good is the tool at reporting missing cuda functionally?
        
       | SomeRndName11 wrote:
       | [flagged]
        
       | raphlinus wrote:
       | This is something that fundamentally can't work, unfortunately.
       | One showstopper (and there may be others) is subgroup size.
       | Nvidia hardware has a subgroup (warp) size of 32, while Intel's
       | subgroup size story is far more complicated, and depends on a
       | compiler heuristic to tune. The short version of the story is
       | that it's usually 16 but can be 8 if there's a lot of register
       | pressure, or 32 for a big workgroup and not much register
       | pressure (and for those who might reasonably question whether
       | forcing subgroup size to 32 can solve the compatibility issue,
       | the answer is that it will frequently cause registers to spill
       | and performance to tank). CUDA code is not written to be agile in
       | subgroup size, so there is no automated translation that works
       | efficiently on Intel GPU hardware.
       | 
       | Longer term, I think we _can_ write GPU code that is portable,
       | but it will require building out the infrastructure for it.
       | Vulkan compute shaders are one good starting point, and as of
       | Vulkan 1.3 the  "subgroup size control" feature is mandatory.
       | WebGPU is another possible path to get there, but it's currently
       | lacking a lot of important features, including subgroups at all.
       | There's more discussion of subgroups as a potential WebGPU
       | feature in [1], including how to handle subgroup size.
       | 
       | [1]: https://github.com/gpuweb/gpuweb/issues/3950
        
         | varelse wrote:
         | [dead]
        
         | AnthonyMouse wrote:
         | Things like this are often useful even if they're not optimal.
         | Before you had a piece of code that simply would not run on
         | your GPU. Now it runs. Even if it's slower than it should be,
         | that's better than not running at all. Which makes more people
         | willing to buy the GPU.
         | 
         | Then they go to the developers and ask why the implementation
         | isn't optimized for this hardware lots of people have and the
         | solution is to do an implementation in Vulkan etc.
        
         | pjmlp wrote:
         | Only if SPIR-V tooling ever gets half as good as PTX ecosystem.
        
           | [deleted]
        
         | fancyfredbot wrote:
         | The CUDA block size is likely to be a good proxy for register
         | pressure so if the block size is small you can try running with
         | a small subgroup, etc.
         | 
         | NVIDIA used to discourage code which relies on the subgroup or
         | warp size. I'm not sure how much this is true of real world
         | code though.
        
       | vkaku wrote:
       | Beautiful! Waiting for someone to use this and get benchmarks
       | with PyTorch now.
        
       | replete wrote:
       | Llama.cpp just added CUDA GPU acceleration yesterday, so this
       | would be very interesting for the emerging space of running local
       | LLMs on commodity hardware.
       | 
       | Running CUDA on an AMD RDNA3 APU is what I'd like to see as its
       | probably the cheapest 16GB shared VRAM solution (UMA Frame Buffer
       | BIOS setting) and creates the possibility of running 13b LLM
       | locally on an underutilized iGPU.
       | 
       | Aaand its been dead for years, shame.
        
         | brucethemoose2 wrote:
         | - llama.cpp already has OpenCL acceleration. It has had it for
         | some time.
         | 
         | - AMD already has a CUDA translator: ROCM. It _should_ work
         | with llama.cpp CUDA, but in practice... _shrug_
         | 
         | - Copies the CUDA/OpenCL code make (that are unavoidable for
         | discrete GPUs) are problematic for IGPs. Right now acceleration
         | regresses performance on IGPs.
         | 
         | Llama.cpp would need tailor made IGP acceleration. And I'm not
         | even sure what API has the most appropriate zero copy
         | mechanism. Vulkan? OneAPI? Something inside ROCM?
        
       | varelse wrote:
       | [dead]
        
       ___________________________________________________________________
       (page generated 2023-06-15 23:01 UTC)