[HN Gopher] Scuda - Virtual GPU over IP
       ___________________________________________________________________
        
       Scuda - Virtual GPU over IP
        
       Author : kevmo314
       Score  : 188 points
       Date   : 2024-10-09 13:07 UTC (2 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | ranger_danger wrote:
       | This appears to only support CUDA on nvidia. I'm curious why they
       | didn't just expose /dev/nvidia-uvm as a socket and forward that
       | over the network instead of hooking hundreds of functions (maybe
       | it's not that simple and I just don't know).
        
         | monocasa wrote:
         | You can't mmap a socket, and mmap is core to how /dev/nvidia-
         | uvm works.
        
           | XorNot wrote:
           | Which seems weird to me: if we're going to have device files,
           | it's super annoying that they actually don't really act like
           | files.
           | 
           | Like we really should just have enough rDMA in the kernel to
           | let that work.
        
             | monocasa wrote:
             | At it's core, this device file is responsible for managing
             | a GPU local address space, and sharing memory securely with
             | that address space in order to have a place to write
             | command buffers and data that the gpu can see. It doesn't
             | really make sense without a heavy memory mapping component.
             | 
             | A plan 9 like model that's heavily just a standard file
             | would massively cut into gpu performance.
        
             | gorkish wrote:
             | I agree with you that making RDMA a more accessible
             | commodity technology is very important for "the future of
             | compute". Properly configuring something like RoCEv2 or
             | Infiniband is expensive and difficult. These technologies
             | need to be made more robust in order to be able to run on
             | commodity networks.
        
           | majke wrote:
           | It's a first time I hear about /dev/nvidia-uvm. Is there any
           | documentation on how nvidia API works? Especially, how strong
           | is the multi-tenancy story. Can two users use one GPU and
           | expect reasonable security?
           | 
           | Last time I checked the GPU did offer some kind of memory
           | isolation, but that was only for their datacenter, not
           | consumer cards.
        
             | monocasa wrote:
             | There's not a lot of docs on how it works. It used to be
             | entirely in the closed source driver, now it's mainly a
             | thin bridge to the closed source firmware blob.
             | 
             | But yes, for more than a decade now even with consumer
             | cards, separate user processes have separate hardware
             | enforced contexts. This is as true for consumer cards as it
             | is for datacenter cards. This is core to how something like
             | webgl works without exposing everything else being rented
             | on your desktop to public Internet. There have been bugs,
             | but per process hardware isolation with a GPU local mmu has
             | been tablestakes for a modern gpu for nearly twenty years.
             | 
             | What datacenter gpus expose in addition to that is multiple
             | virtual gpus, sort of like sr-iov, where a single gpu can
             | be exposed to multiple CPU kernels running in virtual
             | machines.
        
           | afr0ck wrote:
           | Well, it's not impossible. It's just software after all. You
           | can mmap a remote device file, but you need OS support to do
           | the magical paging for you, probably some sort of page
           | ownership tracking protocol like in HMM [1], but outside a
           | coherence domain.
           | 
           | I was once working on CXL [2] and memory ownership tracking
           | in the Linux kernel and wanted to play with Nvidia GPUs, but
           | then I hit a wall when I realised that a lot of the
           | functionalities were running on the GSP or the firmware blob
           | with very little to no documentation, so I ended up generally
           | not liking the system software stack of Nvidia and I gave up
           | the project. UVM subsystem in the open kernel driver is a bit
           | of an exception, but a lot of the control path is still
           | handled and controlled from closed-source cuda libraries in
           | userspace.
           | 
           | tldr; it's very hard to do systems hacking with Nvidia GPUs.
           | 
           | [1] https://www.kernel.org/doc/html/v5.0/vm/hmm.html [2]
           | https://en.wikipedia.org/wiki/Compute_Express_Link
        
             | monocasa wrote:
             | Yeah, the Nvidia stuff isn't really made to be hacked on.
             | 
             | I'd check out the AMD side since you can at least have a
             | full open source GPU stack to play with, and they make a
             | modicum of effort to document their gpus.
        
           | gorkish wrote:
           | Granted it requires additional support from your
           | nics/switches, but it is probably straightforward to remote
           | nvidia-uvm with an RDMA server
        
       | ghxst wrote:
       | This looks more like CUDA over IP or am I missing something?
        
       | gpuhacker wrote:
       | As this mentions some prior art but not rCUDA
       | (https://en.m.wikipedia.org/wiki/RCUDA) I'm a bit confused about
       | what makes scuda different.
        
         | kevmo314 wrote:
         | I've updated the README! rCUDA is indeed inspiration, in fact
         | it inspired scuda's name too :)
        
       | dschuetz wrote:
       | More like "virtual cuda only gpu" over IP.
        
         | Ey7NFZ3P0nzAe wrote:
         | Well scuda has cuda in the name
        
       | saurik wrote:
       | Reminds me of this, from a couple months ago.
       | 
       | https://news.ycombinator.com/item?id=41203475
        
         | friedtofu wrote:
         | Was going to post a reference to the same thing! Not sure about
         | you but I tested it, and I'm not sure if it was just being
         | hugged to death when I used it or not, but the network
         | performance was incredibly poor.
         | 
         | Having something that you can self-host, as a user I find this
         | really neat but what I really want is something more like
         | 
         | https://github.com/city96/ComfyUI_NetDist + OP's project mashed
         | together.
         | 
         | Say I'm almost able to execute a workflow that would normally
         | require ~16Gb VRAM. I have a nvidia 3060 12Gb running headless
         | with prime/executing the workflow via the CLI.
         | 
         | Right now, I'd probably just have to run the workflow in a
         | paperspace(or any other cloud compute) container, or borrow the
         | power of a local apple M1 when using the second repository I
         | mentioned.
         | 
         | I wish I had something that could lend me extra resources and
         | temporarily act as either the host GPU or a secondary depending
         | on the memory needed, only when I needed it(if that makes
         | sense)
        
       | kbumsik wrote:
       | I have heard NVSwitch is used for GPU-to-GPU interconnection over
       | network.
       | 
       | How is it different?
        
         | thelastparadise wrote:
         | Orders of magnitude slower.
        
         | nsteel wrote:
         | Isn't this GPU-to-CPU? And really slow. And only CUDA. And over
         | IP. And implemented in software. I think it's really very
         | different.
        
       | meowzor wrote:
       | nice
        
       | AkashKaStudio wrote:
       | Would this let Nvidia card be accessible on Apple Silicon over
       | TB4 for training on a e-GPU caddy? Would happily relegate my
       | desktop to HTPC/Gaming duties.
        
       | some1else wrote:
       | You might have a problem using CUDA as part of the name, since
       | Nvidia has it trademarked. Maybe you can switch to Scuba if they
       | give you trouble, sounds like a good name for the tool.
        
         | teeray wrote:
         | We need to do for CUDA what was done for Jell-o and Kleenex.
        
         | n3storm wrote:
         | Buda may Be a Better name
        
       | gchamonlive wrote:
       | I have a laptop with a serviceable GPU but only 16gb of ram, and
       | another with a low tier GPU but 32gb of ram. Wondering, will it
       | be too slow to use the later as the control plane and delegate
       | inference to the former laptop using something like comfyui to
       | run text-to-image models?
        
       | Technetium wrote:
       | It would be nice to have a description added.
        
       | rtghrhtr wrote:
       | Everyone hates nvidia but treats ATI as an afterthought. Another
       | completely useless tool to throw on the pile.
        
         | dahart wrote:
         | > Everyone hates nvidia but treats ATI as an afterthought.
         | 
         | Hehe, do you mean AMD?
        
           | chpatrick wrote:
           | What year is it?
        
         | gorkish wrote:
         | ATI? afterthought, indeed
        
       | kkielhofner wrote:
       | This is very interesting but many of the motivations listed are
       | far better served with alternate approaches.
       | 
       | For "remote" model training there is NCCL + Deepspeed/FSDP/etc.
       | For remote inferencing there are solutions like Triton Inference
       | Server[0] that can do very high-performance hosting of any model
       | for inference. For LLMs specifically there are nearly countless
       | implementations.
       | 
       | That said, the ability to use this for testing is interesting but
       | I wonder about GPU contention and as others have noted the
       | performance of such a solution will be terrible even with
       | relatively high speed interconnect (100/400gb ethernet, etc).
       | 
       | NCCL has been optimized to support DMA directly between network
       | interfaces and GPUs which is of course considerably faster than
       | solutions like this. Triton can also make use of shared memory,
       | mmap, NCCL, MPI, etc which is one of the many tricks it uses for
       | very performant inference - even across multiple chassis over
       | another network layer.
       | 
       | [0] - https://github.com/triton-inference-server/server
        
         | theossuary wrote:
         | I don't think NCCL + Deepspeed/FSDP are really an alternative
         | to Scuda, as they all require the models in question to be
         | designed for distributed training. They also require a lot of
         | support in the libraries being used.
         | 
         | This has been a struggle for data scientists for a while now. I
         | haven't seen a good solution to allow a data scientist to work
         | locally, but utilize GPUs remotely, without basically just
         | developing remotely (through a VM or Jupyter), or submitting
         | remote jobs (through SLURM or a library specific Kubernetes
         | integration). Scuda is an interesting step towards a better
         | solution for utilizing remote GPUs easily across a wide range
         | of libraries, not just Pytorch and Tensorflow.
        
           | seattleeng wrote:
           | Why is working locally important?
        
             | theossuary wrote:
             | Working locally still matters, and this is from someone who
             | normally works in tmux/nvim. When working on vision and 3D
             | ML work, being able to quickly open a visualizer windows is
             | imperative to understanding what's going on. For Gaussian
             | Splatting, point cloud work, SLAM, etc. you have to have
             | access to a desktop environment to see visualizations; they
             | very rarely work well remotely (even if they have some
             | Jupyter support).
             | 
             | Working remotely, when having to use a desktop environment
             | is painful, no matter the technology. The best tice come up
             | with is using tmux/vim and sunshine/moonlight, but even
             | still I'd rather just have access to everything locally.
        
       | elintknower wrote:
       | Curious if this could be simplified to provide NVENC over ip?
        
       ___________________________________________________________________
       (page generated 2024-10-11 23:02 UTC)