[HN Gopher] Show HN: Attaching to a virtual GPU over TCP
___________________________________________________________________
Show HN: Attaching to a virtual GPU over TCP
We developed a tool to trick your computer into thinking it's
attached to a GPU which actually sits across a network. This allows
you to switch the number or type of GPUs you're using with a single
command.
Author : bmodel
Score : 130 points
Date : 2024-08-09 16:50 UTC (6 hours ago)
(HTM) web link (www.thundercompute.com)
(TXT) w3m dump (www.thundercompute.com)
| talldayo wrote:
| > Access serverless GPUs through a simple CLI to run your
| existing code on the cloud while being billed precisely for usage
|
| Hmm... well I just watched you run nvidia-smi in a Mac terminal,
| which is a platform it's explicitly not supported on. My instant
| assumption is that your tool copies my code into a private server
| instance and communicates back and forth to run the commands.
|
| Does this platform expose eGPU capabilities if my host machine
| supports it? Can I run raster workloads or network it with my own
| CUDA hardware? The actual way your tool and service connects
| isn't very clear to me and I assume other developers will be
| confused too.
| bmodel wrote:
| Great questions! To clarify the demo, we were ssh'd into a
| linux machine with no GPU.
|
| Going into more details for how this works, we intercept
| communication between the CPU and the GPU so only GPU code and
| commands are sent across the network to a GPU that we are
| hosting. This way we are able to virtualize a remote GPU and
| make your computer think it's directly attached to that GPU.
|
| We are not copying your CPU code and running it on our
| machines. The CPU code runs entirely on your instance (meaning
| no files need to be copied over or packages installed on the
| GPU machine). One of the benefits of this approach is that you
| can easily scale to a more / less powerful GPU without needing
| to setup a new server.
| billconan wrote:
| does this mean you have a customized/dummy kernel gpu driver?
|
| will that cause system instability, say, if the network
| suddenly dropped?
| bmodel wrote:
| We are not writing any kernel drivers, this runs entirely
| in userspace (this won't result in a crowdstrike level
| crash haha).
|
| Given that, if the network suddenly dropped then only the
| process using the GPU would fail.
| ZeroCool2u wrote:
| How do you do that exactly? Are you using eBPF or
| something else?
|
| Also, for my ML workloads the most common bottleneck is
| GPU VRAM <-> RAM copies. Doesn't this dramatically
| increase latency? Or is it more like it increases latency
| on first data transfer, but as long as you dump
| everything into VRAM all at once at the beginning you're
| fine? I'd expect this wouldn't play super well with stuff
| like PyTorch data loaders, but would be curious to hear
| how you've faired when testing.
| bmodel wrote:
| We intercept api calls and use our own implementation to
| forward them to a remote machine. No eBPF (which I
| believe need to run in the kernel).
|
| As for latency, we've done a lot of work to minimize that
| as much as possible. You can see the performance we get
| running inference on BERT from huggingface here:
| https://youtu.be/qsOBFQZtsFM?t=64. It's still slower than
| local (mainly for training workloads) but not by as much
| as you'd expect. We're aiming to reach near parity in the
| next few months!
| samstave wrote:
| When you release a self-host version, what would be
| really neat would be to see it across HFT focused NICs
| that have huge TCP buffers...
|
| https://www.arista.com/assets/data/pdf/HFT/HFTTradingNetw
| ork...
|
| Basically taking into account the large buffers and
| super-time-sensitive nature of HFT networking
| optimizations, I wonder if your TCP<-->GPU might benefit
| from both the HW and the learnings of NFT stylings?
| billconan wrote:
| is this a remote nvapi?
|
| this is awesome. can it do 3d rendering (vulkan/opengl)
| czbond wrote:
| I am not in this "space", but I second the "this is cool to
| see", more stuff like this needed on HN.
| cpeterson42 wrote:
| Appreciate the praise!
| bmodel wrote:
| Thank you!
|
| > is this a remote nvapi
|
| Essentially yes! Just to be clear, this covers the entire GPU
| not just the NVAPI (i.e. all of cuda). This functions like you
| have the physical card directly plugged into the machine.
|
| Right now we don't support vulkan or opengl since we're mostly
| focusing on AI workloads, however we plan to support these in
| the future (especially if there is interest!)
| billconan wrote:
| sorry, I didn't mean nvapi, I meant rmapi.
|
| I bet you saw this https://github.com/mikex86/LibreCuda
|
| they implemented the cuda driver by calling into rmapi.
|
| My understanding is if there is a remote rmapi, other user
| mode drivers should work out of the box?
| doctorpangloss wrote:
| I don't get it. Why would I start an instance in ECS, to use your
| GPUs in ECS, when I could start an instance for the GPUs I want
| in ECS? Separately, why would I want half of Nitro, instead of
| real Nitro?
| billconan wrote:
| it's more transparent to your system, for example, if you have
| a gui application that needs gpu acceleration on a thin client
| (Matlab, solidworks, blender), you can do so without setting up
| ECS. you can develop without any gpu, but suddenly have one
| when you need to run simulation. this will be way cheaper than
| AWS.
|
| I think essentially this is solving the same problem Ray
| (https://www.ray.io/) is solving, but in a more generic way.
|
| it potentially can have finer grained gpu sharing, like a half-
| gpu.
|
| I'm very excited about this.
| bmodel wrote:
| Exactly! The finer grain sharing is one of the key things on
| our radar right now
| goku-goku wrote:
| www.juicelabs.co does all this today, including the GPU
| sharing and fractionalization.
| bmodel wrote:
| Great point, there are a few benefits:
|
| 1. If you're actively developing and need a GPU then you
| typically would be paying the entire time the instance is
| running. Using Thunder means you only pay for the GPU while
| actively using it. Essentially, if you are running CPU only
| code you would not be paying for any GPU time. The alterative
| for this is to manually turn the instance on and off which can
| be annoying.
|
| 2. This allows you to easily scale the type and number of GPUs
| you're using. For example, say you want to do development on a
| cheap T4 instance and run a full DL training job on a set of 8
| A100. Instead of needing to swap instances and setup everything
| again, you can just run a command and then start running on the
| more powerful GPUs.
| doctorpangloss wrote:
| Okay, but your GPUs are in ECS. Don't I just want this
| feature from Amazon, not you, and natively via Nitro? Or even
| Google has TPU attachments.
|
| > 1. If you're actively developing and need a GPU [for
| fractional amounts of time]...
|
| Why would I need a GPU for a short amount of time during
| development? For testing?
|
| I don't get it - what would testing an H100 over a TCP
| connection tell me? It's like, yeah, I can do that, but it
| doesn't represent an environment I am going to use for real.
| Nobody runs applications to GPUs on buses virtualized over
| TCP connections, so what exactly would I be validating?
| bmodel wrote:
| I don't believe Nitro would allow you to access a GPU
| that's not directly connected to the CPU that the VM is
| running on. So swapping between GPU type or scaling to
| multiple GPUs is still a problem.
|
| From the developer perspective, you wouldn't know that the
| H100 is across a network. The experience will be as if your
| computer is directly attached to an H100. The benefit here
| is that if you're not actively using the H100 (such as when
| you're setting up the instance or after the training job
| completes) you are not paying for the H100.
| doctorpangloss wrote:
| Okay, a mock H100 object would also save me money. I
| could pretend a 3090 is an A100. "The experience would be
| that a 3090 is an A100." Apples to oranges comparison?
| It's using a GPU attached to the machine versus a GPU
| that crosses a VPC boundary. Do you see what I am saying?
|
| I would never run a training job on a GPU virtualized
| over TCP connection. I would never run a training job
| that requires 80GB of VRAM on a 24GB VRAM device.
|
| Whom is this for? Who needs to save kopecks on a single
| GPU who needs H100s?
| steelbrain wrote:
| Ah this is quite interesting! I had a usecase where I needed a
| GPU-over-IP but only for transcoding videos. I had a not-so-
| powerful AMD GPU in my homelab server that somehow kept crashing
| the kernel any time I tried to encode videos with it and also an
| NVIDIA RTX 3080 in a gaming machine.
|
| So I wrote https://github.com/steelbrain/ffmpeg-over-ip and had
| the server running in the windows machine and the client in the
| media server (could be plex, emby, jellyfin etc) and it worked
| flawlessly.
| crishoj wrote:
| Interesting. Do you know if your tool supports conversions
| resulting in multiple files, such as HLS and its myriad of
| timeslice files?
| bhaney wrote:
| This is more or less what I was hoping for when I saw the
| submission title. Was disappointed to see that the submission
| wasn't actually a useful generic tool but instead a paid cloud
| service. Of course the real content is in the comments.
|
| As an aside, are there any uses for GPU-over-network other than
| video encoding? The increased latency seems like it would
| prohibit anything machine learning related or graphics
| intensive.
| trws wrote:
| Some computation tasks can tolerate the latency if they're
| written with enough overlap and can keep enough of the data
| resident, but they usually need more performant networking
| than this. See older efforts like rcuda for remote cuda over
| infiniband as an example. It's not ideal, but sometimes worth
| it. Usually the win is in taking a multi-GPU app and giving
| it 16 or 32 of them rather than a single remote GPU though.
| toomuchtodo wrote:
| Have you done a Show HN yet? If not, please consider doing so!
|
| https://gist.github.com/tzmartin/88abb7ef63e41e27c2ec9a5ce5d...
|
| https://news.ycombinator.com/showhn.html
|
| https://news.ycombinator.com/item?id=22336638
| cpeterson42 wrote:
| Given the interest here we decided to open up T4 instances for
| free. Would love for y'all to try it and let us know your
| thoughts!
| tptacek wrote:
| This is neat. Were you able to get MIG or vGPUs working with it?
| bmodel wrote:
| We haven't tested with MIG or vGPU, but I think it would work
| since it's essentially physically partitioning the GPU.
|
| One of our main goals for the near future is to allow GPU
| sharing. This would be better than MIG or vGPU since we'd allow
| users to use the entire GPU memory instead of restricting them
| to a fraction.
| tptacek wrote:
| We had a hell of a time dealing with the licensing issues and
| ultimately just gave up and give people whole GPUs.
|
| What are you doing to reset the GPU to clean state after a
| run? It's surprisingly complicated to do this securely (we're
| writing up a back-to-back sequence of audits we did with
| Atredis and Tetrel; should be publishing in a month or two).
| bmodel wrote:
| We kill the process to reset the GPU. Since we only store
| GPU state that's the only clean up we need to do
| tptacek wrote:
| Hm. Ok. Well, this is all very cool! Congrats on
| shipping.
| kawsper wrote:
| Cool idea, nice product page!
|
| Does anyone know if this is possible with USB?
|
| I have a Davinci Resolve license USB-dongle I'd like to not
| plugging into my laptop.
| kevmo314 wrote:
| You can do that with USB/IP: https://usbip.sourceforge.net/
| orsorna wrote:
| So what exactly is the pricing model? Do I need a quote? Because
| otherwise I don't see how to determine it without creating an
| account which is needlessly gatekeeping.
| bmodel wrote:
| We're still in our beta so it's entirely free for now (we can't
| promise a bug-free experience)! You have to make an account but
| it won't require payment details.
|
| Down the line we want to move to a pay-as-you-go model.
| Cieric wrote:
| This is interesting, but I'm more interested in self-hosting. I
| already have a lot of GPUs (some running some not.) Does this
| have a self-hosting option so I can use the GPUs I already have?
| cpeterson42 wrote:
| We don't support self hosting yet but the same technology
| should work well here. Many of the same benefits apply in a
| self-hosted setting, namely efficient workload scheduling, GPU-
| sharing, and ease-of-use. Definitely open to this possibility
| in the future!
| goku-goku wrote:
| Juice does have this ability today! :)
|
| www.juicelabs.co
| Cieric wrote:
| Thanks, but I've already evaluated JuiceLabs and it does not
| handle what I need it to. Plus with the whole project going
| commercial and the community edition being neglected I no
| longer have any interest in trying to support the project
| either.
| covi wrote:
| If you want to use your own GPUs or cloud accounts but with a
| great dev experience, see SkyPilot.
| cpeterson42 wrote:
| We created a discord for the latest updates, bug reports, feature
| suggestions, and memes. We will try to respond to any issues and
| suggestions as quickly as we can! Feel free to join here:
| https://discord.gg/nwuETS9jJK
| throwaway888abc wrote:
| Does it work for gaming on windows ? or even linux ?
| cpeterson42 wrote:
| In theory yes. In practice, however, latency between the CPU
| and remote GPU makes this impractical
| rubatuga wrote:
| What ML packages do you support? In the comments below it says
| you do not support Vulkan or OpenGL. Does this support AMD GPUs
| as well?
| bmodel wrote:
| We have tested this with pytorch and huggingface and it is
| mostly stable (we know there are issues with pycuda and jax).
| In theory this should work with any libraries, however we're
| still actively developing this so bugs will show up
| the_reader wrote:
| Would be possible to mix it with Blender?
| bmodel wrote:
| At the moment out tech is linux-only so it would not work with
| Blender.
|
| Down the line, we could see this being used for batched render
| jobs (i.e. to replace a render farm).
| comex wrote:
| Blender can run on Linux...
| bmodel wrote:
| Oh nice, I didn't know that! In that case it might work,
| you could try running `tnr run ./blender` (replace the
| ./blender with how you'd launch blender from the CLI) to
| see what happens. We haven't tested it so I can't make
| promises about performance or stability :)
| teaearlgraycold wrote:
| This could be perfect for us. We need very limited bandwidth but
| have high compute needs.
| bmodel wrote:
| Awesome, we'd love to chat! You can reach us at
| founders@thundercompute.com or join the discord
| https://discord.gg/nwuETS9jJK!
| goku-goku wrote:
| Feel free to reach out www.juicelabs.co
| bkitano19 wrote:
| this is nuts
| cpeterson42 wrote:
| We think so too, big things coming :)
| goku-goku wrote:
| www.juicelabs.co
| dishsoap wrote:
| For anyone curious about how this actually works, it looks like a
| library is injected into your process to hook these functions [1]
| in order to forward them to the service.
|
| [1] https://pastebin.com/raw/kCYmXr5A
| Zambyte wrote:
| Reminds me of Plan9 :)
| K0IN wrote:
| can you elaborate a bit on why? (noob here)
| radarsat1 wrote:
| I'm confused, if this operates at the CPU/GPU boundary doesn't it
| create a massive I/O bottleneck for any dataset that doesn't fit
| into VRAM? I'm probably misunderstanding how it works but if it
| intercepts GPU i/o then it must stream your entire dataset on
| every epoch to a remote machine, which sounds wasteful, probably
| I'm not getting this right.
| bmodel wrote:
| That understanding of the system is correct. To make it
| practical we've implemented a bunch of optimizations to
| minimize I/O cost. You can see how it performs on inference
| with BERT here: https://youtu.be/qsOBFQZtsFM?t=69.
|
| The overheads are larger for training compared to inference,
| and we are implementing more optimizations to approach native
| performance.
| radarsat1 wrote:
| Aah ok thanks, that was my basic misunderstanding, my mind
| just jumped straight to my current training needs but for
| inference it makes a lot of sense. Thanks for the
| clarification.
| winecamera wrote:
| I saw that in the tnr CLI, there are hints of an option to self-
| host a GPU. Is this going to be a released feature?
| cpeterson42 wrote:
| We don't support self-hosting yet but are considering adding it
| in the future. We're a small team working as hard as we can :)
|
| Curious where you see this in the CLI, may be an oversight on
| our part. If you can join the Discord and point us to this bug
| we would really appreciate it!
| test20240809 wrote:
| pocl (Portable Computing Language) [1] provides a remote backend
| [2] that allows for serialization and forwarding of OpenCL
| commands over a network.
|
| Another solution is qCUDA [3] which is more specialized towards
| CUDA.
|
| In addition to these solutions, various virtualization solutions
| today provide some sort of serialization mechanism for GPU
| commands, so they can be transferred to another host (or
| process). [4]
|
| One example is the QEMU-based Android Emulator. It is using
| special translator libraries and a "QEMU Pipe" to efficiently
| communicate GPU commands from the virtualized Android OS to the
| host OS [5].
|
| The new Cuttlefish Android emulator [6] uses Gallium3D for
| transport and the virglrenderer library [7].
|
| I'd expect that the current virtio-gpu implementation in QEMU [8]
| might make this job even easier, because it includes the
| Android's gfxstream [9] (formerly called "Vulkan Cereal") that
| should already support communication over network sockets out of
| the box.
|
| [1] https://github.com/pocl/pocl
|
| [2] https://portablecl.org/docs/html/remote.html
|
| [3] https://github.com/coldfunction/qCUDA
|
| [4] https://www.linaro.org/blog/a-closer-look-at-virtio-and-
| gpu-...
|
| [5]
| https://android.googlesource.com/platform/external/qemu/+/em...
|
| [6] https://source.android.com/docs/devices/cuttlefish/gpu
|
| [7]
| https://cs.android.com/android/platform/superproject/main/+/...
|
| [8] https://www.qemu.org/docs/master/system/devices/virtio-
| gpu.h...
|
| [9]
| https://android.googlesource.com/platform/hardware/google/gf...
| fpoling wrote:
| Zscaler uses a similar approach in their remote browser. WebGL
| in the local browser exposed as a GPU to a Chromium instance in
| the cloud.
| mmsc wrote:
| What's it like to actually use this for any meaningful
| throughput? Can this be used for hash cracking? Every time I
| think about virtual GPUs over a network, I think about botnets.
| Specifically from
| https://www.hpcwire.com/2012/12/06/gpu_monster_shreds_passwo...
| "Gosney first had to convince Mosix co-creator Professor Amnon
| Barak that he was not going to "turn the world into a giant
| botnet.""
| cpeterson42 wrote:
| This is definitely an interesting thought experiment, however
| in practice our system is closer to AWS than a botnet, as the
| GPUs are not distributed. This technology does lend itself to
| some interesting applications with creating very flexible
| clusters within data centers that we are exploring.
___________________________________________________________________
(page generated 2024-08-09 23:00 UTC)