hngopher.com

       [HN Gopher] Show HN: Attaching to a virtual GPU over TCP
       ___________________________________________________________________
        
       Show HN: Attaching to a virtual GPU over TCP
        
       We developed a tool to trick your computer into thinking it's
       attached to a GPU which actually sits across a network. This allows
       you to switch the number or type of GPUs you're using with a single
       command.
        
       Author : bmodel
       Score  : 310 points
       Date   : 2024-08-09 16:50 UTC (1 days ago)
        
 (HTM) web link (www.thundercompute.com)
 (TXT) w3m dump (www.thundercompute.com)
        
       | talldayo wrote:
       | > Access serverless GPUs through a simple CLI to run your
       | existing code on the cloud while being billed precisely for usage
       | 
       | Hmm... well I just watched you run nvidia-smi in a Mac terminal,
       | which is a platform it's explicitly not supported on. My instant
       | assumption is that your tool copies my code into a private server
       | instance and communicates back and forth to run the commands.
       | 
       | Does this platform expose eGPU capabilities if my host machine
       | supports it? Can I run raster workloads or network it with my own
       | CUDA hardware? The actual way your tool and service connects
       | isn't very clear to me and I assume other developers will be
       | confused too.
        
         | bmodel wrote:
         | Great questions! To clarify the demo, we were ssh'd into a
         | linux machine with no GPU.
         | 
         | Going into more details for how this works, we intercept
         | communication between the CPU and the GPU so only GPU code and
         | commands are sent across the network to a GPU that we are
         | hosting. This way we are able to virtualize a remote GPU and
         | make your computer think it's directly attached to that GPU.
         | 
         | We are not copying your CPU code and running it on our
         | machines. The CPU code runs entirely on your instance (meaning
         | no files need to be copied over or packages installed on the
         | GPU machine). One of the benefits of this approach is that you
         | can easily scale to a more / less powerful GPU without needing
         | to setup a new server.
        
           | billconan wrote:
           | does this mean you have a customized/dummy kernel gpu driver?
           | 
           | will that cause system instability, say, if the network
           | suddenly dropped?
        
             | bmodel wrote:
             | We are not writing any kernel drivers, this runs entirely
             | in userspace (this won't result in a crowdstrike level
             | crash haha).
             | 
             | Given that, if the network suddenly dropped then only the
             | process using the GPU would fail.
        
               | ZeroCool2u wrote:
               | How do you do that exactly? Are you using eBPF or
               | something else?
               | 
               | Also, for my ML workloads the most common bottleneck is
               | GPU VRAM <-> RAM copies. Doesn't this dramatically
               | increase latency? Or is it more like it increases latency
               | on first data transfer, but as long as you dump
               | everything into VRAM all at once at the beginning you're
               | fine? I'd expect this wouldn't play super well with stuff
               | like PyTorch data loaders, but would be curious to hear
               | how you've faired when testing.
        
               | bmodel wrote:
               | We intercept api calls and use our own implementation to
               | forward them to a remote machine. No eBPF (which I
               | believe need to run in the kernel).
               | 
               | As for latency, we've done a lot of work to minimize that
               | as much as possible. You can see the performance we get
               | running inference on BERT from huggingface here:
               | https://youtu.be/qsOBFQZtsFM?t=64. It's still slower than
               | local (mainly for training workloads) but not by as much
               | as you'd expect. We're aiming to reach near parity in the
               | next few months!
        
               | samstave wrote:
               | When you release a self-host version, what would be
               | really neat would be to see it across HFT focused NICs
               | that have huge TCP buffers...
               | 
               | https://www.arista.com/assets/data/pdf/HFT/HFTTradingNetw
               | ork...
               | 
               | Basically taking into account the large buffers and
               | super-time-sensitive nature of HFT networking
               | optimizations, I wonder if your TCP<-->GPU might benefit
               | from both the HW and the learnings of NFT stylings?
        
               | ZeroCool2u wrote:
               | Got it. eBPF module run as part of the kernel, but
               | they're still user space programs.
               | 
               | I would would consider using a larger model for
               | demonstrating inference performance as I have 7B models
               | deployed on CPU at work, but GPU is still important
               | training BERT size models.
        
       | billconan wrote:
       | is this a remote nvapi?
       | 
       | this is awesome. can it do 3d rendering (vulkan/opengl)
        
         | czbond wrote:
         | I am not in this "space", but I second the "this is cool to
         | see", more stuff like this needed on HN.
        
           | cpeterson42 wrote:
           | Appreciate the praise!
        
         | bmodel wrote:
         | Thank you!
         | 
         | > is this a remote nvapi
         | 
         | Essentially yes! Just to be clear, this covers the entire GPU
         | not just the NVAPI (i.e. all of cuda). This functions like you
         | have the physical card directly plugged into the machine.
         | 
         | Right now we don't support vulkan or opengl since we're mostly
         | focusing on AI workloads, however we plan to support these in
         | the future (especially if there is interest!)
        
           | billconan wrote:
           | sorry, I didn't mean nvapi, I meant rmapi.
           | 
           | I bet you saw this https://github.com/mikex86/LibreCuda
           | 
           | they implemented the cuda driver by calling into rmapi.
           | 
           | My understanding is if there is a remote rmapi, other user
           | mode drivers should work out of the box?
        
       | doctorpangloss wrote:
       | I don't get it. Why would I start an instance in ECS, to use your
       | GPUs in ECS, when I could start an instance for the GPUs I want
       | in ECS? Separately, why would I want half of Nitro, instead of
       | real Nitro?
        
         | billconan wrote:
         | it's more transparent to your system, for example, if you have
         | a gui application that needs gpu acceleration on a thin client
         | (Matlab, solidworks, blender), you can do so without setting up
         | ECS. you can develop without any gpu, but suddenly have one
         | when you need to run simulation. this will be way cheaper than
         | AWS.
         | 
         | I think essentially this is solving the same problem Ray
         | (https://www.ray.io/) is solving, but in a more generic way.
         | 
         | it potentially can have finer grained gpu sharing, like a half-
         | gpu.
         | 
         | I'm very excited about this.
        
           | bmodel wrote:
           | Exactly! The finer grain sharing is one of the key things on
           | our radar right now
        
             | goku-goku wrote:
             | www.juicelabs.co does all this today, including the GPU
             | sharing and fractionalization.
        
               | ranger_danger wrote:
               | the free community version has been discontinued, and
               | also doesn't support a linux client with non-CUDA
               | graphics, regardless of the server OS, which is a non-
               | starter for me
        
         | bmodel wrote:
         | Great point, there are a few benefits:
         | 
         | 1. If you're actively developing and need a GPU then you
         | typically would be paying the entire time the instance is
         | running. Using Thunder means you only pay for the GPU while
         | actively using it. Essentially, if you are running CPU only
         | code you would not be paying for any GPU time. The alterative
         | for this is to manually turn the instance on and off which can
         | be annoying.
         | 
         | 2. This allows you to easily scale the type and number of GPUs
         | you're using. For example, say you want to do development on a
         | cheap T4 instance and run a full DL training job on a set of 8
         | A100. Instead of needing to swap instances and setup everything
         | again, you can just run a command and then start running on the
         | more powerful GPUs.
        
           | doctorpangloss wrote:
           | Okay, but your GPUs are in ECS. Don't I just want this
           | feature from Amazon, not you, and natively via Nitro? Or even
           | Google has TPU attachments.
           | 
           | > 1. If you're actively developing and need a GPU [for
           | fractional amounts of time]...
           | 
           | Why would I need a GPU for a short amount of time during
           | development? For testing?
           | 
           | I don't get it - what would testing an H100 over a TCP
           | connection tell me? It's like, yeah, I can do that, but it
           | doesn't represent an environment I am going to use for real.
           | Nobody runs applications to GPUs on buses virtualized over
           | TCP connections, so what exactly would I be validating?
        
             | bmodel wrote:
             | I don't believe Nitro would allow you to access a GPU
             | that's not directly connected to the CPU that the VM is
             | running on. So swapping between GPU type or scaling to
             | multiple GPUs is still a problem.
             | 
             | From the developer perspective, you wouldn't know that the
             | H100 is across a network. The experience will be as if your
             | computer is directly attached to an H100. The benefit here
             | is that if you're not actively using the H100 (such as when
             | you're setting up the instance or after the training job
             | completes) you are not paying for the H100.
        
               | doctorpangloss wrote:
               | Okay, a mock H100 object would also save me money. I
               | could pretend a 3090 is an A100. "The experience would be
               | that a 3090 is an A100." Apples to oranges comparison?
               | It's using a GPU attached to the machine versus a GPU
               | that crosses a VPC boundary. Do you see what I am saying?
               | 
               | I would never run a training job on a GPU virtualized
               | over TCP connection. I would never run a training job
               | that requires 80GB of VRAM on a 24GB VRAM device.
               | 
               | Whom is this for? Who needs to save kopecks on a single
               | GPU who needs H100s?
        
             | teaearlgraycold wrote:
             | I develop GPU accelerated web apps in an EC2 instance with
             | a remote VSCode session. A lot of the time I'm just doing
             | web dev and don't need a GPU. I can save thousands per
             | month by switching to this.
        
               | amelius wrote:
               | Sounds like you can save thousands by just buying a
               | simple GPU card.
        
               | teaearlgraycold wrote:
               | Well, for the time being I'm really just burning AWS
               | credits. But you're right! I do however like that my dev
               | machine is the exact same instance type in the same AWS
               | region as my production instances. If I built an
               | equivalent machine it would have different performance
               | characteristics. Often times the AWS VMs have weird
               | behavior that I would otherwise be caught off guard with
               | when deploying to the cloud for the first time.
        
       | steelbrain wrote:
       | Ah this is quite interesting! I had a usecase where I needed a
       | GPU-over-IP but only for transcoding videos. I had a not-so-
       | powerful AMD GPU in my homelab server that somehow kept crashing
       | the kernel any time I tried to encode videos with it and also an
       | NVIDIA RTX 3080 in a gaming machine.
       | 
       | So I wrote https://github.com/steelbrain/ffmpeg-over-ip and had
       | the server running in the windows machine and the client in the
       | media server (could be plex, emby, jellyfin etc) and it worked
       | flawlessly.
        
         | crishoj wrote:
         | Interesting. Do you know if your tool supports conversions
         | resulting in multiple files, such as HLS and its myriad of
         | timeslice files?
        
           | steelbrain wrote:
           | Since it's sharing the underlying file system and just
           | running ffmpeg remotely, it should support any variation of
           | outputs
        
         | bhaney wrote:
         | This is more or less what I was hoping for when I saw the
         | submission title. Was disappointed to see that the submission
         | wasn't actually a useful generic tool but instead a paid cloud
         | service. Of course the real content is in the comments.
         | 
         | As an aside, are there any uses for GPU-over-network other than
         | video encoding? The increased latency seems like it would
         | prohibit anything machine learning related or graphics
         | intensive.
        
           | trws wrote:
           | Some computation tasks can tolerate the latency if they're
           | written with enough overlap and can keep enough of the data
           | resident, but they usually need more performant networking
           | than this. See older efforts like rcuda for remote cuda over
           | infiniband as an example. It's not ideal, but sometimes worth
           | it. Usually the win is in taking a multi-GPU app and giving
           | it 16 or 32 of them rather than a single remote GPU though.
        
           | tommsy64 wrote:
           | There is a GPU-over-network software called Juice [1]. I've
           | used it on AWS for running CPU-intensive workloads that also
           | happen to need some GPU without needing to use a huge GPU
           | instance. I was able to use a small GPU instance, which had
           | just 4 CPU cores, and stream its GPU to one with 128 CPU
           | cores.
           | 
           | I found Juice to work decently for graphical applications too
           | (e.g., games, CAD software). Latency was about what you'd
           | expect for video encode + decode + network: 5-20ms on a LAN
           | if I recall correctly.
           | 
           | [1] - https://github.com/Juice-Labs/Juice-Labs
        
           | Fnoord wrote:
           | I mean, anything you use a GPU/TPU for could benefit.
           | 
           | IPMI and such could use it. Like, for example, Proxmox could
           | use it. Machine learning tasks (like Frigate) and hashcat
           | could also use such. All in theory, of course. Many tasks use
           | VNC right now, or SPICE. The ability to extract your GPU in
           | the Unix way over TCP/IP is powerful. Though Node.js would
           | not be the way I'd want such to go.
        
           | lostmsu wrote:
           | How do you use it for video encoding/decoding? Won't the
           | uncompressed video (input for encoding or output of decoding)
           | be too large to transmit over network practically?
        
             | bhaney wrote:
             | Well, the ffmpeg-over-ip tool in the GP does it by just not
             | sending uncompressed video. It's more of an ffmpeg server
             | where the server is implicitly expected to have access to a
             | GPU that the client doesn't have, and only compressed video
             | is being sent back and forth in the form of video streams
             | that would normally be the input and output of ffmpeg. It's
             | not a generic GPU server that tries to push a whole PCI bus
             | over the network, which I personally think is a bit of a
             | fool's errand and doomed to never be particularly useful to
             | existing generic workloads. It would work if you very
             | carefully redesign the workload to not take advantage of a
             | GPU's typical high bandwidth and low latency, but if you
             | have to do that then what's the point of trying to abstract
             | over the device layer? Better to work at a higher level of
             | abstraction where you can optimize for your particular
             | application, rather than a lower level that you can't
             | possibly implement well and then have to completely redo
             | the higher levels anyway to work with it.
        
               | lostmsu wrote:
               | Ah, you mean transcoding scenarios. Like it can't encode
               | my screen capture.
        
           | johnisgood wrote:
           | I am increasingly growing tired of these "cloud" services,
           | paid or not. :/
        
             | adwn wrote:
             | Well, feel free to spend your own time on writing such a
             | tool and releasing it as Open Source. That would be a
             | really cool project! Until then, don't complain that others
             | aren't willing to donate a significant amount of their work
             | to the public.
        
               | rowanG077 wrote:
               | There is a vast gap between walled garden cloud service
               | rent seeking and giving away software as open source. In
               | the olden days you could buy software licenses to run it
               | wherever you wanted.
        
             | mhuffman wrote:
             | I agree. Paying by the month for the rest of your life or
             | they cut you off is not something I am a fan of. I feel
             | sorry for people too young to remember that you could
             | actually buy an app, get free bug updates and get a
             | discount if they made some big changes on a new version
             | that you might (or might not) want. But it was up to you
             | when and where you ran it and it was yours forever. I have
             | heard the arguments for why people enjoy this monthly
             | subscription model, but my counter argument is that people
             | did just fine before without them, so what is so different
             | now? And I mean, in general, not that you need to use a GPU
             | for 1 hour but don't want to buy one. I mean, for example,
             | how Adobe products run on your computer but you rent them
             | forever.
        
         | toomuchtodo wrote:
         | Have you done a Show HN yet? If not, please consider doing so!
         | 
         | https://gist.github.com/tzmartin/88abb7ef63e41e27c2ec9a5ce5d...
         | 
         | https://news.ycombinator.com/showhn.html
         | 
         | https://news.ycombinator.com/item?id=22336638
        
       | cpeterson42 wrote:
       | Given the interest here we decided to open up T4 instances for
       | free. Would love for y'all to try it and let us know your
       | thoughts!
        
         | dheera wrote:
         | What is your A100 and H100 pricing?
        
           | cpeterson42 wrote:
           | We are super early stage and don't have A100s or H100s live
           | yet. Exact pricing TBD but expect it to be low. If you want
           | to use them today, reach out directly and we can set them up
           | :)
        
       | tptacek wrote:
       | This is neat. Were you able to get MIG or vGPUs working with it?
        
         | bmodel wrote:
         | We haven't tested with MIG or vGPU, but I think it would work
         | since it's essentially physically partitioning the GPU.
         | 
         | One of our main goals for the near future is to allow GPU
         | sharing. This would be better than MIG or vGPU since we'd allow
         | users to use the entire GPU memory instead of restricting them
         | to a fraction.
        
           | tptacek wrote:
           | We had a hell of a time dealing with the licensing issues and
           | ultimately just gave up and give people whole GPUs.
           | 
           | What are you doing to reset the GPU to clean state after a
           | run? It's surprisingly complicated to do this securely (we're
           | writing up a back-to-back sequence of audits we did with
           | Atredis and Tetrel; should be publishing in a month or two).
        
             | bmodel wrote:
             | We kill the process to reset the GPU. Since we only store
             | GPU state that's the only clean up we need to do
        
               | tptacek wrote:
               | Hm. Ok. Well, this is all very cool! Congrats on
               | shipping.
        
               | azinman2 wrote:
               | Won't the VRAM still contain old bits?
        
       | kawsper wrote:
       | Cool idea, nice product page!
       | 
       | Does anyone know if this is possible with USB?
       | 
       | I have a Davinci Resolve license USB-dongle I'd like to not
       | plugging into my laptop.
        
         | kevmo314 wrote:
         | You can do that with USB/IP: https://usbip.sourceforge.net/
        
       | orsorna wrote:
       | So what exactly is the pricing model? Do I need a quote? Because
       | otherwise I don't see how to determine it without creating an
       | account which is needlessly gatekeeping.
        
         | bmodel wrote:
         | We're still in our beta so it's entirely free for now (we can't
         | promise a bug-free experience)! You have to make an account but
         | it won't require payment details.
         | 
         | Down the line we want to move to a pay-as-you-go model.
        
       | Cieric wrote:
       | This is interesting, but I'm more interested in self-hosting. I
       | already have a lot of GPUs (some running some not.) Does this
       | have a self-hosting option so I can use the GPUs I already have?
        
         | cpeterson42 wrote:
         | We don't support self hosting yet but the same technology
         | should work well here. Many of the same benefits apply in a
         | self-hosted setting, namely efficient workload scheduling, GPU-
         | sharing, and ease-of-use. Definitely open to this possibility
         | in the future!
        
         | covi wrote:
         | If you want to use your own GPUs or cloud accounts but with a
         | great dev experience, see SkyPilot.
        
         | ellis0n wrote:
         | You can rent out your GPUs in the cloud with services like
         | Akash Network and rent GPUs at thundercompute.com.. manager's
         | path, almost like self-hosting :)
        
       | cpeterson42 wrote:
       | We created a discord for the latest updates, bug reports, feature
       | suggestions, and memes. We will try to respond to any issues and
       | suggestions as quickly as we can! Feel free to join here:
       | https://discord.gg/nwuETS9jJK
        
       | throwaway888abc wrote:
       | Does it work for gaming on windows ? or even linux ?
        
         | cpeterson42 wrote:
         | In theory yes. In practice, however, latency between the CPU
         | and remote GPU makes this impractical
        
         | boxerbk wrote:
         | You could use a remote streaming protocol, like Parsec, for
         | that. You'd need your own cloud account and connect directly to
         | a GPU-enabled cloud machine. Otherwise, it would work to let
         | you game.
        
       | rubatuga wrote:
       | What ML packages do you support? In the comments below it says
       | you do not support Vulkan or OpenGL. Does this support AMD GPUs
       | as well?
        
         | bmodel wrote:
         | We have tested this with pytorch and huggingface and it is
         | mostly stable (we know there are issues with pycuda and jax).
         | In theory this should work with any libraries, however we're
         | still actively developing this so bugs will show up
        
       | the_reader wrote:
       | Would be possible to mix it with Blender?
        
         | bmodel wrote:
         | At the moment out tech is linux-only so it would not work with
         | Blender.
         | 
         | Down the line, we could see this being used for batched render
         | jobs (i.e. to replace a render farm).
        
           | comex wrote:
           | Blender can run on Linux...
        
             | bmodel wrote:
             | Oh nice, I didn't know that! In that case it might work,
             | you could try running `tnr run ./blender` (replace the
             | ./blender with how you'd launch blender from the CLI) to
             | see what happens. We haven't tested it so I can't make
             | promises about performance or stability :)
        
               | chmod775 wrote:
               | _Disclaimer: I only have a passing familiarity with
               | Blender, so I might be wrong on some counts._
               | 
               | I think you'd want to run the blender GUI locally and
               | only call out to a headless rendering server ("render
               | farm") that uses your service under the hood to get the
               | actual render.
               | 
               | This separation is already something blender supports,
               | and you could for instance use Blender on Windows despite
               | your render farm using Linux servers.
               | 
               | Cloud rendering is adjacent to what you're offering, and
               | it should be trivial for you to expand into that space by
               | just figuring out the setup and preparing a guide for
               | users wishing to do that with your service.
        
       | teaearlgraycold wrote:
       | This could be perfect for us. We need very limited bandwidth but
       | have high compute needs.
        
         | bmodel wrote:
         | Awesome, we'd love to chat! You can reach us at
         | founders@thundercompute.com or join the discord
         | https://discord.gg/nwuETS9jJK!
        
         | goku-goku wrote:
         | Feel free to reach out www.juicelabs.co
        
       | bkitano19 wrote:
       | this is nuts
        
         | cpeterson42 wrote:
         | We think so too, big things coming :)
        
         | goku-goku wrote:
         | www.juicelabs.co
        
       | dishsoap wrote:
       | For anyone curious about how this actually works, it looks like a
       | library is injected into your process to hook these functions [1]
       | in order to forward them to the service.
       | 
       | [1] https://pastebin.com/raw/kCYmXr5A
        
         | almostgotcaught wrote:
         | How did you figure out these were hooked? I'm assuming some
         | flag that tells ld/ldd to tell you when some symbol is rebound?
         | Also I thought a symbol has to be a weak symbol to be rebound
         | and assuming nvidia doesn't expose weak symbols (why would
         | they) the implication is that their thing is basically
         | LD_PRELOADed?
        
           | yarpen_z wrote:
           | Yes. While I don't know what they do internally, API remoting
           | has been used for GPUs since at least rCUDA - that's over 10
           | years ago.
           | 
           | LD_PRELOAD trick allows you to intercept and virtualize calls
           | to the CUDA runtime.
        
         | the8472 wrote:
         | Ah, I assumed/hoped they had some magic that would manage to
         | forward a whole PCIe device.
        
       | Zambyte wrote:
       | Reminds me of Plan9 :)
        
         | K0IN wrote:
         | can you elaborate a bit on why? (noob here)
        
           | Zambyte wrote:
           | In Plan 9 everything is a file (for real this time). Remote
           | file systems are accessible through the 9P protocol (still
           | used in modern systems! I know it's used in QEMU and WSL).
           | Every process has its own view of the filesystem called a
           | namespace. The implication of these three features is that
           | remote resources can be transparently accessed as local
           | resources by applications.
        
       | radarsat1 wrote:
       | I'm confused, if this operates at the CPU/GPU boundary doesn't it
       | create a massive I/O bottleneck for any dataset that doesn't fit
       | into VRAM? I'm probably misunderstanding how it works but if it
       | intercepts GPU i/o then it must stream your entire dataset on
       | every epoch to a remote machine, which sounds wasteful, probably
       | I'm not getting this right.
        
         | bmodel wrote:
         | That understanding of the system is correct. To make it
         | practical we've implemented a bunch of optimizations to
         | minimize I/O cost. You can see how it performs on inference
         | with BERT here: https://youtu.be/qsOBFQZtsFM?t=69.
         | 
         | The overheads are larger for training compared to inference,
         | and we are implementing more optimizations to approach native
         | performance.
        
           | radarsat1 wrote:
           | Aah ok thanks, that was my basic misunderstanding, my mind
           | just jumped straight to my current training needs but for
           | inference it makes a lot of sense. Thanks for the
           | clarification.
        
           | semitones wrote:
           | > to approach native performance.
           | 
           | The same way one "approaches the sun" when they take the
           | stairs?
        
             | the8472 wrote:
             | I guess there's non-negligible optimization potential, e.g.
             | by doing hash-based caching. If the same data gets uploaded
             | twice they can have the blob already sitting somewhere
             | closer to the machine.
        
             | ruined wrote:
             | yes. my building has stairs and i find them useful because
             | usually i don't need to go to the sun
        
           | ranger_danger wrote:
           | Is DirectX support possible any time soon? This would be
           | _huge_ for Windows VMs on Linux...
        
             | zozbot234 wrote:
             | You could use unofficial Windows drivers for virtio-gpu,
             | that are specifically intended for VM use.
        
               | ranger_danger wrote:
               | Indeed, it's just that it's not very feature-complete or
               | stable yet.
        
       | winecamera wrote:
       | I saw that in the tnr CLI, there are hints of an option to self-
       | host a GPU. Is this going to be a released feature?
        
         | cpeterson42 wrote:
         | We don't support self-hosting yet but are considering adding it
         | in the future. We're a small team working as hard as we can :)
         | 
         | Curious where you see this in the CLI, may be an oversight on
         | our part. If you can join the Discord and point us to this bug
         | we would really appreciate it!
        
       | test20240809 wrote:
       | pocl (Portable Computing Language) [1] provides a remote backend
       | [2] that allows for serialization and forwarding of OpenCL
       | commands over a network.
       | 
       | Another solution is qCUDA [3] which is more specialized towards
       | CUDA.
       | 
       | In addition to these solutions, various virtualization solutions
       | today provide some sort of serialization mechanism for GPU
       | commands, so they can be transferred to another host (or
       | process). [4]
       | 
       | One example is the QEMU-based Android Emulator. It is using
       | special translator libraries and a "QEMU Pipe" to efficiently
       | communicate GPU commands from the virtualized Android OS to the
       | host OS [5].
       | 
       | The new Cuttlefish Android emulator [6] uses Gallium3D for
       | transport and the virglrenderer library [7].
       | 
       | I'd expect that the current virtio-gpu implementation in QEMU [8]
       | might make this job even easier, because it includes the
       | Android's gfxstream [9] (formerly called "Vulkan Cereal") that
       | should already support communication over network sockets out of
       | the box.
       | 
       | [1] https://github.com/pocl/pocl
       | 
       | [2] https://portablecl.org/docs/html/remote.html
       | 
       | [3] https://github.com/coldfunction/qCUDA
       | 
       | [4] https://www.linaro.org/blog/a-closer-look-at-virtio-and-
       | gpu-...
       | 
       | [5]
       | https://android.googlesource.com/platform/external/qemu/+/em...
       | 
       | [6] https://source.android.com/docs/devices/cuttlefish/gpu
       | 
       | [7]
       | https://cs.android.com/android/platform/superproject/main/+/...
       | 
       | [8] https://www.qemu.org/docs/master/system/devices/virtio-
       | gpu.h...
       | 
       | [9]
       | https://android.googlesource.com/platform/hardware/google/gf...
        
         | fpoling wrote:
         | Zscaler uses a similar approach in their remote browser. WebGL
         | in the local browser exposed as a GPU to a Chromium instance in
         | the cloud.
        
       | mmsc wrote:
       | What's it like to actually use this for any meaningful
       | throughput? Can this be used for hash cracking? Every time I
       | think about virtual GPUs over a network, I think about botnets.
       | Specifically from
       | https://www.hpcwire.com/2012/12/06/gpu_monster_shreds_passwo...
       | "Gosney first had to convince Mosix co-creator Professor Amnon
       | Barak that he was not going to "turn the world into a giant
       | botnet.""
        
         | cpeterson42 wrote:
         | This is definitely an interesting thought experiment, however
         | in practice our system is closer to AWS than a botnet, as the
         | GPUs are not distributed. This technology does lend itself to
         | some interesting applications with creating very flexible
         | clusters within data centers that we are exploring.
        
       | m3kw9 wrote:
       | So won't that make the network the prohibitive bottle neck? Your
       | memory bandwidth is 1gbps max
        
         | teaearlgraycold wrote:
         | Cloud hosts will offer 10Gb/s. Anyway, in my experience with
         | training LoRAs and running DINOv2 inference you don't need much
         | bandwidth. We are usually sitting at around 10-30MB/s per GPU.
        
       | userbinator wrote:
       | It's impressive that this is even possible, but I wonder what
       | happens if the network connection goes down or is anything but
       | 100% stable? In my experience drivers react badly to even a local
       | GPU that isn't behaving.
        
       | tamimio wrote:
       | I'm more interested in using tools like hashcat, any benchmark on
       | these? As the docs link returns error.
        
       | delijati wrote:
       | Even a directly attached eGPU via thunderbold 4 was after some
       | time too slow for machine learning aka training. As i work now
       | fully remote i just have a beefy midi tower. Some context about
       | eGPU [1].
       | 
       | But hey i'm happy to be proofed wrong ;)
       | 
       | [1] https://news.ycombinator.com/item?id=38890182#38905888
        
       | xyst wrote:
       | Exciting. But would definitely like to see a self hosted option.
        
       | ellis0n wrote:
       | In 2008, I had a powerful server with XEON CPU, but the
       | motherboard had no slots for a graphics card. I also had a
       | computer with a powerful graphics card but a weak Core 2 Duo. I
       | had the idea of passing the graphics card over the network using
       | Linux drivers. This concept has now been realized in this
       | project. Good job!
        
       | somat wrote:
       | What makes me sad is that the original sgi engineers who
       | developed glx were very careful to use x11 mechanisms for the gpu
       | transport, so it was fairly trivial to send the gl stream over
       | the network to render on your graphics card. "run on the
       | supercomputer down the hall, render on your workstation". More
       | recent driver development has not shown such care and this is
       | usually no longer possible.
       | 
       | I am not sure how useful it was in reality(usually if you had a
       | nice graphics card you also had a nice cpu) but I had fun playing
       | around with it. There was something fascinating about getting
       | accelerated graphics on a program running in the machine room. I
       | was able to get glquake running like this once.
        
       ___________________________________________________________________
       (page generated 2024-08-10 23:01 UTC)