[HN Gopher] Show HN: vGPU and SR-IOV on Consumer GPUs
       ___________________________________________________________________
        
       Show HN: vGPU and SR-IOV on Consumer GPUs
        
       Author : ArcVRArthur
       Score  : 105 points
       Date   : 2021-10-21 13:55 UTC (9 hours ago)
        
 (HTM) web link (arccompute.com)
 (TXT) w3m dump (arccompute.com)
        
       | AaronFriel wrote:
       | This article almost got my hopes up too much! I'm curious whether
       | Intel's "Ark" GPUs will support the same or if they'll go the
       | same path of Nvidia in locking down virtual function support.
        
       | COGlory wrote:
       | Very impressive. You should also post this to the Level1Techs
       | forum
        
         | ArcVRArthur wrote:
         | Wendell tweeted about LibVF.IO and it made my whole month haha:
         | https://twitter.com/tekwendell/status/1449054328766013440
         | 
         | Also I posted in the Level1Techs forum!
         | https://forum.level1techs.com/t/libvf-io-a-commodity-gpu-mul...
        
       | ArtWomb wrote:
       | Whoa! Multi GPU is so hard. This could be the start of something.
       | Looks like it makes use of nvidia's capture api?
        
       | gh123man wrote:
       | This is insanely impressive. Having tried set up GPU passthrough
       | in proxmox a few years ago, it was an absolute disaster. I would
       | love to see this kind of approach more widely supported by other
       | hypervisors!
       | 
       | It's a real shame consumer GPUs are arbitrary locked down when
       | the enterprise counterparts (often with the exact same chip) have
       | much better support for virtualization.
        
         | kfprt wrote:
         | The enterprise versions of these products have a lot of bugs
         | and gotchas because so few use the feature.
        
           | ArcVRArthur wrote:
           | I'm making this because I had the same problem. My hope is
           | that we can move the needle a little bit.
        
             | kfprt wrote:
             | As far as I'm concerned anyone that's working of
             | virtualized GPU is doing the Lord's work.
        
         | ArcVRArthur wrote:
         | Ya the lock out is absolutely arbitrary. There is zero physical
         | difference between the consumer and server chips for these
         | features. I think actually there's a lot of benefit to
         | consumers by having these features enabled! I talk about that a
         | bit here in our Xorg Developer Conference 2021 talk:
         | https://www.youtube.com/watch?v=8pVrTyLqV_I
         | 
         | We're going to try to add support for more distributions in the
         | coming days.
         | 
         | Right now we've got support in our install script for Ubuntu
         | 20.04 hosts and arbitrary guest operating systems (Windows
         | guests work best so far) but if people on GitHub are posting
         | issues asking for support for other systems I'll try my best to
         | get to those.
         | 
         | I'm going to try to add official support for Arch, PopOS, and
         | Fedora as I know some people who I think would use it on those
         | systems and a few others.
        
           | kipchak wrote:
           | Is the process of unlocking these features on Nvidia GPUs
           | similar to something like the vgpu_unlock tool is doing?[1]
           | No affiliation, just came across it trying to find a
           | replacement to the deprecated RemoteFX vGPU and am out of my
           | depth.
           | 
           | [1]https://github.com/DualCoder/vgpu_unlock
        
             | my123 wrote:
             | For replacing RemoteFX vGPU, what you might want is
             | https://forum.level1techs.com/t/2-gamers-1-gpu-with-
             | hyper-v-...
             | 
             | (which is the direct successor)
             | 
             | The advantage is that it ships inbox in Windows and doesn't
             | need license hacks or anything. It works cross-vendor too.
             | 
             | However, it needs the host OS to be Windows (with Hyper-V
             | being used).
        
               | kipchak wrote:
               | Interesting! News to me, the Microsoft documentation I
               | was looking at didn't make any mention of GPU-P but that
               | seems like a perfect fit. Was looking at a old Grid K2 to
               | avoid Nvidia licensing and direct pass through for high
               | use VMs but as we're full Microsoft (for better or worse)
               | their solution probably makes more sense.
               | 
               | edit - apparently while this works on Server 2019 direct
               | pass through is the only officially "supported" option. I
               | wonder if this is a stepping on partners' toes sort of
               | situation?
        
               | my123 wrote:
               | GPU-P is also the GPU acceleration infrastructure used
               | for GPU acceleration in WSL2 and for Windows Sandbox too.
        
               | ArcVRArthur wrote:
               | I'm not entirely sure on how wide the vendor support is
               | but I wouldn't be entirely surprised if they were
               | upsetting some folks with it. Happily we're just a little
               | team right now making stuff that's useful for ourselves
               | so we don't have all the same pressures big companies
               | have. I hope if we ever grow we'll act in the same spirit
               | (I'll try my best to see that we do anyway).
        
               | ArcVRArthur wrote:
               | I think Hyper-V is GPU-P is great! I think Microsoft's
               | hypervisor team is one of the most talented in the world,
               | and honestly I think I could learn a lot from them.
               | 
               | One of the benefits to using our approach instead of
               | Microsoft's is that our tools are free open source
               | software and (my biased opinion) I think we have an
               | easier user setup. :)
               | 
               | Some day I would love to read Hyper-V's GPU-P code as I
               | think they did a rather good job overall.
        
             | ArcVRArthur wrote:
             | vGPU_Unlock's Merged driver is an optional package you can
             | include but if you don't want to use it there's no explicit
             | dependance. We actually enable these features using a
             | vendor neutral API called VFIO-Mdev:
             | 
             | https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driv
             | e...
             | 
             | Here's a few examples of YAML for use with different GPU
             | vendors:
             | 
             | Intel: https://github.com/Arc-
             | Compute/libvf.io/blob/master/example/... Nvidia:
             | https://github.com/Arc-
             | Compute/libvf.io/blob/master/example/... AMD:
             | https://github.com/Arc-
             | Compute/libvf.io/blob/master/example/...
             | 
             | The odd one out is AMD that uses a different API due to the
             | fact that the vendor has largely ignored standard open
             | source interfaces in the kernel. We're still supporting
             | that API but unfortunately there are very few AMD cards
             | that work due to the fact that they refuse to release open
             | source code to support their newer cards and they have
             | locked out these features at the firmware level on consumer
             | cards. Fortunately Nvidia and Intel GPUs are very well
             | suited to this functionality and we've got support for most
             | recent consumer cards from both!
        
       | [deleted]
        
       | liuliu wrote:
       | Great! Still requires vGPU support and the merged driver approach
       | last time I tried won't support CUDA on host (I was probably the
       | first one tried the merged driver thing with vgpu_unlocked?).
       | 
       | Looking forward someone write a Vulkan driver on Windows that
       | just shuttles down to the Linux host. virgl used to be a
       | promising project ...
        
         | ArcVRArthur wrote:
         | I CUDA does work on the host in our testing. :) We also run
         | Vulkan/DirectX/OpenGL at full performance in the guest! It's
         | WAY faster than Virgl.
        
       | awesnvadsome wrote:
       | This seems awesome! I have a passthrough setup with a very old
       | card and a much newer one for games in a windows vm, it'll be
       | nice to look into getting this set up and I can reduce the power
       | draw on my system which was causing some problems...
       | 
       | Is this something that could work with/be integrated with libvirt
       | for easy configuration? Itd be neat to set it up with my current
       | install, although not at all a real problem.
        
         | ArcVRArthur wrote:
         | We actually are a replacement for Libvirt that tries to
         | simplify a few things. Take a look at LibVF.IO's user API. I
         | tried to make it a bit more human friendly, like Docker:
         | https://github.com/Arc-Compute/libvf.io/blob/master/example/...
        
           | awesnvadsome wrote:
           | Neat! Certainly more readable than the massive xml file!
           | Thanks for the hard work!
           | 
           | E: The code generating the qemu commands is also quite
           | readable, so I think migrating my install will be quite
           | doable + If needs be i can add args manually.
        
       | uberduper wrote:
       | I'm familiar with linux virtualization, gpu passthrough, etc.
       | I've never heard of arcd and they've made no attempt in this doc
       | or on their git to explain what it is or why it exists as, I
       | assume, a replacement (or wrapper?) for qemu.
       | 
       | My past experience with looking-glass is that it falls on its
       | face at anything > 1440p@60Hz. I'm interested in vGPU for my
       | linux VMs (spice is slow and sdl/gtk display is flakey) but for
       | gaming, I don't want looking-glass and prefer to just do the
       | passthrough thing with a KVM switch.
        
         | posix_me_less wrote:
         | > sdl/gtk display is flakey
         | 
         | How is it compared to spice/qxl - does it have some advantages?
        
           | uberduper wrote:
           | In my case with 4k monitors and virgl, it's substantially
           | faster and a desktop env feels native. Spice feels quite
           | laggy to me.
           | 
           | The downside is that the sdl or gtk interface is coupled to
           | the running vm and if they crash, the vm goes too. And in my
           | experience, they crash a lot. I've been unable to get a two
           | monitor setup working with either the sdl or gtk display but
           | it works fine via spice.
        
         | ArcVRArthur wrote:
         | arcd is actually the part of the program that you call in your
         | shell to use our library's functions. Arc Compute is the name
         | of our company so arcd seemed like a reasonable name. We also
         | considered calling it vfd but so far we've settled on arcd.
        
         | ArcVRArthur wrote:
         | We are still trying to make the documentation better. We're
         | actually hiring right now so if you can recommend someone who
         | might be able to help with that I'd love to talk to them. :)
         | 
         | Also looking-glass is an option under introspection: in the
         | yaml. You can use whatever type of virtual display you like
         | best. We chose looking glass as the default because in our
         | testing it was the most performant.
        
         | xt00 wrote:
         | Yea what / where is arcd? is that a closed source portion of
         | this stack that this depends upon? not super clear what is
         | really open source about this stuff -- this looks like a pile
         | of scripts to run arcd?
         | 
         | edit: ok thanks so its qemu-system-x86_64
         | 
         | Have you guys tried to run something like android in the vm on
         | an arm host? or what are the limitations there?
        
           | ArcVRArthur wrote:
           | All the source code for arcd is here: https://github.com/Arc-
           | Compute/libvf.io/tree/master/src
           | 
           | :)
        
       | ncmncm wrote:
       | They had me until YAML.
        
         | anentropic wrote:
         | yes, but interestingly it seems to be https://nimyaml.org/ :)
        
       | belval wrote:
       | Just to clarify because I've failed at doing pretty much the
       | setup that you are describing with my 1080Ti. Does this still
       | require the vgpu_unlock changes for Nvidia cards or is this
       | something that bypasses the need for it entirely?
        
         | ArcVRArthur wrote:
         | My main development rig actually uses a 1080Ti!
         | 
         | The normal vGPU_Unlock steps aren't a part of the user setup
         | process, rather the optional merged driver package is built
         | with a C version of the code that's been prepackaged. If you
         | decide to use the vGPU_Unlock merged driver option during setup
         | you won't have to do the (somewhat intense) process of
         | vGPU_Unlock setup, you'll just skip straight to the end result.
         | :)
        
       | omreaderhn wrote:
       | This looks really nice. Does this work with Ampere GPUs?
        
         | ArcVRArthur wrote:
         | We're supporting Ampere in our own VPS cloud (arccompute.com)
         | but there are some hurtles on getting it working on consumer
         | Ampere gear in the nv merged driver due to changes in the use
         | of SR-IOV APIs - that is to say Nvidia has now enabled SR-IOV
         | whereas before they were doing their entirely different thing
         | to achieve similar functionality. We've got SR-IOV APIs working
         | with our use of mdevType: "sriovdev" in our yaml configuration
         | layer so from our perspective it's already possible to use but
         | unfortunately until the nv merged drivers work with Ampere we
         | won't be able to do a lot on our end.. Right now I think the
         | vGPU_Unlock guys are working on the 3090 and we'll be sure to
         | support it once it's ready. I'll also try to remember to post
         | about it on GitHub once I've tested it out for myself and
         | confirmed our software works with it.
         | 
         | By the way, we're hiring GPU driver engineers and kernel
         | hackers!
         | 
         | Please reach out to me if you know anyone who might be
         | interested. My email is: arthur@arccompute.com
        
           | rirze wrote:
           | Super thrilled to hear you guys working on the 3090! I'll be
           | keeping an eye out for any progress
        
       | csdvrx wrote:
       | That's very impressive!
       | 
       | You may want to do the same for NVMe: creating several namespaces
       | is not supported on most consumer drives, while laptops can
       | rarely have more than 1 NVMe (same problem as with the GPUs: a
       | passthrough requires having 2 of them)
       | 
       | Being able to split the NVMe drive not by partition but by
       | namespace would let each OS see a "full drive".
        
       | DiabloD3 wrote:
       | I tried doing this years ago, and never quite go it to work.
       | 
       | Some of the software involved in that article simply didn't exist
       | yet, and GPUs weren't shipping with SR-IOV support yet (instead,
       | I did Intel iGPU for Linux fbcon, real AMD GPU fed directly to
       | the Windows VM with PCI-E Passthrough). In the end, I bailed on
       | that dream and moved the Linux install to its own smaller
       | machine, and ran Windows bare on the big machine.
       | 
       | The problem was, if the GPU locked up hard, and GPUs back then
       | would not respond to PCI Device Reset, if it wasn't something
       | that merely re-initializing it on VM restart would fix... I had
       | to restart the _entire_ machine, thus defeating the purpose of
       | having Windows in the VM in the first place!
       | 
       | All my long-lived processes now run on the stand-alone Linux
       | machine, and anything that is free to explode runs on my Windows
       | machine. Windows gets wonky? Restart, ssh back into my screen
       | sessions, reopen the browser, restart a bunch of cloud slaved
       | apps, tada.
        
         | ArcVRArthur wrote:
         | I got into writing this actually because graphics sharing was
         | such a bad experience that I would usually run a Windows host
         | and then Linux guests in something like VirtualBox or VMWare
         | Workstation. I always wanted a host with fault tolerant guests
         | but GPU sharing features seemed to be locked to "enterprise
         | GPUs" and it wouldn't work on my consumer GPU. This solves some
         | of those problems I was having and lets me run Linux on my host
         | with a full performance Windows guest (at least for me it
         | does).
        
       | kfprt wrote:
       | Note Intel GVT-g is only available on <=Gen9 GPU's. Gen11 and 12
       | are not supported.
        
         | ArcVRArthur wrote:
         | It also works on Intel's ARC and DG1/DG2 discrete GPUs.
         | 
         | Our software also works on most consumer Nvidia cards and also
         | some consumer AMD cards.
        
           | kfprt wrote:
           | Do you have any specific knowledge of Intels support for
           | graphics virtualization on ARC GPU's?
           | 
           | What are the security implications of LibVF and would it be
           | compatible with the QubesOS security model?
        
             | ArcVRArthur wrote:
             | Ya, they are enabling GVT-g on all their GPUs going forward
             | including the ARC branded devices (not just the embedded
             | GPUs with Xe branding). They have no plans to lock out
             | those features on consumer devices. In fact I bought a $700
             | computer from Best Buy that had an intel discrete GPU in it
             | for development. :)
             | 
             | Ya, I have been talking with the Qubes guys. I'm trying my
             | best to figure out how I can help them ship something with
             | LibVF.IO but they are currently based on Xen which uses
             | Libxl rather than VFIO which may present some challenges.
        
         | orangepurple wrote:
         | GVT-G has hard resolution limitations and was never stable with
         | QEMU on Linux
        
           | ArcVRArthur wrote:
           | GVT-g uses KVMgt. Intel has not yet upstreamed their patches.
        
       | rektide wrote:
       | Really hoping AMD eventually does the right thing here. Not that
       | it particularly matters seeing as how decent AMD video cards have
       | been unpurchaseable for 18 months now.
       | 
       | Consumers should have the ability to use their hardware well too.
       | Selling the same thing at 2X the price differentiating only on
       | virtualization capabilities is not a moral path.
       | 
       | > _We remain hopeful that AMD will recognize forthcoming changes
       | in GPU virtualization with the creation of open standards such as
       | Auxiliary Domains (AUX Domains), Mdev (VFIO-Mdev developed by
       | Nvidia, RedHat, and Intel), and Alternative Routing-ID
       | Interpretation (ARI) especially in light of Intel 's market
       | entrance with their ARC line of GPUs supporting Intel Graphics
       | Virtualization Technology (GVT-g)._
       | 
       | Really cool to hear there are a bunch of vGPU-related efforts
       | underway! That's so great.
        
         | ArcVRArthur wrote:
         | I have to pinch myself from time to time to remind myself I'm
         | not dreaming. I honestly feel like I'm working on the coolest
         | project in the world with some of the smartest people I've ever
         | met. If you know anyone who wants to help us work on open
         | source vGPU stuff please reach out to me at
         | arthur@arccompute.com.
         | 
         | We're hiring!!!
        
       ___________________________________________________________________
       (page generated 2021-10-21 23:02 UTC)