[HN Gopher] NVIDIA Transitions Fully Towards Open-Source Linux G...
___________________________________________________________________
NVIDIA Transitions Fully Towards Open-Source Linux GPU Kernel
Modules
Author : shaicoleman
Score : 821 points
Date : 2024-07-17 18:40 UTC (1 days ago)
(HTM) web link (developer.nvidia.com)
(TXT) w3m dump (developer.nvidia.com)
| hypeatei wrote:
| How is the NVIDIA driver situation on Linux these days? I built a
| new desktop with an AMD GPU since I didn't want to deal with all
| the weirdness of closed source or lacking/obsolete open source
| drivers.
| tadasv wrote:
| great. rtx 4090 works out of the box after installing drivers
| from non-free. That's on debian bookworm.
| jppittma wrote:
| 4070 worked out of the box on my arch system. I used the closed
| source drivers and X11 and I've not encountered a single
| problem.
|
| My prediction is that it will continue to improve if only
| because people want to run nvidia on workstations.
| jcranmer wrote:
| I built my new-ish computer with an AMD GPU because I trusted
| in-kernel drivers better than out-of-kernel DKMS drivers.
|
| That said, my previous experience with the DKMS driver stuff
| hasn't been bad. If you use Nvidia's proprietary driver stack,
| then things should generally be fine. The worst issues are that
| Nvidia has (historically, at least; it might be different for
| newer cards) refused to implement some graphics features that
| everybody else uses, which means that you basically need
| entirely separate codepaths for Nvidia in window managers, and
| some of them have basically said "fuck no" to doing that.
| mepian wrote:
| The current stable proprietary driver is a nightmare on Wayland
| with my 3070, constant flickering and stuttering everywhere.
| Apparently the upcoming version 555 is much better, I'm
| sticking with X11 until it comes out. I never tried the open-
| source one yet, not sure if it supports my GPU at all.
| llmblockchain wrote:
| I have a 3070 on X and it has been great.
| levkk wrote:
| Same setup here. Multiple displays don't work well for me.
| One of the displays doesn't often get detected after
| resuming screen saver.
| llmblockchain wrote:
| I have two monitors connected to the 3070 and it works
| well. The only issue I had was suspending, the GPU would
| "fall of the bus" and not get its power back when the PC
| woke up. I had to add the kernel line "pcie_aspm=off" to
| prevent the GPU from falling asleep.
|
| So... not perfect, but it works.
| josephg wrote:
| Huh. I'm using 2 monitors connected to a 4090 on Linux
| mint - which is still using X11. It works flawlessly,
| including DPI scaling. Wake from sleep is fine too.
|
| I haven't tried wayland yet. Sounds like it might be time
| soon given other comments in this thread.
| misterbishop wrote:
| this is resolved in 555 (currently running 555.58.02). my
| asus zephyrus g15 w/ 3060 is looking real good on Fedora 40.
| there's still optimizations needed around clocking, power,
| and thermals. but the graphics presentation layer has no
| issues on wayland. that's with hybrid/optimus/prime
| switching, which has NEVER worked seamlessly for me on any
| laptop on linux going back to 2010. gnome window animations
| remain snappy and not glitchy while running a game. i'm
| getting 60fps+ running baldurs gate 3 @ 1440p on the low
| preset.
| robviren wrote:
| Had similar experience with my Legion 5i 3070 with Wayland
| and Nvidia 555, but my HDMI out is all screwed up now of
| course. Working on 550. One step forward and one step back.
| misterbishop wrote:
| is there a mux switch?
| bcrescimanno wrote:
| The 555 version is the current version. It was officially
| released on June 27.
|
| https://www.phoronix.com/news/NVIDIA-555.58-Linux-Driver
| JasonSage wrote:
| In defense of the parent, upcoming can still be a relative
| term, albeit a bit misleading. For example: I'm running the
| 550 drivers still because my upstream nixos-unstable
| doesn't have 555 for me yet.
| mepian wrote:
| Yep, I'm on openSUSE Tumbleweed, and it's not rolled out
| there yet. I would rather wait than update my drivers
| out-of-band.
| SushiHippie wrote:
| The versions that nixos provides are based on the files
| in this repo
|
| https://github.com/aaronp24/nvidia-versions
|
| See: https://github.com/NixOS/nixpkgs/blob/9355fa86e6f274
| 22963132...
|
| You could also opt to use the latest driver instead of
| stable: https://nixos.wiki/wiki/Nvidia
| mananaysiempre wrote:
| > nixos-unstable doesn't have 555
|
| Version 555.58.02 is under "latest" in nixos-unstable as
| of about three weeks ago[1]. (Somebody should check with
| qyliss if she knows the PR tracker is dead... But the
| last nixos-unstable bump was two days ago, so it's
| there.)
|
| [1] https://github.com/NixOS/nixpkgs/commit/4e15c4a8ad30c
| 02d6c26...
| JasonSage wrote:
| `nvidia-smi` shows that my driver version is 550.78. I
| ran `nixos-rebuild switch --upgrade` yesterday. My nixos
| channel is `nixos-unstable`.
|
| Do you know something I don't? I'd love to be on the
| latest version.
|
| I should have written my post better, it implies that 555
| does not exist in nixpkgs, which I never meant. There's
| certainly a phrasing that captures what I'm seeing more
| accurately.
| atrus wrote:
| Are you using flakes? If you don't do `nix flake update`
| there won't be all that much to update.
| JasonSage wrote:
| I am! I forgot about this. Mental model check happening.
|
| (Still on 550.)
| mananaysiempre wrote:
| I did not mean to chastise you or anything, just to
| suggest you could be able to have a newer driver if you
| had missed the possibility.
|
| The thing is, AFAIU, NVIDIA has several release channels
| for their Linux driver[1] and 555 is not (yet?) the
| "production" one, which is what NixOS defaults to (550
| is). If you want a different degree of freshness for your
| NVIDIA driver, you need to say so explicitly[2]. The
| necessary incantation should be
| hardware.nvidia.package =
| config.boot.kernelPackages.nvidiaPackages.latest;
|
| This is somewhat similar to how you get a newer kernel by
| setting boot.kernelPackages to linuxPackages_latest, for
| example, if case you've ever done that.
|
| [1] https://www.nvidia.com/en-us/drivers/unix/
|
| [2] https://nixos.wiki/wiki/Nvidia
| JasonSage wrote:
| I had this configuration but was lacking a flake update
| to move my nixpkgs forward despite the channel, which I
| can understand much better looking back.
|
| Thanks for the additional info, this HM thread has helped
| me quite a bit.
| zxexz wrote:
| I love NixOS, and the nvidia-x11 package is truly
| wonderful and captures so many options. But having such a
| complex package makes updating and regression testing
| take time. For ML stuff I ended up using it as the basis
| for an overlay, and ripping out literally everything I
| don't need, which makes it a matter of minutes usually to
| make the changes requires to upgrade when a new driver is
| released I'm running completely headless because these
| are H100 nodes, and I just need persistenced and
| fabricmanager, and GDRMA (which wasn't working at all,
| causing me to go down this rabbit hole of stripping
| everything away until I could figure out why).
| postcert wrote:
| I was going to say specialisations might be useful for
| you to keep a previous driver version around for testing
| but you might be past that point!
|
| Having the ability to keep alternate configurations for
| $previous_kernel and $nvidia_stable have been super
| helpful in diagnosing instead of rolling back.
| gmokki wrote:
| I switched to Wayland 10 years ago when it became an option
| ok Fedora. First thing I had to do was to drop NVIDIA and
| switch to Intel GPU, and past 5 years to AMD GPU. Makes a big
| difference if the upstream kernel is supported.
|
| Maybe NVIDIA drivers have kind of worked on 12 month old
| kernels that Ubuntu on average uses.
| anon291 wrote:
| I've literally never had an issue in decades of using NVIDIA
| and linux. They're closed source, but the drivers work very
| consistently for me. NVIDIA's just the only option if you want
| something actually good and to run ML workloads as well.
| sqeaky wrote:
| > but the drivers work very consistently for me
|
| The problem with comments like this is that you never know if
| you will be me or you on your graphics card or laptop.
|
| I have tried nvidia a few times and kept getting burnt. AMD
| just works. I don't get the fastest ML machine, but I am just
| a tinkerer there and OpenCL works fine for my little toy apps
| and my 7900XTX blazes through every wine game.
|
| If you need it professionally than you need it, warts an all.
| For any casual user that 10% extra gaming performance needs
| to weighed against reliability.
| Workaccount2 wrote:
| It also depends heavily on the user.
|
| A mechanic might say "This car has never given me a
| problem" because the mechanic doesn't consider cleaning an
| idle bypass circuit or adjusting valve clearances to be a
| "problem". To 99% percent of the population though, those
| are expensive and annoying problems because they have no
| idea what those words even mean, much less the ability to
| troubleshoot, diagnose, and repair.
| chasil wrote:
| If you use a search engine for "Torvalds Nvidia" you will
| discern a certain attitude towards Nvidia as a
| corporation and its products.
|
| This might provide you a suggestion that alternate
| manufacturers should be considered.
|
| I have confirmed this to be the case on Google and Bing,
| so DuckDuckGo and Startpage will also exhibit this
| phenomena.
| Dylan16807 wrote:
| An opinion on support from over ten years ago is not a
| very strong suggestion.
| chasil wrote:
| Your problem there is that both search engines place this
| image and backstory at the top of the results, so neither
| Google nor Bing agree with any of you.
|
| If you think they're wrong, be sure to let them know.
| lyu07282 wrote:
| What torvalds is complaining about is absolutely true,
| but the problem is that most users do not give a shit
| about those issues. Torvalds disagreement wasn't about
| bugs in-, or complains about the quality of the
| proprietary driver, he complained about nvidias lack of
| open source contributions and bad behavior towards the
| kernel developer community. But users don't care if they
| run a proprietary driver as long as it works (and it does
| work fine for most people).
|
| So you see now why that's not very relevant to end-users
| experiences they were talking about?
| chasil wrote:
| No.
| Dylan16807 wrote:
| Do you think Google and Bing are endorsing top results,
| and in particular endorsing a result like that in the
| specific context of what manufacturers I consider buying
| from?
|
| That's the only way they would be disagreeing with me.
| dahart wrote:
| Torvalds has said nasty mean things to a lot of people in
| the past, and expressed regret over his temper &
| hyperbole. Try searching for something more recent
| https://youtu.be/wvQ0N56pW74
| lyu07282 wrote:
| a lot has probably to do with not really understanding
| their distributions package manager and lkms
| specifically, I also always suspected that most Linux
| users don't know if they are using Wayland or X11 and the
| issues they had were actually Wayland specific ones they
| wouldn't have with Nvidia/x11 and come to think of it,
| how would they even know if it's a GPU driver issue in
| the first place? Guess I'm the mechanic in your analogy.
| sqeaky wrote:
| When I run Gentoo or Arch, I know. But when I run Ubuntu
| or Fedora, should I have needed to know?
|
| On plenty of distros "I want to install it and forget
| about is reasonable" and on both Gentoo and Ubuntu I have
| rebooted from a working system into a system where the
| display stopped working, at least on Gentoo I was ready
| because I broke it somehow.
| lyu07282 wrote:
| Absolutely I once had an issue with kernel/user-space
| driver version mismatch in Ubuntu, trivial to fix and the
| kernel logs tell you what's wrong. But yeah I get that
| most users don't read their kernel logs and it shouldn't
| be an expectation to do so for normal users of linux. The
| experiences are just very different, it's why the car
| mechanic analogy fits so well.
|
| I think it also got so much better over time, I've been
| using Linux since debian woody (22 years ago) the stuff
| you had to deal with back then heavily skews my
| perspective on what users today see as unacceptable
| brokenness in the Nvidia driver.
| anon291 wrote:
| I've run NixOS for almost a decade now and I honestly
| would not recommend anything else. I've had many issues
| with booting on almost every distro. They're about as
| reliable as Windows in that regard. NixOS has been
| absolutely rock solid; beyond anything I could possibly
| have hoped for. In the extremely rare case my system
| would not boot, I've either found a hardware problem that
| would affect anyone, or I could just revert to a previous
| system revision and boot up. Never had any problem. No
| longer use anything else because it's just too risky
| vetinari wrote:
| If there's an issue with Nvidia/Wayland and there isn't
| with AMD/Wayland or Intel/Wayland, it is Nvidia issue
| then, not Wayland one.
| lmm wrote:
| > AMD just works. I don't get the fastest ML machine, but I
| am just a tinkerer there and OpenCL works fine for my
| little toy apps and my 7900XTX blazes through every wine
| game.
|
| That's the opposite of my experience. I'd love to support
| open-source. But the AMD experience is just too flaky, too
| card-dependent. NVidia is rock-solid (maybe not for
| Wayland, but I never wanted Wayland in the first place).
| sqeaky wrote:
| What kind of flakiness? The only AMD GPU problem I have
| had involved a lightning strike killing a card while I
| was gaming.
|
| My nvidia problems are generally software and update
| related. The NVidia stuff usually works on popular
| distros, but as soon anything custom or a surprise update
| happens then there is a chance things break.
| bobajeff wrote:
| I did when my card stopped being supported by all the distros
| because it was too old while the legacy driver didn't fully
| work the same.
| l33tman wrote:
| Same here, been using the nvidia binary drivers on a dozen
| computers with various other HW and distros for decades with
| never any problems whatsoever.
| pizza234 wrote:
| Up to a couple of years ago, before permanently moving to AMD
| GPUs, I couldn't even boot Ubuntu with an Nvida GPU. This was
| because Ubuntu booted by default with Nouveau, which didn't
| support a few/several series (I had at least two different
| series).
|
| The cards worked fine with binary drivers once the system was
| installed, but AFAIR, I had to integrate the binary driver
| packages in the Ubuntu ISO in order to boot.
|
| I presume that now, the situation is much better, but
| necessiting binary drivers can be a problem in itself.
| resoluteteeth wrote:
| Are you using wayland or are you still on x11? My experience
| was that the closed source drivers were fine with x11 but a
| nightmare with wayland.
| Keyframe wrote:
| Me too. Now I have a laptop with discrete nvidia and an eGPU
| with 3090 in it, a desktop with 4090, another laptop with
| another discrete nvidia.. all switching combinations work,
| acceleration works, game performance is on par with windows
| (even with proton to within a small percentage or even
| sometimes better). All out of the box with stock Ubuntu and
| installing driver from Nvidia site.
|
| The only "trick" is I'm still on X11 and probably will stay.
| Note that I did try wayland on few occasions but I steered
| away (mostly due to other issues with it at the time).
| isatty wrote:
| Likewise. Rock solid for decades in intel + nvidia
| proprietary drivers even when doing things like hot plugging
| for passthroughs.
| anon291 wrote:
| Yeah I once worked at a cloud gaming company that used Wine
| on Linux on NVIDIA to stream cloud games. They were the
| only real option for multi-game performance, and very rock
| solid in terms of uptime. I truly have no idea what people
| are talking about. Yes I use X11.
| art0rz wrote:
| I've been running Arch with KDE under Wayland on two different
| laptops both with NVIDIA GPUs using proprietary drivers for
| years and have not run into issues. Maybe I'm lucky? It's been
| flawless for me.
| lyu07282 wrote:
| The experiences always vary quite a lot, it depends so much
| on what you do with it. For example discord doesn't support
| screen sharing with Wayland, it's just one small example but
| those can add up over time. Another example is display
| rotation which was broken in kde for a long time (recently
| fixed).
| green-salt wrote:
| Whatever pop_os uses has been quite stable for my 4070.
| tormeh wrote:
| Pop uses X by default because of Nvidia.
| segmondy wrote:
| plug, install then play, I got 3 different Nvidia GPU sets and
| all running without any issue, nothing crazy to do but follow
| installation instructions.
| anonym29 wrote:
| To some of us, running any closed source software in userland
| qualifies as quite crazy indeed.
| DaoVeles wrote:
| I have never had an issue with them. That said I typically go
| mid range on cards so they are usually hardened architecture
| due to a year or two of being in the high end.
| mathfailure wrote:
| Depends on the version of drivers: 550 version results into
| black screen (you have to kill and restart X server) after
| waking up from sleep. 535 version doesn't have this bug. Don't
| know about 555.
|
| Also tearing is a bitch. Still. Even with
| ForceCompositionPipeline.
| drdaeman wrote:
| 3090 owner here.
|
| Wayland is even worse mess than it normally is. Used to flicker
| real bad before 555.58.02, less so with the latest driver - but
| still has some glitches with games. A bunch of older Electron
| apps still fail to render anything and require hardware
| acceleration disabled. I gave up trying to make it all work -
| can't get rid of all the flicker and drawing issues, plus
| Wayland seems to be a real pain in the ass with HiDPI displays.
|
| X11 sort of works, but I had to entirely disable DPMS or one of
| my monitors never comes back online after going to sleep. I
| thought it was my KVM messing up, but that happened even with a
| direct connection... no idea what's going on there.
|
| CUDA works fine, save for the regular version compatibility
| hiccups.
| senectus1 wrote:
| 4070ti super here, X11 is fine, i have zero issues.
|
| Wayland is mostly fine, though i get some windowframe
| glitches when maxing them to the monitor and a another issue
| that i'm pretty sure is wayland but it has obnly happened a
| couple of times and it locks the whole device up. I cant
| prove it yet.
| devwastaken wrote:
| KDE plasma 6 + Nvidia beta 555 works well. Have to make
| .desktop files to launch some applications explicitly Wayland.
| adrian_b wrote:
| I am not using Wayland and I do not have any intention to use
| it, therefore I do not care for any problems caused by Wayland
| not supporting NVIDIA and demanding that NVIDIA must support
| Wayland.
|
| I am using only Linux or FreeBSD on all my laptop, desktop or
| server computers.
|
| On desktop and server computers I did not ever have the
| slightest difficulty with the NVIDIA proprietary drivers,
| either for OpenGL or for CUDA applications or for video
| decoding/encoding or for multiple monitor support, with high
| resolution and high color depth, on either Gentoo/Funtoo Linux
| or FreeBSD, during the last two decades. I also have AMD GPUs,
| which I use for compute applications (because they are older
| models, which still had FP64 support). For graphics
| applications they frequently had annoying bugs, unlike NVIDIA
| (however my AMD GPUs have been older models, preceding RDNA,
| which might be better supported by the open-source AMD
| drivers).
|
| The only computers on which I had problems with NVIDIA on Linux
| were those laptops that used the NVIDIA Optimus method of
| coexistence with the Intel integrated GPUs. Many years ago I
| have needed a couple of days to properly configure the drivers
| and additional software so that the NVIDIA GPU was selected
| when desired, instead of the Intel iGPU. I do not know if any
| laptops with NVIDIA Optimus still exist. The laptops that I
| bought later had video outputs directly from the NVIDIA GPU, so
| there was no difference between them and desktops and the
| NVIDIA drivers worked flawlessly.
|
| Both on Gentoo/Funtoo Linux and FreeBSD I never had to do
| anything else but to give the driver update command and
| everything worked fine. Moreover, NVIDIA has always provided a
| nice GUI application "NVIDIA X Server Settings", which provides
| a lot of useful information and which makes very easy any
| configuration tasks, like setting the desired positions of
| multiple monitors. A few years ago there was nothing equivalent
| for the AMD or Intel GPU drivers, but that might have changed
| meanwhile.
| tgsovlerkhgsel wrote:
| My experience with an AMD iGPU on Linux was so bad that my next
| laptop will be Intel. Horrible instability to the point where I
| could reliably crash my machine by using Google Maps for a few
| minutes, on both Chrome and Firefox. It got fixed eventually -
| with the next Ubuntu release, so I had a computer where I was
| afraid to use anything with WebGL for half a year.
| brrrrrm wrote:
| Kernel is an overloaded term for GPUs. This is about the linux
| kernel
| karamanolev wrote:
| "... Linux GPU Kernel Modules" is pretty unambiguous to me.
| brrrrrm wrote:
| Yep the title was updated.
| brrrrrm wrote:
| Guh, wish i could delete this now that the title was updated.
| the original title (shown on the linked page) wasn't super
| clear
| berkeleyjunk wrote:
| As someone who is pretty skeptical and reads the fine print, I
| think this is a good move and I really do not see a downside
| (other than the fact that this probably strengthens the nVidia
| monoculture).
| vlovich123 wrote:
| AFAIK I believe all they did was move the closed source user
| space driver code to their opaque firmware blob leaving a thin
| shim in the kernel.
|
| In essence I don't believe that much has really changed here.
| stkdump wrote:
| But the firmware runs directly on the hardware, right? So
| they effectively rearchitected their system to move what used
| to be 'above' the kernel to 'below' the kernel, which seems
| like a huge effort.
| vlovich123 wrote:
| It's some effort but I bet they added a classical serial
| CPU to run the existing code. In fact, [1] suggests that's
| exactly what they did. I suspect they had other reasons to
| add the GSP so the amortized cost of moving the driver code
| to firmware was actually not that large all things
| considered and in the long term reduces their costs (eg
| they reduce the burden further of supporting multiple OSes,
| they can improve performance further theoretically, etc
| etc)
|
| [1] https://download.nvidia.com/XFree86/Linux-x86_64/525.78
| .01/R...
| p_l wrote:
| That's exactly what happened - Turing microarchitecture
| brought in new[1] "GSP" which is capable enough to run
| the task. Similar architecture happens AFAIK on Apple
| M-series where the GPU runs its own instance of RTOS
| talking with "application OS" over RPC.
|
| [1] Turing GSP is not the first "classical serial CPU" in
| nvidia chips, it's just first that has enough juice to do
| the task. Unfortunately without recalling the name of the
| component it seems impossible to find it again thanks to
| search results being full of nvidia ARM and GSP pages...
| mepian wrote:
| >the name of the component
|
| Falcon?
| p_l wrote:
| THANK YOU, that was the name I was forgetting :)
|
| here's[1] a presentation from nvidia regarding (unsure if
| done or not) plan for replacing Falcon with RISC-V, [2]
| suggests the GSP is in fact the "NV-RISC" mentioned in
| [1]. Some work on reversing Falcon was apparently done
| for Switch hacking[3]?
|
| [1] https://riscv.org/wp-
| content/uploads/2016/07/Tue1100_Nvidia_... [2]
| https://www.techpowerup.com/291088/nvidia-unlocks-gpu-
| system... [3] https://github.com/vbe0201/faucon
| knotimpressed wrote:
| Would you happen to have a source or any further readings
| about Apple M-series GPUs running their own RTOS
| instance?
| p_l wrote:
| Asahi Linux documentation has pretty good writeup.
|
| The GPU is described here[1] and the mailbox interface
| used generally between various components is described
| here [2]
|
| [1]
| https://github.com/AsahiLinux/docs/wiki/HW%3AAGX#overview
|
| [2] https://github.com/AsahiLinux/docs/wiki/HW%3AASC
| imtringued wrote:
| Why? It should make it much easier to support Nvidia GPUs
| on Windows, Linux, Arm/x86/RISC-V and more OSes with a
| single firmware codebase per GPU now.
| stkdump wrote:
| Yes makes sense, in the long run it should make their
| life easier. I just suspect that the move itself was a
| big effort. But probably they can afford that nowadays.
| adrian_b wrote:
| Having as open-source all the kernel, more precisely all the
| privileged code, is much more important for security than
| having as open-source all the firmware of the peripheral
| devices.
|
| Any closed-source privileged code cannot be audited and it
| may contain either intentional backdoors, or, more likely,
| bugs that can cause various undesirable effects, like crashes
| or privilege escalation.
|
| On the other hand, in a properly designed modern computer any
| bad firmware of a peripheral device cannot have a worse
| effect than making that peripheral unusable.
|
| The kernel should take care, e.g. by using the I/O MMU, that
| the peripheral cannot access anything where it could do
| damage, like the DRAM not assigned to it or the non-volatile
| memory (e.g. SSDs) or the network interfaces for
| communicating with external parties.
|
| Even when the peripheral is so important as the display, a
| crash in its firmware would have no effect if the kernel had
| reserved some key combination to reset the GPU (while I am
| not aware of such a useful feature in Linux, its effect can
| frequently be achieved by switching, e.g. with Alt+F1, to a
| virtual console and then back to the GUI, the saving and
| restoring of the GPU state together with the switching of the
| video modes being enough to clear some corruption caused by a
| buggy GPU driver or a buggy mouse or keyboard driver).
|
| In conclusion, making the NVIDIA kernel driver as open source
| does not deserve to have its importance minimized. It is an
| important contribution to a more secure OS kernel.
|
| The only closed-source firmware that must be feared is that
| which comes from the CPU manufacturer, e.g. from Intel, AMD,
| Apple or Qualcomm.
|
| All such firmware currently includes various features for
| remote management that are not publicly documented, so you
| can never be sure if they can be properly disabled,
| especially when the remote management can be done wirelessly,
| like through the WiFi interface of the Intel laptop CPUs, so
| you cannot interpose an external firewall to filter the
| network traffic of any "magic" packets.
|
| A paranoid laptop user can circumvent the lack of control
| over the firmware blobs from the CPU manufacturer by
| disconnecting the internal antennas and using an external
| cheap and small single-board computer for all wired and
| wireless network access, which must run a firewall with tight
| rules. Such a SBC should be chosen among those for which
| complete hardware documentation is provided, i.e. including
| its schematics.
| saagarjha wrote:
| Did you run this through a LLM? I'm not sure what the point
| is of arguing with yourself and bringing up points that
| seem tangential to what you started off talking about
| (...security of GPUs?)
| adrian_b wrote:
| I have not argued with myself. I do not see what made you
| believe this.
|
| I have argued with "I don't believe that much has really
| changed here", which is the text to which I have replied.
|
| As I have explained, an open-source kernel module, even
| together with closed-source device firmware, is much more
| secure than a closed-source kernel module.
|
| Therefore the truth is that a lot has changed here,
| contrary to the statement to which I have replied, as
| this change makes the OS kernel much more secure.
| stragies wrote:
| Everything you wrote assumes the IOMMUs across the board to
| be 100% correctly implemented without errors/bugdoors.
|
| People used to believe similar things about Hyperthreading,
| glitchability, ME, Cisco, boot-loaders, ... the list goes
| on.
| adrian_b wrote:
| There still is a huge difference between running
| privileged code on the CPU, for which there is nothing
| limiting what it can do, and code that runs on a device,
| which should normally be contained by the I/O MMU, except
| if the I/O MMU is buggy.
|
| The functions of an I/O MMU for checking and filtering
| the transfers are very simple, so the probability of non-
| intentional bugs is extremely small in comparison with
| the other things enumerated by you.
| stragies wrote:
| Agreed, that the feature-set of IOMMU is fairly small,
| but is this function not usually included in one of the
| Chipset ICs, which do run a lot other code/functions
| alongside a (hopefully) faithful correct IOMMU routine?
|
| Which -to my eyes- would increase the possibility of
| other system parts mucking with IOMMU restrictions,
| and/or triggering bugs.
| bradyriddle wrote:
| I remember Nvidia getting hacked pretty bad a few years ago.
| IIRC, the hackers threatened to release everything they had
| unless they open sourced their drivers. Maybe they got what they
| wanted.
|
| [0] https://portswigger.net/daily-swig/nvidia-hackers-
| allegedly-...
| nicce wrote:
| Kernel modules are not user-space drivers which are still
| proprietary.
| porphyra wrote:
| Much of the black magic has been moved from the drivers to
| the firmware anyway.
| bradyriddle wrote:
| Ooops. Missed that part.
|
| Re-reading that story is kind of wild. I don't know how
| valuable what they allegedly got would be (silicon, graphics
| and chipset files) but the hackers accused Nvidia of 'hacking
| back' and encrypting their data.
|
| Reminds me of a story I heard about Nvidia hiring a private
| military to guard their cards after entire shipments started
| getting 'lost' somewhere in asia.
| spookie wrote:
| Wait what? That PMC story got me. Where can I find more
| info on that lmao?
| bradyriddle wrote:
| I'd heard the story first hand from a guy in san jose.
| Never looked it up until now. This is the closest thing I
| could find to it. In which case it sounds like it's been
| debunked.
|
| [0] https://www.pcgamer.com/no-half-a-million-geforce-
| rtx-30-ser...
|
| [1] https://www.geeknetic.es/Noticia/20794/Encuentran-en-
| Corea-5...
| dralley wrote:
| I doubt it. It's probably a matter of constantly being prodded
| by their industry partners (i.e. Red Hat), constantly being
| shamed by the community, and reducing the amount of maintenance
| they need to do to keep their driver stack updated and working
| on new kernels.
|
| The meat of the drivers is still proprietary, this just allows
| them to be loaded without a proprietary kernel module.
| p_l wrote:
| I suspect it's mainly the reduced maintenance and reduction
| of workload needed to support, especially with more platforms
| coming to be supported (not so long ago there was no ARM64
| nvidia support, now they are shipping their own ARM64
| servers!)
|
| What really changed the situation is that Turing architecture
| GPUs bring new, more powerful management CPU, which has
| enough capacity to essentially run the OS-agnostic parts of
| driver that used to be provided as blob on linux.
| knotimpressed wrote:
| Am I correct in reading that as Turing architecture cards
| include a small CPU on the GPU board, running parts of the
| driver/other code?
| p_l wrote:
| In Turing microarchitecture, nVidia replaced their old
| "falcon" cpu with NV-RISCV RV64 chip, running various
| internal tasks.
|
| "Open Drivers" from nVidia include different firmware
| that utilizes the new-found performance.
| matheusmoreira wrote:
| How well isolated is this secondary computer? Do we have
| reason to fear the proprietary software running on it?
| p_l wrote:
| As well isolated as anything else on the bus.
|
| So you better actually use IOMMU
| stragies wrote:
| Ah, yes, the magical IOMMU controller, that everybody
| just assumes to be implemented perfectly across the
| board. I'm expecting this to be like Hyperthreading,
| where we find out 20 years later, that the feature was
| faulty/maybe_bugdoored since inception in many/most/all
| implementations.
|
| Same thing with USB3/TB-controllers, NPUs, etc that
| everybody just expects to be perfectly implemented to
| spec, with flawless firmwares.
| p_l wrote:
| It's not perfect or anything, but it's usually a step
| up[1], and the funniest thing is that GPUs generally had
| less of ... "interesting" compute facilities to jump over
| from, just easier to access _usually_. My first 64 bit
| laptop, my first android smartphone, first few iPhones,
| had more MIPS32le cores with possible DMA access to
| memory than the main CPU cores, and that was just
| counting one component of many (the _wifi chip_ ).
|
| Also, Hyperthreading wasn't itself faulty or "bugdoored".
| The tricks necessary to get high performance out of CPUs
| were, and then there was intel deciding to drop various
| good precautions in name of still higher single core
| performance.
|
| Fortunately, after several years, IOMMU availability
| becomes more common (current laptop I'm writing this on
| has proper separate groups for every device it seems)
|
| [1] There's always the OpenBSD of navel gazing about
| writing "secure" C code, becoming slowly obsolescent
| thanks to being behind in performance and features, and
| ultimately getting pwned because your C focus and not
| implementing "complex" features helping mitigate access
| results in pwnable SMTPd running as root.
| stragies wrote:
| All fine and well, but I always come back to "If I were a
| manufacturer/creator of some work/device/software, that
| does something in the plausible realm of
| 'telecommunication', how do make sure, that my product
| can always comply with
| https://en.wikipedia.org/wiki/Lawful_interception
| requests? Allow for ingress/egress of data/commands at as
| low a level as possible!"
|
| So as a chipset creator company director it would seem
| like a no-brainer to me to have to tell my engineers
| unfortunately to not fix some exploitable bug in the
| IOMMU/Chipset. Unless I want to never sell devices that
| could potentially be used to move citizens internet
| packets around in a large scale deployment.
|
| And implement/not_fix something similar in other layers
| as well, e.g. ME.
| p_l wrote:
| If your product is supposed to comply with Lawful
| Interception, you're going to implement proper LI
| interfaces, not leave bullshit DMA bugs in.
|
| The very point of Lawful Interception involves explicit,
| described interfaces, so that all parties involved can do
| the work.
|
| The systems with LI interfaces also often end up in
| jurisdictions that simultaneously put high penalties on
| giving access to them without specific authorizations - I
| know, I had to sign some really interesting legalese once
| due to working in environment where we had to balance
| both Lawful Interception, post-facto access to data, _and
| telecommunications privacy_ laws.
|
| Leaving backdoors like that is for _Unlawful_
| Interception, and the danger of such approaches is
| greatly exposed in form of Chinese intelligence services
| exploiting NSA backdoor in Juniper routers (infamous
| DRBG_EC RNG)
| matheusmoreira wrote:
| > you better actually use IOMMU
|
| Is this feature commonly present on PC hardware? I've
| only ever read about it in the context of smartphone
| security. I've also read that nvidia doesn't like this
| sort of thing because it allows virtualizing their cards
| which is supposed to be an "enterprise" feature.
| brendank310 wrote:
| Relatively common nowadays. It used to be delineated as a
| feature in Intel chips as part of their vPro line, but I
| think it's baked in. Generally an IOMMU is needed for
| performant PCI passthrough to VMs, and Windows uses it
| for DeviceGuard which tries to prevent DMA attacks.
| p_l wrote:
| Seems to me that Zen 4 has no issues at all, but
| bridges/switches require additional interfaces to further
| fan-out access controls.
| wtallis wrote:
| Mainstream consumer x86 processors have had IOMMU
| capability for over a decade, but for the first few years
| it was commonly disabled on certain parts for product
| segmentation (eg. i5-3570K had overclocking but no IOMMU,
| i5-3570 had IOMMU but limited overclocking). That
| practice died off approximately when Thunderbolt started
| to catch on, because not having an IOMMU when using
| Thunderbolt would have been very bad.
| kabes wrote:
| It's hard to believe one of the highest valued companies in
| the world cares about being shamed for not having open source
| drivers.
| nailer wrote:
| Having products that require a bunch of extra work due to
| proprietary drivers, especially when their competitors
| don't require that work, is not good.
| josefx wrote:
| The biggest chunk of that "extra work" would be
| installing Linux in the first place, given that almost
| everything comes with Windows out of the box. An
| additional "sudo apt install nvidia-drivers" isn't going
| to stop anyone who already got that far.
| sam_bristow wrote:
| Does the "everything comes with Windows out of the box"
| still apply for the servers and workstations where I
| imagine the vast majority of these high-end GPUs are
| going these days?
| nailer wrote:
| Most cloud instances come with Linux out of the box.
| Arch-TK wrote:
| Tainted kernel. Having to sort out secure boot problems
| caused by use of an out of tree module. DKMS. Annoying
| weird issues with different kernel versions and problems
| running the bleeding edge.
| commodoreboxer wrote:
| They care when it affects their bottom line, and customers
| leaving for the competition does that.
|
| I don't know if that's what's happening here, honestly, but
| you're right that they don't care about being shamed, but
| building a reputation of being hard to work with and
| target, especially in a growing market like Linux (still
| tiny, but growing nonetheless, and becoming significantly
| more important in the areas where non-gaming GPU use is
| concerned) can start to erode sales and B2B relationships,
| and the latter particularly if you make the programmers and
| PMs hate using your products.
| bryanlarsen wrote:
| > in a growing market like Linux
|
| Isn't Linux 80% of their market? ML et al is 80% of their
| sales, and ~99% of that is Linux.
| fngjdflmdflg wrote:
| True, although note that the Linux market itself is
| increasing in size due to ML. Maybe "increasingly
| dominant market" is a better phrase here.
| bryanlarsen wrote:
| Hah, good point. The OP was pedantically correct. The
| implication in "growing market share" is that "market
| share" is small, but that's definitely reading between
| the lines!
| lmm wrote:
| Right, and that's where most of their growth is.
| gessha wrote:
| > customers leaving for the competition does that
|
| What competition?
|
| I do agree that companies don't really care for public
| sentiment as long as business is going as usual. Nvidia
| is printing money with their data center hardware [1]
| where half of their yearly revenue comes from.
|
| https://nvidianews.nvidia.com/news/nvidia-announces-
| financia...
| ZeroCool2u wrote:
| I mean I've personally given our Nvidia rep some light
| hearted shit for it. Told him I'd appreciate if he passed
| the feedback up the chain. Can't hurt to provide feedback!
| chillfox wrote:
| Nvidia has historically given zero fucks about the opinions
| of their partners.
|
| So my guess is it's to do with LLMs. They are all in on AI,
| and having more of their code be part of training sets could
| make tools like ChatGPT/Claude/Copilot better at generating
| code for Nvidia GPUs.
| jmorenoamor wrote:
| I also see this as the main reason. GPU drivers for Linux,
| as far as I know, were just a niche use case, maybe CUDA
| planted a small seed, and the AI hype is the flower. Now
| the industry, not the users, demand drivers, so this became
| a demanded feature instead of a niche user wish.
|
| A bit sad, but hey, welcome anyways.
| da_chicken wrote:
| Yup. nVidia wants those fat compute center checks to keep
| coming in. It's an unsaturated market, unlike gaming
| consoles, home gaming PCs, and design/production
| workstations. They got a taste of that blockchain dollar,
| and now AI looks to double down on the demand.
|
| The best solution is to have the industry eat their
| dogfood.
| justinclift wrote:
| For Nvidia, the most likely reason they've strongly avoided
| Open Sourcing their drivers isn't anything like that.
|
| It's simply a function of their history. They _used_ to have
| high priced professional level graphics cards ( "Nvidia
| Quadro") using exactly the same chips as their consumer
| graphics cards.
|
| The BIOS of the cards was different, enabling different
| features. So people wanting those features cheaply would buy
| the consumer graphics cards and flash the matching Quadro BIOS
| to them. Worked perfectly fine.
|
| Nvidia naturally wasn't happy about those "lost sales", so
| began a game of whack-a-mole to stop BIOS flashing from
| working. They did stuff like adding resistors to the boards to
| tell the card whether it was a Geforce or Quadro card, and when
| that was promptly reverse engineered they started getting
| creative in other ways.
|
| Meanwhile, they _couldn 't_ really Open Source their drivers
| because then people could see what the "Geforce vs Quadro"
| software checks were. That would open up software
| countermeasures being developed.
|
| ---
|
| In the most recent few years the professional cards and gaming
| cards now use different chips. So the BIOS tricks are no longer
| relevant.
|
| Which means Nvidia can "safely" Open Source their drivers now,
| and they've begun doing so.
|
| --
|
| Note that this is a copy of my comment from several months ago,
| as it's just as relevant now as it was then:
| https://news.ycombinator.com/item?id=38418278
| 1oooqooq wrote:
| interesting timing to recall that story. now the same trick
| is used for h100 vs whatever the throttled-for-embargo-wink-
| wink Chinese version is called.
|
| but those companies are really adverse to open sourcing
| because they can't be sure they own all the code. it's
| decades of copy pasting reference implementations after all
| rfoo wrote:
| > now the same trick is used for h100 vs whatever the
| throttled-for-embargo-wink-wink Chinese version
|
| No. H20 is a different chip designed to be less compute-
| dense (by having different combinations of SM/L2$/HBM
| controller). It is not a throttled chip.
|
| A800 and H800 are A100/H100 with some area of the chip
| _physically_ blown up and reconfigured. They are also not
| simply throttled.
| 1oooqooq wrote:
| that's what nvidia told everyone in mar 23... but there's
| a reason why h800 were included last minute on the
| embargo in oct 23.
| rfoo wrote:
| That's not what NVIDIA claimed, that's what I have
| personally verified.
|
| > there's a reason why h800 were included last minute
|
| No. Oct 22 restrictions are by itself significantly
| easier than Oct 23 one. NVIDIA _just need_ to kill 4
| NVLink lanes off A100 and you get A800. For H100 you kill
| some more NVLink until on paper NVLink bandwidth is
| roughly at A800 level again and then voila.
|
| BIS is certainly pissed off by NVIDIA's attempt at being
| creative to sell the best possible product to China. So
| they actually lowered allowed compute number AGAIN in Oct
| 23. That's what killed H800.
| SuperNinKenDo wrote:
| Very interesting, thanks for the perspective. I suspect all
| the recent loss of face they experienced with the transition
| to Wayland happening around the time that this motivation
| evaporated also probably plays a part too though.
|
| I swore off ever again buying Nvidia, or any laptops that
| come with Nvidia, after all this. Maybe in 10 years they'll
| have managed to right the brand perceptions of people like
| myself.
| CamperBob2 wrote:
| The explanation could also be as simple as fear of patent
| trolls.
| nicman23 wrote:
| they did release it. a magic drive i have seen, but totally do
| not own, has it
| enoeht wrote:
| didn't they say that many times before?
| vlovich123 wrote:
| Not sure but with the Turing series they support having a
| cryptographically signed binary blob that they load on the GPU.
| So before where their kernel driver was a thin shim for the
| user space driver, now it's a thin shim for the black box
| firmware loaded on the GPU
| p_l wrote:
| the scope of what the kernel interface provides didn't
| change, but what was previously a blob wrapped by source-
| provided "os interface layer" is now moved to run on GSP
| (RISC-V based) inside the GPU.
| creata wrote:
| Huh. Sway and Wayland was such a nightmare on Nvidia that it
| convinced me to switch to AMD. I wonder if it's better now.
|
| (IIRC the main issue was
| https://gitlab.freedesktop.org/xorg/xserver/-/issues/1317 , which
| is now complete.)
| snailmailman wrote:
| Better as of extremely recently. Explicit sync fixes most of
| the issues with flickering that I've had on Wayland. I've been
| using the latest (beta?) driver for a while because of it.
|
| I'm using Hyprland though so explicit sync support isn't
| _entirely_ there for me yet. It's actively being worked on. But
| in the last few months it's gotten a lot better
| JasonSage wrote:
| > Better as of extremely recently.
|
| Yup. Anecdotally, I see a lot of folks trying to run
| wine/games on Wayland reporting flickering issues that are
| gone as of version 555, which is the most recent release save
| for 560 coming out this week. It's a good time to be on the
| bleeding edge.
| hulitu wrote:
| You can always use X11. /s
| bornfreddy wrote:
| I know that was a joke, but - as someone who is still on
| X, what am I missing? Any practical advantages to using
| Wayland when using a single monitor on desktop computer?
| vetinari wrote:
| Even that single monitor can be hidpi, vrr or hdr (this
| one is still wip).
| Arch-TK wrote:
| I have a 165 DPI monitor. This honestly just works with
| far less hassle on X. I don't have to listen to anyone
| try to explain to me how fractional scaling doesn't make
| sense (real explanation for why it wasn't supported). I
| don't have to deal with some silly explanation for why
| XWayland applications just can't be non-blurry with a
| fractional or non-1 scaling factor. I can just set the
| DPI to the value I calculated and things work in 99% of
| cases. In 0.9% of the remaining cases I need to set an
| environment variable or pass a flag to fix a buggy
| application and in the 0.1% of cases I need to make a
| change to the code.
|
| VRR has always worked for me on single monitor X. I use
| it on my gaming computer (so about twice a year).
| asyx wrote:
| I think it's X11 stuff that is using Vulkan for rendering
| that is still flickering in 555. This probably affects
| pretty much all of Proton / Wine gaming.
| doix wrote:
| Any specific examples that you know should be broken? I
| am on X11 with 555 drivers and an nvidia gpu. I don't
| have any flickering when I'm gaming, it's actually why I
| stay on X11 instead of transitioning to wayland.
| johnny22 wrote:
| They are probably talking about running the game in a
| wayland session via xwayland, since wine's wayland driver
| is not part of proton yet.
| Fr0styMatt88 wrote:
| On latest NixOS unstable and KDE + Wayland is still a bit
| of a dumpster fire for me (3070 + latest NV drivers). In
| particular there's a buffer wait bug in EGL that needs
| fixing on the Nvidia side that causes the Plasma UI to
| become unresponsive. Panels are also broken for me, with
| icons not showing.
|
| Having said that, the latest is a pain on X11 right now as
| well, with frequent crashing of Plasma, which atleast
| restarts itself.
|
| There's a lot of bleeding on the bleeding edge right at
| this moment :)
| JasonSage wrote:
| That's interesting, maybe it's hardware-dependent? I'm
| doing nixos + KDE + Wayland and I've had almost no issues
| in day-to-day usage and productivity.
|
| I agree with you that there's a lot of bleeding. Linux is
| nicer than it used to be and there's less fiddling
| required to get to a usable base, but still plenty of
| fiddling as you get into more niche usage, especially
| when it involves any GPU hardware/software. Yet somehow
| one can run Elden Ring on Steam via Proton with a few
| mouse clicks and no issues, which would've been
| inconceivable to me only a few years ago.
| Fr0styMatt88 wrote:
| Yeah it's pretty awesome overall. I think the issues are
| from a few things on my end:
|
| - I've upgraded through a few iterations starting with
| Plasma 6, so my dotfiles might be a bit wonky. I'm not
| using Home Manager so my dotfiles are stateful.
|
| - Could be very particular to my dock setup as I have two
| docks + one of the clock widgets.
|
| - Could be the particular wallpaper I'm using (it's one
| of the dynamic ones that comes with KDE).
|
| - It wouldn't surprise me if it's related to audio
| somehow as I have Bluetooth set-up for when I need it.
|
| I'm sure it'll settle soon enough :)
| postcert wrote:
| I've been having a similar flakiness with plasma on Nixos
| (proprietary + 3070 as well). Sadly can't say whether it
| did{n't} happen on another distro as I last used Arch
| around the v535 driver.
|
| I found it funny how silently it would fail at times.
| After coming out of a game or focusing on something I'd
| scratch my head as to where did the docks/background
| went. I'd say you're lucky in that it recovered itself,
| generally I needed to run `plasmashell` in the alt+f2 run
| prompt.
| joecool1029 wrote:
| It's buggy still with sway on nvidia. I really thought the 555
| driver would wrinkle out last of the issues but it still has
| further to go. Switched to kde plasma 6 on wayland since then
| and it's been great, not buggy at all.
| XorNot wrote:
| Easy Linux use is what keeps me firmly on AMD. This move may
| earn them a customer.
| modzu wrote:
| why switch to amd and not just switch to X? :D
| account42 wrote:
| Why not both?
| whalesalad wrote:
| once you go Wayland you usually don't go back :)
| kiney wrote:
| I tested wayland for a while to see what the hype is about.
| No uoside lits of small workflows broken. Back to Xorg it
| was.
| Animats wrote:
| NVidia revenue is now 78% from "AI" devices.[1] NVidia's market
| cap is now US$2.92 trillion. (Yes, trillion.) Only Apple and
| Microsoft can beat that. Their ROI climbed from about 10% to 90%
| in the last two years. That growth has all been on the AI side.
|
| Open-sourcing graphics drivers may indicate that NVidia is moving
| away from GPUs for graphics. That's not where the money is now.
|
| [1] https://www.visualcapitalist.com/nvidia-revenue-by-
| product-l...
|
| [2] https://www.macrotrends.net/stocks/charts/NVDA/nvidia/roi
| joe_the_user wrote:
| Well, Nvidia seems to be claiming in the article that this is
| everything, not just graphics drivers: _" NVIDIA GPUs share a
| common driver architecture and capability set. The same driver
| for your desktop or laptop runs the world's most advanced AI
| workloads in the cloud. It's been incredibly important to us
| that we get it just right."_
|
| And _For cutting-edge platforms such as NVIDIA Grace Hopper or
| NVIDIA Blackwell, you must use the open-source GPU kernel
| modules. The proprietary drivers are unsupported on these
| platforms._ (These are two most advanced NVIDIA architectures
| currently)
| Animats wrote:
| That's interesting. I've been expecting the AI cards to
| diverge more from the graphics cards. AI doesn't need
| triangle fill, Z-buffering, HDMI out, etc. 16 bit 4x4
| multiply/add units are probably enough. What's going on in
| that area?
| p_l wrote:
| TL;DR - there seems to be not that much improvement from
| dropping the "graphics-only" parts of the chip if you
| already have a GPU instead of breaking into AI market as
| your first product.
|
| 1. nVidia compute dominance is not due to hyperfocus on AI
| (that's Google's TPU for you, or things like intel's NPU in
| Meteor Lake), but because CUDA offers considerable general
| purpose compute. In fact, considerable revenue came and
| still comes from non-AI compute. This also means that if
| you figure out a novel mechanism for AI that isn't based
| around 4x4 matrix addition, or which mixes it with various
| other operations, you can do them inline. This also
| includes any pre and post processing you might want to do
| on the data.
|
| 2. The whole advantage they have in software ecosystem
| builds upon their PTX assembly. Having it compile to CPU
| and only implement the specific variant of one or two
| instructions that map to "tensor cores" would be pretty
| much nonsensical (especially given that AI is not the only
| market they target with tensor cores - DSP for example is
| another).
|
| Additionally, a huge part of why nvidia built such a strong
| ecosystem is that you could take cheapest G80-based card
| and just start learning CUDA. Only some highest-end
| features are limited to most expensive cards, like RDMA and
| NVMe integration.
|
| Compare this with AMD, where for many purposes only the
| most expensive compute-only cards are really supported. Or
| specialized AI only chips that are often programmable
| either in very low-level way or essentially as "set a graph
| of large-scale matrix operations that are limited subset of
| operations exposed by Torch/Tensorflow" (Google TPU, Intel
| Meteor Lake NPU, etc).
|
| 3. CUDA literally began with how evolution of shader model
| led to general purpose "shader processor" instead of
| specialized vector and pixel processors. The space taken by
| specialized hardware for graphics that isn't also usable
| for general purpose compute is pretty minimal, although
| some of it is omitted, AFAIK, in compute only cards.
|
| In fact, some of the "graphics only" things like
| Z-buffering are done by the same logic that is used for
| compute (with limited amount of operations done by fixed-
| function ROP block), and certain fixed-function graphical
| components like texture mapping units are also used for
| high-performance array access.
|
| 4. Simplified manufacturing and logistics - nVidia uses
| essentially the same chips in most compute and graphics
| cards, possibly with minor changes achieved by changing
| chicken bits to route pins to different functions (as you
| mentioned, you don't need DP-outs of RTX4090 on an L40
| card, but you can probably reuse the SERDES units to run
| NVLink on the same pins).
| orbital-decay wrote:
| It indicates nothing; they started it a few years ago, before
| that. They just transferred the most important parts of their
| driver to the (closed source) firmware, to be handled by the
| onboard ARM CPU, and open sourced the rest.
| floam wrote:
| NVIDIA Transitions Fully Towards Open-Source GPU Kernel Modules
|
| or
|
| NVIDIA Transitions Towards Fully Open-Source GPU Kernel Modules?
| j4hdufd8 wrote:
| haven't read it but probably the former
| throwadobe wrote:
| "towards" basically negates the "fully" before it for all
| real intents and purposes
| slashdave wrote:
| Not much point in a "partially" open-source kernel module.
| floam wrote:
| But "fully towards" is pretty ambiguous, like an entire
| partial implementation.
|
| Anyhow I read the article, I think they're saying fully as in
| exclusively, like there eventually will not be both a closed
| source and open source driver co-maintained. So "fully open
| source" does make more sense. The current driver situation IS
| partially open source, because their offerings currently
| include open and closed source drivers and in the future the
| closed source drivers may be deprecated?
| einpoklum wrote:
| See my answer. It's not going to be fully-open-source
| drivers, it's rather that all drivers will have open-source
| kernel modules.
| slashdave wrote:
| You can argue against proprietary firmware, but is this
| all that different from other types of devices?
| einpoklum wrote:
| Other device manufacturers with proprietary drivers don't
| engage in publicity stunts to make it sound like their
| drivers are FOSS or that they embrace FOSS (or just OSS).
| pluto_modadic wrote:
| damn, only for new GPUs.
| mynameisvlad wrote:
| For varying definitions of "new". It supports Turing and up,
| which was released in 2018 with the 20xx line. That's two
| generations back at this point.
| sillywalk wrote:
| From the github repo[0]:
|
| Most of NVIDIA's kernel modules are split into two components:
| An "OS-agnostic" component: this is the component of each kernel
| module that is independent of operating system. A
| "kernel interface layer": this is the component of each kernel
| module that is specific to the Linux kernel version and
| configuration.
|
| When packaged in the NVIDIA .run installation package, the OS-
| agnostic component is provided as a binary:
|
| [0] https://github.com/NVIDIA/open-gpu-kernel-modules
| p_l wrote:
| That was the "classic" drivers.
|
| The new open source ones effectively move majority of the OS-
| agnostic component to run as blob on-GPU.
| arghwhat wrote:
| Not quite - it moves some logic to the GSP firmware, but the
| user-space driver is still a significant portion of code.
|
| The exciting bits there is the work on NVK.
| p_l wrote:
| Yes, I was not including userspace driver in this, as a bit
| "out of scope" for the conversation :D
| benjiweber wrote:
| I wonder if we'll ever get hdcp on nvidia. As much as I enjoy
| 480p video from streaming services.
| viraptor wrote:
| Which service goes that low? The ones I know limit you from
| using 4k, but anything up to 1080p works fine.
| 9991 wrote:
| Nonsense that a 1080p limit is acceptable for (and accepted
| by) paying customers.
| viraptor wrote:
| Depends. I disagree with HDCP in theory on ideological
| grounds. In practice, my main movie device is below 720p
| (projector), so it will take another decade before it
| affects me in any way.
| ozgrakkurt wrote:
| Just download it to your pc. It is better user experience and
| costs less
| smcleod wrote:
| So does this mean actually getting rid of the binary blobs of
| microcode that are in their current 'open' drivers?
| p_l wrote:
| No, it means the blob from the "closed" drivers is moved to run
| on GSP.
| risho wrote:
| does this mean you will be able to use NVK/Mesa and CUDA at the
| same time? The non mesa proprietary side of nvidia's linux
| drivers are such a mess and NVK is improving by the day, but I
| really need cuda.
| john2x wrote:
| Maybe that's one way to retain engineers who are effectively
| millionaires.
| magicloop wrote:
| Remember that time when Linus looked at the camera and gave
| Nvidia the finger. Has that time now passed? Is it time to
| reconcile? Or are there still some gotchas?
| jaimex2 wrote:
| These are kernel modules not the actual drivers. So the finger
| remains up.
| jcalvinowens wrote:
| Throwing the tarball over the wall and saying "fetch!" is
| meaningless to me. Until they actually contribute a driver to the
| upstream kernel, I'll be buying AMD.
| aseipp wrote:
| You can just use Nouveau and NVK for that if you just need
| workstation graphics (and the open-gpu-modules source
| code/separate GSP release has been a big uplift to Nouveau too,
| at least.)
| jcalvinowens wrote:
| Nouveau is great, and I absolutely admire what the community
| around it has been able to achieve. But I can't imagine
| choosing that over AMD's first class upstream driver support
| today.
| xyst wrote:
| Nvidia has finally realize they couldn't write drivers for their
| own hardware, especially for Linux.
|
| Never thought I would see the day.
| TeMPOraL wrote:
| Suddenly they went from powering gaming to being the winners of
| the AI revolution; AI is Serious Cloud Stuff, and Serious Cloud
| Stuff means Linux, so...
| shmerl wrote:
| That's not upstream yet. But they supposedly showed some
| interesting in nova too.
| asaiacai wrote:
| I really hope this makes it easier to install/upgrade NVIDIA
| drivers on Linux. It's a nightmare to figure out version
| mismatches between drivers, utils, container-runtime...
| einpoklum wrote:
| From my limited experience with their open-sourcing of kernel
| modules so far: It doesn't make things easier; but - the silver
| lining is that, for the most part, it doesn't make installation
| and configuration harder! Which is no small thing actually.
| riddley wrote:
| A nightmare how? When i used their cards, I'd just download the
| .run and run it. Done.
| jaimex2 wrote:
| After a reboot of coarse :)
|
| Everything breaks immediately otherwise.
| amelius wrote:
| And when it doesn't work, what do you do then?
|
| Exactly, that's when the nightmare starts.
| einpoklum wrote:
| The title of this statement is misleading:
|
| NVIDIA is not transitioning to open-source drivers for its GPUs;
| most or all user-space parts of the drivers (and most importantly
| for me, libcuda.so) are closed-source; and as I understand from
| others, most of the logic is now in a binary blob that gets sent
| to the GPU.
|
| Now, I'm sure this open-sourcing has its uses, but for people who
| want to do something like a different hardware backend for CUDA
| with the same API, or to clear up "corners" of the API semantics,
| or to write things in a different-language without going through
| the C API - this does not help us.
| qalmakka wrote:
| Well, it is something, even if it's still only the kernel module,
| and it will be probably never upstreamed anyway.
| CivBase wrote:
| Too late for me. I tried switching to Linux years ago but failed
| because of the awful state of NVIDIA's drivers. Switched to AMD
| least year and it's been a breeze ever since.
|
| Gaming on Linux with an NVIDIA card (especially an old one) is
| awful. Of course Linux gamers aren't the demographic driving this
| recent change of heart so I expect it to stay awful for a while
| yet.
| doctoboggan wrote:
| My guess is Meta and/or Amazon told Nvidia that they would
| contribute considerable resources to development as long as the
| results were open source. Both companies bottom lines would
| benefit from improved kernel modules, and like another commenter
| said elsewhere, Nvidia doesn't have much to lose.
| Narhem wrote:
| I cant wait to use linux without having to spend multiple
| weekends trying to get the right drivers to work.
| aussieguy1234 wrote:
| I'll update as soon at its in NixOS unstable. Hopefully this will
| change the mind of the sway maintainers to start supporting
| Nvidia cards, I'm using i3 and X but would like to try out
| Wayland.
| jdonaldson wrote:
| It's kind of surprising that these haven't just been reverse
| engineered yet by language models.
| special-K wrote:
| That's simply not how LLMs work, and are actually awful at
| reverse engineering of any kind.
| jdonaldson wrote:
| Are you saying that they cant explain the contents of machine
| code in human readable format? Are you saying that they can't
| be used in a system that iteratively evaluates combinations
| of inputs and check their results?
| matheusmoreira wrote:
| Transition is not done until their drivers are upstreamed into
| the mainline kernel and ALL features work out of the box,
| especially power management and hybrid graphics.
| rldjbpin wrote:
| mind the wording they've used here - "fully towards open-source"
| and not "towards fully open-source".
|
| big difference. almost nobody is going to give you the sauce
| hidden behind blobs. but i hope the dumb issues of the past
| (imagine using it on laptops with switchable graphics) go away
| slowly with this and it is not only for pleasing the enterprise
| crowd.
| n3storm wrote:
| I read "NVIDIA transitions fully Torvalds..."
| nikolayasdf123 wrote:
| hope linux gets first class open source gpu drivers.. and dare I
| hope that Go adds native support for GPUs too
| gigatexal wrote:
| will this mean that we'll be able to remove the arbitrary
| distinctions between quadro and geforce cards maybe by hacking
| some configs or such in the drivers?
| Varloom wrote:
| They know CUDA monopoly won't last forever.
| aseipp wrote:
| CUDA lives in userspace; this kernel driver release does not
| contain any of that. It's still very useful to release an open
| source DKMS driver, but this doesn't change anything at all
| about the CUDA situation.
| muhehe wrote:
| What is GPU kernel module? Is it something like a driver for GPU?
| qalmakka wrote:
| Yes. In modern operating systems, GPU drivers usually consist
| in kernel component that is loaded inside of the kernel or in a
| privileged context, and a userspace component that talks with
| it and implements the GPU-specific part of the APIs that the
| windowing system and applications use. In the case of NVIDIA,
| they have decided to drop their proprietary kernel module in
| favour of an open one. Unfortunately, it's out of tree.
|
| In Linux and BSD, you usually get all of your drivers with the
| system; you don't have to install anything, it's all mostly
| plug and play. For instance, this has been the case for AMD and
| Intel GPUs, which have a 100% open source stack. NVIDIA is
| particularly annoying due to the need to install the drivers
| separately and the fact they've got different implementations
| of things compared to anyone else, so NVIDIA users are often
| left behind by FOSS projects due to GeForce cards being more
| annoying to work with.
| muhehe wrote:
| Thanks. I'm not well versed in these things. It sounded like
| something you load into GPU (it reminded me old hp printer,
| which required firmware upload after start).
| nicman23 wrote:
| they are worthless. the main code is in the userspace
| shanoaice wrote:
| There is little meaning for NVIDIA to open-source only the driver
| portion of their cards, since they heavily rely on proprietary
| firmware and userspace lib (most important!) to do the real job.
| Firmware is a relatively small issue - this is mostly same for
| AMD and Intel, since encapsulation reduces work done on driver
| side and open-sourcing firmware could allow people to do some
| really unanticipated modification which might heavily threaten
| even commercial card sale. Nonetheless at least for AMD they
| still keep a fair share of work done by driver compared to
| Nvidia. Userspace library is the worst problem, since they handle
| a lot of GPU control related functionality and graphics API,
| which is still kept closed-source.
|
| The best thing we can hope is improvement on NVK and RedHat's
| Nova Driver can put pressure on NVIDIA releasing their user space
| components.
| gpderetta wrote:
| It is meaningful because, as you note, it enables a fully
| opensource userspace driver. Of course the firmware is still
| proprietary and it increasingly contains more and more logic.
| sscarduzio wrote:
| Which in a way is good because the hardware will more and
| more perform identically on Linux as on Windows.
| pabs3 wrote:
| The firmware is also signed, so you can't even do reverse
| engineering to replace it.
| matheusmoreira wrote:
| Doesn't seem like a bad tradeoff so long as the proprietary
| stuff is kept completely isolated with no access to any other
| parts of my system.
| justinclift wrote:
| Personally, I somewhat wonder about that. The firmware
| (proprietary) which runs on the gpu seems like it'll have
| access to do things over the gpu PCIe bus, including read
| system memory, and access other devices (including network
| gear). Reading memory of remote hosts (ie RDMA) is also a
| thing which Nvidia gpus can do.
| foresto wrote:
| Is that not solvable using an IOMMU (assuming hardware
| that has one)?
| justinclift wrote:
| No idea personally. :)
| bayindirh wrote:
| The GLX libraries are the elephant(s) in the room. Open
| source kernel modules mean nothing without these libraries.
| On the other hand AMD and Intel uses "pltform GLX" natively,
| and with great success.
| paulmd wrote:
| the open kernel driver also fundamentally breaks the
| limitation about geforce gpus not being licensed for use in
| the datacenter. that provision is a _driver provision_ and
| CUDA does not follow the same license as the driver... really
| the only significant limitation is that you aren 't allowed
| to use the CUDA toolkit to develop for non-NVIDIA hardware,
| and some license notice requirements if you redistribute the
| sample projects or other sample sourcecode. and yeah they
| paid to develop it, it's proprietary source code, that's
| reasonable overall.
|
| https://docs.nvidia.com/cuda/eula/index.html
|
| ctrl-f "datacenter": none
|
| so yeah, I'm not sure where the assertion of "no progress"
| and "nothing meaningful" and "this changes nothing" come
| from, other than pure fanboyism/anti-fans. before you
| couldn't write a libre CUDA userland even if you wanted to -
| the kernel side wasn't there. And now you can, and this
| allows retiming and clock-up of supported gpus even with
| nouveau-style libre userlands. Which of course don't grow on
| trees, but it's still progress.
|
| honestly it's kinda embarrassing that grown-ass adults are
| still getting their positions from what is functionally just
| some sick burn in a 2004 viral video or whatever, to the
| extent they actively oppose the company moving in the
| direction of libre software _at all_. but I think with the
| "linus torvalds" citers, you just can't reason those people
| out of a position that they didn't reason themselves into.
| Not only is it an emotionally-driven (and fanboy-driven)
| mindset, but it's literally not even their own position to
| begin with, it's just something they're absorbing from
| youtube via osmosis.
|
| Apple debates and NVIDIA debates always come down to the
| anti-fans bringing down the discourse. It's honestly sad.
| https://paulgraham.com/fh.html
|
| it also generally speaks to the long-term success and
| intellectual victory of the GPL/FSF that people see
| proprietary software as somehow inherently bad and
| illegitimate... even when source is available, in some cases.
| Like CUDA's toolchain and libraries/ecosystem is pretty much
| the ideal example of a company paying to develop a solution
| that would not otherwise have been developed, in a market
| that was (at the time) not really interested until NVIDIA
| went ahead and proved the value. You don't get to ret-con
| every single successful software project as being
| retroactively open-source just because you really really want
| to run it on a competitor's hardware. But people now have
| this mindset that if it's not libre then it's somehow
| illegitimate.
|
| Again, most CUDA stuff is distributed as source, if you want
| to modify and extend it you can do so, subject to the terms
| of the CUDA license... and that's not good enough either.
| matheusmoreira wrote:
| Why is the user space component required? Won't they provide
| sysfs interfaces to control the hardware?
| cesarb wrote:
| It's something common to all modern GPUs, not just NVIDIA:
| most of the logic is in a user space library loaded by the
| OpenGL or Vulkan loader into each program. That library
| writes a stream of commands into a buffer (plus all the
| necessary data) directly into memory accessible to the GPU,
| and there's a single system call at the end to ask the
| operating system kernel to tell the GPU to start reading from
| that command buffer. That is, other than memory allocation
| and a few other privileged operations, the user space
| programs talk directly to the GPU.
| AshamedCaptain wrote:
| I really don't know where this crap about "Moving everything to
| the firmware" is coming from. The kernel part of the nvidia
| driver has always been small, and this is the only thing they
| are open-sourcing (they have been announcing it for months
| now......). The immense majority of the user-space driver is
| still closed and no one has seen any indications that this may
| change.
|
| I see no indications either that either nvidia nor any of the
| rest of the manufacturers has moved any respectable amount of
| functionality to the firmware. If you look at the opensource
| drivers you can even confirm by yourself that the firmware does
| practically nothing -- the size of the binary blobs of AMD
| cards are minuscule for example, and long are the times of
| ATOMBIOS. The drivers are literally generating bytecode-level
| binaries for the shader units in the GPU, what do you expect
| the firmware could even do at this point? Re-optimize the
| compiler output?
|
| There was an example of a GPU that did move everything to the
| firmware -- the videocore on the raspberry pi, and it was
| clearly a completely distinct paradigm, as the "driver" would
| almost literally pass through OpenGL calls to a mailbox, read
| by the secondary ARM core (more powerful than the main ARM
| core!) that was basically running the actual driver as
| "firmware". Nothing I see on nvidia indicates a similar trend,
| otherwise RE-ing it would be trivial, as happened with the VC.
| ploxiln wrote:
| https://lwn.net/Articles/953144/
|
| > Recently, though, the company has rearchitected its
| products, adding a large RISC-V processor (the GPU system
| processor, or GSP) and moving much of the functionality once
| handled by drivers into the GSP firmware. The company allows
| that firmware to be used by Linux and shipped by
| distributors. This arrangement brings a number of advantages;
| for example, it is now possible for the kernel to do
| reclocking of NVIDIA GPUs, running them at full speed just
| like the proprietary drivers can. It is, he said, a big
| improvement over the Nouveau-only firmware that was provided
| previously.
|
| > There are a number of disadvantages too, though. The
| firmware provides no stable ABI, and a lot of the calls it
| provides are not documented. The firmware files themselves
| are large, in the range of 20-30MB, and two of them are
| required for any given device. That significantly bloats a
| system's /boot directory and initramfs image (which must
| provide every version of the firmware that the kernel might
| need), and forces the Nouveau developers to be strict and
| careful about picking up firmware updates.
| cpgxiii wrote:
| These aren't necessarily conflicting assessments. The
| addition of the GSP to Turing and later GPUs does mean that
| some behavior can be moved on-device from the drivers.
| Device initialization and management is an _important_
| piece of behavior, certainly, but in the context of the all
| work done by the Nvidia driver (both kernel and user-
| space), it is a relatively tiny portion (e.g. compiling
| /optimizing shaders and kernels, video encode/decode, etc).
| noch wrote:
| >> I see no indications either that either nvidia nor any
| of the rest of the manufacturers has moved any respectable
| amount of functionality to the firmware.
|
| Someone who believes this could easily prove that they are
| correct by "simply" taking their 4090 and documenting all
| its functionality, as was done with the [7900
| xtx](https://github.com/geohot/7900xtx).
|
| You can't say "I see no indications/evidence" unless you
| have proven that there is no evidence, no?
| paulmd wrote:
| so basically "if you really think there's no proof of a
| positive claim, then you won't mind conclusively proving
| the negation"?
|
| no, that's not how either logical propositions or burden
| of proof works
| v3ss0n wrote:
| Thank You Nvidia hacker! You did it! The Lapasu$ team threaten a
| few years back that if nvidia is not going to release nvidia
| opensource they are gonna release their code. That lead nvidia to
| releasing first kernel opensource module in a few months later
| but it was quite incomplete. Now it seems they are opensourcing
| fully more.
| sylware wrote:
| Hopefully, we get a plain and simple C99 user space vulkan
| implementation.
| exabrial wrote:
| Are Nvidia grace CPUs even available? I thought it was
| interesting they mentioned that.
| resource_waste wrote:
| This means Fedora can bundle it?
| gorkish wrote:
| This is great. I've been having to build my own .debs of the OSS
| driver for some time because of the crapola NVIDIA puts in their
| proprietary driver that prevents it from working in a VM as a
| passthrough device. (just a regular whole-card passthru, not
| trying to use GRID/vGPU on a consumer card or anything)
|
| NVIDIA can no longer get away with that nonsense when they have
| to show their code.
| K33P4D wrote:
| Does this mean we can aggressively volt mod, add/replace memory
| modules to our liking?
___________________________________________________________________
(page generated 2024-07-18 23:07 UTC)