hngopher.com

       [HN Gopher] AMD Open-Source GPU Kernel Driver Above 5M Lines, En...
       ___________________________________________________________________
        
       AMD Open-Source GPU Kernel Driver Above 5M Lines, Entire Linux
       Kernel at 34.8M
        
       Author : TangerineDream
       Score  : 114 points
       Date   : 2023-08-31 16:47 UTC (6 hours ago)
        
 (HTM) web link (www.phoronix.com)
 (TXT) w3m dump (www.phoronix.com)
        
       | pixelesque wrote:
       | It's not completely clear from the article, but: are the files
       | generated 'on-the-fly' during the build process (and therefore
       | not in git), or generated once (by AMD), and then committed?
        
         | [deleted]
        
         | mpreda wrote:
         | Pre-generated by AMD and committed, I assume.
         | 
         | If they were generated as part of the build, they would not be
         | counted as SLOC (not being "source").
        
         | rjsw wrote:
         | Not read the article but the files in the Linux tree have been
         | generated once by AMD.
        
       | tgsovlerkhgsel wrote:
       | ... and it doesn't work right. When you start googling for your
       | syslog entries you find countless reports spanning many kernel
       | versions of identical looking crashes, likely with different root
       | causes since all the message basically says is "the GPU hung".
        
       | shmerl wrote:
       | This could be expressed in binary format using way less space,
       | but expressing it in code / text I suppose make it more suitable
       | to call it a source.
        
       | 1letterunixname wrote:
       | Corporations don't incentivize good engineering, they incentivize
       | functionality at any cost. This leads to giant codebases, over-
       | engineering, bad engineering, fragility, unmaintainable, useless
       | code, and duplication. The FOSS/FLOSS community must push back
       | against the hot mess turds corporations want to dump into their
       | source.
        
       | Kab1r wrote:
       | I would much rather have a large amount of in-tree driver source
       | over a small driver with a "large" firmware binary.
        
       | Thaxll wrote:
       | Linux kernel is not made of 34M loc, most of it is drivers which
       | I hardly consider kernel code.
        
       | agloe_dreams wrote:
       | I'm not sure I get why the comparison to the Kernel is needed.
       | GPUs are wildly complex. Rendering is wildly complex. Managing
       | memory and data is complex. Managing connected hardware is
       | complex. I am not sure why anyone would expect a GPU Driver to be
       | small while also doing a billion things and playing games as well
       | as mature gaming platforms.
        
         | sylware wrote:
         | It is said nvidia hardware programming interface is much more
         | simple than AMD one.
         | 
         | If true, AMD is doing something wrong here. And yes, giga tons
         | of generated headers related to registers.
        
         | harry8 wrote:
         | If you're not intimately familiar with GPU drivers and what
         | goes on this gives you a very quick, back-of-the-envelope of
         | the size and complexity of the work involved. 1/7th the size
         | and complexity of the kernel for this one driver.
         | 
         | I raised an eyebrow but I have only the vaguest notion of how
         | the hardware works and what a driver might have to manage.
        
           | dralley wrote:
           | As the article pointed out, the vast majority of the lines of
           | code in the driver are autogenerated header files for things
           | like defining hardware registers. There's not much complexity
           | or logic in that type of code.
           | 
           | Probably if AMD wanted to spend the time, they could compress
           | it down to a fraction of it's current size.
        
             | jmole wrote:
             | Right, if you have 7 different architectures, each with
             | it's own register map, and then model-specific tweaks,
             | you're going to have a ton of code like that.
        
               | undersuit wrote:
               | We could just compile it into a proprietary blob like
               | Nvidia! /s
        
               | scns wrote:
               | > if you have 7 different architectures
               | 
               | GPU or CPU? If talking about the latter only two [four]
               | should count (ARM & x86 [+ [* 2 64BitVersion]]. If you
               | meant the former forget my comment.
        
               | benlwalker wrote:
               | Is it really that much code? I don't know GPU hardware,
               | but the NVMe spec header file in SPDK is around 4k
               | lines[0]. If there's 7 of them and they're twice as
               | complicated each, we're still well under 100k from
               | register map headers. I didn't actually look through
               | Linux to see how big they are, so maybe it is that much
               | more complex.
               | 
               | 0: https://github.com/spdk/spdk/blob/master/include/spdk/
               | nvme_s...
        
           | tester756 wrote:
           | >If you're not intimately familiar with GPU drivers and what
           | goes on this gives you a very quick, back-of-the-envelope of
           | the size and complexity of the work involved. 1/7th the size
           | and complexity of the kernel for this one driver.
           | 
           | ehh, no.
           | 
           | almost all of this are header files
           | 
           | >Meanwhile the open-source NVIDIA "Nouveau" driver is around
           | 201k (21.7k blank lines, 24.3k lines of comments, and 155k
           | lines of code). Or the Intel i915 DRM kernel graphics driver
           | is around 381k lines via the same cloc judgment.
           | 
           | so it seems like GPU driver is around 1% of kernel's code
           | 
           | and you start thinking why actually kernel has this much code
           | if GPU (out of all software) needs just around 1%.
        
             | nvm0n2 wrote:
             | The NVIDIA proprietary driver is about the same size
             | compiled as the Linux kernel, iirc.
             | 
             | The reason is, GPU drivers are basically complete operating
             | systems, just for the secondary computer we call the GPU
             | instead of the CPU.
        
             | boppo1 wrote:
             | Nouveau barely works iirc
        
       | 1-6 wrote:
       | "Of course, much of that is auto-generated header files... A
       | large portion of it with AMD continuing to introduce new auto-
       | generated header files with each new generation/version of a
       | given block. These verbose header files has been AMD's
       | alternative to creating exhaustive public documentation on their
       | GPUs that they were once known for."
       | 
       | So what's the point of saying that it's large?
        
         | dijit wrote:
         | I read it as a bit of a negative situation. So the reason for
         | mentioning it is to shame AMD into doing a more correct or sane
         | thing instead of spewing out enormous amounts of what is
         | basically repetitive noise.
         | 
         | Pointing out that enormity is important because source files
         | need to be stored; interpreted, versioned and parsed by
         | humans/IDE's. It has an externalised cost (but, then again,
         | isn't capitalism all _about_ externalising costs?)
        
         | throwaway193439 wrote:
         | Because it's large and large is difficult to maintain.
         | 
         | AMD maintains it but do we know how they are generated?
         | Probably not.
         | 
         | It's like a gift that stinks but you can't complain about
         | because it's a gift.
        
           | FirmwareBurner wrote:
           | _> but do we know how they are generated? Probably not_
           | 
           | Having worked in the semi industry, I can fathom a guess:
           | It's a spaghetti mess of cascading Perl scripts that parse
           | the Verilog/VHDL design files, with their development going
           | back 20+ years, full of comments like "don't touch this line
           | because it breaks another line, nobody knows why", and
           | maintained by a team where a gray-beard "Gandalf" engineer
           | wearing an ATI t-shirt, has most of the deep-down low-level
           | knowledge on how to un-fuck them whenever they get fucked,
           | pardon my french.
        
             | trollied wrote:
             | There's bound to be some tcl in there too...
        
             | baq wrote:
             | Having too worked in the semi industry, this is spot on
        
               | sshine wrote:
               | I haven't worked in the semi industry, but I've worked
               | with EE's and Perl programmers, and they do love that
               | undocumented lore. And the universe does reward you with
               | a grey beard after enough Perl.
        
             | trws wrote:
             | I have not seen these scripts, but can confirm that AMD has
             | a long history of such Perl scripts. Look at hipcc for a
             | current, moderately frustrating, example of this. Also the
             | last time I met one of the open source driver team in
             | person he was, in fact, wearing a classic ATI red ATI
             | t-shirt straight in from Markham. Much of that team is
             | European now though from what I hear, and they're generally
             | a good bunch.
        
               | FirmwareBurner wrote:
               | _> ATI t-shirt straight in from Markham._
               | 
               | Curious how much of AMD Radeon GPU development now is
               | being done in Markham-Canada, as AFAIK, the modern Radeon
               | architecture stems from ATI's acquisition of ArtX[1], a
               | US-based spin-ff of SGI, which was responsible for the
               | GPUs in the Nintendo GameCube, Wii and many other
               | innovations like programable shaders, later found in
               | ATI/AMD GPUs.
               | 
               |  _> Much of that team is European now though from what I
               | hear, and they're generally a good bunch._
               | 
               | I didn't know AMD has a GPU design team in Europe. Where?
               | I know they had a fab in Germany and they have an office
               | for the Ryzen and Infinity Fabric R&D in Romania, but I
               | had no idea they do GPU stuff as well in Europe. Where is
               | that office?
               | 
               | [1] https://en.wikipedia.org/wiki/ArtX
        
               | ahartmetz wrote:
               | >I didn't know AMD has a GPU design team in Europe
               | 
               | AFAIK they don't, but the Linux driver guys seem to be
               | mostly German and Polish and such. And yeah, they are
               | doing good work. I half-expect AMD to reboot their
               | Windows driver from the Linux driver code base at some
               | point.
        
               | [deleted]
        
           | SXX wrote:
           | > AMD maintains it but do we know how they are generated?
           | Probably not.
           | 
           | Basically those files are generated from AMD GPU register
           | data files where majorify of registers are documented, but
           | there of course bunch of magic numbers as well probably
           | because they belong to HDCP or other cases where
           | documentation only available under NDA.
           | 
           | There been a number of leaks of AMD internal documentation so
           | anyone who is into GPU drivers can really find a lot of
           | information on their GPU internal workings.
           | 
           | I've archieved some of it many years ago and it's was never
           | DMCA*ed:
           | 
           | https://github.com/ArseniyShestakov/rai-bonaire
           | 
           | Source was a talk on CCC.
        
       | mrweasel wrote:
       | So while the AMD driver is open source, the community is
       | basically excluded from contributing?
       | 
       | Should someone decided that they'd start working through the
       | code, removing duplicate code and clean up headers, functions and
       | abstraction, they work would either be rejected, or undone with
       | the next AMD code dump?
        
         | kube-system wrote:
         | A lot of open source projects work that way. Open source means
         | you get access to the source and get to make changes for your
         | own use. It doesn't mean you get to force anyone else to merge
         | your code.
        
           | mrweasel wrote:
           | > It doesn't mean you get to force anyone else to merge your
           | code.
           | 
           | Sure, you can fork the code if you really feel that strongly
           | about it. My main "issue" is that it basically removes one of
           | the big benefits of open source, that we can collaborate and
           | do better as a collective. If it's just a big code dump that
           | other kernel developers can't really touch it's more "source
           | code is available" than actual open source.
        
             | acrispino wrote:
             | It's not open source unless you can have it your way?
             | That's too picky, for me
        
             | elteto wrote:
             | The fundamental premise of open source is full access to
             | the source with the possibility to make changes and
             | redistribute those changes [0]. Anything else, including
             | collaboration to improve the code, is a nice cherry on top
             | but not consequential to the concept of open source.
             | 
             | [0] https://opensource.org/osd/
        
             | kube-system wrote:
             | Open source is a licensing model, not a community
             | organization model. Collaboration is not a benefit of open
             | source, it's a benefit of collaboration software and a
             | group of people who welcome collaboration. Almost all of
             | the people who like collaborating on software use open
             | source licensing. But there are _plenty_ of people who use
             | open source licensing who are not interested in
             | collaborating. For example, it is very normal for projects
             | maintained by someone with a narrow focus, or projects with
             | limited or formally organized resources to not accept PRs.
             | 
             | When you send someone a PR, you are demanding that they do
             | work for you to review and merge. Open source licensing
             | does not mandate that they do this. Heck, most open source
             | licenses even disclaim warranty to avoid obligating the
             | authors of even doing work that _the law_ would otherwise
             | require them to do. Now yes, some people will help you with
             | problems. This is because they 're nice, not because it's
             | open source.
        
       | guardiangod wrote:
       | I am working on the kernel right now, the code is very pleasant
       | (as far as C code goes) to work with.
       | 
       | Whereas I worked on Chrome's V8 C++ code for a year and I still
       | could not say I understand more than half of it. Its complexity
       | is a factor more than the Linux kernel.
        
       | tiffanyh wrote:
       | As a comparison:                 FreeBSD: ~9M loc       NetBSD:
       | ~7M loc       OpenBSD: ~3M loc
       | 
       | And this _includes_ the base userland (not just kernel)
       | 
       | https://www.csoonline.com/article/564373/is-the-bsd-os-dying...
        
         | rjsw wrote:
         | NetBSD currently contains an older version of this driver, from
         | Linux 5.6. Checking just now it comes to 2.2M loc. Running the
         | same test on the Linux 6.4 source tree, does give me the
         | reported 5M loc.
         | 
         | Maybe the figures you quote exclude things imported from
         | elsewhere like gcc and llvm, I get a figure of 75M loc for base
         | + kernel of NetBSD-10.
        
       | somat wrote:
       | The openbsd situation is even worse. over there the driver is
       | bigger than the rest of the kernel.
       | 
       | Don't get me wrong I use the driver every day and AMD is
       | definitely one of the good guys for making an open source driver
       | and them who ported it are absolute heros. However.... Sometimes
       | I wish AMD had tied down the isa to their cards a little better.
       | Narrowed the interface if you would. because as it is the driver
       | is so big because there is this combinatorial explosion of
       | generated header files.
       | 
       | https://flak.tedunangst.com/post/watc
        
         | sroussey wrote:
         | There is no business reason to restrict themselves on the ISA,
         | and it would be make their hardware less performant compared to
         | the competition which would not be so bound.
        
           | dogma1138 wrote:
           | The competition is very much bound to a rather narrow ISA
           | which is why CUDA is forward and backwards compatible whilst
           | ROCm isn't.
           | 
           | ROCm will be pointless until at least forward compatibility
           | will be guaranteed by design.
        
             | nimish wrote:
             | CUDA is subsequently compiled to the hardware assembly at
             | runtime isn't it? Like precompiled shaders.
             | 
             | C++ -> PTX -> Hardware ISA
        
         | Laaas wrote:
         | The issue is that the generated code is checked in. Surely
         | there's a better solution.
        
           | tux3 wrote:
           | They _could_ in theory post the (no doubt) Perl scripts that
           | generate those headers from the HDL along with the relevant
           | source files, but I imagine that would be a _very_ hard sell.
           | And probably not much more helpful to the kernel, as no one
           | reads those headers anyways, and the compile time will not
           | improve by shuffling where the generation step happens.
           | 
           | It may be more practical to rework the scripts to try to find
           | ways to reduce the verbosity and redundancy. The actual .c
           | driver code probably doesn't need every copy of every lines
           | in all those .h files.
        
             | alfalfasprout wrote:
             | As long as it's deterministic there should be no issue
             | checking in the generators right?
        
               | tux3 wrote:
               | The generators themselves probably not, but the
               | definition of all those registers is from hardware. These
               | kind of code generators convert input source files that
               | describe the hardware into C header files for the
               | software.
               | 
               | But I expect AMD would be skittish about open-sourcing
               | anything that could even remotely be construed as HDL,
               | even if it's just dry lists of registers. Open sourcing
               | the drivers is one thing, but the hardware itself is
               | another.
        
       | trevithick wrote:
       | Does a graphical representation of the files in the Linux kernel
       | exist anywhere? Like a graphical file explorer but for the
       | different kernel components.
        
         | fooker wrote:
         | Yeah the files are organized into a directory hierarchy, pretty
         | cool tech! :-)
         | 
         | And there are great tools for exploring directories of files,
         | my current favorite is dolphin with two or three panes.
        
         | Sjonny wrote:
         | you can run windirstat (or similar tool) on a checkout to get
         | an idea
        
           | trevithick wrote:
           | Yeah, I guess this is the answer. When I posted the question
           | I had this[1] in mind, and was thinking of something like
           | that with simplified labels maybe. But I guess the file
           | structure is so organized it would explain itself to anyone
           | interested in this kind of thing.
           | 
           | [1] https://upload.wikimedia.org/wikipedia/commons/d/d5/GNOME
           | _Di...
        
         | pavon wrote:
         | Wikipedia has a graph showing high-level breakdown of the
         | kernel tree, and the size of the components[1]
         | 
         | [1]
         | https://en.wikipedia.org/wiki/Linux_kernel#/media/File:Sanke...
        
       | Osiris wrote:
       | Why are GPU drivers baked into the kernel?
       | 
       | Wouldn't it be better to load them in such a way that a crash in
       | the GPU driver can be recovered from as opposed to crashing the
       | whole system?
       | 
       | Other operating systems load the GPUs drivers separately.
        
         | bendhoefs wrote:
         | Why should having the GPU drivers checked into the same
         | repository mean that they can't be loaded and unloaded
         | dynamically?
        
       | amelius wrote:
       | How much of it is generated code?
        
       ___________________________________________________________________
       (page generated 2023-08-31 23:01 UTC)