[HN Gopher] Open-source drivers according to Habana
       ___________________________________________________________________
        
       Open-source drivers according to Habana
        
       Author : Aissen
       Score  : 73 points
       Date   : 2022-03-31 14:42 UTC (8 hours ago)
        
 (HTM) web link (threedots.ovh)
 (TXT) w3m dump (threedots.ovh)
        
       | naoqj wrote:
       | ...and then they'll complain that corporations would rather
       | maintain their own forks of the kernel.
        
         | Aissen wrote:
         | ...and their customers will simply not buy their product if it
         | needs a custom kernel.
        
       | yboris wrote:
       | Do I understand this right? It seems like Habana has some super-
       | efficient and fast ML hardware, but you can't just drop in an ML
       | project and start using it? For example, only a subset of
       | TensorFlow or PyTorch is supported?
       | 
       | Is that right? If you want to use their hardware, you need to
       | jump through some hoops?
       | 
       | https://docs.habana.ai/en/latest/Tensorflow_User_Guide/Tenso...
       | 
       | https://docs.habana.ai/en/latest/PyTorch_User_Guide/PyTorch_...
        
         | j16sdiz wrote:
         | If you want to use that, you need the closed-source driver.
         | 
         | The open-source driver is the minimal to make linux kernel
         | developer happy.
        
           | yboris wrote:
           | But my point is that even if you use some closed-source
           | driver or whatever is required to "set up" your project - you
           | still can't re-use your regular code wholesale, but will have
           | to modify to fit within whatever features they support.
           | Right?
        
             | mirker wrote:
             | The API of Tensorflow and PyTorch is quite large. Even
             | Tensorflow has a dialect specifically for TPU, so the
             | notion of accelerator independence (especially when
             | considering performance) has not been realized yet (though
             | this is still a work in progress).
             | 
             | Anyway, how could you reuse your code wholesale when one of
             | the most common operations is calling ".cuda()" on a
             | tensor?
        
             | my123 wrote:
             | Unsupported layers will transparently fall back to CPU
             | execution. You can then choose to implement those who can
             | be as TPC kernels. It's fundamentally much less flexible
             | than a GPU.
             | 
             | Efficiency of Habana hardware isn't that great either, but
             | that's another story... (they're still using TSMC 16nm in
             | 2022 notably). Where Habana has an advantage in some
             | workloads is cost.
        
       | sjmm1989 wrote:
       | Okay, I clearly don't understand everything going on in this one.
       | Why? There must be something going on that explains why we are
       | letting people get away with breaking the rules the rest of us
       | are supposed to be follow. Must be. Why?
       | 
       | Because if I understand anything at all about open source, it's
       | that you don't get to post closed source as if it is open source;
       | even if you try to resort to shenanigans and trickery to get
       | around the rules.
       | 
       | So my question is this. Why are we letting Intel get away with
       | loopholing the rules the rest of us would have to follow? Seems
       | to me the best thing to do here is punish them like the petulant
       | child they are being. Erase their code, and tell them to politely
       | fuck off. Or at least more politely than Linus Torvalds did with
       | Nvidia.
       | 
       | Also, I don't see why letting them put their code up is of any
       | use in the first place if they are just going to do things like
       | this to essentially break it. Like seriously folks, what in the
       | ever living fuck?
       | 
       | If it were up to me, I'd have people hacking them just to show
       | them their place, and putting EVERYTHING up online for all to
       | use. And that would be after doing something like sending
       | evidence of wrong doings to the justice departments of every
       | single nation who wants to take a chew out of them. (of which
       | there are likely many)
       | 
       | Why?
       | 
       | Because the fact we allow companies like Microsoft and Intel to
       | continuously get away with all their bullshit that they keep on
       | trying to pull more bullshit. It's not rocket science folks.
       | 
       | Time to apply the brake to their bullshitmobile and hard.
        
         | Topgamer7 wrote:
         | > If it were up to me, I'd have people hacking them
         | 
         | Cut this garbage shit out. Hack on the driver implementation
         | instead.
         | 
         | > Because if I understand anything at all about open source,
         | it's that you don't get to post closed source as if it is open
         | source
         | 
         | If it was upstreamed into the linux tree, then they have open
         | sourced, with the appropriate license. So the driver is not
         | fully functional, but what they have included could be the
         | building blocks for someone to add the additional
         | functionality.
         | 
         | It is a shitty practice, and if you want to drive adoption,
         | this isn't going to do it.
         | 
         | At least they didn't make a marketing statement that they're
         | open source purveyors and software saviors.
        
       | j16sdiz wrote:
       | Background story: https://lwn.net/Articles/867168/
       | 
       | TLDR: Intel want some huge DRI-related in linux kernel. The DRI
       | maintainer insist there need at least one open user mode user. So
       | we have this Proof-of-Concept driver
        
         | drewg123 wrote:
         | Thanks for this, without this I had no clue what the article
         | was referring to.
         | 
         | It would be nice if there was a fairly stable kernel API (or
         | ABI) so drivers like this didn't _have_ to be in the kernel.
         | Out of tree drivers are a nightmare to maintain.
         | 
         | I maintained some out-of-tree drivers for years at Myricom
         | (version of myri10ge with valuable features nak'ed by netdev,
         | MX HPC drivers). Doing this was a massive PITA. Pretty much
         | every minor release brought with it some critical function
         | changing the number of arguments, changing names, etc. RHEL
         | updates were my own special version of hell, since their kernel
         | X.Y.Z in no way resembled the upstream X.Y.Z It got so bad to
         | support 2.6.9 through 3.x that the shim layer for the Linux
         | driver was almost as big as the entire FreeBSD driver (where
         | nobody cared that I implemented LRO).
        
           | 10000truths wrote:
           | This is by design. Linux doesn't _want_ to pay the
           | maintenance and performance costs of guaranteeing a stable
           | in-kernel API /ABI:
           | 
           | https://www.kernel.org/doc/Documentation/process/stable-
           | api-...
        
             | charcircuit wrote:
             | You can make the same argument about user space
             | compatibility. It's extra work and may prevent some
             | improvements, but it's nice to have.
        
           | josephcsible wrote:
           | > It would be nice if there was a fairly stable kernel API
           | (or ABI) so drivers like this didn't _have_ to be in the
           | kernel.
           | 
           | Why? We _want_ as many drivers as possible to be in the
           | kernel.
        
         | aseipp wrote:
         | There's another preceding case here, from 2020, involving
         | Qualcomm submitting a similar driver (and to some extent
         | Microsoft), which is quietly linked to and worth looking at for
         | some other history: https://lwn.net/Articles/821817/
         | 
         | The situation there is that Habana already had their driver
         | submitted very early I suppose, and I guess the resistance
         | wasn't high enough at the time to keep it out. Qualcomm later
         | came around and their own AI 100 driver was rejected, on
         | similar grounds that would have kept Habana out, had they been
         | applied at the time. (Airlie even called out Greg at this
         | time.)
         | 
         | The later scuffle (your OP link) is because the Habana driver
         | eventually wanted to adopt DMA-BUF and P2P-DMA support in the
         | driver, which the original developers intended for the GPU
         | subsystem, so they consider this over the line, because the
         | criteria for new GPU drivers is "a testable open source
         | userspace". So, that work was rejected, but the driver itself
         | wasn't pulled entirely. Just that particular series of patches
         | was not applied.
         | 
         | Microsoft had a weirdly similar case where they wanted a
         | virtualization driver for Linux that would effectively pass GPU
         | compute through to Windows hosts running under HyperV, for the
         | purposes of running Machine Learning compute workloads -- not
         | graphics. (The underlying Windows component to handle these
         | tasks _does_ use DirectX, but only the DirectCompute part of
         | it.) But, it wasn 't out of hand rejected on the same
         | principle; it's more like a VFIO passthrough device
         | conceptually, and didn't need to use any DRI/DRM specific
         | subsystems to accomplish that. But the basic outline is the
         | same where the userspace component would be closed source, so
         | the driver is just connecting a binary blob to a binary blob.
         | It doesn't use any deeply involved APIs, but it's also not very
         | useful for anyone except the WSL team. It's a bit of an
         | inbetween case where it isn't quite the same thing, but it's
         | not _not_ the same thing. Strange one.
         | 
         | As of right now, looking at upstream:
         | 
         | - Habana now has DMA-BUF support, as of late last year, so
         | presumably the minimal userspace given above was "good enough"
         | for upstream, since they can presumably at least run minimal
         | testing on the driver paths:
         | https://github.com/torvalds/linux/commit/a9498ee575fa116e289...
         | 
         | - Microsoft's DXGI/whatever-it's-called driver for compute is
         | still not upstream, but I think they ship it with their custom-
         | by-default WSL2 kernel (`wsl -e uname -a` gives me
         | `5.10.16.3-microsoft-standard-WSL2` right now). It was not
         | rejected out of hand but they also didn't seem to mind if it
         | didn't land immediately. I have no idea what it's status is.
         | 
         | - Qualcomm's driver for AI 100 was completely rejected
         | immediately and I do not know of any further attempts to
         | upstream it.
         | 
         | - And there are probably even more cases of this. I believe
         | Xilinx has a driver for their (similarly closed) compiler +
         | runtime stack included in Vitis, and I doubt it's going
         | upstream soon (xocl/xcmgmnt)
         | 
         | So the rules in general aren't particularly conclusive. But it
         | looks like most accelerator designs will eventually fall under
         | the rules of the graphics subsystem, if they seek to scale
         | through P2P/DMA designs. As a result of that, a lot of people
         | will probably get blocked, but Habana to some extent got a
         | first-mover advantage, I think.
         | 
         | Arguably if people want to complain about SynapseAI Core being
         | unsuitable for production use, to some extent, they should also
         | share a bit of blame with the Linux developers for that, if
         | they consider the drivers a problem. I think this isn't an
         | unreasonable position.
         | 
         | But ultimately this comes down to there being two different
         | desires among people: the kernel developers' concerns _aren 't_
         | that every userspace stack for every accelerator, shipped to
         | every production user, is fully open source. That might be the
         | concern of some people who are _users_ of the kernel and Linux
         | (including some kernel developers themselves), but not  "them"
         | at large. Their concern might be more accurately stated as:
         | they have enough tooling and information to maintain their own
         | codebase and APIs reliably, given the hardware drivers they
         | have. These are not the same objective, and this is a good
         | example of that.
        
       | AshamedCaptain wrote:
       | Kernel maintainers tend to refuse drivers that only work with
       | proprietary user-space, so I guess this is just one way to
       | workaround that.
        
         | hansendc wrote:
         | It's not just drivers. It's really about ensuring that the
         | folks that maintain the kernel have a way to test the code they
         | maintain. The reasons that we (the kernel maintainers) have for
         | this requirement are varied. But, for me, it's really nice to
         | have at least one open source implementation that can _test_
         | the kernel code. Without that, the kernel code can bit rot too
         | easily.
         | 
         | Even better is if an open source implementation is _in_ the
         | kernel tree, like in tools /testing/selftests. That makes it
         | even less likely that the kernel code gets broken.
         | 
         | Disclaimer: I work on Linux at Intel, although not on drivers
         | like this Habana one.
        
         | 10000truths wrote:
         | Or they could pull an NVidia, and dedicate a whole in-house
         | kernel team to maintaining an out-of-tree kernel module.
        
           | my123 wrote:
           | For NVIDIA? More than one.
           | 
           | The Tegra stack uses a totally separate out-of-tree but GPLv2
           | kernel module (which also works on some dGPU SKUs). It's
           | available at https://nv-tegra.nvidia.com/r/gitweb?p=linux-
           | nvgpu.git;a=sum...
           | 
           | And then there's the partially closed kernel module stack,
           | which is a different code base...
        
         | eikenberry wrote:
         | One of the points of having the drivers in kernel is that means
         | they kernel can actually run on that hardware. In addition to
         | allowing for testing as others have pointed out, it is also a
         | way to make sure that drivers aren't used to restrict access to
         | the hardware. It ensures the freedom of the platform.
        
       ___________________________________________________________________
       (page generated 2022-03-31 23:01 UTC)