[HN Gopher] AMD proposes an FPGA subsystem user-space interface ...
       ___________________________________________________________________
        
       AMD proposes an FPGA subsystem user-space interface for Linux
        
       Author : mikhael
       Score  : 106 points
       Date   : 2024-01-04 18:16 UTC (4 hours ago)
        
 (HTM) web link (www.phoronix.com)
 (TXT) w3m dump (www.phoronix.com)
        
       | gen3 wrote:
       | I think this would be a really cool feature. Imagine editing
       | video and being able to add an updated h265 encoder
        
         | buildbot wrote:
         | There is something kinda like this already - a much less
         | integrated, but works via v4l2:
         | https://developer.ridgerun.com/wiki/index.php/V4L2_FPGA
        
         | johnea wrote:
         | I only know about xilinx specific, but even given the support
         | for runtime updates to the FPGA logic that already exist in the
         | xilinx kernels, there are serious roadblocks to updating only
         | portions of the FPGA array.
         | 
         | The logic must be very carefully implemented to allow over-
         | writing only specific regions of the FPGA.
         | 
         | If you're re-flashing the entire logic array, it's pretty
         | straightforward. If you want to leave some logic in and add an
         | additional funcitonality, this is much more difficult.
         | 
         | This is really a limitaiton of the FPGA logic array
         | implementation, not a linux shortcoming...
        
           | jeffreygoesto wrote:
           | Partial reconfiguration is possible since twenty-something
           | years but only a small percentage of problems really benefit
           | from it, see i.e. https://www.fpgakey.com/tutorial/section742
        
       | jauntywundrkind wrote:
       | It'll be interesting to see if AMD can help unleash what Xilinx
       | was. At some point, leaving FPGAs as cloistered inaccessible
       | technology - even well supported cloistered technology - becomes
       | a risk.
       | 
       | Theres been some incremental improvement with low level Linux
       | support across the past year. Good to seem but so far, actually
       | using FPGAs is still all Vivado & closed systems. I think there's
       | so much possibility left on the table by not supporting openfpga
       | alternatives, not embracing yosys/openpnr/openroad/&al.
        
       | buildbot wrote:
       | I hope AMD sees the light and helps F4FPGA develop a more
       | complete open source toolchain for their FGPAs
       | (https://f4pga.org). With this subsystem and an open source
       | compilation flow, FGPA experiments would be way easier.
        
         | UncleOxidant wrote:
         | Unfortunately this will never happen.
        
           | hwbehrens wrote:
           | I don't know much about the ins-and-outs of the FPGA
           | ecosystem -- can you explain why you think this kind of
           | collaboration would be impossible? Is it a technical
           | roadblock, a philosophical difference of opinion, a business
           | decision, etc?
        
             | PaulHoule wrote:
             | My guess: if the bitstream format was documented
             | competitors would know how the device works and be able to
             | prove their patents are being violated.
             | 
             | FPGA vendors will also justify inertia in that current FPGA
             | users don't seem to be deterred by the bad tools because of
             | the economics of their business.
             | 
             | Some think a lot of hobby users would try FPGA if the
             | toolset was easier to pick up but there are not enough of
             | those folks to keep Radio Shack or even Fry's alive and
             | they will be buying $5-$150 parts, not the much more
             | powerful $10,000-$100,000 parts.
        
               | JoshTriplett wrote:
               | > My guess: if the bitstream format was documented
               | competitors would know how the device works and be able
               | to prove their patents are being violated.
               | 
               | This has been the persistent argument for many years from
               | companies who say they can't release Open Source graphics
               | drivers.
               | 
               | > FPGA vendors will also justify inertia in that current
               | FPGA users don't seem to be deterred by the bad tools
               | because of the economics of their business.
               | 
               | Want hundreds as times as many FPGA users? Make it easy
               | for an FPGA to be used for transparent acceleration, by
               | making it easy for Open Source libraries to build and
               | ship FPGA bitstreams that serve as accelerators for their
               | data handling. Imagine if compression libraries,
               | databases, and many other kinds of libraries could
               | transparently take advantage of an FPGA if available to
               | process data many times faster. Then there'd be a benefit
               | to shipping an FPGA in many servers, and many client
               | systems as well.
        
               | MrCottenBall wrote:
               | I don't think that even with accessible tools there is
               | going to be hundreds of times more users. At the end of
               | the day, they are niche products with niche uses. The
               | average random person isn't going out of their way to
               | make accelerators or some kind of pipeline tool or
               | circuit to do something random that couldn't just easily
               | be achieved with a micro controller or Arduino.
               | 
               | Those that are willing to go out of their way to design a
               | custom circuit or something else on an FPGA are in my
               | opinion the type already dedicated enough or driven
               | enough to not be deterred by crappy tools.
               | 
               | The work you do on FPGAs is already a filter enough so
               | that I don't think anyone is getting passed that and then
               | giving up because the tools suck.
        
               | UncleOxidant wrote:
               | > The work you do on FPGAs is already a filter enough so
               | that I don't think anyone is getting passed that and then
               | giving up because the tools suck.
               | 
               | About 10 years ago I was doing some FPGA development in a
               | startup. We were using Vivado. It seemed like we spent
               | about 30% of our time working around Vivado bugs. I come
               | from a hardware background originally and then got into
               | software development (EDA tools) later on. After the
               | startup gig ended I could've gone more in the direction
               | of FPGA development. I decided not to because the tools
               | suck and life is too short to deal with that day in and
               | day out. And it's not simply that the FPGA vendor tools
               | are some of the buggiest software known to humankind,
               | it's that the FPGA vendors don't care to make them
               | better.
        
               | nerpderp82 wrote:
               | I think you are looking at it from the wrong perspective.
               | 
               | It doesn't take 100x the devs to make FPGA compelling on
               | the desktop or the server. Just like bespoke accelerators
               | in Apple Silicon are used behind a library, so too will
               | the accelerators implemented via FPGAs. The program
               | itself can be copied a billion times.
               | 
               | Your argument can be made for GPUs as well, the users
               | (end users) aren't the ones writing the shaders, but GPUs
               | are used my hundreds of millions of people.
        
               | EMIRELADERO wrote:
               | > This has been the persistent argument for many years
               | from companies who say they can't release Open Source
               | graphics drivers.
               | 
               | What? How can any company claim that with the patent
               | thing at play? Wouldn't that just be admitting they're
               | violating patents, therefore making the closed-sourceness
               | reason moot in the first place?
               | 
               | Moreover, wouldn't any sufficiently-interested
               | patentholder just reverse-engineer the compiled binary
               | and arrive to the supposed infringment on their own?
        
               | mschuster91 wrote:
               | > Moreover, wouldn't any sufficiently-interested
               | patentholder just reverse-engineer the compiled binary
               | and arrive to the supposed infringment on their own?
               | 
               | It's a MAD (mutually assured destruction) situation. You
               | can rest assured that everyone knows about everyone
               | else's rotting corpses in the storage locker... the first
               | one to chicken out to the feds will get blasted to pieces
               | just like everyone else.
               | 
               | My personal opinion is that today's patent systems can go
               | and die in a fire for all I care, right after copyright.
        
               | aseipp wrote:
               | The use cases you write about are mostly constrained by
               | design, not software. Configuration of SRAM based FPGAs
               | is rather slow because it requires a scan chain to
               | program each logical element and shift config bits into,
               | and doing it faster requires even more circuitry. You
               | need to multiplex things onto the fabric in practice, you
               | can't "context switch" AKA temporally multiplex very
               | well, you have to spatially multiplex. But FPGAs are
               | already area intensive; a k-LUT needs 2^k SRAM bits for
               | the table, each bit being 6 transistors, on top of the
               | scan chain to program it, and the registers and latches
               | that go with the LUT in a typical logic element, and
               | routing crossbars, and so on. Assuming K=6 then a single
               | LUT would be like ~100x transistor overhead compared to a
               | CMOS NAND gate (not a 1-to-1 comparison, just a
               | ballpark). The SRAM requirements alone are problematic
               | because it scales far, far worse than logic. If you're
               | talking about a modern 7/5/3-nm wafer, area = money, and
               | that's a shitload of money. So, what part of the system
               | architecture do you even put the FPGA on? In the core
               | complex? It can't be too big, then; your users are 99%
               | better off with that area going to more cache and branch
               | prediction. Put it on an older process and stuff it in
               | the package? Packaging isn't free. Maybe just on the PCB?
               | "Bump in the wire" to the NIC, or RAM, or storage? That
               | limits the use, but an out-of-line design means there's
               | less bandwidth available as input/outputs are shared.
               | There are benefits and costs to them all and they all
               | change the use cases and interface a bit. Now keep in
               | mind you might have multiple parallel bitstreams you want
               | to run. All of these choices impact how the software can
               | interface with it and what capabilities it has.
               | 
               | Example: Modern DDR5 has something like 64GB of bandwidth
               | per channel; assuming your design is inline on the bus
               | running at something like 500MHz, you'd need a 128-bit
               | bus, per channel. That clock rate might require deep
               | pipelining, further increasing area requirements, so you
               | can't fit as much other stuff. Otherwise, you need a
               | wider bus and to go slower, but wider buses often scale
               | sub-linearly in terms of area and routing congestion; a
               | 256-bit bus will be more than twice as expensive and
               | difficult to route as a 128 bit one due to limited
               | routing tracks, etc. So maybe you can hit that target,
               | but then you're too routing congested, so you can't fit
               | as many channels as you want in. Ergo, you need
               | bigger/more FPGAs, or serious optimization and redesign.
               | There's no immediate win. You need to explore/napkin math
               | the design space to find the best solution on the pareto
               | frontier, typically. Or just buy a FPGA that's massive
               | overkill, AKA "buy a faster PC", the typical software
               | programmer's solution. But it really isn't plug and play
               | or anything close to that.
               | 
               | It's similar to other niche things, like in-memory GPU
               | databases. They are not held back by CUDA being
               | proprietary. That fact does suck, but it's not really
               | relevant in the grand scheme. They are held back by
               | physical design dictating that parallel accelerators need
               | loads of fast memory to feed the execution units, fast
               | memory is super expensive and takes up a lot of space on
               | the PCB resulting in a physical upper bound on density,
               | and that the working set for such databases typically
               | grows much, much faster than rate at which GPU memory
               | performance/price drops. Past the point of no return
               | (working set > VRAM), their advantages rapidly vanish.
               | Their limitations are in the design, not the software.
               | 
               | FPGAs taught me a lot about hardware/software design. I
               | really like them and want more people to use them. I'm
               | really excited there are fully FOSS flows, even if they
               | have giant limitations. But they are pretty niche and
               | have serious physical design factors to account for; I
               | say that as someone who contributes to, uses, and loves
               | the open-source tools for what they are, and even was
               | lucky enough to play with them for work.
        
               | EMIRELADERO wrote:
               | > My guess: if the bitstream format was documented
               | competitors would know how the device works and be able
               | to prove their patents are being violated.
               | 
               | Assuming the patentholder had sufficient and warranted
               | suspicions, wouldn't they initiate legal action and get
               | the actual source/hardware design files through discovery
               | anyway?
        
               | 127361 wrote:
               | There are existing reverse engineering efforts for AMD
               | Xilinx devices:
               | 
               | https://f4pga.org/
               | 
               | https://f4pga.readthedocs.io/projects/prjxray/en/latest/a
               | rch...
               | 
               | https://github.com/f4pga/prjxray
               | 
               | https://github.com/epfl-vlsc/bitfiltrator
        
               | UncleOxidant wrote:
               | > My guess: if the bitstream format was documented
               | competitors would know how the device works and be able
               | to prove their patents are being violated.
               | 
               |  _Maybe_. There certainly is a lot of  "secret sauce"
               | energy around the bitstream formats. Primarily I think
               | they guard the bitstream format to help ensure vendor
               | lockin. Imagine if there were open tools that could
               | easily target FPGAs from multiple vendors so that users
               | could choose the most cost effective solution. The FPGA
               | vendors don't want that.
        
               | someguydave wrote:
               | Yep, FPGAs would become cheap commodities like DRAM
        
       | johnea wrote:
       | Devicetree overlays and configfs has supported this in the xilinx
       | kernels for some time.
       | 
       | I guess what they're talking about is upstreaming all of this...
        
         | monocasa wrote:
         | Do those existing kernel patches support dynamic devicetree
         | overlays in a running system?
        
       | synergy20 wrote:
       | https://www.kernel.org/doc/html/latest/driver-api/fpga/index...
       | is the mentioned linux fpga framework, it would be very useful if
       | a userspace API works.
        
       | spacedcowboy wrote:
       | I wrote an internal proposal up for Apple to do this, about 17
       | years ago, while working on Aperture. Didn't get very far :(
        
         | xattt wrote:
         | Was there any uptake when the Afterburner card came around?
        
         | ApolIllo wrote:
         | I loved Aperture!
        
       | tuckerpo wrote:
       | On-the-fly bitstream programming of FPGAs from a parallel Linux
       | board used to mean holding the FPGA in reset and bit-banging the
       | entire bitstream via the FPGA's passive or active serial
       | interface, which is a blocking process that takes quite a bit of
       | time.
       | 
       | Be very nifty if this takes off.
        
       | aseipp wrote:
       | Yeah, the existing FPGA device subsystem has been pretty much
       | worthless for upstream kernels for years now and the lack of a
       | good userspace API is a big reason, so in practice everyone just
       | bitbangs the bitstream over some QSPI interface and pulls the
       | reset manually. Hopefully this can see the light of day at some
       | point.
        
       | andrewstuart wrote:
       | What are the facts about how long AMD provides drivers for?
       | 
       | Nvidia provide drivers for what feels like a VERY long time.
       | 
       | I have the feeling that AMD gives up on providing drivers after
       | only a few years of product life.
       | 
       | But these are only hunches, anyone know the facts?
        
         | bryanlarsen wrote:
         | Still on the front page:
         | https://news.ycombinator.com/item?id=38868265
        
           | andrewstuart wrote:
           | Are you sayig AMD provides drivers for 22 years just because
           | that's a headline on HN?
           | 
           | I don't believe that.
           | 
           | My question stands - anyone have FACTS?
        
             | bryanlarsen wrote:
             | I'm saying that you're posting your question on the wrong
             | page. Your question is off topic here.
        
               | andrewstuart wrote:
               | OH yes I see.
        
       | falserum wrote:
       | My current take is "program with if statements all the way
       | down"->CPU, "Multiply these humongous matrix multiple times per
       | second"->GPU.
       | 
       | What is FPGA for?
        
         | packetlost wrote:
         | If you're thinking in terms of instructions or mathematics
         | operations, you're thinking too high level for what an FPGA is
         | and does.
         | 
         | FPGAs are often times for prototyping hardware designs, but
         | also for bespoke parallelized or hardware operations that are
         | too low of yield to justify ASIC development.
        
         | icf80 wrote:
         | custom software in hardware, multiple times faster re-implement
         | your own GPU?
        
         | fayalalebrun wrote:
         | With FPGA, unlike with GPUs, you can achieve significant
         | speedup of algorithms where parallelization is very difficult.
         | This is thanks to a technique called pipelining, where you can
         | perform several steps of a sequential computation at the same
         | time.
         | 
         | An example of this is video decoding/encoding, which is
         | commonly implemented by dedicated hardware.
        
       | Lramseyer wrote:
       | > One of the major valid concerns that were raised with this
       | configfs interface was security as it opens up the interface to
       | users for modifying the live device tree.
       | 
       | This has always felt like a gaping security hole waiting to be
       | explored.
       | 
       | Modern, high end FPGAs have a feature known as Raw SerDes, which
       | in essence allows you to bypass a PCIe or Ethernet controller and
       | use those lanes (yes, PCIe lanes) to your heart's desire
       | ...provided you can design a working communication protocol.
       | Difficult, but not impossible by any means.
       | 
       | So if you wanted to, you could design your own PCIe controller
       | and give it whatever device ID, vendor ID, memory space, or
       | capability space you want! Normally these things are not writable
       | on a PCIe controller. But if you designed your own, you could
       | write them to whatever you want and spoof device types, memory
       | spaces, or driver bindings, and probably get yourself access to
       | memory you shouldn't be touching. While I don't know how the
       | linux kernel would handle these potentially out of spec
       | conditions, it never sat right with me from a security
       | standpoint.
        
         | mschuster91 wrote:
         | > But if you designed your own, you could write them to
         | whatever you want and spoof device types, memory spaces, or
         | driver bindings, and probably get yourself access to memory you
         | shouldn't be touching.
         | 
         | Not in a system with a properly configured IOMMU unit. That
         | stuff got some serious attention back in the old Thunderbolt 2
         | era, when people discovered that yes, it's PCIe under the hood
         | and yes, having no IOMMU protection yields an attacker an
         | instant-0wn.
        
         | adwn wrote:
         | The built-in Gigabit transceiver cores, which you'd have to use
         | for the PCIe protocol, are connected to very specific IO pins
         | on the FPGA. If the PCIe slots on your mainboard aren't already
         | routed to those pins, then the FPGA will never be able to
         | "bypass" the regular PCIe or Ethernet interfaces. Conversely,
         | if they _are_ connected to those pins, then the regular PCIe
         | and Ethernet interfaces won 't be able to use that PCIe slot.
         | So no, your security concerns are unwarranted.
         | 
         | > _a feature known as Raw SerDes_
         | 
         | I have never heard anyone use the term "raw serdes" for hard
         | transceiver IP cores.
        
           | Lramseyer wrote:
           | You're attaching your design to the Mac layer inside the
           | FPGA, not to the IO pins, so it's the PIPE interface or
           | something similar that you would need to communicate with.
           | And yes, you can bypass the PCIe or Ethernet controller on
           | various models of FPGAs.
        
         | dooglius wrote:
         | FPGAs are no different than any other hardware in this regard;
         | in fact I suspect if you can hack the firmware on most pcie
         | cards you could do that stuff too (unless there is an IOMMU)
        
         | schmidtleonard wrote:
         | > Modern, high end FPGAs have a feature known as Raw SerDes
         | 
         | That's like saying "did you know that advanced microprocessors
         | have the capability to bypass I2C and output voltages directly
         | on the pins?!?!?!?"
         | 
         | First of all, it's backwards. The physical layer comes before
         | the protocols and is always there at the base. Second of all,
         | the world does not exclusively run on I2C. Some people want SPI
         | busses or to toggle transistors with GPIO. That's fine. Sure,
         | gate it behind different permissions, but don't just rant at
         | what you don't understand.
         | 
         | If you want a concrete example where serdes access is
         | important, look up JESD204b, but in general there are loads of
         | real-time systems or bespoke processing applications where it
         | makes sense to dispense with the complex and temperamental
         | packet-switched infrastructure in places where that complexity
         | and nondeterministic behavior is likely to cause more trouble
         | than good. There are also applications to backplane connections
         | (if you are encapsulating PCIe, you want to run slightly faster
         | than the PCIe so the PCIe can run at full bandwidth), even to
         | the development of next-gen PCIe itself. It's not magic, it is
         | not delivered by a stork, it needs to be prototyped, and that's
         | another thing FPGAs are used for.
        
       ___________________________________________________________________
       (page generated 2024-01-04 23:00 UTC)