[HN Gopher] AMD proposes an FPGA subsystem user-space interface ...
___________________________________________________________________
AMD proposes an FPGA subsystem user-space interface for Linux
Author : mikhael
Score : 106 points
Date : 2024-01-04 18:16 UTC (4 hours ago)
(HTM) web link (www.phoronix.com)
(TXT) w3m dump (www.phoronix.com)
| gen3 wrote:
| I think this would be a really cool feature. Imagine editing
| video and being able to add an updated h265 encoder
| buildbot wrote:
| There is something kinda like this already - a much less
| integrated, but works via v4l2:
| https://developer.ridgerun.com/wiki/index.php/V4L2_FPGA
| johnea wrote:
| I only know about xilinx specific, but even given the support
| for runtime updates to the FPGA logic that already exist in the
| xilinx kernels, there are serious roadblocks to updating only
| portions of the FPGA array.
|
| The logic must be very carefully implemented to allow over-
| writing only specific regions of the FPGA.
|
| If you're re-flashing the entire logic array, it's pretty
| straightforward. If you want to leave some logic in and add an
| additional funcitonality, this is much more difficult.
|
| This is really a limitaiton of the FPGA logic array
| implementation, not a linux shortcoming...
| jeffreygoesto wrote:
| Partial reconfiguration is possible since twenty-something
| years but only a small percentage of problems really benefit
| from it, see i.e. https://www.fpgakey.com/tutorial/section742
| jauntywundrkind wrote:
| It'll be interesting to see if AMD can help unleash what Xilinx
| was. At some point, leaving FPGAs as cloistered inaccessible
| technology - even well supported cloistered technology - becomes
| a risk.
|
| Theres been some incremental improvement with low level Linux
| support across the past year. Good to seem but so far, actually
| using FPGAs is still all Vivado & closed systems. I think there's
| so much possibility left on the table by not supporting openfpga
| alternatives, not embracing yosys/openpnr/openroad/&al.
| buildbot wrote:
| I hope AMD sees the light and helps F4FPGA develop a more
| complete open source toolchain for their FGPAs
| (https://f4pga.org). With this subsystem and an open source
| compilation flow, FGPA experiments would be way easier.
| UncleOxidant wrote:
| Unfortunately this will never happen.
| hwbehrens wrote:
| I don't know much about the ins-and-outs of the FPGA
| ecosystem -- can you explain why you think this kind of
| collaboration would be impossible? Is it a technical
| roadblock, a philosophical difference of opinion, a business
| decision, etc?
| PaulHoule wrote:
| My guess: if the bitstream format was documented
| competitors would know how the device works and be able to
| prove their patents are being violated.
|
| FPGA vendors will also justify inertia in that current FPGA
| users don't seem to be deterred by the bad tools because of
| the economics of their business.
|
| Some think a lot of hobby users would try FPGA if the
| toolset was easier to pick up but there are not enough of
| those folks to keep Radio Shack or even Fry's alive and
| they will be buying $5-$150 parts, not the much more
| powerful $10,000-$100,000 parts.
| JoshTriplett wrote:
| > My guess: if the bitstream format was documented
| competitors would know how the device works and be able
| to prove their patents are being violated.
|
| This has been the persistent argument for many years from
| companies who say they can't release Open Source graphics
| drivers.
|
| > FPGA vendors will also justify inertia in that current
| FPGA users don't seem to be deterred by the bad tools
| because of the economics of their business.
|
| Want hundreds as times as many FPGA users? Make it easy
| for an FPGA to be used for transparent acceleration, by
| making it easy for Open Source libraries to build and
| ship FPGA bitstreams that serve as accelerators for their
| data handling. Imagine if compression libraries,
| databases, and many other kinds of libraries could
| transparently take advantage of an FPGA if available to
| process data many times faster. Then there'd be a benefit
| to shipping an FPGA in many servers, and many client
| systems as well.
| MrCottenBall wrote:
| I don't think that even with accessible tools there is
| going to be hundreds of times more users. At the end of
| the day, they are niche products with niche uses. The
| average random person isn't going out of their way to
| make accelerators or some kind of pipeline tool or
| circuit to do something random that couldn't just easily
| be achieved with a micro controller or Arduino.
|
| Those that are willing to go out of their way to design a
| custom circuit or something else on an FPGA are in my
| opinion the type already dedicated enough or driven
| enough to not be deterred by crappy tools.
|
| The work you do on FPGAs is already a filter enough so
| that I don't think anyone is getting passed that and then
| giving up because the tools suck.
| UncleOxidant wrote:
| > The work you do on FPGAs is already a filter enough so
| that I don't think anyone is getting passed that and then
| giving up because the tools suck.
|
| About 10 years ago I was doing some FPGA development in a
| startup. We were using Vivado. It seemed like we spent
| about 30% of our time working around Vivado bugs. I come
| from a hardware background originally and then got into
| software development (EDA tools) later on. After the
| startup gig ended I could've gone more in the direction
| of FPGA development. I decided not to because the tools
| suck and life is too short to deal with that day in and
| day out. And it's not simply that the FPGA vendor tools
| are some of the buggiest software known to humankind,
| it's that the FPGA vendors don't care to make them
| better.
| nerpderp82 wrote:
| I think you are looking at it from the wrong perspective.
|
| It doesn't take 100x the devs to make FPGA compelling on
| the desktop or the server. Just like bespoke accelerators
| in Apple Silicon are used behind a library, so too will
| the accelerators implemented via FPGAs. The program
| itself can be copied a billion times.
|
| Your argument can be made for GPUs as well, the users
| (end users) aren't the ones writing the shaders, but GPUs
| are used my hundreds of millions of people.
| EMIRELADERO wrote:
| > This has been the persistent argument for many years
| from companies who say they can't release Open Source
| graphics drivers.
|
| What? How can any company claim that with the patent
| thing at play? Wouldn't that just be admitting they're
| violating patents, therefore making the closed-sourceness
| reason moot in the first place?
|
| Moreover, wouldn't any sufficiently-interested
| patentholder just reverse-engineer the compiled binary
| and arrive to the supposed infringment on their own?
| mschuster91 wrote:
| > Moreover, wouldn't any sufficiently-interested
| patentholder just reverse-engineer the compiled binary
| and arrive to the supposed infringment on their own?
|
| It's a MAD (mutually assured destruction) situation. You
| can rest assured that everyone knows about everyone
| else's rotting corpses in the storage locker... the first
| one to chicken out to the feds will get blasted to pieces
| just like everyone else.
|
| My personal opinion is that today's patent systems can go
| and die in a fire for all I care, right after copyright.
| aseipp wrote:
| The use cases you write about are mostly constrained by
| design, not software. Configuration of SRAM based FPGAs
| is rather slow because it requires a scan chain to
| program each logical element and shift config bits into,
| and doing it faster requires even more circuitry. You
| need to multiplex things onto the fabric in practice, you
| can't "context switch" AKA temporally multiplex very
| well, you have to spatially multiplex. But FPGAs are
| already area intensive; a k-LUT needs 2^k SRAM bits for
| the table, each bit being 6 transistors, on top of the
| scan chain to program it, and the registers and latches
| that go with the LUT in a typical logic element, and
| routing crossbars, and so on. Assuming K=6 then a single
| LUT would be like ~100x transistor overhead compared to a
| CMOS NAND gate (not a 1-to-1 comparison, just a
| ballpark). The SRAM requirements alone are problematic
| because it scales far, far worse than logic. If you're
| talking about a modern 7/5/3-nm wafer, area = money, and
| that's a shitload of money. So, what part of the system
| architecture do you even put the FPGA on? In the core
| complex? It can't be too big, then; your users are 99%
| better off with that area going to more cache and branch
| prediction. Put it on an older process and stuff it in
| the package? Packaging isn't free. Maybe just on the PCB?
| "Bump in the wire" to the NIC, or RAM, or storage? That
| limits the use, but an out-of-line design means there's
| less bandwidth available as input/outputs are shared.
| There are benefits and costs to them all and they all
| change the use cases and interface a bit. Now keep in
| mind you might have multiple parallel bitstreams you want
| to run. All of these choices impact how the software can
| interface with it and what capabilities it has.
|
| Example: Modern DDR5 has something like 64GB of bandwidth
| per channel; assuming your design is inline on the bus
| running at something like 500MHz, you'd need a 128-bit
| bus, per channel. That clock rate might require deep
| pipelining, further increasing area requirements, so you
| can't fit as much other stuff. Otherwise, you need a
| wider bus and to go slower, but wider buses often scale
| sub-linearly in terms of area and routing congestion; a
| 256-bit bus will be more than twice as expensive and
| difficult to route as a 128 bit one due to limited
| routing tracks, etc. So maybe you can hit that target,
| but then you're too routing congested, so you can't fit
| as many channels as you want in. Ergo, you need
| bigger/more FPGAs, or serious optimization and redesign.
| There's no immediate win. You need to explore/napkin math
| the design space to find the best solution on the pareto
| frontier, typically. Or just buy a FPGA that's massive
| overkill, AKA "buy a faster PC", the typical software
| programmer's solution. But it really isn't plug and play
| or anything close to that.
|
| It's similar to other niche things, like in-memory GPU
| databases. They are not held back by CUDA being
| proprietary. That fact does suck, but it's not really
| relevant in the grand scheme. They are held back by
| physical design dictating that parallel accelerators need
| loads of fast memory to feed the execution units, fast
| memory is super expensive and takes up a lot of space on
| the PCB resulting in a physical upper bound on density,
| and that the working set for such databases typically
| grows much, much faster than rate at which GPU memory
| performance/price drops. Past the point of no return
| (working set > VRAM), their advantages rapidly vanish.
| Their limitations are in the design, not the software.
|
| FPGAs taught me a lot about hardware/software design. I
| really like them and want more people to use them. I'm
| really excited there are fully FOSS flows, even if they
| have giant limitations. But they are pretty niche and
| have serious physical design factors to account for; I
| say that as someone who contributes to, uses, and loves
| the open-source tools for what they are, and even was
| lucky enough to play with them for work.
| EMIRELADERO wrote:
| > My guess: if the bitstream format was documented
| competitors would know how the device works and be able
| to prove their patents are being violated.
|
| Assuming the patentholder had sufficient and warranted
| suspicions, wouldn't they initiate legal action and get
| the actual source/hardware design files through discovery
| anyway?
| 127361 wrote:
| There are existing reverse engineering efforts for AMD
| Xilinx devices:
|
| https://f4pga.org/
|
| https://f4pga.readthedocs.io/projects/prjxray/en/latest/a
| rch...
|
| https://github.com/f4pga/prjxray
|
| https://github.com/epfl-vlsc/bitfiltrator
| UncleOxidant wrote:
| > My guess: if the bitstream format was documented
| competitors would know how the device works and be able
| to prove their patents are being violated.
|
| _Maybe_. There certainly is a lot of "secret sauce"
| energy around the bitstream formats. Primarily I think
| they guard the bitstream format to help ensure vendor
| lockin. Imagine if there were open tools that could
| easily target FPGAs from multiple vendors so that users
| could choose the most cost effective solution. The FPGA
| vendors don't want that.
| someguydave wrote:
| Yep, FPGAs would become cheap commodities like DRAM
| johnea wrote:
| Devicetree overlays and configfs has supported this in the xilinx
| kernels for some time.
|
| I guess what they're talking about is upstreaming all of this...
| monocasa wrote:
| Do those existing kernel patches support dynamic devicetree
| overlays in a running system?
| synergy20 wrote:
| https://www.kernel.org/doc/html/latest/driver-api/fpga/index...
| is the mentioned linux fpga framework, it would be very useful if
| a userspace API works.
| spacedcowboy wrote:
| I wrote an internal proposal up for Apple to do this, about 17
| years ago, while working on Aperture. Didn't get very far :(
| xattt wrote:
| Was there any uptake when the Afterburner card came around?
| ApolIllo wrote:
| I loved Aperture!
| tuckerpo wrote:
| On-the-fly bitstream programming of FPGAs from a parallel Linux
| board used to mean holding the FPGA in reset and bit-banging the
| entire bitstream via the FPGA's passive or active serial
| interface, which is a blocking process that takes quite a bit of
| time.
|
| Be very nifty if this takes off.
| aseipp wrote:
| Yeah, the existing FPGA device subsystem has been pretty much
| worthless for upstream kernels for years now and the lack of a
| good userspace API is a big reason, so in practice everyone just
| bitbangs the bitstream over some QSPI interface and pulls the
| reset manually. Hopefully this can see the light of day at some
| point.
| andrewstuart wrote:
| What are the facts about how long AMD provides drivers for?
|
| Nvidia provide drivers for what feels like a VERY long time.
|
| I have the feeling that AMD gives up on providing drivers after
| only a few years of product life.
|
| But these are only hunches, anyone know the facts?
| bryanlarsen wrote:
| Still on the front page:
| https://news.ycombinator.com/item?id=38868265
| andrewstuart wrote:
| Are you sayig AMD provides drivers for 22 years just because
| that's a headline on HN?
|
| I don't believe that.
|
| My question stands - anyone have FACTS?
| bryanlarsen wrote:
| I'm saying that you're posting your question on the wrong
| page. Your question is off topic here.
| andrewstuart wrote:
| OH yes I see.
| falserum wrote:
| My current take is "program with if statements all the way
| down"->CPU, "Multiply these humongous matrix multiple times per
| second"->GPU.
|
| What is FPGA for?
| packetlost wrote:
| If you're thinking in terms of instructions or mathematics
| operations, you're thinking too high level for what an FPGA is
| and does.
|
| FPGAs are often times for prototyping hardware designs, but
| also for bespoke parallelized or hardware operations that are
| too low of yield to justify ASIC development.
| icf80 wrote:
| custom software in hardware, multiple times faster re-implement
| your own GPU?
| fayalalebrun wrote:
| With FPGA, unlike with GPUs, you can achieve significant
| speedup of algorithms where parallelization is very difficult.
| This is thanks to a technique called pipelining, where you can
| perform several steps of a sequential computation at the same
| time.
|
| An example of this is video decoding/encoding, which is
| commonly implemented by dedicated hardware.
| Lramseyer wrote:
| > One of the major valid concerns that were raised with this
| configfs interface was security as it opens up the interface to
| users for modifying the live device tree.
|
| This has always felt like a gaping security hole waiting to be
| explored.
|
| Modern, high end FPGAs have a feature known as Raw SerDes, which
| in essence allows you to bypass a PCIe or Ethernet controller and
| use those lanes (yes, PCIe lanes) to your heart's desire
| ...provided you can design a working communication protocol.
| Difficult, but not impossible by any means.
|
| So if you wanted to, you could design your own PCIe controller
| and give it whatever device ID, vendor ID, memory space, or
| capability space you want! Normally these things are not writable
| on a PCIe controller. But if you designed your own, you could
| write them to whatever you want and spoof device types, memory
| spaces, or driver bindings, and probably get yourself access to
| memory you shouldn't be touching. While I don't know how the
| linux kernel would handle these potentially out of spec
| conditions, it never sat right with me from a security
| standpoint.
| mschuster91 wrote:
| > But if you designed your own, you could write them to
| whatever you want and spoof device types, memory spaces, or
| driver bindings, and probably get yourself access to memory you
| shouldn't be touching.
|
| Not in a system with a properly configured IOMMU unit. That
| stuff got some serious attention back in the old Thunderbolt 2
| era, when people discovered that yes, it's PCIe under the hood
| and yes, having no IOMMU protection yields an attacker an
| instant-0wn.
| adwn wrote:
| The built-in Gigabit transceiver cores, which you'd have to use
| for the PCIe protocol, are connected to very specific IO pins
| on the FPGA. If the PCIe slots on your mainboard aren't already
| routed to those pins, then the FPGA will never be able to
| "bypass" the regular PCIe or Ethernet interfaces. Conversely,
| if they _are_ connected to those pins, then the regular PCIe
| and Ethernet interfaces won 't be able to use that PCIe slot.
| So no, your security concerns are unwarranted.
|
| > _a feature known as Raw SerDes_
|
| I have never heard anyone use the term "raw serdes" for hard
| transceiver IP cores.
| Lramseyer wrote:
| You're attaching your design to the Mac layer inside the
| FPGA, not to the IO pins, so it's the PIPE interface or
| something similar that you would need to communicate with.
| And yes, you can bypass the PCIe or Ethernet controller on
| various models of FPGAs.
| dooglius wrote:
| FPGAs are no different than any other hardware in this regard;
| in fact I suspect if you can hack the firmware on most pcie
| cards you could do that stuff too (unless there is an IOMMU)
| schmidtleonard wrote:
| > Modern, high end FPGAs have a feature known as Raw SerDes
|
| That's like saying "did you know that advanced microprocessors
| have the capability to bypass I2C and output voltages directly
| on the pins?!?!?!?"
|
| First of all, it's backwards. The physical layer comes before
| the protocols and is always there at the base. Second of all,
| the world does not exclusively run on I2C. Some people want SPI
| busses or to toggle transistors with GPIO. That's fine. Sure,
| gate it behind different permissions, but don't just rant at
| what you don't understand.
|
| If you want a concrete example where serdes access is
| important, look up JESD204b, but in general there are loads of
| real-time systems or bespoke processing applications where it
| makes sense to dispense with the complex and temperamental
| packet-switched infrastructure in places where that complexity
| and nondeterministic behavior is likely to cause more trouble
| than good. There are also applications to backplane connections
| (if you are encapsulating PCIe, you want to run slightly faster
| than the PCIe so the PCIe can run at full bandwidth), even to
| the development of next-gen PCIe itself. It's not magic, it is
| not delivered by a stork, it needs to be prototyped, and that's
| another thing FPGAs are used for.
___________________________________________________________________
(page generated 2024-01-04 23:00 UTC)