[HN Gopher] FuryGpu - Custom PCIe FPGA GPU
       ___________________________________________________________________
        
       FuryGpu - Custom PCIe FPGA GPU
        
       Author : argulane
       Score  : 303 points
       Date   : 2024-03-27 08:37 UTC (14 hours ago)
        
 (HTM) web link (www.furygpu.com)
 (TXT) w3m dump (www.furygpu.com)
        
       | snvzz wrote:
       | Pipeline seems retro, but far better than nothing.
       | 
       | There's no open hardware GPU to speak of. Depending on license
       | (can't find information?), this could be the first, and a
       | starting point for more.
        
         | crote wrote:
         | It all depends on your definition of "open", of course. As far
         | as I know there is no open-source toolchain for any remotely-
         | recent FPGA, so you're still stick with proprietary (paid?)
         | tooling to actually modify it. You're pretty much out of luck
         | if you need more than an iCE40 UP5k.
        
           | rwmj wrote:
           | At least some Xilinx 7-series FPGAs have been reverse
           | engineered: https://yosyshq.readthedocs.io/projects/yosys/en/
           | latest/cmd/...
        
             | robinsonb5 wrote:
             | There's been some interesting recent work to get the QMTech
             | Kintex7-325 board (among others) supported under
             | yosys/nextpnr - https://github.com/openXC7 It works well
             | enough now to build a RISC-V SoC capable of running Linux.
        
           | snvzz wrote:
           | >You're pretty much out of luck if you need more than an
           | iCE40 UP5k.
           | 
           | Lattice ECP5 (which goes up to 85k LUT or so?) and Nexus have
           | more than decent support.
           | 
           | Gowin FPGAs are supported via project apicula up to 20k LUT
           | models. Some new models go above 200k LUT so there's hope
           | there.
        
             | robinsonb5 wrote:
             | Yeah I've used yosys / nextpnr on an ECP5-85 with great
             | results - it's pretty mature and dependable now.
        
           | bajsejohannes wrote:
           | The up an coming GateMate seems interesting to me. They are
           | leaning heavily on open source tooling.
           | 
           | chip: https://colognechip.com/programmable-logic/gatemate/
           | board: https://www.olimex.com/Products/FPGA/GateMate/GateMate
           | A1-EVB...
        
         | monocasa wrote:
         | > There's no open hardware GPU to speak of. Depending on
         | license (can't find information?), this could be the first, and
         | a starting point for more.
         | 
         | There's this which is about the same kind of GPU
         | 
         | https://github.com/asicguy/gplgpu
        
         | mips_r4300i wrote:
         | Ticket2Ride Number9 is a fixed function GPU from the late 90s
         | that was completely open sourced under GPL
        
         | Hazematman wrote:
         | There's also Nyuzi which is more GPGPU focused
         | https://github.com/jbush001/NyuziProcessor, but the author also
         | experimented with having it do 3D graphics.
        
       | jamesu wrote:
       | Similarly there is this: https://github.com/ToNi3141/Rasterix
       | 
       | Would be neat if someone made an FPGA GPU which had a shader
       | pipeline honestly.
        
         | actionfromafar wrote:
         | How good would a Ryzen with 32 cores be if it did just
         | graphics?
        
           | tux3 wrote:
           | You can run Crysis in software rendering on a high core count
           | AMD CPU.
           | 
           | It's terrible use of the hardware and the performance is far
           | from stellar, but you _can_!
        
           | immibis wrote:
           | Wasn't Intel Larrabee something like that? Get a bunch of
           | dumb x86 cores together and tell them to do graphics?
        
             | actionfromafar wrote:
             | I'm so sad Larrabee or similar things never took off. No,
             | it might not have benchmarked well against contemporary
             | graphics cards, but I think these matrixes of x86 cores
             | could have come to great use for cool things not
             | necessarily related to graphics.
        
               | fancyfredbot wrote:
               | Intel launched Larabee as Xeon Phi for non-graphics
               | purposes. Turns out it wasn't especially good at those
               | either. You can still pick one up on eBay today for not
               | very much.
        
               | Y_Y wrote:
               | The novelty of sshing into a PCI card is nice though. I
               | remember trying to use them at a hpc cluster, all the
               | convenience of wrangling GPUs but at a fraction of the
               | performance
        
               | actionfromafar wrote:
               | That's where we have to agree to (potentially) disagree.
               | I lament that these _or similar_ designs didn 't last
               | longer in the market, so people could learn how to
               | harness them.
               | 
               | Imagine for instance hard real time tasks, each one task
               | running on its own separate core.
        
               | rjsw wrote:
               | I think Intel should have made more effort to get cheap
               | Larabee dev boards onto the market, they could have been
               | using chips that didn't run at full speed or with too
               | many broken cores to sell at full price.
        
               | bee_rider wrote:
               | Probably not aided by the fact that conventional Xeon
               | core counts were sneaking up on them--not quite caught
               | up, but anybody could see the trajectory--and offered a
               | much more familiar environment.
        
               | actionfromafar wrote:
               | Yes, I agree. Still unfortunate. I think the concept was
               | very promising. But Intel had no appetite for burning
               | money on it to see where it would go in the long run.
        
             | erik wrote:
             | Larrabee was mostly x86 cores, but it did have
             | sampling/texturing hardware because it's way more efficient
             | to do those particular things in the 3d pipeline with
             | dedicated hardware.
        
           | __alexs wrote:
           | 15 fps on an oldish Epyc 64 core
           | https://www.youtube.com/watch?v=2tn0bZcQf0E
        
         | danbruc wrote:
         | If you are going to that effort, you might also want a decent
         | resolution. Say we aim for one megapixel (720p) and 30 frames
         | per second, then we have to calculate 27.7 megapixel per
         | second. If you get your FPGA to run at 500 MHz, that gives you
         | 18 clock cycles per pixel. So you would probably want something
         | like 100 cores keeping in mind that we also have to run vertex
         | shaders. We also need quick access to a sizable amount of
         | memory and I am not sure if one can get away with integer
         | respectively fixed point arithmetics or whether floating point
         | arithemtics is pretty much necessary. Another complication that
         | I would expect is that it is probably much easier to build a
         | long execution pipeline if you are implementing a fixed
         | function pipeline as compared to a programmable processor.
         | Things like out-of-order execution are probably best off-loaded
         | to the compiler in order to keep the design simpler and more
         | compact.
         | 
         | So my guess is that it would be quite challenging to implement
         | a modern GPU in an affordable FPGA if you want more than a
         | proof of concept.
        
           | d_tr wrote:
           | There's a new board by Trenz with a Versal chip which can do
           | 440 GFLOPS just with the DSP58 slices (the lowest speed
           | grade) and it costs under 1000 Euros, but you also need to
           | buy a Vivado license currently.
           | 
           | Cheaper boards are definitely possible since there are
           | smaller parts in that family, but they need to offer support
           | for some of them in the free version of Vivado...
        
           | PfhorSlayer wrote:
           | You've nailed the problem directly on the head. For hitting
           | 60Hz in FuryGpu, I actually render at 640x360 and then pixel-
           | double (well, pixel->quad) the output to the full 720p. Even
           | with my GPU cores running at 400MHz and the texture units at
           | 480MHz with fully fixed-function pipelines, it can still
           | struggle to keep up at times.
           | 
           | I do not doubt that a shader core could be built, but I have
           | reservations about the ability to run it fast enough or have
           | as many of them as would be needed to get similar performance
           | out of them. FuryGpu does its front-end (everything up
           | through primitive assembly) in full fp32. Because that's just
           | a simple fixed modelview-projection matrix transform it can
           | be done relatively quickly, but having every single
           | vertex/pixel able to run full fp32 shader instructions
           | requires the ability to cover instruction latency with
           | additional data sets - it gets complicated, _fast_!
        
       | gchadwick wrote:
       | Cool! I found the hello blog here illuminating to understand the
       | creators intentions: https://www.furygpu.com/blog/hello
       | 
       | As I read it, it's just a fun hobby project for them first and
       | foremost and looks like they're intending to write a whole bunch
       | more about how they built it.
       | 
       | It's certainly an impressive piece of work, in particular as
       | they've got the full stack working, a windows driver implementing
       | a custom graphics API and then quake running on top of that. A
       | shame they've not got some DX/GL support but I can certainly
       | understand why they went the custom API route.
       | 
       | I wonder if they'll open source the design?
        
         | PfhorSlayer wrote:
         | I'm in the process of actually trying to work out what would be
         | feasible performance-wise if I were to spent the considerable
         | effort to add the features required for base D3D support. It's
         | not looking good, unfortunately. Beyond just "shaders", there
         | are a _significant_ amount of other requirements that even just
         | the OS 's window manager needs to function _at all_. It 's all
         | built up on 20+ years of evolving tech and for the normal
         | players in this space (AMD, Nvidia, Intel, Imagination, etc.)
         | it's always been an iterative process.
        
       | iAkashPaul wrote:
       | FPGAs for native FP4 will change the entire landscape
        
         | luma wrote:
         | How so?
        
           | iAkashPaul wrote:
           | Reduced memory requirements, dropping higher precision IP
           | blocks for starters
        
           | CamperBob2 wrote:
           | 4-bit values (or 6-bit values, nowadays) values are
           | interesting because they're small enough to address a single
           | LUT, which is the lowest-level atomic element of an FPGA.
           | That gives them major advantages in the timing and resource-
           | usage departments.
        
         | jsheard wrote:
         | Very briefly, until someone makes an ASIC that does the same
         | thing and FPGAs are relegated to niche use-cases once again.
         | 
         | FPGAs only make long-term sense in applications that are so
         | low-volume that it's not worth spinning an ASIC for them.
        
           | iAkashPaul wrote:
           | Absolutely
        
         | Y_Y wrote:
         | Four-bit floats are not as useful as Nvidia would have you
         | believe. Like structured sparsity it's mainly a trick to make
         | newer-gen cards look faster in the absence of an improvement in
         | the underlying tech. If you're using it for NN inference you
         | have to carefully tune the weights to get good accuracy and it
         | offers nothing over fixed-point.
        
         | blacklion wrote:
         | Entire landscape of open graphic chips?
         | 
         | Not every GPU should be used to train or infer so-called AI.
         | 
         | Please, stop, we need some hardware to put images on the
         | screens.
        
       | spuz wrote:
       | This looks like an incredible achievement. I'd love to see some
       | photos of the physical device. I'm also slightly confused about
       | which FGPA module is being used. The blog mentions the Xylinx
       | Kria SoMs but if you follow the links to the specs of those
       | modules, you see they have ARM SoCs rather than Xylinx FGPAs. The
       | whole world of FGPAs is pretty unfamiliar to me so maybe I'm
       | missing something.
       | 
       | https://www.amd.com/en/products/system-on-modules/kria/k26/k...
        
         | crote wrote:
         | > you see they have ARM SoCs rather than Xylinx FGPAs
         | 
         | It's a mixed chip: FPGA and traditional SoC glued together.
         | This mean you don't have a softcore MCU taking up precious FPGA
         | resources just to do some basic management tasks.
        
           | spuz wrote:
           | Ah that makes sense. It's slightly ironic then that the ARM
           | SoC includes a Mali GPU which presumably easily outperforms
           | what can be achieved with the FGPA.
        
           | chrsw wrote:
           | I didn't see any mention of what the software on the Zynq's
           | ARM core is doing, which made me wonder why use Zynq at all.
        
             | PfhorSlayer wrote:
             | The hardened DisplayPort IP is connected to the ARM cores,
             | and requires a significant amount of configuration and
             | setup. FuryGpu's firmware primarily handles interfacing
             | with that block: setting up descriptor sets to DMA video
             | frame and audio data from memory (where the GPU has written
             | it for video, or where the host has DMA'd it for audio),
             | responding to requests to reconfigure things for different
             | resolutions, etc. There's also a small command processor
             | there that lets me do various things that building out
             | hardware for doesn't make sense - moving memory around with
             | the hardened DMA peripheral, setting up memory buffers used
             | internally by the GPU, etc. If I ever need to expose a VGA
             | interface in order to have motherboards treat this as a
             | primary graphics output device during boot, I'd also be
             | handling all of that in the firmware.
        
         | chiral-anomaly wrote:
         | Xilinx doesn't mention the exact FPGA p/n used in the Kria
         | SoMs. However according to their public specs they appear to
         | match [1] the ZU3EG-UBVA530-2L and ZU5EV-SFVC784-2L devices,
         | with the latter being the only one featuring PCIe support.
         | 
         | Designing and bringing-up the FPGA board as described in the
         | blog post is already a high bar to clear. I hope the author
         | will at some point publish schematics and sources.
         | 
         | [1] https://docs.amd.com/v/u/en-US/zynq-ultrascale-plus-
         | product-...
        
         | PfhorSlayer wrote:
         | You're in luck! https://imgur.com/a/BE0h9cZ
         | 
         | As mentioned in the rest of this thread, the Kria SoMs are FPGA
         | fabric with hardened ARM cores running the show. Beyond just
         | being what was available (for _oh so_ cheap, the Kria devboards
         | are like $350!), these devices also include things like
         | hardened DisplayPort IP attached to the ARM cores allowing me
         | to offload things like video output and audio to the firmware.
         | A previous version of this project was running on a Zynq 7020,
         | for which I needed to write my own HDMI stuff that, while not
         | super complicated, takes up a fair amount of logic and also
         | gets way more complex if it needs to be configurable.
        
       | wpwpwpw wrote:
       | Excellent job. Would be amazing if this became an open source
       | hardware project.
        
       | sylware wrote:
       | Hopefully their hardware programming model is going full hardware
       | circular command/interrupt buffers (even for GPU register
       | programming).
       | 
       | It is how it is done on AMD GPU, that said I have no idea what is
       | the nvidia hardware programming model.
        
       | codedokode wrote:
       | "UltraScale" in name assumes ultra price? FPGAs seem to be an
       | expensive toy.
        
         | mattalex wrote:
         | Not in the grand scheme of things: you can get fpga dev boards
         | for $50 that are already useable for this type of thing (you
         | can go even lower, but those aren't really useable for "CPU
         | like" operation and are closer to "a whole lot of logic gates
         | in a single chip"). Of course the "industry grade" solutions
         | pack significantly more of a punch, but they can also be had
         | for <$500.
        
         | varispeed wrote:
         | Ages ago I bought TinyFPGA, which is like PS40 and I was able
         | to synthesize RISC-V cpu on it. It was fun.
        
         | nxobject wrote:
         | It's worth mentioning that it's easy enough to find absurdly
         | cheap (~$20) early-generation dev boards for Zynq FPGAs with
         | embedded ARM cores on Aliexpress, shucked from obsolete Bitcoin
         | miners [1]. Interfaces include SD, Ethernet, 3 banks of GPIO.
         | 
         | [1] https://github.com/xjtuecho/EBAZ4205
        
           | thrtythreeforty wrote:
           | Zynq is deeply annoying to work with, though. Unfortunately
           | the hard ARM core bootloads the FPGA fabric, rather than the
           | other way around (or having the option to initialize both
           | separately). This means you have to muck with software on the
           | target to update FPGA bitstreams.
        
             | CamperBob2 wrote:
             | Isn't it mostly just boilerplate code that does the FPGA
             | configuration, though?
        
         | PfhorSlayer wrote:
         | In general, yes. _However_ , the Kria series are amazingly good
         | deals for what you get - a quite powerful Zynq US+ part _and_ a
         | dev board for like $350.
        
       | MalphasWats wrote:
       | It's incredible how influential Ben Eater's breadboard computer
       | series has been in hobby electronics. I've been similarly
       | inspired to try to design my own "retro" CPU.
       | 
       | I desperately want something as easy to plug into things as the
       | 6502, but with _jussst_ a little more capability - few more
       | registers, hardware division, that sort of thing. It 's a really
       | daunting task.
       | 
       | I always end up coming back to _just use an MCU and be done with
       | it_ , and then I hit the How To Generate Graphics problem.
        
         | jsheard wrote:
         | I've been looking into graphics on MCUs and was disappointed to
         | learn that the little "NeoChrom" GPU they're putting on newer
         | STM32 parts is completely undocumented. Historically they have
         | been good about not putting black boxes in their chips, but I
         | guess it's probably an IP block they've licensed from a third
         | party.
        
           | gchadwick wrote:
           | The RP2040 is a great MCU for playing with graphics as it can
           | bit bang VGA and DVI/HDMI. There's some info on the DVI here:
           | https://github.com/Wren6991/PicoDVI
           | 
           | I wrote a couple of articles on how to do bit banged VGA on
           | the RP2040 from scratch:
           | https://gregchadwick.co.uk/blog/playing-with-the-pico-pt5/
           | and https://gregchadwick.co.uk/blog/playing-with-the-pico-
           | pt6/ plus an intro to PIO
           | https://gregchadwick.co.uk/blog/playing-with-the-pico-pt4/
        
             | jsheard wrote:
             | You can do something similar on STM32 parts that have an
             | LCD controller, which can be abused to drive a VGA DAC or a
             | DVI encoder chip. The LCD controller at least is fully
             | documented, but many of their parts pair that with a small
             | GPU, which _would_ be an advantage over the GPU-less
             | RP2040... if there were any public documentation at all for
             | the GPU :(
        
             | CarVac wrote:
             | I used "composite" (actually monochrome) video output
             | software someone wrote on the RP2040 for an optional
             | feature on the PhobGCC custom gamecube controller
             | motherboard to allow easy calibration, configuration, and
             | high-frequency input recording and graphing.
             | 
             | Pictures of the output here:
             | https://github.com/PhobGCC/PhobGCC-
             | doc/blob/main/For_Users/P...
        
           | unwind wrote:
           | Agreed. It is so, so, _so_ very disappointing. I was deeply
           | surprised (in a non-pleasant way) when I first opened up a
           | Reference Manual for one of those chips and saw that the GPU
           | chapter was, like, four pages. :(
        
             | nick__m wrote:
             | On the ST forum the company clearly said that they will
             | only release to some selected partners. That's sad.
        
         | verticalscaler wrote:
         | True, can't think of much else this popular.
         | 
         | He started posting videos again recently with some regularity
         | after a lull. Audience is in the low hundreds of thousands. I
         | assume fewer than 100k actually finish videos and fewer still
         | do anything with it.
         | 
         | Hobby electronics seems surprisingly small in this era.
        
           | hedora wrote:
           | I wonder if there's much overlap between people that watch
           | YouTube to get deep technical content (instead of reading),
           | and people that care about hobby electronics.
           | 
           | I'm having trouble wrapping my head around how / why you'd
           | use youtube to present analog electrical engineering formulas
           | and pin out diagrams instead of using latex or a diagram.
        
             | robinsonb5 wrote:
             | I consider YouTube (or rather, video in general) a
             | fantastic platform for showcasing something cool,
             | demonstrating what it can do, and even demonstrating how to
             | drive a piece of software - but for actual technical
             | learning I loathe the video format - it's so hard to skim,
             | re-read, pause, recap and digest at your own speed.
             | 
             | The best compromise seems to be webpages with readable
             | technical info and animated video illustrations - such as
             | the one posted here yesterday about how radio works.
        
             | jpc0 wrote:
             | For some things there is a lot of nuance lost in just
             | writing. The unknowm unknowns.
             | 
             | There has been a lot of times where I am showing someone
             | new to my field something and they stop me before I get to
             | what I thought was the "educational" point and ask what I
             | just did.
             | 
             | Video can portray that pretty well because the information
             | is there for you to see, with a schematic or write-up if
             | the author didn't put it there the information isn't there.
        
           | TillE wrote:
           | Even if you're not much of a tinkerer, Ben Eater's videos are
           | massively helpful if you want to truly understand how
           | computers work. As long as you come in knowing the rudiments
           | of digital electronics, just watching his stuff is a whole
           | education in 8-bit computer design. You won't quite learn how
           | _modern_ computers work with their fancy caches and pipelines
           | and such, but it 's a really strong foundation to build on.
           | 
           | I've built stuff with microcontrollers (partially aided by
           | techniques learned here), but that was very purpose-driven
           | and I'm not super interested in just messing around for fun.
        
         | MenhirMike wrote:
         | I was about to recommend the Parallax Propeller (the first one
         | that's available in DIP format), but arguably, that one is way
         | more complex to program for (and also significantly more
         | powerful, and at that point you might as well look into an
         | ESP32 and that is "just use an MCU" :))
         | 
         | And yeah, video output is a significant issue because of the
         | required bandwidth for digital outputs (unless you're okay with
         | composite or VGA outputs, I guess they can still be done with
         | readily available chips?). The recent Commander X16 settled for
         | an FPGA for this.
        
           | MalphasWats wrote:
           | I feel like the CX16 lost its way about a week after the
           | project started and it suddenly became an expensive FPGA-
           | based blob. But at the same time, I'm not sure what other
           | option there is for a project like that.
           | 
           | I always got the impression that David sort of got railroaded
           | by the other members of the team that wanted to keep adding
           | features and MOAR POWAH, and didn't have a huge amount of
           | choice because those features quickly scoped out of his own
           | areas of knowledge.
        
             | MenhirMike wrote:
             | I think so too - it must have been a great learning
             | experience for him though, but for me, the idea of "The
             | best C64-like computer that ever existed" died pretty
             | quickly.
             | 
             | He also did run into a similar problem that I ran into when
             | I tried something like that as well: Sound Chips. Building
             | a system around a Yamaha FM Synthesizer is perfect, but I
             | found as well that most of the chips out there are broken,
             | fake, or both and that no one else makes them anymore.
             | Which makes sense because if you want a sound chip in this
             | day, you use an AC97 or HD Audio codec and call it a day,
             | but that goes against that spirit.
             | 
             | I think that the spirit on hobby electronics is really
             | found in FPGAs these days instead of rarer and rarer DIP
             | parts. Which is a bit sad, but I guess that's just the
             | passage of time. I wonder if that's how some people felt in
             | the 70s when CPUs replaced many distinct layouts, or if
             | they rejoiced and embraced it instead.
             | 
             | I've given up trying to build a system on a breadboard and
             | think that MiSTer is the modern equivalent of that.
        
               | dragontamer wrote:
               | > I think that the spirit on hobby electronics is really
               | found in FPGAs these days instead of rarer and rarer DIP
               | parts. Which is a bit sad, but I guess that's just the
               | passage of time. I wonder if that's how some people felt
               | in the 70s when CPUs replaced many distinct layouts, or
               | if they rejoiced and embraced it instead.
               | 
               | Microcontrollers have taken over. When 8kB SRAM and 20MHz
               | microcontrollers exist below 50-cents and at miniscule
               | 25mm^2 chip sizes drawing only 500uA of current...
               | there's very little reason to use a collection of 30
               | chips to do equivalent functionality.
               | 
               | Except performance. If you need performance then bam,
               | FPGA land comes in and Zynq just has too much performance
               | at too low a cost (though not quite as low as the
               | microcontroller gang).
               | 
               | ----------
               | 
               | Hobby Electronics is great now. You have so many usable
               | parts at very low costs. A lot of problems are "solved"
               | yes, but that's a good thing. That means you can focus on
               | solving your hobby problem rather than trying to invent a
               | new display driver or something.
        
               | gnramires wrote:
               | Another advantage of hobby anything is that you can just
               | do, and reinvent whatever you want. Sure, fast CPUs/MCUs
               | exist now and can do whatever you want. But if you feel
               | like reinventing the wheel just for the sake of it, no
               | one will stop you![1]
               | 
               | I do think some people that remember fondly the _user
               | experience_ of those old machines might be better served
               | by using modern machines (like a raspberry pi or even a
               | standard pc) in a different way instead of trying to use
               | old hardware. That 's from the good old Turing machine
               | universality (you can simulate practically any machine
               | you like using newer hardware, if what you're interested
               | in is software). You can even add artificial limitations
               | like PICO-8 or TIC-80 does.
               | 
               | See also uxn:
               | 
               | https://100r.co/site/uxn.html
               | 
               | and (WIP) picotron:
               | 
               | https://www.lexaloffle.com/picotron.php
               | 
               | I think there's a general concept here of making
               | 'Operating environments' that are pleasant to work within
               | (or have fun limitations), which I think are more
               | practical than a dedicated Operating System optionally
               | with a dedicated machine. Plus (unless you particularly
               | want to!) you don't need to worry about all the complex
               | parts of operating systems like network stacks, drivers
               | and such.
               | 
               | [1] Maybe we should call that Hobby universality (or
               | immortality?) :P If it's already been made/discovered,
               | you can always make it again just for fun.
        
             | rzzzt wrote:
             | The first choice was the Gameduino, also an FPGA-based
             | solution. I have misplaced my bookmark for the
             | documentation covering the previous hardware revision, but
             | current version 3X is MOAR POWAH just on its own, this
             | seems to be a natural tendency:
             | https://excamera.com/sphinx/gameduino3/index.html#about-
             | game...
             | 
             | Edit: found it!
             | https://excamera.com/sphinx/gameduino/index.html
        
               | erik wrote:
               | Modern retro computer designs run into the problem of
               | generating a video signal. Ideally you'd have a tile and
               | sprite based rendering. And you'd like to support HDMI or
               | at least VGA. But there are no modern parts that offer
               | this and building the functionality out of discrete
               | components is impractical and unwieldy.
               | 
               | A FPGA is really just the right tool for solving the
               | video problem. Or some projects do it with a micro-
               | controller. But it's sort of too bad as it kind of
               | undercuts the spirit of the whole design. If you video
               | processor is orders of magnitude more powerful than the
               | rest of the computer, then one starts to ask why not just
               | implement the entire computer inside the video processor?
        
               | MenhirMike wrote:
               | It's one of the funny things of the Raspberry Pi Pico W:
               | The Infineon CYW4343 has an integrated ARM Cortex-M3 CPU,
               | so the WiFi/BT chip is technically more advanced than the
               | actual RP2040 (which is a Cortex-M0+) and also has more
               | built-in ROM/RAM than what's on the Pico board for the
               | RP2040 to use.
               | 
               | And yeah, you can't really buy sprite-based video chips
               | anymore, and you don't even have to worry about stuff
               | like "Sprites per Scanline" because you can get a proper
               | framebuffer for essentially free - but now you might as
               | well go further and use one microprocessor to be the CPU,
               | GPU, and FM Synthesizer Sound Chip and "just" add the
               | logic to generate the actual video/audio signals.
        
         | PfhorSlayer wrote:
         | Funny enough, that's _exactly_ where this project started.
         | After I built his 8 bit breadboard computer, I started looking
         | into what might be involved in making something a bit more
         | interesting. Can 't do a whole lot of high-speed anything with
         | discrete logic gates, so I figured learning what I could do
         | with an FPGA would be far more interesting.
        
         | bArray wrote:
         | Registers can be worked around by using the stack and/or
         | memory. Division could always be implemented as a simple
         | function. It's part of the fun of working at that level.
         | 
         | Regarding graphics, initially output serial. Abstract the
         | problem away until you are ready to deal with it. If you sneak
         | up on an Arduino and make it scream, you can make it into a
         | very basic VGA graphics card [1]. Even easier is ESP32 to VGA
         | (also gives keyboard and mouse) [2].
         | 
         | [1] https://www.instructables.com/Arduino-Basic-PC-With-VGA-
         | Outp...
         | 
         | [2] https://www.aliexpress.us/item/1005006222846299.html
        
       | bloatfish wrote:
       | This is insane! As a hobby hardware designer myself, I _can_
       | imagine how much work must have gone into reaching this stage.
       | Well done!
        
       | nxobject wrote:
       | I hope the author goes into some detail about how he implements
       | the PCIe interface! I doubt I'll ever do hardware work at that
       | level of sophistication, but for general cultural awareness I
       | think it's worth looking under the hood of PCIe.
        
         | gorkish wrote:
         | The FPGA he is using has native pcie so usually all you get on
         | this front is an interface to a vendor proprietary ip block.
         | The state of open interfaces in FPGA land is abysmal. I think
         | the best I've seen fully open source is a gigabit MAC
        
           | 0xcde4c3db wrote:
           | There is an open-source DisplayPort transmitter [1] that
           | apparently supports multiple 2.7 Gbps lanes (albeit using
           | family-specific SERDES/differential transceiver blocks, but I
           | doubt that's avoidable at these speeds). This isn't PCIe, but
           | it's also surprisingly close to PCIe 1.0 (2.5 Gbps/lane, and
           | IIRC they use the same 8b/10b code and scrambling algorithm).
           | 
           | [1] https://github.com/hamsternz/FPGA_DisplayPort
        
         | PfhorSlayer wrote:
         | Next blog post will be covering exactly that! Probably going to
         | do a multi-part series - first one will be the PCB
         | schematic/layout, then the FPGA interfaces and testing,
         | followed by Windows drivers.
        
       | notorandit wrote:
       | It needs to be very fancy to write text in light gray on white.
       | 
       | I am not sure your product will be a success.
       | 
       | I am sure you web design skills need a good overhaul.
        
       | KallDrexx wrote:
       | This is my dream!
       | 
       | The last year I've been working on a 2d focused GPU for I/O
       | constrained microcontrollers
       | (https://github.com/KallDrexx/microgpu). I've been able to
       | utilize this to get user interfaces on slow SPI machines to
       | render on large displays, and it's been fascinating to work on.
       | 
       | But seeing the limitation of processor pipelines I've had the
       | thought for a while that FPGAs could make this faster. I've
       | recently gotten some low end FPGAs to start learning to try and
       | turn my microgpu from an ESP32 based one to an FPGA one.
       | 
       | I don't know if I"ll ever get to this level due to kids and free
       | time constraints, but man, I would love to get even a hundredth
       | of this level.
        
         | Chabsff wrote:
         | You probably know this already, but for anyone else curious
         | about going down that road: For this type of use, it's
         | definitely worth it to constrain yourself to FPGAs with
         | dedicated high-bandwidth transceivers. A "basic" 1080p RGB
         | signal at 60hz requires some high-frequency signal processing
         | that's really hard to contend with in pure FPGA-land.
        
           | KallDrexx wrote:
           | That's good to know actually. I'm still very very early in my
           | FPGA adaption (learning the fpga basics) and I am intending
           | to start with standard 640x480 VGA before expanding.
        
       | detuur wrote:
       | I can't believe that this is the closest we have to a compact,
       | stand-alone GPU option. There's nothing like a M.2 format GPU out
       | there. All I want is a stand-alone M.2 GPU with modest
       | performance, something on the level of embedded GPUs like Intel
       | UHD Graphics, AMD Radeon, or Qualcomm's Adreno.
       | 
       | I have an idea for a small embedded product which needs a lot of
       | compute and networking, but only very modest graphical
       | capabilities. The NXP Layerscape LX2160A [1] would be perfect,
       | but I have to pass on it because it doesn't come with an embedded
       | GPU. I just want a small GPU!
       | 
       | [1]: https://www.nxp.com/products/processors-and-
       | microcontrollers...
        
         | magixx wrote:
         | What about MXM GPUs that used to be found in gaming laptops? I
         | know the standard is very niche and thus expensive ($400 for a
         | 3080M used on ebay) but it does exists and you could convert
         | them to PCI-E and thus m.2
        
         | t-3 wrote:
         | Maybe a little bit too low-powered for you, but:
         | https://www.matrixorbital.com/ftdi-eve
        
         | cpgxiii wrote:
         | There's at least one m.2 GPU based on the Silicon Motion SM750
         | controller made by Asrock Rack. Similar products exist for
         | mPCIe form factor.
         | 
         | Performance is nowhere near a modern iGPU, because an iGPU has
         | access to all of the system memory and caches and power budget,
         | and a simple m.2 device has node of that. Even low-end PCIe
         | GPUs (single slot, half-length/half-height) struggle to
         | outperform better iGPUs and really only make sense when you
         | have to use them for basic display functionality.
        
       | PfhorSlayer wrote:
       | So, this is my project! Was somewhat hoping to wait until there
       | was a bit more content up on the site before it started doing the
       | rounds, but here we are! :)
       | 
       | To answer what seems to be the most common question I get asked
       | about this, I am intending on open-sourcing the entire stack (PCB
       | schematic/layout, all the HDL, Windows WDDM drivers, API runtime
       | drivers, and Quake ported to use the API) at some point, but
       | there are a number of legal issues that need to be cleared (with
       | respect to my job) and I need to decide the rest of the
       | particulars (license, etc.) - this stuff is not what I do for a
       | living, but it's tangentially-related enough that I need to cover
       | my ass.
       | 
       | The first commit for this project was on August 22, 2021. It's
       | been a bit over two and a half years I've been working on this,
       | and while I didn't write anything up during that process, there
       | are a fair number of videos in my YouTube FuryGpu playlist
       | (https://www.youtube.com/playlist?list=PL4FPA1MeZF440A9CFfMJ7...)
       | that can kind of give you an idea of how things progressed.
       | 
       | The next set of blog posts that are in the works concern the PCIe
       | interface. It'll probably be a multi-part series starting at the
       | PCB schematic/layout and moving through the FPGA design and
       | ending with the Windows drivers. No timeline on when that'll be
       | done, though. After having written just that post on how the
       | Texture Units work, I've got even more respect for those that can
       | write up technical stuff like that with any sort of timing
       | consistency.
       | 
       | I'll answer the remaining questions in the threads where they
       | were asked.
       | 
       | Thanks for the interest!
        
         | michaelt wrote:
         | Googling the Xilinx Zynq UltraScale+ it seems kinda expensive.
         | 
         | Of course plenty of hobbies let people spend thousands (or
         | more) so there's nothing wrong with that if you've got the
         | money. But is it the end target for your project? Or do you
         | have ambitions to go beyond that?
        
           | kanetw wrote:
           | The Kria SOM in use here is like $300.
        
           | PfhorSlayer wrote:
           | Let's be clear here, this is a _toy_. Beyond being a fun
           | project to work on that could maybe get my foot in the door
           | were I ever to decide to change careers and move into
           | hardware design, this is not going to change the GPU
           | landscape or compete with any of the commercial players. What
           | it _might_ do is pave the way for others to do interesting
           | things in this space. A board with all of the video hardware
           | that you can plug into a computer with all the infrastructure
           | available to play around with accelerating graphics could be
           | a fun, if _extremely_ niche, product. That would also require
           | a *significant* time and money investment from me, and that
           | 's not something I necessarily want to deal with. When this
           | is eventually open-sourced, those who really _are_ interested
           | could make their own boards.
           | 
           | One thing to note that is that while the US+ line is
           | generally quite expensive (the higher end parts sit in the
           | five-figures range for a one-off purchase! No one actually
           | buying these is paying that price, but still!), the Kria SOMs
           | are quite cheap in comparison. They've got a reasonably-
           | powerful Zynq US+ for about $400, or just $350ish the dev
           | boards (which do not expose some of the high-speed interfaces
           | like PCIe). I'm starting to sound like a Xilinx shill given
           | how many times I've re-stated this, but for anyone serious
           | about getting into this kind of thing, those devboards are an
           | amazing deal.
        
             | belter wrote:
             | "...I'm doing a (free) operating system (just a hobby,
             | won't be big and professional like gnu) for 386(486) AT
             | clones..."
        
           | 0xcde4c3db wrote:
           | I've been told by several people that distributor pricing for
           | FPGAs is _ridiculously_ inflated compared to what direct
           | customers pay, and considering that one can apparently get a
           | dev board on AliExpress for about $110 [1] while Digikey
           | lists the FPGA alone for about $1880 [2], I believe it (this
           | example isn 't an UltraScale chip, but it is significantly
           | bigger than the usual low-end Zynq 7000 boards sold to
           | undergrads and tinkerers).
           | 
           | [1] https://www.aliexpress.us/item/3256806069467487.html
           | 
           | [2] https://www.digikey.com/en/products/detail/amd/XC7K325T-1
           | FFG...
        
             | bangaladore wrote:
             | I have some first- and second-hand experience with this,
             | and you are correct. I'm not sure who benefits from this
             | practice. It's anywhere from 5-25x cheaper in even small-
             | ish quantities.
        
         | rustybolt wrote:
         | I have seen semi-regular updates from you on discord and it is
         | awesome to see how far this project has come (and also a bit
         | frustrating to see how relatively little progress I have made
         | on my FPGA projects in the same time!). I was hoping you'd do a
         | writeup, can't wait!
        
         | ruslan wrote:
         | How much it depends on hard IP blocks ? I mean, can it be
         | ported to FPGAs of other vendors, like Lattice ECP5 ? Did you
         | implement PCIe in HDL or used vendor specific IP block ?
         | Please, provide some resource utilization statistics. Thanks.
        
           | PfhorSlayer wrote:
           | Implementing PCIe in the fabric without using the hard IP
           | would be foolish, and definitely not the kind of thing I'd
           | enjoy spending my time on! The design makes extensive use of
           | the DSP48E2 and various BRAM/URAM blocks available in the
           | fabric. I don't have exact numbers off the top of my head,
           | but roughly it's ~500 DSP units (primarily for
           | multiplication), ~70k LUTs, ~135k FFs, and ~90 BRAMs. Porting
           | it to a different device would be a pretty significant
           | undertaking, but would not be impossible. Many of the DSP
           | resources are inferred, but there is a lot of timing stuff
           | that depends on the DSP48E2's behavior - multiple register
           | stages following the multiplies, the inputs are sized
           | appropriately for those specific DSP capabilities, etc.
        
         | pocak wrote:
         | In the post about the texture unit, that ROM table for mip
         | level address offsets seems to use quite a bit of space. Have
         | you considered making the mip base addresses a part of the
         | texture spec instead?
        
       | raphlinus wrote:
       | Very cool project, and I love to see more work in this space.
       | 
       | Something else to look at is the Vortex project from Georgia
       | Tech[1]. Rather than recapitulating the fixed-function past of
       | GPU design, I think it looks toward the future, as it's at heart
       | a highly parallel computer, based on RISC-V with some extensions
       | to handle GPU workloads better. The boards it runs on are a few
       | thousand dollars, so it's not exactly a hobbyist friendly, but it
       | certainly is more accessible than closed, proprietary
       | development. There's a 2.0 release that just landed a few months
       | ago.
       | 
       | [1]: https://vortex.cc.gatech.edu/
        
       ___________________________________________________________________
       (page generated 2024-03-27 23:00 UTC)