[HN Gopher] FuryGpu - Custom PCIe FPGA GPU
___________________________________________________________________
FuryGpu - Custom PCIe FPGA GPU
Author : argulane
Score : 303 points
Date : 2024-03-27 08:37 UTC (14 hours ago)
(HTM) web link (www.furygpu.com)
(TXT) w3m dump (www.furygpu.com)
| snvzz wrote:
| Pipeline seems retro, but far better than nothing.
|
| There's no open hardware GPU to speak of. Depending on license
| (can't find information?), this could be the first, and a
| starting point for more.
| crote wrote:
| It all depends on your definition of "open", of course. As far
| as I know there is no open-source toolchain for any remotely-
| recent FPGA, so you're still stick with proprietary (paid?)
| tooling to actually modify it. You're pretty much out of luck
| if you need more than an iCE40 UP5k.
| rwmj wrote:
| At least some Xilinx 7-series FPGAs have been reverse
| engineered: https://yosyshq.readthedocs.io/projects/yosys/en/
| latest/cmd/...
| robinsonb5 wrote:
| There's been some interesting recent work to get the QMTech
| Kintex7-325 board (among others) supported under
| yosys/nextpnr - https://github.com/openXC7 It works well
| enough now to build a RISC-V SoC capable of running Linux.
| snvzz wrote:
| >You're pretty much out of luck if you need more than an
| iCE40 UP5k.
|
| Lattice ECP5 (which goes up to 85k LUT or so?) and Nexus have
| more than decent support.
|
| Gowin FPGAs are supported via project apicula up to 20k LUT
| models. Some new models go above 200k LUT so there's hope
| there.
| robinsonb5 wrote:
| Yeah I've used yosys / nextpnr on an ECP5-85 with great
| results - it's pretty mature and dependable now.
| bajsejohannes wrote:
| The up an coming GateMate seems interesting to me. They are
| leaning heavily on open source tooling.
|
| chip: https://colognechip.com/programmable-logic/gatemate/
| board: https://www.olimex.com/Products/FPGA/GateMate/GateMate
| A1-EVB...
| monocasa wrote:
| > There's no open hardware GPU to speak of. Depending on
| license (can't find information?), this could be the first, and
| a starting point for more.
|
| There's this which is about the same kind of GPU
|
| https://github.com/asicguy/gplgpu
| mips_r4300i wrote:
| Ticket2Ride Number9 is a fixed function GPU from the late 90s
| that was completely open sourced under GPL
| Hazematman wrote:
| There's also Nyuzi which is more GPGPU focused
| https://github.com/jbush001/NyuziProcessor, but the author also
| experimented with having it do 3D graphics.
| jamesu wrote:
| Similarly there is this: https://github.com/ToNi3141/Rasterix
|
| Would be neat if someone made an FPGA GPU which had a shader
| pipeline honestly.
| actionfromafar wrote:
| How good would a Ryzen with 32 cores be if it did just
| graphics?
| tux3 wrote:
| You can run Crysis in software rendering on a high core count
| AMD CPU.
|
| It's terrible use of the hardware and the performance is far
| from stellar, but you _can_!
| immibis wrote:
| Wasn't Intel Larrabee something like that? Get a bunch of
| dumb x86 cores together and tell them to do graphics?
| actionfromafar wrote:
| I'm so sad Larrabee or similar things never took off. No,
| it might not have benchmarked well against contemporary
| graphics cards, but I think these matrixes of x86 cores
| could have come to great use for cool things not
| necessarily related to graphics.
| fancyfredbot wrote:
| Intel launched Larabee as Xeon Phi for non-graphics
| purposes. Turns out it wasn't especially good at those
| either. You can still pick one up on eBay today for not
| very much.
| Y_Y wrote:
| The novelty of sshing into a PCI card is nice though. I
| remember trying to use them at a hpc cluster, all the
| convenience of wrangling GPUs but at a fraction of the
| performance
| actionfromafar wrote:
| That's where we have to agree to (potentially) disagree.
| I lament that these _or similar_ designs didn 't last
| longer in the market, so people could learn how to
| harness them.
|
| Imagine for instance hard real time tasks, each one task
| running on its own separate core.
| rjsw wrote:
| I think Intel should have made more effort to get cheap
| Larabee dev boards onto the market, they could have been
| using chips that didn't run at full speed or with too
| many broken cores to sell at full price.
| bee_rider wrote:
| Probably not aided by the fact that conventional Xeon
| core counts were sneaking up on them--not quite caught
| up, but anybody could see the trajectory--and offered a
| much more familiar environment.
| actionfromafar wrote:
| Yes, I agree. Still unfortunate. I think the concept was
| very promising. But Intel had no appetite for burning
| money on it to see where it would go in the long run.
| erik wrote:
| Larrabee was mostly x86 cores, but it did have
| sampling/texturing hardware because it's way more efficient
| to do those particular things in the 3d pipeline with
| dedicated hardware.
| __alexs wrote:
| 15 fps on an oldish Epyc 64 core
| https://www.youtube.com/watch?v=2tn0bZcQf0E
| danbruc wrote:
| If you are going to that effort, you might also want a decent
| resolution. Say we aim for one megapixel (720p) and 30 frames
| per second, then we have to calculate 27.7 megapixel per
| second. If you get your FPGA to run at 500 MHz, that gives you
| 18 clock cycles per pixel. So you would probably want something
| like 100 cores keeping in mind that we also have to run vertex
| shaders. We also need quick access to a sizable amount of
| memory and I am not sure if one can get away with integer
| respectively fixed point arithmetics or whether floating point
| arithemtics is pretty much necessary. Another complication that
| I would expect is that it is probably much easier to build a
| long execution pipeline if you are implementing a fixed
| function pipeline as compared to a programmable processor.
| Things like out-of-order execution are probably best off-loaded
| to the compiler in order to keep the design simpler and more
| compact.
|
| So my guess is that it would be quite challenging to implement
| a modern GPU in an affordable FPGA if you want more than a
| proof of concept.
| d_tr wrote:
| There's a new board by Trenz with a Versal chip which can do
| 440 GFLOPS just with the DSP58 slices (the lowest speed
| grade) and it costs under 1000 Euros, but you also need to
| buy a Vivado license currently.
|
| Cheaper boards are definitely possible since there are
| smaller parts in that family, but they need to offer support
| for some of them in the free version of Vivado...
| PfhorSlayer wrote:
| You've nailed the problem directly on the head. For hitting
| 60Hz in FuryGpu, I actually render at 640x360 and then pixel-
| double (well, pixel->quad) the output to the full 720p. Even
| with my GPU cores running at 400MHz and the texture units at
| 480MHz with fully fixed-function pipelines, it can still
| struggle to keep up at times.
|
| I do not doubt that a shader core could be built, but I have
| reservations about the ability to run it fast enough or have
| as many of them as would be needed to get similar performance
| out of them. FuryGpu does its front-end (everything up
| through primitive assembly) in full fp32. Because that's just
| a simple fixed modelview-projection matrix transform it can
| be done relatively quickly, but having every single
| vertex/pixel able to run full fp32 shader instructions
| requires the ability to cover instruction latency with
| additional data sets - it gets complicated, _fast_!
| gchadwick wrote:
| Cool! I found the hello blog here illuminating to understand the
| creators intentions: https://www.furygpu.com/blog/hello
|
| As I read it, it's just a fun hobby project for them first and
| foremost and looks like they're intending to write a whole bunch
| more about how they built it.
|
| It's certainly an impressive piece of work, in particular as
| they've got the full stack working, a windows driver implementing
| a custom graphics API and then quake running on top of that. A
| shame they've not got some DX/GL support but I can certainly
| understand why they went the custom API route.
|
| I wonder if they'll open source the design?
| PfhorSlayer wrote:
| I'm in the process of actually trying to work out what would be
| feasible performance-wise if I were to spent the considerable
| effort to add the features required for base D3D support. It's
| not looking good, unfortunately. Beyond just "shaders", there
| are a _significant_ amount of other requirements that even just
| the OS 's window manager needs to function _at all_. It 's all
| built up on 20+ years of evolving tech and for the normal
| players in this space (AMD, Nvidia, Intel, Imagination, etc.)
| it's always been an iterative process.
| iAkashPaul wrote:
| FPGAs for native FP4 will change the entire landscape
| luma wrote:
| How so?
| iAkashPaul wrote:
| Reduced memory requirements, dropping higher precision IP
| blocks for starters
| CamperBob2 wrote:
| 4-bit values (or 6-bit values, nowadays) values are
| interesting because they're small enough to address a single
| LUT, which is the lowest-level atomic element of an FPGA.
| That gives them major advantages in the timing and resource-
| usage departments.
| jsheard wrote:
| Very briefly, until someone makes an ASIC that does the same
| thing and FPGAs are relegated to niche use-cases once again.
|
| FPGAs only make long-term sense in applications that are so
| low-volume that it's not worth spinning an ASIC for them.
| iAkashPaul wrote:
| Absolutely
| Y_Y wrote:
| Four-bit floats are not as useful as Nvidia would have you
| believe. Like structured sparsity it's mainly a trick to make
| newer-gen cards look faster in the absence of an improvement in
| the underlying tech. If you're using it for NN inference you
| have to carefully tune the weights to get good accuracy and it
| offers nothing over fixed-point.
| blacklion wrote:
| Entire landscape of open graphic chips?
|
| Not every GPU should be used to train or infer so-called AI.
|
| Please, stop, we need some hardware to put images on the
| screens.
| spuz wrote:
| This looks like an incredible achievement. I'd love to see some
| photos of the physical device. I'm also slightly confused about
| which FGPA module is being used. The blog mentions the Xylinx
| Kria SoMs but if you follow the links to the specs of those
| modules, you see they have ARM SoCs rather than Xylinx FGPAs. The
| whole world of FGPAs is pretty unfamiliar to me so maybe I'm
| missing something.
|
| https://www.amd.com/en/products/system-on-modules/kria/k26/k...
| crote wrote:
| > you see they have ARM SoCs rather than Xylinx FGPAs
|
| It's a mixed chip: FPGA and traditional SoC glued together.
| This mean you don't have a softcore MCU taking up precious FPGA
| resources just to do some basic management tasks.
| spuz wrote:
| Ah that makes sense. It's slightly ironic then that the ARM
| SoC includes a Mali GPU which presumably easily outperforms
| what can be achieved with the FGPA.
| chrsw wrote:
| I didn't see any mention of what the software on the Zynq's
| ARM core is doing, which made me wonder why use Zynq at all.
| PfhorSlayer wrote:
| The hardened DisplayPort IP is connected to the ARM cores,
| and requires a significant amount of configuration and
| setup. FuryGpu's firmware primarily handles interfacing
| with that block: setting up descriptor sets to DMA video
| frame and audio data from memory (where the GPU has written
| it for video, or where the host has DMA'd it for audio),
| responding to requests to reconfigure things for different
| resolutions, etc. There's also a small command processor
| there that lets me do various things that building out
| hardware for doesn't make sense - moving memory around with
| the hardened DMA peripheral, setting up memory buffers used
| internally by the GPU, etc. If I ever need to expose a VGA
| interface in order to have motherboards treat this as a
| primary graphics output device during boot, I'd also be
| handling all of that in the firmware.
| chiral-anomaly wrote:
| Xilinx doesn't mention the exact FPGA p/n used in the Kria
| SoMs. However according to their public specs they appear to
| match [1] the ZU3EG-UBVA530-2L and ZU5EV-SFVC784-2L devices,
| with the latter being the only one featuring PCIe support.
|
| Designing and bringing-up the FPGA board as described in the
| blog post is already a high bar to clear. I hope the author
| will at some point publish schematics and sources.
|
| [1] https://docs.amd.com/v/u/en-US/zynq-ultrascale-plus-
| product-...
| PfhorSlayer wrote:
| You're in luck! https://imgur.com/a/BE0h9cZ
|
| As mentioned in the rest of this thread, the Kria SoMs are FPGA
| fabric with hardened ARM cores running the show. Beyond just
| being what was available (for _oh so_ cheap, the Kria devboards
| are like $350!), these devices also include things like
| hardened DisplayPort IP attached to the ARM cores allowing me
| to offload things like video output and audio to the firmware.
| A previous version of this project was running on a Zynq 7020,
| for which I needed to write my own HDMI stuff that, while not
| super complicated, takes up a fair amount of logic and also
| gets way more complex if it needs to be configurable.
| wpwpwpw wrote:
| Excellent job. Would be amazing if this became an open source
| hardware project.
| sylware wrote:
| Hopefully their hardware programming model is going full hardware
| circular command/interrupt buffers (even for GPU register
| programming).
|
| It is how it is done on AMD GPU, that said I have no idea what is
| the nvidia hardware programming model.
| codedokode wrote:
| "UltraScale" in name assumes ultra price? FPGAs seem to be an
| expensive toy.
| mattalex wrote:
| Not in the grand scheme of things: you can get fpga dev boards
| for $50 that are already useable for this type of thing (you
| can go even lower, but those aren't really useable for "CPU
| like" operation and are closer to "a whole lot of logic gates
| in a single chip"). Of course the "industry grade" solutions
| pack significantly more of a punch, but they can also be had
| for <$500.
| varispeed wrote:
| Ages ago I bought TinyFPGA, which is like PS40 and I was able
| to synthesize RISC-V cpu on it. It was fun.
| nxobject wrote:
| It's worth mentioning that it's easy enough to find absurdly
| cheap (~$20) early-generation dev boards for Zynq FPGAs with
| embedded ARM cores on Aliexpress, shucked from obsolete Bitcoin
| miners [1]. Interfaces include SD, Ethernet, 3 banks of GPIO.
|
| [1] https://github.com/xjtuecho/EBAZ4205
| thrtythreeforty wrote:
| Zynq is deeply annoying to work with, though. Unfortunately
| the hard ARM core bootloads the FPGA fabric, rather than the
| other way around (or having the option to initialize both
| separately). This means you have to muck with software on the
| target to update FPGA bitstreams.
| CamperBob2 wrote:
| Isn't it mostly just boilerplate code that does the FPGA
| configuration, though?
| PfhorSlayer wrote:
| In general, yes. _However_ , the Kria series are amazingly good
| deals for what you get - a quite powerful Zynq US+ part _and_ a
| dev board for like $350.
| MalphasWats wrote:
| It's incredible how influential Ben Eater's breadboard computer
| series has been in hobby electronics. I've been similarly
| inspired to try to design my own "retro" CPU.
|
| I desperately want something as easy to plug into things as the
| 6502, but with _jussst_ a little more capability - few more
| registers, hardware division, that sort of thing. It 's a really
| daunting task.
|
| I always end up coming back to _just use an MCU and be done with
| it_ , and then I hit the How To Generate Graphics problem.
| jsheard wrote:
| I've been looking into graphics on MCUs and was disappointed to
| learn that the little "NeoChrom" GPU they're putting on newer
| STM32 parts is completely undocumented. Historically they have
| been good about not putting black boxes in their chips, but I
| guess it's probably an IP block they've licensed from a third
| party.
| gchadwick wrote:
| The RP2040 is a great MCU for playing with graphics as it can
| bit bang VGA and DVI/HDMI. There's some info on the DVI here:
| https://github.com/Wren6991/PicoDVI
|
| I wrote a couple of articles on how to do bit banged VGA on
| the RP2040 from scratch:
| https://gregchadwick.co.uk/blog/playing-with-the-pico-pt5/
| and https://gregchadwick.co.uk/blog/playing-with-the-pico-
| pt6/ plus an intro to PIO
| https://gregchadwick.co.uk/blog/playing-with-the-pico-pt4/
| jsheard wrote:
| You can do something similar on STM32 parts that have an
| LCD controller, which can be abused to drive a VGA DAC or a
| DVI encoder chip. The LCD controller at least is fully
| documented, but many of their parts pair that with a small
| GPU, which _would_ be an advantage over the GPU-less
| RP2040... if there were any public documentation at all for
| the GPU :(
| CarVac wrote:
| I used "composite" (actually monochrome) video output
| software someone wrote on the RP2040 for an optional
| feature on the PhobGCC custom gamecube controller
| motherboard to allow easy calibration, configuration, and
| high-frequency input recording and graphing.
|
| Pictures of the output here:
| https://github.com/PhobGCC/PhobGCC-
| doc/blob/main/For_Users/P...
| unwind wrote:
| Agreed. It is so, so, _so_ very disappointing. I was deeply
| surprised (in a non-pleasant way) when I first opened up a
| Reference Manual for one of those chips and saw that the GPU
| chapter was, like, four pages. :(
| nick__m wrote:
| On the ST forum the company clearly said that they will
| only release to some selected partners. That's sad.
| verticalscaler wrote:
| True, can't think of much else this popular.
|
| He started posting videos again recently with some regularity
| after a lull. Audience is in the low hundreds of thousands. I
| assume fewer than 100k actually finish videos and fewer still
| do anything with it.
|
| Hobby electronics seems surprisingly small in this era.
| hedora wrote:
| I wonder if there's much overlap between people that watch
| YouTube to get deep technical content (instead of reading),
| and people that care about hobby electronics.
|
| I'm having trouble wrapping my head around how / why you'd
| use youtube to present analog electrical engineering formulas
| and pin out diagrams instead of using latex or a diagram.
| robinsonb5 wrote:
| I consider YouTube (or rather, video in general) a
| fantastic platform for showcasing something cool,
| demonstrating what it can do, and even demonstrating how to
| drive a piece of software - but for actual technical
| learning I loathe the video format - it's so hard to skim,
| re-read, pause, recap and digest at your own speed.
|
| The best compromise seems to be webpages with readable
| technical info and animated video illustrations - such as
| the one posted here yesterday about how radio works.
| jpc0 wrote:
| For some things there is a lot of nuance lost in just
| writing. The unknowm unknowns.
|
| There has been a lot of times where I am showing someone
| new to my field something and they stop me before I get to
| what I thought was the "educational" point and ask what I
| just did.
|
| Video can portray that pretty well because the information
| is there for you to see, with a schematic or write-up if
| the author didn't put it there the information isn't there.
| TillE wrote:
| Even if you're not much of a tinkerer, Ben Eater's videos are
| massively helpful if you want to truly understand how
| computers work. As long as you come in knowing the rudiments
| of digital electronics, just watching his stuff is a whole
| education in 8-bit computer design. You won't quite learn how
| _modern_ computers work with their fancy caches and pipelines
| and such, but it 's a really strong foundation to build on.
|
| I've built stuff with microcontrollers (partially aided by
| techniques learned here), but that was very purpose-driven
| and I'm not super interested in just messing around for fun.
| MenhirMike wrote:
| I was about to recommend the Parallax Propeller (the first one
| that's available in DIP format), but arguably, that one is way
| more complex to program for (and also significantly more
| powerful, and at that point you might as well look into an
| ESP32 and that is "just use an MCU" :))
|
| And yeah, video output is a significant issue because of the
| required bandwidth for digital outputs (unless you're okay with
| composite or VGA outputs, I guess they can still be done with
| readily available chips?). The recent Commander X16 settled for
| an FPGA for this.
| MalphasWats wrote:
| I feel like the CX16 lost its way about a week after the
| project started and it suddenly became an expensive FPGA-
| based blob. But at the same time, I'm not sure what other
| option there is for a project like that.
|
| I always got the impression that David sort of got railroaded
| by the other members of the team that wanted to keep adding
| features and MOAR POWAH, and didn't have a huge amount of
| choice because those features quickly scoped out of his own
| areas of knowledge.
| MenhirMike wrote:
| I think so too - it must have been a great learning
| experience for him though, but for me, the idea of "The
| best C64-like computer that ever existed" died pretty
| quickly.
|
| He also did run into a similar problem that I ran into when
| I tried something like that as well: Sound Chips. Building
| a system around a Yamaha FM Synthesizer is perfect, but I
| found as well that most of the chips out there are broken,
| fake, or both and that no one else makes them anymore.
| Which makes sense because if you want a sound chip in this
| day, you use an AC97 or HD Audio codec and call it a day,
| but that goes against that spirit.
|
| I think that the spirit on hobby electronics is really
| found in FPGAs these days instead of rarer and rarer DIP
| parts. Which is a bit sad, but I guess that's just the
| passage of time. I wonder if that's how some people felt in
| the 70s when CPUs replaced many distinct layouts, or if
| they rejoiced and embraced it instead.
|
| I've given up trying to build a system on a breadboard and
| think that MiSTer is the modern equivalent of that.
| dragontamer wrote:
| > I think that the spirit on hobby electronics is really
| found in FPGAs these days instead of rarer and rarer DIP
| parts. Which is a bit sad, but I guess that's just the
| passage of time. I wonder if that's how some people felt
| in the 70s when CPUs replaced many distinct layouts, or
| if they rejoiced and embraced it instead.
|
| Microcontrollers have taken over. When 8kB SRAM and 20MHz
| microcontrollers exist below 50-cents and at miniscule
| 25mm^2 chip sizes drawing only 500uA of current...
| there's very little reason to use a collection of 30
| chips to do equivalent functionality.
|
| Except performance. If you need performance then bam,
| FPGA land comes in and Zynq just has too much performance
| at too low a cost (though not quite as low as the
| microcontroller gang).
|
| ----------
|
| Hobby Electronics is great now. You have so many usable
| parts at very low costs. A lot of problems are "solved"
| yes, but that's a good thing. That means you can focus on
| solving your hobby problem rather than trying to invent a
| new display driver or something.
| gnramires wrote:
| Another advantage of hobby anything is that you can just
| do, and reinvent whatever you want. Sure, fast CPUs/MCUs
| exist now and can do whatever you want. But if you feel
| like reinventing the wheel just for the sake of it, no
| one will stop you![1]
|
| I do think some people that remember fondly the _user
| experience_ of those old machines might be better served
| by using modern machines (like a raspberry pi or even a
| standard pc) in a different way instead of trying to use
| old hardware. That 's from the good old Turing machine
| universality (you can simulate practically any machine
| you like using newer hardware, if what you're interested
| in is software). You can even add artificial limitations
| like PICO-8 or TIC-80 does.
|
| See also uxn:
|
| https://100r.co/site/uxn.html
|
| and (WIP) picotron:
|
| https://www.lexaloffle.com/picotron.php
|
| I think there's a general concept here of making
| 'Operating environments' that are pleasant to work within
| (or have fun limitations), which I think are more
| practical than a dedicated Operating System optionally
| with a dedicated machine. Plus (unless you particularly
| want to!) you don't need to worry about all the complex
| parts of operating systems like network stacks, drivers
| and such.
|
| [1] Maybe we should call that Hobby universality (or
| immortality?) :P If it's already been made/discovered,
| you can always make it again just for fun.
| rzzzt wrote:
| The first choice was the Gameduino, also an FPGA-based
| solution. I have misplaced my bookmark for the
| documentation covering the previous hardware revision, but
| current version 3X is MOAR POWAH just on its own, this
| seems to be a natural tendency:
| https://excamera.com/sphinx/gameduino3/index.html#about-
| game...
|
| Edit: found it!
| https://excamera.com/sphinx/gameduino/index.html
| erik wrote:
| Modern retro computer designs run into the problem of
| generating a video signal. Ideally you'd have a tile and
| sprite based rendering. And you'd like to support HDMI or
| at least VGA. But there are no modern parts that offer
| this and building the functionality out of discrete
| components is impractical and unwieldy.
|
| A FPGA is really just the right tool for solving the
| video problem. Or some projects do it with a micro-
| controller. But it's sort of too bad as it kind of
| undercuts the spirit of the whole design. If you video
| processor is orders of magnitude more powerful than the
| rest of the computer, then one starts to ask why not just
| implement the entire computer inside the video processor?
| MenhirMike wrote:
| It's one of the funny things of the Raspberry Pi Pico W:
| The Infineon CYW4343 has an integrated ARM Cortex-M3 CPU,
| so the WiFi/BT chip is technically more advanced than the
| actual RP2040 (which is a Cortex-M0+) and also has more
| built-in ROM/RAM than what's on the Pico board for the
| RP2040 to use.
|
| And yeah, you can't really buy sprite-based video chips
| anymore, and you don't even have to worry about stuff
| like "Sprites per Scanline" because you can get a proper
| framebuffer for essentially free - but now you might as
| well go further and use one microprocessor to be the CPU,
| GPU, and FM Synthesizer Sound Chip and "just" add the
| logic to generate the actual video/audio signals.
| PfhorSlayer wrote:
| Funny enough, that's _exactly_ where this project started.
| After I built his 8 bit breadboard computer, I started looking
| into what might be involved in making something a bit more
| interesting. Can 't do a whole lot of high-speed anything with
| discrete logic gates, so I figured learning what I could do
| with an FPGA would be far more interesting.
| bArray wrote:
| Registers can be worked around by using the stack and/or
| memory. Division could always be implemented as a simple
| function. It's part of the fun of working at that level.
|
| Regarding graphics, initially output serial. Abstract the
| problem away until you are ready to deal with it. If you sneak
| up on an Arduino and make it scream, you can make it into a
| very basic VGA graphics card [1]. Even easier is ESP32 to VGA
| (also gives keyboard and mouse) [2].
|
| [1] https://www.instructables.com/Arduino-Basic-PC-With-VGA-
| Outp...
|
| [2] https://www.aliexpress.us/item/1005006222846299.html
| bloatfish wrote:
| This is insane! As a hobby hardware designer myself, I _can_
| imagine how much work must have gone into reaching this stage.
| Well done!
| nxobject wrote:
| I hope the author goes into some detail about how he implements
| the PCIe interface! I doubt I'll ever do hardware work at that
| level of sophistication, but for general cultural awareness I
| think it's worth looking under the hood of PCIe.
| gorkish wrote:
| The FPGA he is using has native pcie so usually all you get on
| this front is an interface to a vendor proprietary ip block.
| The state of open interfaces in FPGA land is abysmal. I think
| the best I've seen fully open source is a gigabit MAC
| 0xcde4c3db wrote:
| There is an open-source DisplayPort transmitter [1] that
| apparently supports multiple 2.7 Gbps lanes (albeit using
| family-specific SERDES/differential transceiver blocks, but I
| doubt that's avoidable at these speeds). This isn't PCIe, but
| it's also surprisingly close to PCIe 1.0 (2.5 Gbps/lane, and
| IIRC they use the same 8b/10b code and scrambling algorithm).
|
| [1] https://github.com/hamsternz/FPGA_DisplayPort
| PfhorSlayer wrote:
| Next blog post will be covering exactly that! Probably going to
| do a multi-part series - first one will be the PCB
| schematic/layout, then the FPGA interfaces and testing,
| followed by Windows drivers.
| notorandit wrote:
| It needs to be very fancy to write text in light gray on white.
|
| I am not sure your product will be a success.
|
| I am sure you web design skills need a good overhaul.
| KallDrexx wrote:
| This is my dream!
|
| The last year I've been working on a 2d focused GPU for I/O
| constrained microcontrollers
| (https://github.com/KallDrexx/microgpu). I've been able to
| utilize this to get user interfaces on slow SPI machines to
| render on large displays, and it's been fascinating to work on.
|
| But seeing the limitation of processor pipelines I've had the
| thought for a while that FPGAs could make this faster. I've
| recently gotten some low end FPGAs to start learning to try and
| turn my microgpu from an ESP32 based one to an FPGA one.
|
| I don't know if I"ll ever get to this level due to kids and free
| time constraints, but man, I would love to get even a hundredth
| of this level.
| Chabsff wrote:
| You probably know this already, but for anyone else curious
| about going down that road: For this type of use, it's
| definitely worth it to constrain yourself to FPGAs with
| dedicated high-bandwidth transceivers. A "basic" 1080p RGB
| signal at 60hz requires some high-frequency signal processing
| that's really hard to contend with in pure FPGA-land.
| KallDrexx wrote:
| That's good to know actually. I'm still very very early in my
| FPGA adaption (learning the fpga basics) and I am intending
| to start with standard 640x480 VGA before expanding.
| detuur wrote:
| I can't believe that this is the closest we have to a compact,
| stand-alone GPU option. There's nothing like a M.2 format GPU out
| there. All I want is a stand-alone M.2 GPU with modest
| performance, something on the level of embedded GPUs like Intel
| UHD Graphics, AMD Radeon, or Qualcomm's Adreno.
|
| I have an idea for a small embedded product which needs a lot of
| compute and networking, but only very modest graphical
| capabilities. The NXP Layerscape LX2160A [1] would be perfect,
| but I have to pass on it because it doesn't come with an embedded
| GPU. I just want a small GPU!
|
| [1]: https://www.nxp.com/products/processors-and-
| microcontrollers...
| magixx wrote:
| What about MXM GPUs that used to be found in gaming laptops? I
| know the standard is very niche and thus expensive ($400 for a
| 3080M used on ebay) but it does exists and you could convert
| them to PCI-E and thus m.2
| t-3 wrote:
| Maybe a little bit too low-powered for you, but:
| https://www.matrixorbital.com/ftdi-eve
| cpgxiii wrote:
| There's at least one m.2 GPU based on the Silicon Motion SM750
| controller made by Asrock Rack. Similar products exist for
| mPCIe form factor.
|
| Performance is nowhere near a modern iGPU, because an iGPU has
| access to all of the system memory and caches and power budget,
| and a simple m.2 device has node of that. Even low-end PCIe
| GPUs (single slot, half-length/half-height) struggle to
| outperform better iGPUs and really only make sense when you
| have to use them for basic display functionality.
| PfhorSlayer wrote:
| So, this is my project! Was somewhat hoping to wait until there
| was a bit more content up on the site before it started doing the
| rounds, but here we are! :)
|
| To answer what seems to be the most common question I get asked
| about this, I am intending on open-sourcing the entire stack (PCB
| schematic/layout, all the HDL, Windows WDDM drivers, API runtime
| drivers, and Quake ported to use the API) at some point, but
| there are a number of legal issues that need to be cleared (with
| respect to my job) and I need to decide the rest of the
| particulars (license, etc.) - this stuff is not what I do for a
| living, but it's tangentially-related enough that I need to cover
| my ass.
|
| The first commit for this project was on August 22, 2021. It's
| been a bit over two and a half years I've been working on this,
| and while I didn't write anything up during that process, there
| are a fair number of videos in my YouTube FuryGpu playlist
| (https://www.youtube.com/playlist?list=PL4FPA1MeZF440A9CFfMJ7...)
| that can kind of give you an idea of how things progressed.
|
| The next set of blog posts that are in the works concern the PCIe
| interface. It'll probably be a multi-part series starting at the
| PCB schematic/layout and moving through the FPGA design and
| ending with the Windows drivers. No timeline on when that'll be
| done, though. After having written just that post on how the
| Texture Units work, I've got even more respect for those that can
| write up technical stuff like that with any sort of timing
| consistency.
|
| I'll answer the remaining questions in the threads where they
| were asked.
|
| Thanks for the interest!
| michaelt wrote:
| Googling the Xilinx Zynq UltraScale+ it seems kinda expensive.
|
| Of course plenty of hobbies let people spend thousands (or
| more) so there's nothing wrong with that if you've got the
| money. But is it the end target for your project? Or do you
| have ambitions to go beyond that?
| kanetw wrote:
| The Kria SOM in use here is like $300.
| PfhorSlayer wrote:
| Let's be clear here, this is a _toy_. Beyond being a fun
| project to work on that could maybe get my foot in the door
| were I ever to decide to change careers and move into
| hardware design, this is not going to change the GPU
| landscape or compete with any of the commercial players. What
| it _might_ do is pave the way for others to do interesting
| things in this space. A board with all of the video hardware
| that you can plug into a computer with all the infrastructure
| available to play around with accelerating graphics could be
| a fun, if _extremely_ niche, product. That would also require
| a *significant* time and money investment from me, and that
| 's not something I necessarily want to deal with. When this
| is eventually open-sourced, those who really _are_ interested
| could make their own boards.
|
| One thing to note that is that while the US+ line is
| generally quite expensive (the higher end parts sit in the
| five-figures range for a one-off purchase! No one actually
| buying these is paying that price, but still!), the Kria SOMs
| are quite cheap in comparison. They've got a reasonably-
| powerful Zynq US+ for about $400, or just $350ish the dev
| boards (which do not expose some of the high-speed interfaces
| like PCIe). I'm starting to sound like a Xilinx shill given
| how many times I've re-stated this, but for anyone serious
| about getting into this kind of thing, those devboards are an
| amazing deal.
| belter wrote:
| "...I'm doing a (free) operating system (just a hobby,
| won't be big and professional like gnu) for 386(486) AT
| clones..."
| 0xcde4c3db wrote:
| I've been told by several people that distributor pricing for
| FPGAs is _ridiculously_ inflated compared to what direct
| customers pay, and considering that one can apparently get a
| dev board on AliExpress for about $110 [1] while Digikey
| lists the FPGA alone for about $1880 [2], I believe it (this
| example isn 't an UltraScale chip, but it is significantly
| bigger than the usual low-end Zynq 7000 boards sold to
| undergrads and tinkerers).
|
| [1] https://www.aliexpress.us/item/3256806069467487.html
|
| [2] https://www.digikey.com/en/products/detail/amd/XC7K325T-1
| FFG...
| bangaladore wrote:
| I have some first- and second-hand experience with this,
| and you are correct. I'm not sure who benefits from this
| practice. It's anywhere from 5-25x cheaper in even small-
| ish quantities.
| rustybolt wrote:
| I have seen semi-regular updates from you on discord and it is
| awesome to see how far this project has come (and also a bit
| frustrating to see how relatively little progress I have made
| on my FPGA projects in the same time!). I was hoping you'd do a
| writeup, can't wait!
| ruslan wrote:
| How much it depends on hard IP blocks ? I mean, can it be
| ported to FPGAs of other vendors, like Lattice ECP5 ? Did you
| implement PCIe in HDL or used vendor specific IP block ?
| Please, provide some resource utilization statistics. Thanks.
| PfhorSlayer wrote:
| Implementing PCIe in the fabric without using the hard IP
| would be foolish, and definitely not the kind of thing I'd
| enjoy spending my time on! The design makes extensive use of
| the DSP48E2 and various BRAM/URAM blocks available in the
| fabric. I don't have exact numbers off the top of my head,
| but roughly it's ~500 DSP units (primarily for
| multiplication), ~70k LUTs, ~135k FFs, and ~90 BRAMs. Porting
| it to a different device would be a pretty significant
| undertaking, but would not be impossible. Many of the DSP
| resources are inferred, but there is a lot of timing stuff
| that depends on the DSP48E2's behavior - multiple register
| stages following the multiplies, the inputs are sized
| appropriately for those specific DSP capabilities, etc.
| pocak wrote:
| In the post about the texture unit, that ROM table for mip
| level address offsets seems to use quite a bit of space. Have
| you considered making the mip base addresses a part of the
| texture spec instead?
| raphlinus wrote:
| Very cool project, and I love to see more work in this space.
|
| Something else to look at is the Vortex project from Georgia
| Tech[1]. Rather than recapitulating the fixed-function past of
| GPU design, I think it looks toward the future, as it's at heart
| a highly parallel computer, based on RISC-V with some extensions
| to handle GPU workloads better. The boards it runs on are a few
| thousand dollars, so it's not exactly a hobbyist friendly, but it
| certainly is more accessible than closed, proprietary
| development. There's a 2.0 release that just landed a few months
| ago.
|
| [1]: https://vortex.cc.gatech.edu/
___________________________________________________________________
(page generated 2024-03-27 23:00 UTC)