[HN Gopher] Swan - A Lightweight Language Model Execution Enviro...
___________________________________________________________________
Swan - A Lightweight Language Model Execution Environment Using
FPGA
Author : daremocooon
Score : 35 points
Date : 2024-04-24 12:15 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| zachbee wrote:
| I'm curious what the motivation is here -- unfortunately, the dev
| blog is all in Chinese and I can't read it. If it's mostly to
| show a proof-of-concept of LLMs on a FPGA, that's awesome!
|
| But if this is targeting real-world applications, I'd have
| concerns about price-to-performance. High-level synthesis tools
| often result in fairly poor performance compared to writing
| Verilog or SystemVerilog. Also, AI-focused SoCs like the Nvidia
| Jetson usually offer better price-to-performance and performance-
| per-watt than FPGA systems like the KV260.
|
| Potentially focusing on specialized transformer architectures
| with high sparsity or significant quantization could give FPGAs
| an advantage over AI chips, though.
|
| Not to toot my own horn, but I wrote up a piece on open-source
| FPGA development recently going a bit deeper into some of these
| insights, and why AI might not be the best use-case for open-
| source FPGA applications: https://www.zach.be/p/how-to-build-a-
| commercial-open-source
| imtringued wrote:
| AMD hasn't shipped their "high compute" SOMs, so there is
| little point in building inference around it. Using
| programmable logic for machine learning is a complete waste,
| since Xilinx never shied away from sprinkling lots of "AI
| Engines" on their bigger FPGAs, to the point where buying the
| FPGA just for the AI Engines might be worth it, because 100s of
| VLIW cores packs a serious punch for running numerical
| simulations.
| UncleOxidant wrote:
| There has been a some recent investigations into bitnets (1 or
| 2-bit weights for NNs including LLMs) where they show that a
| 1.58 bit weight (with values: -1,0,1) can achieve very good
| results. Effectively that's 2 bits. The problem is that doing
| 2-bit math on a CPU or GPU isn't going to be very efficient
| (lots of shifting & masking). But doing 2-bit math on an FPGA
| is really easy and space-efficient. Another bonus is that many
| of the matrix multiplications are replaced by additions. Right
| now if you want to investigate these smaller weight sizes FPGAs
| are probably the best option.
|
| > High-level synthesis tools often result in fairly poor
| performance compared to writing Verilog or SystemVerilog.
|
| Agreed.
___________________________________________________________________
(page generated 2024-04-25 23:00 UTC)