[HN Gopher] Swan - A Lightweight Language Model Execution Enviro...
       ___________________________________________________________________
        
       Swan - A Lightweight Language Model Execution Environment Using
       FPGA
        
       Author : daremocooon
       Score  : 35 points
       Date   : 2024-04-24 12:15 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | zachbee wrote:
       | I'm curious what the motivation is here -- unfortunately, the dev
       | blog is all in Chinese and I can't read it. If it's mostly to
       | show a proof-of-concept of LLMs on a FPGA, that's awesome!
       | 
       | But if this is targeting real-world applications, I'd have
       | concerns about price-to-performance. High-level synthesis tools
       | often result in fairly poor performance compared to writing
       | Verilog or SystemVerilog. Also, AI-focused SoCs like the Nvidia
       | Jetson usually offer better price-to-performance and performance-
       | per-watt than FPGA systems like the KV260.
       | 
       | Potentially focusing on specialized transformer architectures
       | with high sparsity or significant quantization could give FPGAs
       | an advantage over AI chips, though.
       | 
       | Not to toot my own horn, but I wrote up a piece on open-source
       | FPGA development recently going a bit deeper into some of these
       | insights, and why AI might not be the best use-case for open-
       | source FPGA applications: https://www.zach.be/p/how-to-build-a-
       | commercial-open-source
        
         | imtringued wrote:
         | AMD hasn't shipped their "high compute" SOMs, so there is
         | little point in building inference around it. Using
         | programmable logic for machine learning is a complete waste,
         | since Xilinx never shied away from sprinkling lots of "AI
         | Engines" on their bigger FPGAs, to the point where buying the
         | FPGA just for the AI Engines might be worth it, because 100s of
         | VLIW cores packs a serious punch for running numerical
         | simulations.
        
         | UncleOxidant wrote:
         | There has been a some recent investigations into bitnets (1 or
         | 2-bit weights for NNs including LLMs) where they show that a
         | 1.58 bit weight (with values: -1,0,1) can achieve very good
         | results. Effectively that's 2 bits. The problem is that doing
         | 2-bit math on a CPU or GPU isn't going to be very efficient
         | (lots of shifting & masking). But doing 2-bit math on an FPGA
         | is really easy and space-efficient. Another bonus is that many
         | of the matrix multiplications are replaced by additions. Right
         | now if you want to investigate these smaller weight sizes FPGAs
         | are probably the best option.
         | 
         | > High-level synthesis tools often result in fairly poor
         | performance compared to writing Verilog or SystemVerilog.
         | 
         | Agreed.
        
       ___________________________________________________________________
       (page generated 2024-04-25 23:00 UTC)