https://github.com/trevorpogue/algebraic-nnhw

Skip to content
Toggle navigation
 
Sign in

  * Product
      +  
        Actions
        Automate any workflow
      +  
        Packages
        Host and manage packages
      +  
        Security
        Find and fix vulnerabilities
      +  
        Codespaces
        Instant dev environments
      +  
        Copilot
        Write better code with AI
      +  
        Code review
        Manage code changes
      +  
        Issues
        Plan and track work
      +  
        Discussions
        Collaborate outside of code
    Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
    For
      + Enterprise
      + Teams
      + Startups
      + Education
    By Solution
      + CI/CD & Automation
      + DevOps
      + DevSecOps
    Resources
      + Learning Pathways
      + White papers, Ebooks, Webinars
      + Customer Stories
      + Partners
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
    Repositories
      + Topics
      + Trending
      + Collections
  * Pricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search
[                    ]
Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

[                    ] [ ] Include my email address so I can be
contacted
Cancel Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name [                    ] 
Query [                    ]

To see all available qualifiers, see our documentation.

Cancel Create saved search
Sign in
Sign up
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session. You switched accounts on another tab or window. Reload
to refresh your session. Dismiss alert
{{ message }}
trevorpogue / algebraic-nnhw Public

  * Notifications
  * Fork 8
  * Star 145
  * 

AI acceleration using matrix multiplication with half the
multiplications

ieeexplore.ieee.org/document/10323219
145 stars 8 forks Branches Tags Activity
Star
Notifications

  * Code
  * Issues 0
  * Pull requests 0
  * Actions
  * Projects 0
  * Security
  * Insights

Additional navigation options

  * Code
  * Issues
  * Pull requests
  * Actions
  * Projects
  * Security
  * Insights

trevorpogue/algebraic-nnhw

This commit does not belong to any branch on this repository, and may
belong to a fork outside of the repository.
 main
BranchesTags
  
Go to file
Code

Folders and files

    Name          Name      Last commit message Last commit date
Latest commit

 

History

14 Commits
 
compiler/nnhw compiler/nnhw                      

rtl           rtl                                

sim           sim                                

tests         tests                              

utils         utils                              

.gitignore    .gitignore                         

CITATION.cff  CITATION.cff                       

README.md     README.md                          

setup.py      setup.py                           

View all files

Repository files navigation

  * README

This repository contains the source code for ML hardware
architectures that require nearly half the number of multiplier units
to achieve the same performance, by executing alternative
inner-product algorithms that trade nearly half the multiplications
for cheap low-bitwidth additions, while still producing identical
output as the conventional inner product. This increases the
theoretical throughput and compute efficiency limits of ML
accelerators. See the following journal publication for the full
details:

T. E. Pogue and N. Nicolici, "Fast Inner-Product Algorithms and
Architectures for Deep Neural Network Accelerators," in IEEE
Transactions on Computers, vol. 73, no. 2, pp. 495-509, Feb. 2024,
doi: 10.1109/TC.2023.3334140.

Article URL: https://ieeexplore.ieee.org/document/10323219

Open-access version: https://arxiv.org/abs/2311.12224

Abstract: We introduce a new algorithm called the Free-pipeline Fast
Inner Product (FFIP) and its hardware architecture that improve an
under-explored fast inner-product algorithm (FIP) proposed by
Winograd in 1968. Unlike the unrelated Winograd minimal filtering
algorithms for convolutional layers, FIP is applicable to all machine
learning (ML) model layers that can mainly decompose to matrix
multiplication, including fully-connected, convolutional, recurrent,
and attention/transformer layers. We implement FIP for the first time
in an ML accelerator then present our FFIP algorithm and generalized
architecture which inherently improve FIP's clock frequency and, as a
consequence, throughput for a similar hardware cost. Finally, we
contribute ML-specific optimizations for the FIP and FFIP algorithms
and architectures. We show that FFIP can be seamlessly incorporated
into traditional fixed-point systolic array ML accelerators to
achieve the same throughput with half the number of
multiply-accumulate (MAC) units, or it can double the maximum
systolic array size that can fit onto devices with a fixed hardware
budget. Our FFIP implementation for non-sparse ML models with 8 to
16-bit fixed-point inputs achieves higher throughput and compute
efficiency than the best-in-class prior solutions on the same type of
compute platform.

The following diagram shows an overview of the ML accelerator system
implemented in this source code:

          [285502293-11a7d485-04a3-4e9d-b9fb-91c35c80086f]

The FIP and FFIP systolic array/MXU processing elements (PE)s shown
below in (b) and (c) implement the FIP and FFIP inner-product
algorithms and each individually provide the same effective
computational power as the two baseline PEs shown in (a) combined
which implement the baseline inner product as in previous
systolic-array ML accelerators:

          [300184475-d9b956a2-25fa-4173-8ba9-8fd27d02f0c1]

The following is a diagram of the MXU/systolic array and shows how
the PEs are connected:

          [300986120-baf3e2f7-1767-49ec-811e-7cb44fac8d92]

The source code organization is as follows:

  * compiler
      + A compiler for parsing Python model descriptions into
        accelerator instructions that allow it to accelerate the
        model. This part also includes code for interfacing with a
        PCIe driver for initiating model execution on the
        accelerator, reading back results and performance counters,
        and testing the correctness of the results.
  * rtl
      + Synthesizable SystemVerilog RTL.
  * sim
      + Scripts for setting up simulation environments for testing.
  * tests
      + UVM-based testbench source code for verifying the accelerator
        in simulation using Cocotb.
  * utils
      + Additional Python packages and scripts used in this project
        that the author created for general development utilities and
        aids.

The files rtl/top/define.svh and rtl/top/pkg.sv contain a number of
configurable parameters such as FIP_METHOD in define.svh which
defines the systolic array type (baseline, FIP, or FFIP), SZI and SZJ
which define the systolic array height/width, and LAYERIO_WIDTH/
WEIGHT_WIDTH which define the input bitwidths.

The directory rtl/arith includes mxu.sv and mac_array.sv which
contain the RTL for the baseline, FIP, and, FFIP systolic array
architectures (depending on the value of the parameter FIP_METHOD).

About

AI acceleration using matrix multiplication with half the
multiplications

ieeexplore.ieee.org/document/10323219

Topics

machine-learning deep-neural-networks hardware accelerator 
artificial-intelligence computer-architecture hardware-acceleration 
systolic-arrays

Resources

Readme
Activity

Stars

145 stars

Watchers

5 watching

Forks

8 forks
Report repository

Releases

No releases published

Packages 0

No packages published

Languages

  * Python 59.5%
  * SystemVerilog 39.8%
  * Other 0.7%

Footer

 (c) 2024 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact
  * Manage cookies
  * Do not share my personal information

You can't perform that action at this time.