https://github.com/facebookincubator/AITemplate

Skip to content Toggle navigation
 
Sign up

  * Product
      +  
        Actions
        Automate any workflow
      +  
        Packages
        Host and manage packages
      +  
        Security
        Find and fix vulnerabilities
      +  
        Codespaces
        Instant dev environments
      +  
        Copilot
        Write better code with AI
      +  
        Code review
        Manage code changes
      +  
        Issues
        Plan and track work
      +  
        Discussions
        Collaborate outside of code
    Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
    For
      + Enterprise
      + Teams
      + Startups
      + Education
    By Solution
      + CI/CD & Automation
      + DevOps
      + DevSecOps
    Case Studies
      + Customer Stories
      + Resources
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
    Repositories
      + Topics
      + Trending
      + Collections
  * Pricing

[                    ] 

  *  
    #
    In this repository All GitHub |
    Jump to |

  * No suggested jump to results

  *  
    #
    In this repository All GitHub |
    Jump to |
  *  
    #
    In this organization All GitHub |
    Jump to |
  *  
    #
    In this repository All GitHub |
    Jump to |

Sign in
Sign up
{{ message }}
facebookincubator / AITemplate Public

  * Notifications
  * Fork 283
  * Star 3.4k

AITemplate is a Python framework which renders neural network into
high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore
(NVIDIA GPU) and MatrixCore (AMD GPU) inference.

License

Apache-2.0 license
3.4k stars 283 forks
Star
Notifications

  * Code
  * Issues 71
  * Pull requests 30
  * Actions
  * Projects 0
  * Wiki
  * Security
  * Insights

More

  * Code
  * Issues
  * Pull requests
  * Actions
  * Projects
  * Wiki
  * Security
  * Insights

facebookincubator/AITemplate

This commit does not belong to any branch on this repository, and may
belong to a fork outside of the repository.
main
Switch branches/tags
[                    ]
Branches Tags
Could not load branches
Nothing to show
{{ refName }} default View all branches
Could not load tags
Nothing to show
{{ refName }} default
View all tags

Name already in use

A tag already exists with the provided branch name. Many Git commands
accept both tag and branch names, so creating this branch may cause
unexpected behavior. Are you sure you want to create this branch?
Cancel Create
3 branches 2 tags
Code

  * Local
  * Codespaces

  *  
    Clone
    HTTPS GitHub CLI
    [https://github.com/f]

    Use Git or checkout with SVN using the web URL.

    [gh repo clone facebo]

    Work fast with our official CLI. Learn more about the CLI.

  * Open with GitHub Desktop
  * Download ZIP

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

@mcremon-meta @facebook-github-bot
mcremon-meta and facebook-github-bot Add support of
nn.functional.hardtanh (#739)
...
22a74c7 Jun 2, 2023
Add support of nn.functional.hardtanh (#739)

Summary:
Pull Request resolved: #739

nn.functional.hardtanh is used in mobilenet_v2 but corresponding support in FX and/or kernel is missing.
Mapping hardtanh to clamp to enable compiling/running the model.
Note that it gives a numerical mismatch, which will be investigated in a separate task.

Reviewed By: cgufb

Differential Revision: D46194334

fbshipit-source-id: 87419e123d6ee7bad3682e81b3578817d6a3359d

22a74c7

Git stats

  * 529 commits

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
.circleci
 
 
.github/workflows
 
 
3rdparty
 
 
docker
 
 
docs
 
 
examples
 
 
fx2ait
 
 
licenses
 
 
python
 
 
static
 
 
tests
 
 
.clang-format
 
 
.flake8
 
 
.gitignore
 
 
.gitmodules
 
 
CITATION.cff
 
 
CODE_OF_CONDUCT.md
 
 
CONTRIBUTING.md
 
 
LICENSE
 
 
README.md
 
 
default.nix
 
 
View code
[                    ]
AITemplate More about AITemplate Excellent Backward Capability
Horizontal Fusion Vertical Fusion Memory Fusion Working w/wo PyTorch
Extensions without suffering FX2AIT Installation Clone the code
Docker Image From Source Getting Started Examples & Performance
Release Contributing The Team Acknowledgements License

README.md

 AITemplate

License | Documentation | CircleCI Deploy docs to Pages

AITemplate (AIT) is a Python framework that transforms deep neural
networks into CUDA (NVIDIA GPU) / HIP (AMD GPU) C++ code for
lightning-fast inference serving. AITemplate highlights include:

  * High performance: close to roofline fp16 TensorCore (NVIDIA GPU)
    / MatrixCore (AMD GPU) performance on major models, including
    ResNet, MaskRCNN, BERT, VisionTransformer, Stable Diffusion, etc.
  * Unified, open, and flexible. Seamless fp16 deep neural network
    models for NVIDIA GPU or AMD GPU. Fully open source, Lego-style
    easily extendable high-performance primitives for new model
    support. Supports a significantly more comprehensive range of
    fusions than existing solutions for both GPU platforms.

 More about AITemplate

 Excellent Backward Capability

AITemplate doesn't depend on third-party libraries or runtimes, such
as cuBLAS, cuDNN, rocBLAS, MIOpen, TensorRT, MIGraphX, etc. Each
model is compiled into a self-contained portable binary, which can be
used on any software environment with the same hardware.

 Horizontal Fusion

AITemplate provides unique advanced horizontal fusion. AITemplate can
fuse parallel GEMM, LayerNorm, and other operators with different
input shapes into a single GPU kernel.

 Vertical Fusion

AITemplate provides strong vertical fusion. AITemplate can fuse a
large range of operations into TensorCore/MatrixCore operations, such
as elementwise operations, reductions, and layout permutations.
AITemplate also provides back-to-back style TensorCore / MatrixCore
operation fusion.

 Memory Fusion

AITemplate provides innovative memory fusions. AITemplate can fuse
GEMM, LayerNorm, and other operators, followed by memory operations
such as concatenation, split, and slice into a single operator.

 Working w/wo PyTorch

The AITemplate-generated Python runtime can take PyTorch tensors as
inputs and outputs without an extra copy. For environments without
PyTorch, the AITemplate Python/C++ runtime is self-contained.

 Extensions without suffering

AITemplate provides a straightforward approach for making an
extension in codegen. To add a new operator or a new fused kernel
into AITemplate, most of the time one only needs to add two Python
files: one for a graph node definition and another for the backend
codegen. The CUDA/HIP kernel in a text header file can be directly
utilized in the codegen.

 FX2AIT

FX2AIT is a Python-based tool that converts PyTorch models into
AITemplate (AIT) engine for lightning-fast inference serving. Using
FX2AIT's built-in AITLowerer, partial AIT acceleration can be
achieved for models with unsupported operators in AITemplate.

Key features of FX2AIT include:

  * Easy Conversion: FX2AIT requires only a PyTorch model and input
    for conversion, generating an "AITModule" output for inference
    serving.
  * Expanded Support: AITemplate does not support all PyTorch
    operators. FX2AIT's AITLowerer offers a solution for partial AIT
    conversion for models with unsupported operators. Check the
    fx2ait/fx2ait/example/03_lowering_split for more information.

More info can be found from https://github.com/facebookincubator/
AITemplate/tree/main/fx2ait.

 Installation

Hardware requirements:

  * NVIDIA: AIT is only tested on SM80+ GPUs (Ampere etc). Not all
    kernels work with old SM75/SM70 (T4/V100) GPUs.
  * AMD: AIT is only tested on CDNA2 (MI-210/250) GPUs. There may be
    compiler issues for old CDNA1 (MI-100) GPUs.

 Clone the code

When cloning the code, please use the following command to also clone
the submodules:

git clone --recursive https://github.com/facebookincubator/AITemplate

 Docker Image

We highly recommend using AITemplate with Docker to avoid
accidentally using a wrong version of NVCC or HIPCC.

  * CUDA: ./docker/build.sh cuda
  * ROCM: DOCKER_BUILDKIT=1 ./docker/build.sh rocm

This will build a docker image with tag ait:latest.

 From Source

The following command will create a Python wheel for AITemplate.
Please ensure you have correct CUDA/ROCm compiler installed.

  * CUDA: CUDA 11.6
  * ROCm: We tested on ROCm 5.2.3 with a customized build HIPCC with
    the command in docker/Dockerfile.rocm#L87-L96

Incorrect compiler will lead performance regression.

Please check all submodules are cloned correctly before go to next
step.

cd python
python setup.py bdist_wheel
pip install dist/*.whl --force-reinstall

 Getting Started

Check out the AITemplate Documentation for API reference.

There are a few tutorials for onboarding:

  * 01: How to inference a PyTorch model with AIT
  * 02: How to add an op to AIT codegen
  * 03: How to visualize AIT's optimization

 Examples & Performance

AITemplate provides the following model templates & reference
performance data on A100/MI-250:

  * 01_ResNet-50 with PyTorch Image Models (TIMM)
  * 02_MaskRCNN-FPN with Detectron2
  * 03_BERT with HuggingFace Transformer
  * 04_Vision Transformer with PyTorch Image Models (TIMM)
  * 05_Stable Diffusion with HuggingFace Diffusers

 Release

All current development updates can be seen in the AITemplate
repository. Releases are not on a set schedule and will only be
tagged for significant feature releases.

Mid-term plan:

  * Better dynamic shape support: Focus on the dynamic sequence in
    Transformers. Add symbolic shape support.
  * More automatic graph passes: Relief manual rewrite models to
    obtain the best performance.
  * Quantization: fp8/int8/int4.
  * Sparsity pruning for Gemm.
  * PT2 integration: Aten2AIT is under active development.

Long-term plan:

  * Automatic ONNX, Open-XLA and other format model conversion.
  * Composable Kernel CPU extension on AVX2/AVX-512 for AMD Epyc CPU.

 Contributing

Check our contributing guide to learn about how to contribute to the
project.

 The Team

AITemplate is currently maintained by Meta engineers: Ying Zhang,
Yang Chen, Terry Chen, Mu-Chu Lee, Max Podkorytov, Adnan Akhundov.

AITemplate is co-created by Meta engineers: Bing Xu, Ying Zhang, Hao
Lu, Yang Chen, and Terry Chen, with major contributions coming from
more talented engineers. A non-exhaustive list to mention is Mike
Iovine, Mu-Chu Lee, Scott Wolchok, Oleg Khabinov, Shirong Wu, Huaming
Li, Hui Guo, Zhijing Li, Max Podkorytov. We also want to thank Andrew
Tulloch, Yinghai Lu, Lu Fang for the valuable discussions.

FX2AIT and Aten2AIT are co-created and maintained by Meta engineers:
Wei Wei, Shirong Wu and Zhijing Li.

 Acknowledgements

AITemplate team works deeply with NVIDIA CUTLASS Team (led by Andrew
Kerr, Haicheng Wu) and AMD Composable Kernel Team (led by Chao Liu,
Jing Zhang). We co-designed many advanced GPU optimizations
specialized for each platform, and nothing is possible without our
close collaboration.

 License

AITemplate is licensed under the Apache 2.0 License.

About

AITemplate is a Python framework which renders neural network into
high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore
(NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Resources

Readme

License

Apache-2.0 license

Code of conduct

Code of conduct

Security policy

Security policy

Stars

3.4k stars

Watchers

79 watching

Forks

283 forks
Report repository

Releases 2

 
AITemplate 0.2 Latest
Jan 31, 2023
+ 1 release

Contributors 48

  * @aakhundov
  * @tenpercent
  * @apivovarov
  * @muchulee8
  * @chenyang78
  * @ipiszy
  * @frank-wei
  * @tissue3
  * @kadeng
  * @hl475
  * @terrychenism

+ 37 contributors

Languages

  * Python 77.0%
  * C++ 19.0%
  * Cuda 4.0%

Footer

 (c) 2023 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact GitHub
  * Pricing
  * API
  * Training
  * Blog
  * About

You can't perform that action at this time.
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session.