https://github.com/facebookincubator/AITemplate Skip to content Toggle navigation Sign up * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code Explore + All features + Documentation + GitHub Skills + Blog * Solutions For + Enterprise + Teams + Startups + Education By Solution + CI/CD & Automation + DevOps + DevSecOps Case Studies + Customer Stories + Resources * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Pricing [ ] * # In this repository All GitHub | Jump to | * No suggested jump to results * # In this repository All GitHub | Jump to | * # In this organization All GitHub | Jump to | * # In this repository All GitHub | Jump to | Sign in Sign up {{ message }} facebookincubator / AITemplate Public * Notifications * Fork 283 * Star 3.4k AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference. License Apache-2.0 license 3.4k stars 283 forks Star Notifications * Code * Issues 71 * Pull requests 30 * Actions * Projects 0 * Wiki * Security * Insights More * Code * Issues * Pull requests * Actions * Projects * Wiki * Security * Insights facebookincubator/AITemplate This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main Switch branches/tags [ ] Branches Tags Could not load branches Nothing to show {{ refName }} default View all branches Could not load tags Nothing to show {{ refName }} default View all tags Name already in use A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch? Cancel Create 3 branches 2 tags Code * Local * Codespaces * Clone HTTPS GitHub CLI [https://github.com/f] Use Git or checkout with SVN using the web URL. [gh repo clone facebo] Work fast with our official CLI. Learn more about the CLI. * Open with GitHub Desktop * Download ZIP Sign In Required Please sign in to use Codespaces. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching Xcode If nothing happens, download Xcode and try again. Launching Visual Studio Code Your codespace will open once ready. There was a problem preparing your codespace, please try again. Latest commit @mcremon-meta @facebook-github-bot mcremon-meta and facebook-github-bot Add support of nn.functional.hardtanh (#739) ... 22a74c7 Jun 2, 2023 Add support of nn.functional.hardtanh (#739) Summary: Pull Request resolved: #739 nn.functional.hardtanh is used in mobilenet_v2 but corresponding support in FX and/or kernel is missing. Mapping hardtanh to clamp to enable compiling/running the model. Note that it gives a numerical mismatch, which will be investigated in a separate task. Reviewed By: cgufb Differential Revision: D46194334 fbshipit-source-id: 87419e123d6ee7bad3682e81b3578817d6a3359d 22a74c7 Git stats * 529 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time .circleci .github/workflows 3rdparty docker docs examples fx2ait licenses python static tests .clang-format .flake8 .gitignore .gitmodules CITATION.cff CODE_OF_CONDUCT.md CONTRIBUTING.md LICENSE README.md default.nix View code [ ] AITemplate More about AITemplate Excellent Backward Capability Horizontal Fusion Vertical Fusion Memory Fusion Working w/wo PyTorch Extensions without suffering FX2AIT Installation Clone the code Docker Image From Source Getting Started Examples & Performance Release Contributing The Team Acknowledgements License README.md AITemplate License | Documentation | CircleCI Deploy docs to Pages AITemplate (AIT) is a Python framework that transforms deep neural networks into CUDA (NVIDIA GPU) / HIP (AMD GPU) C++ code for lightning-fast inference serving. AITemplate highlights include: * High performance: close to roofline fp16 TensorCore (NVIDIA GPU) / MatrixCore (AMD GPU) performance on major models, including ResNet, MaskRCNN, BERT, VisionTransformer, Stable Diffusion, etc. * Unified, open, and flexible. Seamless fp16 deep neural network models for NVIDIA GPU or AMD GPU. Fully open source, Lego-style easily extendable high-performance primitives for new model support. Supports a significantly more comprehensive range of fusions than existing solutions for both GPU platforms. More about AITemplate Excellent Backward Capability AITemplate doesn't depend on third-party libraries or runtimes, such as cuBLAS, cuDNN, rocBLAS, MIOpen, TensorRT, MIGraphX, etc. Each model is compiled into a self-contained portable binary, which can be used on any software environment with the same hardware. Horizontal Fusion AITemplate provides unique advanced horizontal fusion. AITemplate can fuse parallel GEMM, LayerNorm, and other operators with different input shapes into a single GPU kernel. Vertical Fusion AITemplate provides strong vertical fusion. AITemplate can fuse a large range of operations into TensorCore/MatrixCore operations, such as elementwise operations, reductions, and layout permutations. AITemplate also provides back-to-back style TensorCore / MatrixCore operation fusion. Memory Fusion AITemplate provides innovative memory fusions. AITemplate can fuse GEMM, LayerNorm, and other operators, followed by memory operations such as concatenation, split, and slice into a single operator. Working w/wo PyTorch The AITemplate-generated Python runtime can take PyTorch tensors as inputs and outputs without an extra copy. For environments without PyTorch, the AITemplate Python/C++ runtime is self-contained. Extensions without suffering AITemplate provides a straightforward approach for making an extension in codegen. To add a new operator or a new fused kernel into AITemplate, most of the time one only needs to add two Python files: one for a graph node definition and another for the backend codegen. The CUDA/HIP kernel in a text header file can be directly utilized in the codegen. FX2AIT FX2AIT is a Python-based tool that converts PyTorch models into AITemplate (AIT) engine for lightning-fast inference serving. Using FX2AIT's built-in AITLowerer, partial AIT acceleration can be achieved for models with unsupported operators in AITemplate. Key features of FX2AIT include: * Easy Conversion: FX2AIT requires only a PyTorch model and input for conversion, generating an "AITModule" output for inference serving. * Expanded Support: AITemplate does not support all PyTorch operators. FX2AIT's AITLowerer offers a solution for partial AIT conversion for models with unsupported operators. Check the fx2ait/fx2ait/example/03_lowering_split for more information. More info can be found from https://github.com/facebookincubator/ AITemplate/tree/main/fx2ait. Installation Hardware requirements: * NVIDIA: AIT is only tested on SM80+ GPUs (Ampere etc). Not all kernels work with old SM75/SM70 (T4/V100) GPUs. * AMD: AIT is only tested on CDNA2 (MI-210/250) GPUs. There may be compiler issues for old CDNA1 (MI-100) GPUs. Clone the code When cloning the code, please use the following command to also clone the submodules: git clone --recursive https://github.com/facebookincubator/AITemplate Docker Image We highly recommend using AITemplate with Docker to avoid accidentally using a wrong version of NVCC or HIPCC. * CUDA: ./docker/build.sh cuda * ROCM: DOCKER_BUILDKIT=1 ./docker/build.sh rocm This will build a docker image with tag ait:latest. From Source The following command will create a Python wheel for AITemplate. Please ensure you have correct CUDA/ROCm compiler installed. * CUDA: CUDA 11.6 * ROCm: We tested on ROCm 5.2.3 with a customized build HIPCC with the command in docker/Dockerfile.rocm#L87-L96 Incorrect compiler will lead performance regression. Please check all submodules are cloned correctly before go to next step. cd python python setup.py bdist_wheel pip install dist/*.whl --force-reinstall Getting Started Check out the AITemplate Documentation for API reference. There are a few tutorials for onboarding: * 01: How to inference a PyTorch model with AIT * 02: How to add an op to AIT codegen * 03: How to visualize AIT's optimization Examples & Performance AITemplate provides the following model templates & reference performance data on A100/MI-250: * 01_ResNet-50 with PyTorch Image Models (TIMM) * 02_MaskRCNN-FPN with Detectron2 * 03_BERT with HuggingFace Transformer * 04_Vision Transformer with PyTorch Image Models (TIMM) * 05_Stable Diffusion with HuggingFace Diffusers Release All current development updates can be seen in the AITemplate repository. Releases are not on a set schedule and will only be tagged for significant feature releases. Mid-term plan: * Better dynamic shape support: Focus on the dynamic sequence in Transformers. Add symbolic shape support. * More automatic graph passes: Relief manual rewrite models to obtain the best performance. * Quantization: fp8/int8/int4. * Sparsity pruning for Gemm. * PT2 integration: Aten2AIT is under active development. Long-term plan: * Automatic ONNX, Open-XLA and other format model conversion. * Composable Kernel CPU extension on AVX2/AVX-512 for AMD Epyc CPU. Contributing Check our contributing guide to learn about how to contribute to the project. The Team AITemplate is currently maintained by Meta engineers: Ying Zhang, Yang Chen, Terry Chen, Mu-Chu Lee, Max Podkorytov, Adnan Akhundov. AITemplate is co-created by Meta engineers: Bing Xu, Ying Zhang, Hao Lu, Yang Chen, and Terry Chen, with major contributions coming from more talented engineers. A non-exhaustive list to mention is Mike Iovine, Mu-Chu Lee, Scott Wolchok, Oleg Khabinov, Shirong Wu, Huaming Li, Hui Guo, Zhijing Li, Max Podkorytov. We also want to thank Andrew Tulloch, Yinghai Lu, Lu Fang for the valuable discussions. FX2AIT and Aten2AIT are co-created and maintained by Meta engineers: Wei Wei, Shirong Wu and Zhijing Li. Acknowledgements AITemplate team works deeply with NVIDIA CUTLASS Team (led by Andrew Kerr, Haicheng Wu) and AMD Composable Kernel Team (led by Chao Liu, Jing Zhang). We co-designed many advanced GPU optimizations specialized for each platform, and nothing is possible without our close collaboration. License AITemplate is licensed under the Apache 2.0 License. About AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference. Resources Readme License Apache-2.0 license Code of conduct Code of conduct Security policy Security policy Stars 3.4k stars Watchers 79 watching Forks 283 forks Report repository Releases 2 AITemplate 0.2 Latest Jan 31, 2023 + 1 release Contributors 48 * @aakhundov * @tenpercent * @apivovarov * @muchulee8 * @chenyang78 * @ipiszy * @frank-wei * @tissue3 * @kadeng * @hl475 * @terrychenism + 37 contributors Languages * Python 77.0% * C++ 19.0% * Cuda 4.0% Footer (c) 2023 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.