https://github.com/apple/ml-ane-transformers Skip to content Toggle navigation Sign up * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code + Explore + All features + Documentation + GitHub Skills + Blog * Solutions + For + Enterprise + Teams + Startups + Education + By Solution + CI/CD & Automation + DevOps + DevSecOps + Case Studies + Customer Stories + Resources * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles + Repositories + Topics + Trending + Collections * Pricing [ ] * # In this repository All GitHub | Jump to | * No suggested jump to results * # In this repository All GitHub | Jump to | * # In this organization All GitHub | Jump to | * # In this repository All GitHub | Jump to | Sign in Sign up {{ message }} apple / ml-ane-transformers Public * Notifications * Fork 31 * Star 656 Reference implementation of the Transformer architecture optimized for Apple Neural Engine (ANE) License View license 656 stars 31 forks Star Notifications * Code * Pull requests 1 * Security * Insights More * Code * Pull requests * Security * Insights apple/ml-ane-transformers This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main Switch branches/tags [ ] Branches Tags Could not load branches Nothing to show {{ refName }} default View all branches Could not load tags Nothing to show {{ refName }} default View all tags Name already in use A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch? Cancel Create 1 branch 3 tags Code * Local * Codespaces * Clone HTTPS GitHub CLI [https://github.com/a] Use Git or checkout with SVN using the web URL. [gh repo clone apple/] Work fast with our official CLI. Learn more. * Open with GitHub Desktop * Download ZIP Sign In Required Please sign in to use Codespaces. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching Xcode If nothing happens, download Xcode and try again. Launching Visual Studio Code Your codespace will open once ready. There was a problem preparing your codespace, please try again. Latest commit @atiorh atiorh fix relative import ... da64000 Aug 9, 2022 fix relative import da64000 Git stats * 5 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time ane_transformers fix relative import August 8, 2022 21:03 assets first commit June 6, 2022 09:55 .gitignore first commit June 6, 2022 09:55 ACKNOWLEDGEMENTS first commit June 6, 2022 09:55 CODE_OF_CONDUCT.md first commit June 6, 2022 09:55 CONTRIBUTING.md first commit June 6, 2022 09:55 LICENSE.md first commit June 6, 2022 09:55 Makefile first commit June 6, 2022 09:55 README.md installation instructions June 7, 2022 01:23 requirements.txt Adjust LayerNormANE bias to match torch.nn.LayerNorm equation and pin... July 30, 2022 00:18 setup.py Adjust LayerNormANE bias to match torch.nn.LayerNorm equation and pin... July 30, 2022 00:18 View code Apple Neural Engine (ANE) Transformers Tutorial: Optimized Deployment of Hugging Face distilbert A Note on Unit Tests Installation & Troubleshooting README.md Apple Neural Engine (ANE) Transformers Use ane_transformers as a reference PyTorch implementation if you are considering deploying your Transformer models on Apple devices with an A14 or newer and M1 or newer chip to achieve up to 10 times faster and 14 times lower peak memory consumption compared to baseline implementations. ane_transformers.reference comprises a standalone reference implementation and ane_transformers.huggingface comprises optimized versions of Hugging Face model classes such as distilbert to demonstrate the application of the optimization principles laid out in our research article on existing third-party implementations. Please check out our research article for a detailed explanation of the optimizations as well as interactive figures to explore latency and peak memory consumption data from our case study: Hugging Face distilbert model deployment on various devices and operating system versions. Below figures are non-interactive snapshots from the research article for iPhone 13 with iOS16.0 installed: [iPhone13_i] [iPhone13_i] Tutorial: Optimized Deployment of Hugging Face distilbert This tutorial is a step-by-step guide to the model deployment process from the case study in our research article. The same code is used to generate the Hugging Face distilbert performance data in the figures above. In order to begin the optimizations, we initialize the baseline model as follows: import transformers model_name = "distilbert-base-uncased-finetuned-sst-2-english" baseline_model = transformers.AutoModelForSequenceClassification.from_pretrained( model_name, return_dict=False, torchscript=True, ).eval() Then we initialize the mathematically equivalent but optimized model, and we restore its parameters using that of the baseline model: from ane_transformers.huggingface import distilbert as ane_distilbert optimized_model = ane_distilbert.DistilBertForSequenceClassification( baseline_model.config).eval() optimized_model.load_state_dict(baseline_model.state_dict()) Next we create sample inputs for the model: tokenizer = transformers.AutoTokenizer.from_pretrained(model_name) tokenized = tokenizer( ["Sample input text to trace the model"], return_tensors="pt", max_length=128, # token sequence length padding="max_length", ) We then trace the optimized model to obtain the expected input format (Torchscript) for the coremltools conversion tool. import torch traced_optimized_model = torch.jit.trace( optimized_model, (tokenized["input_ids"], tokenized["attention_mask"]) ) Finally, we use coremltools to generate the Core ML model package file and save it. import coremltools as ct import numpy as np ane_mlpackage_obj = ct.convert( traced_optimized_model, convert_to="mlprogram", inputs=[ ct.TensorType( f"input_{name}", shape=tensor.shape, dtype=np.int32, ) for name, tensor in tokenized.items() ], compute_units=ct.ComputeUnit.ALL, ) out_path = "HuggingFace_ane_transformers_distilbert_seqLen128_batchSize1.mlpackage" ane_mlpackage_obj.save(out_path) To verify performance, developers can now launch Xcode and simply add this model package file as a resource in their projects. After clicking on the Performance tab, the developer can generate a performance report on locally available devices, for example, on the Mac that is running Xcode or another Apple device that is connected to that Mac. The figure below shows a performance report generated for this model on an iPhone 13 Pro Max with iOS 16.0 installed. [xcode_perf] Based on the figure above, the latency is improved by a factor of 2.84 times for the sequence length of 128 and batch size of 1 that were chosen for the tutorial. Higher sequence lengths, such as 512, and batch sizes, such as 8, will yield up to 10 times lower latency and 14 times lower peak memory consumption. Please refer to Figure 2 from our research article for detailed and interactive performance data. Note that the load and compilation times increase due to the number of operations increasing in the optimized model but these are one-time costs and user experience will not be affected if the model is loaded asynchronously. Note that 4 of the 606 operations in the optimized model are executed on the CPU. These are the embedding lookup related operations and they are more efficient to do on the CPU for this particular model configuration. A Note on Unit Tests The unit tests measure, among other things, the ANE speed-up factor. Since the device spec for this reference implementation is M1 or newer chips for the Mac and A14 and newer chips for the iPhone and iPad, the speed-up unit tests will print a warning message if executed on devices outside of this spec. Even if the model is generated using an out of spec Mac, the model should work as expected on in-spec devices. Installation & Troubleshooting * Fastest: pip install ane_transformers * Locally editable: pip install -e . * If installation fails with ERROR: Failed building wheel for tokenizers or error: can't find Rust compiler, please follow this solution About Reference implementation of the Transformer architecture optimized for Apple Neural Engine (ANE) Resources Readme License View license Code of conduct Code of conduct Stars 656 stars Watchers 34 watching Forks 31 forks Releases 3 v0.1.3 Latest Aug 9, 2022 + 2 releases Packages 0 No packages published Languages * Python 99.7% * Makefile 0.3% Footer (c) 2023 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.