https://github.com/apple/ml-ane-transformers

Skip to content Toggle navigation
 
Sign up

  * Product
      +  
        Actions
        Automate any workflow
      +  
        Packages
        Host and manage packages
      +  
        Security
        Find and fix vulnerabilities
      +  
        Codespaces
        Instant dev environments
      +  
        Copilot
        Write better code with AI
      +  
        Code review
        Manage code changes
      +  
        Issues
        Plan and track work
      +  
        Discussions
        Collaborate outside of code
      + Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
      + For
      + Enterprise
      + Teams
      + Startups
      + Education
      + By Solution
      + CI/CD & Automation
      + DevOps
      + DevSecOps
      + Case Studies
      + Customer Stories
      + Resources
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
      + Repositories
      + Topics
      + Trending
      + Collections
  * Pricing

[                    ] 

  *  
    #
    In this repository All GitHub |
    Jump to |

  * No suggested jump to results

  *  
    #
    In this repository All GitHub |
    Jump to |
  *  
    #
    In this organization All GitHub |
    Jump to |
  *  
    #
    In this repository All GitHub |
    Jump to |

Sign in
Sign up
{{ message }}
apple / ml-ane-transformers Public

  * Notifications
  * Fork 31
  * Star 656

Reference implementation of the Transformer architecture optimized
for Apple Neural Engine (ANE)

License

View license
656 stars 31 forks
Star
Notifications

  * Code
  * Pull requests 1
  * Security
  * Insights

More

  * Code
  * Pull requests
  * Security
  * Insights

apple/ml-ane-transformers

This commit does not belong to any branch on this repository, and may
belong to a fork outside of the repository.
main
Switch branches/tags
[                    ]
Branches Tags
Could not load branches
Nothing to show
{{ refName }} default View all branches
Could not load tags
Nothing to show
{{ refName }} default
View all tags

Name already in use

A tag already exists with the provided branch name. Many Git commands
accept both tag and branch names, so creating this branch may cause
unexpected behavior. Are you sure you want to create this branch?
Cancel Create
1 branch 3 tags
Code

  * Local
  * Codespaces

  *  
    Clone
    HTTPS GitHub CLI
    [https://github.com/a]

    Use Git or checkout with SVN using the web URL.

    [gh repo clone apple/]

    Work fast with our official CLI. Learn more.

  * Open with GitHub Desktop
  * Download ZIP

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

@atiorh
atiorh fix relative import
...
da64000 Aug 9, 2022
fix relative import
da64000

Git stats

  * 5 commits

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
ane_transformers
fix relative import
August 8, 2022 21:03
assets
first commit
June 6, 2022 09:55
.gitignore
first commit
June 6, 2022 09:55
ACKNOWLEDGEMENTS
first commit
June 6, 2022 09:55
CODE_OF_CONDUCT.md
first commit
June 6, 2022 09:55
CONTRIBUTING.md
first commit
June 6, 2022 09:55
LICENSE.md
first commit
June 6, 2022 09:55
Makefile
first commit
June 6, 2022 09:55
README.md
installation instructions
June 7, 2022 01:23
requirements.txt
Adjust LayerNormANE bias to match torch.nn.LayerNorm equation and
pin...
July 30, 2022 00:18
setup.py
Adjust LayerNormANE bias to match torch.nn.LayerNorm equation and
pin...
July 30, 2022 00:18
View code
Apple Neural Engine (ANE) Transformers Tutorial: Optimized Deployment
of Hugging Face distilbert A Note on Unit Tests Installation &
Troubleshooting

README.md

 Apple Neural Engine (ANE) Transformers

Use ane_transformers as a reference PyTorch implementation if you are
considering deploying your Transformer models on Apple devices with
an A14 or newer and M1 or newer chip to achieve up to 10 times faster
and 14 times lower peak memory consumption compared to baseline
implementations.

ane_transformers.reference comprises a standalone reference
implementation and ane_transformers.huggingface comprises optimized
versions of Hugging Face model classes such as distilbert to
demonstrate the application of the optimization principles laid out
in our research article on existing third-party implementations.

Please check out our research article for a detailed explanation of
the optimizations as well as interactive figures to explore latency
and peak memory consumption data from our case study: Hugging Face
distilbert model deployment on various devices and operating system
versions. Below figures are non-interactive snapshots from the
research article for iPhone 13 with iOS16.0 installed:

[iPhone13_i]

[iPhone13_i]

 Tutorial: Optimized Deployment of Hugging Face distilbert

This tutorial is a step-by-step guide to the model deployment process
from the case study in our research article. The same code is used to
generate the Hugging Face distilbert performance data in the figures
above.

In order to begin the optimizations, we initialize the baseline model
as follows:

import transformers
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
baseline_model = transformers.AutoModelForSequenceClassification.from_pretrained(
    model_name,
    return_dict=False,
    torchscript=True,
).eval()

Then we initialize the mathematically equivalent but optimized model,
and we restore its parameters using that of the baseline model:

from ane_transformers.huggingface import distilbert as ane_distilbert
optimized_model = ane_distilbert.DistilBertForSequenceClassification(
    baseline_model.config).eval()
optimized_model.load_state_dict(baseline_model.state_dict())

Next we create sample inputs for the model:

tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
tokenized = tokenizer(
    ["Sample input text to trace the model"],
    return_tensors="pt",
    max_length=128,  # token sequence length
    padding="max_length",
)

We then trace the optimized model to obtain the expected input format
(Torchscript) for the coremltools conversion tool.

import torch
traced_optimized_model = torch.jit.trace(
    optimized_model,
    (tokenized["input_ids"], tokenized["attention_mask"])
)

Finally, we use coremltools to generate the Core ML model package
file and save it.

import coremltools as ct
import numpy as np
ane_mlpackage_obj = ct.convert(
    traced_optimized_model,
    convert_to="mlprogram",
    inputs=[
        ct.TensorType(
                f"input_{name}",
                    shape=tensor.shape,
                    dtype=np.int32,
                ) for name, tensor in tokenized.items()
            ],
            compute_units=ct.ComputeUnit.ALL,
)
out_path = "HuggingFace_ane_transformers_distilbert_seqLen128_batchSize1.mlpackage"
ane_mlpackage_obj.save(out_path)

To verify performance, developers can now launch Xcode and simply add
this model package file as a resource in their projects. After
clicking on the Performance tab, the developer can generate a
performance report on locally available devices, for example, on the
Mac that is running Xcode or another Apple device that is connected
to that Mac. The figure below shows a performance report generated
for this model on an iPhone 13 Pro Max with iOS 16.0 installed.

[xcode_perf]

Based on the figure above, the latency is improved by a factor of
2.84 times for the sequence length of 128 and batch size of 1 that
were chosen for the tutorial. Higher sequence lengths, such as 512,
and batch sizes, such as 8, will yield up to 10 times lower latency
and 14 times lower peak memory consumption. Please refer to Figure 2
from our research article for detailed and interactive performance
data.

Note that the load and compilation times increase due to the number
of operations increasing in the optimized model but these are
one-time costs and user experience will not be affected if the model
is loaded asynchronously.

Note that 4 of the 606 operations in the optimized model are executed
on the CPU. These are the embedding lookup related operations and
they are more efficient to do on the CPU for this particular model
configuration.

 A Note on Unit Tests

The unit tests measure, among other things, the ANE speed-up factor.
Since the device spec for this reference implementation is M1 or
newer chips for the Mac and A14 and newer chips for the iPhone and
iPad, the speed-up unit tests will print a warning message if
executed on devices outside of this spec. Even if the model is
generated using an out of spec Mac, the model should work as expected
on in-spec devices.

 Installation & Troubleshooting

  * Fastest: pip install ane_transformers
  * Locally editable: pip install -e .
  * If installation fails with ERROR: Failed building wheel for
    tokenizers or error: can't find Rust compiler, please follow this
    solution

About

Reference implementation of the Transformer architecture optimized
for Apple Neural Engine (ANE)

Resources

Readme

License

View license

Code of conduct

Code of conduct

Stars

656 stars

Watchers

34 watching

Forks

31 forks

Releases 3

 
v0.1.3 Latest
Aug 9, 2022
+ 2 releases

Packages 0

No packages published

Languages

  * Python 99.7%
  * Makefile 0.3%

Footer

 (c) 2023 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact GitHub
  * Pricing
  * API
  * Training
  * Blog
  * About

You can't perform that action at this time.
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session.