https://github.com/xenova/transformers.js

Skip to content
Toggle navigation
 
Sign in

  * Product
      +  
        Actions
        Automate any workflow
      +  
        Packages
        Host and manage packages
      +  
        Security
        Find and fix vulnerabilities
      +  
        Codespaces
        Instant dev environments
      +  
        Copilot
        Write better code with AI
      +  
        Code review
        Manage code changes
      +  
        Issues
        Plan and track work
      +  
        Discussions
        Collaborate outside of code
    Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
    For
      + Enterprise
      + Teams
      + Startups
      + Education
    By Solution
      + CI/CD & Automation
      + DevOps
      + DevSecOps
    Resources
      + Learning Pathways
      + White papers, Ebooks, Webinars
      + Customer Stories
      + Partners
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
    Repositories
      + Topics
      + Trending
      + Collections
  * Pricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search
[                    ]
Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

[                    ] [ ] Include my email address so I can be
contacted
Cancel Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name [                    ] 
Query [                    ]

To see all available qualifiers, see our documentation.

Cancel Create saved search
Sign in
Sign up
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session. You switched accounts on another tab or window. Reload
to refresh your session. Dismiss alert
{{ message }}
xenova / transformers.js Public

  * 
  * Notifications
  * Fork 389
  * Star 7k
  * 

State-of-the-art Machine Learning for the web. Run  Transformers
directly in your browser, with no need for a server!

huggingface.co/docs/transformers.js

License

Apache-2.0 license
7k stars 389 forks Branches Tags Activity
Star
Notifications

  * Code
  * Issues 135
  * Pull requests 28
  * Actions
  * Projects 0
  * Security
  * Insights

Additional navigation options

  * Code
  * Issues
  * Pull requests
  * Actions
  * Projects
  * Security
  * Insights

xenova/transformers.js

This commit does not belong to any branch on this repository, and may
belong to a fork outside of the repository.
 main
BranchesTags
  
Go to file
Code

Folders and files

      Name              Name           Last commit      Last commit
                                         message           date
Latest commit

 

History

1,057 Commits
 
.github           .github                              

docs              docs                                 

examples          examples                             

scripts           scripts                              

src               src                                  

tests             tests                                

.gitattributes    .gitattributes                       

.gitignore        .gitignore                           

LICENSE           LICENSE                              

README.md         README.md                            

jest.config.mjs   jest.config.mjs                      

jsconfig.json     jsconfig.json                        

package-lock.json package-lock.json                    

package.json      package.json                         

webpack.config.js webpack.config.js                    

View all files

Repository files navigation

  * README
  * Apache-2.0 license

                                  
               transformers.js javascript library logo

        NPM NPM Downloads jsDelivr Hits License Documentation

State-of-the-art Machine Learning for the web. Run  Transformers
directly in your browser, with no need for a server!

Transformers.js is designed to be functionally equivalent to Hugging
Face's transformers python library, meaning you can run the same
pretrained models using a very similar API. These models support
common tasks in different modalities, such as:

  *  Natural Language Processing: text classification, named entity
    recognition, question answering, language modeling,
    summarization, translation, multiple choice, and text generation.
  * [?] Computer Vision: image classification, object detection, and
    segmentation.
  * [?] Audio: automatic speech recognition and audio classification.
  *  Multimodal: zero-shot image classification.

Transformers.js uses ONNX Runtime to run models in the browser. The
best part about it, is that you can easily convert your pretrained
PyTorch, TensorFlow, or JAX models to ONNX using  Optimum.

For more information, check out the full documentation.

Quick tour

 

It's super simple to translate from existing code! Just like the
python library, we support the pipeline API. Pipelines group together
a pretrained model with preprocessing of inputs and postprocessing of
outputs, making it the easiest way to run models with the library.

                      Python (original)                                              Javascript (ours)
from transformers import pipeline                              import { pipeline } from '@xenova/transformers';

# Allocate a pipeline for sentiment-analysis                   // Allocate a pipeline for sentiment-analysis
pipe = pipeline('sentiment-analysis')                          let pipe = await pipeline('sentiment-analysis');

out = pipe('I love transformers!')                             let out = await pipe('I love transformers!');
# [{'label': 'POSITIVE', 'score': 0.999806941}]                // [{'label': 'POSITIVE', 'score': 0.999817686}]

You can also use a different model by specifying the model id or path
as the second argument to the pipeline function. For example:

// Use a different model for sentiment-analysis
let pipe = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment');

Installation

 

To install via NPM, run:

npm i @xenova/transformers

Alternatively, you can use it in vanilla JS, without any bundler, by
using a CDN or static hosting. For example, using ES Modules, you can
import the library with:

<script type="module">
    import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.17.0';
</script>

Examples

 

Want to jump straight in? Get started with one of our sample
applications/templates:

          Name                     Description              Links
Whisper Web               Speech recognition w/ Whisper  code, demo
Doodle Dash               Real-time sketch-recognition   blog, code,
                          game                           demo
Code Playground           In-browser code completion     code, demo
                          website
Semantic Image Search     Search for images with text    code, demo
(client-side)
Semantic Image Search     Search for images with text    code, demo
(server-side)             (Supabase)
Vanilla JavaScript        In-browser object detection    video, code,
                                                         demo
React                     Multilingual translation       code, demo
                          website
Text to speech            In-browser speech synthesis    code, demo
(client-side)
Browser extension         Text classification extension  code
Electron                  Text classification            code
                          application
Next.js (client-side)     Sentiment analysis (in-browser code, demo
                          inference)
Next.js (server-side)     Sentiment analysis (Node.js    code, demo
                          inference)
Node.js                   Sentiment analysis API         code
Demo site                 A collection of demos          code, demo

Check out the Transformers.js template on Hugging Face to get started
in one click!

Custom usage

 

By default, Transformers.js uses hosted pretrained models and
precompiled WASM binaries, which should work out-of-the-box. You can
customize this as follows:

Settings

 

import { env } from '@xenova/transformers';

// Specify a custom location for models (defaults to '/models/').
env.localModelPath = '/path/to/models/';

// Disable the loading of remote models from the Hugging Face Hub:
env.allowRemoteModels = false;

// Set location of .wasm files. Defaults to use a CDN.
env.backends.onnx.wasm.wasmPaths = '/path/to/files/';

For a full list of available settings, check out the API Reference.

Convert your models to ONNX

 

We recommend using our conversion script to convert your PyTorch,
TensorFlow, or JAX models to ONNX in a single command. Behind the
scenes, it uses  Optimum to perform conversion and quantization of
your model.

python -m scripts.convert --quantize --model_id <model_name_or_path>

For example, convert and quantize bert-base-uncased using:

python -m scripts.convert --quantize --model_id bert-base-uncased

This will save the following files to ./models/:

bert-base-uncased/
+-- config.json
+-- tokenizer.json
+-- tokenizer_config.json
+-- onnx/
    +-- model.onnx
    +-- model_quantized.onnx

For the full list of supported architectures, see the Optimum
documentation.

Supported tasks/models

 

Here is the list of all tasks and architectures currently supported
by Transformers.js. If you don't see your task/model listed here or
it is not yet supported, feel free to open up a feature request here.

To find compatible models on the Hub, select the "transformers.js"
library tag in the filter menu (or visit this link). You can refine
your search by selecting the task you're interested in (e.g.,
text-classification).

Tasks

 

Natural Language Processing

 

     Task                 ID               Description     Supported?
                                        Masking some of
                                        the words in a
                                        sentence and        (docs)
Fill-Mask      fill-mask                predicting which   (models)
                                        words should
                                        replace those
                                        masks.
                                        Retrieve the
Question       question-answering       answer to a         (docs)
Answering                               question from a    (models)
                                        given text.
Sentence                                Determining how     (docs)
Similarity     sentence-similarity      similar two texts  (models)
                                        are.
                                        Producing a
                                        shorter version of
Summarization  summarization            a document while    (docs)
                                        preserving its     (models)
                                        important
                                        information.
                                        Answering a
Table Question table-question-answering question about     
Answering                               information from a
                                        given table.
Text           text-classification or   Assigning a label   (docs)
Classification sentiment-analysis       or class to a      (models)
                                        given text.
                                        Producing new text
Text           text-generation          by predicting the   (docs)
Generation                              next word in a     (models)
                                        sequence.
                                        Converting one
Text-to-text   text2text-generation     text sequence into  (docs)
Generation                              another text       (models)
                                        sequence.
Token          token-classification or  Assigning a label   (docs)
Classification ner                      to each token in a (models)
                                        text.
                                        Converting text     (docs)
Translation    translation              from one language  (models)
                                        to another.
                                        Classifying text
Zero-Shot      zero-shot-classification into classes that   (docs)
Classification                          are unseen during  (models)
                                        training.
                                        Transforming raw
                                        data into
                                        numerical features
Feature        feature-extraction       that can be         (docs)
Extraction                              processed while    (models)
                                        preserving the
                                        information in the
                                        original dataset.

Vision

 

     Task                 ID               Description     Supported?
                                        Predicting the
Depth          depth-estimation         depth of objects    (docs)
Estimation                              present in an      (models)
                                        image.
Image                                   Assigning a label   (docs)
Classification image-classification     or class to an     (models)
                                        entire image.
                                        Divides an image
                                        into segments
                                        where each pixel
                                        is mapped to an
                                        object. This task
Image                                   has multiple        (docs)
Segmentation   image-segmentation       variants such as   (models)
                                        instance
                                        segmentation,
                                        panoptic
                                        segmentation and
                                        semantic
                                        segmentation.
                                        Transforming a
                                        source image to
                                        match the           (docs)
Image-to-Image image-to-image           characteristics of (models)
                                        a target image or
                                        a target image
                                        domain.
Mask                                    Generate masks for
Generation     mask-generation          the objects in an  
                                        image.
                                        Identify objects
Object         object-detection         of certain defined  (docs)
Detection                               classes within an  (models)
                                        image.
Video                                   Assigning a label
Classification n/a                      or class to an     
                                        entire video.
                                        Generating images
Unconditional                           with no condition
Image          n/a                      in any context     
Generation                              (like a prompt
                                        text or another
                                        image).
                                        Transforming raw
                                        data into
                                        numerical features
Image Feature  image-feature-extraction that can be         (docs)
Extraction                              processed while    (models)
                                        preserving the
                                        information in the
                                        original image.

Audio

 

     Task                   ID                Description    Supported?
                                            Assigning a
Audio          audio-classification         label or class    (docs)
Classification                              to a given       (models)
                                            audio.
                                            Generating audio
Audio-to-Audio n/a                          from an input    
                                            audio source.
Automatic                                   Transcribing a    (docs)
Speech         automatic-speech-recognition given audio into (models)
Recognition                                 text.
                                            Generating
Text-to-Speech text-to-speech or            natural-sounding  (docs)
               text-to-audio                speech given     (models)
                                            text input.

Tabular

 

     Task      ID               Description                Supported?
Tabular        n/ Classifying a target category (a group)  
Classification a  based on set of attributes.
Tabular        n/ Predicting a numerical value given a set 
Regression     a  of attributes.

Multimodal

 

     Task                    ID               Description  Supported?
Document                                      Answering
Question       document-question-answering    questions on  (docs)
Answering                                     document     (models)
                                              images.
                                              Output text   (docs)
Image-to-Text  image-to-text                  from a given (models)
                                              image.
                                              Generates
Text-to-Image  text-to-image                  images from  
                                              input text.
                                              Answering
Visual                                        open-ended
Question       visual-question-answering      questions    
Answering                                     based on an
                                              image.
                                              Classifying
Zero-Shot                                     audios into
Audio          zero-shot-audio-classification classes that  (docs)
Classification                                are unseen   (models)
                                              during
                                              training.
                                              Classifying
Zero-Shot                                     images into
Image          zero-shot-image-classification classes that  (docs)
Classification                                are unseen   (models)
                                              during
                                              training.
                                              Identify
Zero-Shot                                     objects of
Object         zero-shot-object-detection     classes that  (docs)
Detection                                     are unseen   (models)
                                              during
                                              training.

Reinforcement Learning

 

    Task      ID                Description                Supported?
                 Learning from actions by interacting with
Reinforcement n/ an environment through trial and error    
Learning      a  and receiving rewards (negative or
                 positive) as feedback.

Models

 

 1. ALBERT (from Google Research and the Toyota Technological
    Institute at Chicago) released with the paper ALBERT: A Lite BERT
    for Self-supervised Learning of Language Representations, by
    Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel,
    Piyush Sharma, Radu Soricut.
 2. Audio Spectrogram Transformer (from MIT) released with the paper
    AST: Audio Spectrogram Transformer by Yuan Gong, Yu-An Chung,
    James Glass.
 3. BART (from Facebook) released with the paper BART: Denoising
    Sequence-to-Sequence Pre-training for Natural Language
    Generation, Translation, and Comprehension by Mike Lewis, Yinhan
    Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer
    Levy, Ves Stoyanov and Luke Zettlemoyer.
 4. BEiT (from Microsoft) released with the paper BEiT: BERT
    Pre-Training of Image Transformers by Hangbo Bao, Li Dong, Furu
    Wei.
 5. BERT (from Google) released with the paper BERT: Pre-training of
    Deep Bidirectional Transformers for Language Understanding by
    Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
 6. Blenderbot (from Facebook) released with the paper Recipes for
    building an open-domain chatbot by Stephen Roller, Emily Dinan,
    Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle
    Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
 7. BlenderbotSmall (from Facebook) released with the paper Recipes
    for building an open-domain chatbot by Stephen Roller, Emily
    Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu,
    Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason
    Weston.
 8. BLOOM (from BigScience workshop) released by the BigScience
    Workshop.
 9. CamemBERT (from Inria/Facebook/Sorbonne) released with the paper
    CamemBERT: a Tasty French Language Model by Louis Martin*,
    Benjamin Muller*, Pedro Javier Ortiz Suarez*, Yoann Dupont,
    Laurent Romary, Eric Villemonte de la Clergerie, Djame Seddah and
    Benoit Sagot.
10. Chinese-CLIP (from OFA-Sys) released with the paper Chinese CLIP:
    Contrastive Vision-Language Pretraining in Chinese by An Yang,
    Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou,
    Chang Zhou.
11. CLAP (from LAION-AI) released with the paper Large-scale
    Contrastive Language-Audio Pretraining with Feature Fusion and
    Keyword-to-Caption Augmentation by Yusong Wu, Ke Chen, Tianyu
    Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov.
12. CLIP (from OpenAI) released with the paper Learning Transferable
    Visual Models From Natural Language Supervision by Alec Radford,
    Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh,
    Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin,
    Jack Clark, Gretchen Krueger, Ilya Sutskever.
13. CLIPSeg (from University of Gottingen) released with the paper
    Image Segmentation Using Text and Image Prompts by Timo Luddecke
    and Alexander Ecker.
14. CodeGen (from Salesforce) released with the paper A
    Conversational Paradigm for Program Synthesis by Erik Nijkamp, Bo
    Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio
    Savarese, Caiming Xiong.
15. CodeLlama (from MetaAI) released with the paper Code Llama: Open
    Foundation Models for Code by Baptiste Roziere, Jonas Gehring,
    Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi
    Adi, Jingyu Liu, Tal Remez, Jeremy Rapin, Artyom Kozhevnikov,
    Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton
    Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Defossez, Jade
    Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier,
    Thomas Scialom, Gabriel Synnaeve.
16. ConvBERT (from YituTech) released with the paper ConvBERT:
    Improving BERT with Span-based Dynamic Convolution by Zihang
    Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng,
    Shuicheng Yan.
17. ConvNeXT (from Facebook AI) released with the paper A ConvNet for
    the 2020s by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph
    Feichtenhofer, Trevor Darrell, Saining Xie.
18. ConvNeXTV2 (from Facebook AI) released with the paper ConvNeXt
    V2: Co-designing and Scaling ConvNets with Masked Autoencoders by
    Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang
    Liu, In So Kweon, Saining Xie.
19. DeBERTa (from Microsoft) released with the paper DeBERTa:
    Decoding-enhanced BERT with Disentangled Attention by Pengcheng
    He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
20. DeBERTa-v2 (from Microsoft) released with the paper DeBERTa:
    Decoding-enhanced BERT with Disentangled Attention by Pengcheng
    He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
21. DeiT (from Facebook) released with the paper Training
    data-efficient image transformers & distillation through
    attention by Hugo Touvron, Matthieu Cord, Matthijs Douze,
    Francisco Massa, Alexandre Sablayrolles, Herve Jegou.
22. Depth Anything (from University of Hong Kong and TikTok) released
    with the paper Depth Anything: Unleashing the Power of
    Large-Scale Unlabeled Data by Lihe Yang, Bingyi Kang, Zilong
    Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao.
23. DETR (from Facebook) released with the paper End-to-End Object
    Detection with Transformers by Nicolas Carion, Francisco Massa,
    Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey
    Zagoruyko.
24. DINOv2 (from Meta AI) released with the paper DINOv2: Learning
    Robust Visual Features without Supervision by Maxime Oquab,
    Timothee Darcet, Theo Moutakanni, Huy Vo, Marc Szafraniec, Vasil
    Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa,
    Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech
    Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra,
    Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve
    Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr
    Bojanowski.
25. DistilBERT (from HuggingFace), released together with the paper
    DistilBERT, a distilled version of BERT: smaller, faster, cheaper
    and lighter by Victor Sanh, Lysandre Debut and Thomas Wolf. The
    same method has been applied to compress GPT2 into DistilGPT2,
    RoBERTa into DistilRoBERTa, Multilingual BERT into DistilmBERT
    and a German version of DistilBERT.
26. DiT (from Microsoft Research) released with the paper DiT:
    Self-supervised Pre-training for Document Image Transformer by
    Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
27. Donut (from NAVER), released together with the paper OCR-free
    Document Understanding Transformer by Geewook Kim, Teakgyu Hong,
    Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok
    Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.
28. DPT (from Intel Labs) released with the paper Vision Transformers
    for Dense Prediction by Rene Ranftl, Alexey Bochkovskiy, Vladlen
    Koltun.
29. EfficientNet (from Google Brain) released with the paper
    EfficientNet: Rethinking Model Scaling for Convolutional Neural
    Networks by Mingxing Tan, Quoc V. Le.
30. ELECTRA (from Google Research/Stanford University) released with
    the paper ELECTRA: Pre-training text encoders as discriminators
    rather than generators by Kevin Clark, Minh-Thang Luong, Quoc V.
    Le, Christopher D. Manning.
31. ESM (from Meta AI) are transformer protein language models.
    ESM-1b was released with the paper Biological structure and
    function emerge from scaling unsupervised learning to 250 million
    protein sequences by Alexander Rives, Joshua Meier, Tom Sercu,
    Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C.
    Lawrence Zitnick, Jerry Ma, and Rob Fergus. ESM-1v was released
    with the paper Language models enable zero-shot prediction of the
    effects of mutations on protein function by Joshua Meier, Roshan
    Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives.
    ESM-2 and ESMFold were released with the paper Language models of
    protein sequences at the scale of evolution enable accurate
    structure prediction by Zeming Lin, Halil Akin, Roshan Rao, Brian
    Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam
    Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
32. Falcon (from Technology Innovation Institute) by Almazrouei,
    Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and
    Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane
    and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and
    Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and
    Penedo, Guilherme.
33. FLAN-T5 (from Google AI) released in the repository
    google-research/t5x by Hyung Won Chung, Le Hou, Shayne Longpre,
    Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa
    Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu,
    Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery,
    Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping
    Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean,
    Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei
34. GLPN (from KAIST) released with the paper Global-Local Path
    Networks for Monocular Depth Estimation with Vertical CutDepth by
    Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan
    Chun, Junmo Kim.
35. GPT Neo (from EleutherAI) released in the repository EleutherAI/
    gpt-neo by Sid Black, Stella Biderman, Leo Gao, Phil Wang and
    Connor Leahy.
36. GPT NeoX (from EleutherAI) released with the paper GPT-NeoX-20B:
    An Open-Source Autoregressive Language Model by Sid Black, Stella
    Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence
    Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang,
    Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria
    Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
37. GPT-2 (from OpenAI) released with the paper Language Models are
    Unsupervised Multitask Learners by Alec Radford*, Jeffrey Wu*,
    Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
38. GPT-J (from EleutherAI) released in the repository kingoflolz/
    mesh-transformer-jax by Ben Wang and Aran Komatsuzaki.
39. GPTBigCode (from BigCode) released with the paper SantaCoder:
    don't reach for the stars! by Loubna Ben Allal, Raymond Li, Denis
    Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz
    Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey,
    Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel
    Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry
    Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni,
    Bernardo Garcia del Rio, Qian Liu, Shamik Bose, Urvashi
    Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco
    Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish
    Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite,
    Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von
    Werra.
40. HerBERT (from Allegro.pl, AGH University of Science and
    Technology) released with the paper KLEJ: Comprehensive Benchmark
    for Polish Language Understanding by Piotr Rybak, Robert
    Mroczkowski, Janusz Tracz, Ireneusz Gawlik.
41. Hubert (from Facebook) released with the paper HuBERT:
    Self-Supervised Speech Representation Learning by Masked
    Prediction of Hidden Units by Wei-Ning Hsu, Benjamin Bolte,
    Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov,
    Abdelrahman Mohamed.
42. LongT5 (from Google AI) released with the paper LongT5: Efficient
    Text-To-Text Transformer for Long Sequences by Mandy Guo, Joshua
    Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan
    Sung, Yinfei Yang.
43. LLaMA (from The FAIR team of Meta AI) released with the paper
    LLaMA: Open and Efficient Foundation Language Models by Hugo
    Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet,
    Marie-Anne Lachaux, Timothee Lacroix, Baptiste Roziere, Naman
    Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand
    Joulin, Edouard Grave, Guillaume Lample.
44. Llama2 (from The FAIR team of Meta AI) released with the paper
    Llama2: Open Foundation and Fine-Tuned Chat Models by Hugo
    Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad
    Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra,
    Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher,
    Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David
    Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller,
    Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn,
    Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor
    Kerkez Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh
    Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana
    Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor
    Mihaylov, Pushka rMishra, Igor Molybog, Yixin Nie, Andrew
    Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan
    Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian,
    Xiaoqing EllenTan, Binh Tang, Ross Taylor, Adina Williams, Jian
    Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang,
    Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez,
    Robert Stojnic, Sergey Edunov, Thomas Scialom.
45. M2M100 (from Facebook) released with the paper Beyond
    English-Centric Multilingual Machine Translation by Angela Fan,
    Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky,
    Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek,
    Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky,
    Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
46. MarianMT Machine translation models trained using OPUS data by
    Jorg Tiedemann. The Marian Framework is being developed by the
    Microsoft Translator Team.
47. mBART (from Facebook) released with the paper Multilingual
    Denoising Pre-training for Neural Machine Translation by Yinhan
    Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan
    Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
48. mBART-50 (from Facebook) released with the paper Multilingual
    Translation with Extensible Multilingual Pretraining and
    Finetuning by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen,
    Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
49. Mistral (from Mistral AI) by The Mistral AI team: Albert Jiang,
    Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra
    Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna
    Lengyel, Guillaume Lample, Lelio Renard Lavaud, Lucile Saulnier,
    Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril,
    Thomas Wang, Timothee Lacroix, William El Sayed.
50. MMS (from Facebook) released with the paper Scaling Speech
    Technology to 1,000+ Languages by Vineel Pratap, Andros Tjandra,
    Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky,
    Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski,
    Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael
    Auli.
51. MobileBERT (from CMU/Google Brain) released with the paper
    MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited
    Devices by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu,
    Yiming Yang, and Denny Zhou.
52. MobileViT (from Apple) released with the paper MobileViT:
    Light-weight, General-purpose, and Mobile-friendly Vision
    Transformer by Sachin Mehta and Mohammad Rastegari.
53. MPNet (from Microsoft Research) released with the paper MPNet:
    Masked and Permuted Pre-training for Language Understanding by
    Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
54. MPT (from MosaiML) released with the repository llm-foundry by
    the MosaicML NLP Team.
55. MT5 (from Google AI) released with the paper mT5: A massively
    multilingual pre-trained text-to-text transformer by Linting Xue,
    Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya
    Siddhant, Aditya Barua, Colin Raffel.
56. NLLB (from Meta) released with the paper No Language Left Behind:
    Scaling Human-Centered Machine Translation by the NLLB team.
57. Nougat (from Meta AI) released with the paper Nougat: Neural
    Optical Understanding for Academic Documents by Lukas Blecher,
    Guillem Cucurull, Thomas Scialom, Robert Stojnic.
58. OPT (from Meta AI) released with the paper OPT: Open Pre-trained
    Transformer Language Models by Susan Zhang, Stephen Roller, Naman
    Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
59. OWL-ViT (from Google AI) released with the paper Simple
    Open-Vocabulary Object Detection with Vision Transformers by
    Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann,
    Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag
    Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai,
    Thomas Kipf, and Neil Houlsby.
60. OWLv2 (from Google AI) released with the paper Scaling
    Open-Vocabulary Object Detection by Matthias Minderer, Alexey
    Gritsenko, Neil Houlsby.
61. Phi (from Microsoft) released with the papers - Textbooks Are All
    You Need by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio Cesar
    Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan
    Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi,
    Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sebastien
    Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi
    Li, Textbooks Are All You Need II: phi-1.5 technical report by
    Yuanzhi Li, Sebastien Bubeck, Ronen Eldan, Allie Del Giorno,
    Suriya Gunasekar and Yin Tat Lee.
62. Qwen2 (from the Qwen team, Alibaba Group) released with the paper
    Qwen Technical Report by Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu
    Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei
    Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin,
    Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui
    Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan,
    Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu,
    Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang,
    Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang,
    Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren
    Zhou, Xiaohuan Zhou and Tianhang Zhu.
63. ResNet (from Microsoft Research) released with the paper Deep
    Residual Learning for Image Recognition by Kaiming He, Xiangyu
    Zhang, Shaoqing Ren, Jian Sun.
64. RoBERTa (from Facebook), released together with the paper
    RoBERTa: A Robustly Optimized BERT Pretraining Approach by Yinhan
    Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen,
    Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
65. RoFormer (from ZhuiyiTechnology), released together with the
    paper RoFormer: Enhanced Transformer with Rotary Position
    Embedding by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen
    and Yunfeng Liu.
66. SegFormer (from NVIDIA) released with the paper SegFormer: Simple
    and Efficient Design for Semantic Segmentation with Transformers
    by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M.
    Alvarez, Ping Luo.
67. Segment Anything (from Meta AI) released with the paper Segment
    Anything by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi
    Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer
    Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick.
68. SigLIP (from Google AI) released with the paper Sigmoid Loss for
    Language Image Pre-Training by Xiaohua Zhai, Basil Mustafa,
    Alexander Kolesnikov, Lucas Beyer.
69. SpeechT5 (from Microsoft Research) released with the paper
    SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken
    Language Processing by Junyi Ao, Rui Wang, Long Zhou, Chengyi
    Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang,
    Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei.
70. SqueezeBERT (from Berkeley) released with the paper SqueezeBERT:
    What can computer vision teach NLP about efficient neural
    networks? by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna,
    and Kurt W. Keutzer.
71. StableLm (from Stability AI) released with the paper StableLM 3B
    4E1T (Technical Report) by Jonathan Tow, Marco Bellagente, Dakota
    Mahan, Carlos Riquelme Ruiz, Duy Phung, Maksym Zhuravinskyi,
    Nathan Cooper, Nikhil Pinnaparaju, Reshinth Adithyan, and James
    Baicoianu.
72. Starcoder2 (from BigCode team) released with the paper StarCoder
    2 and The Stack v2: The Next Generation by Anton Lozhkov, Raymond
    Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier,
    Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei,
    Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes
    Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil
    Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu,
    Terry Yue Zhuo, Evgenii Zheltonozhskii, Nii Osae Osae Dade,
    Wenhao Yu, Lucas Krauss, Naman Jain, Yixuan Su, Xuanli He, Manan
    Dey, Edoardo Abati, Yekun Chai, Niklas Muennighoff, Xiangru Tang,
    Muhtasham Oblokulov, Christopher Akiki, Marc Marone, Chenghao
    Mou, Mayank Mishra, Alex Gu, Binyuan Hui, Tri Dao, Armel Zebaze,
    Olivier Dehaene, Nicolas Patry, Canwen Xu, Julian McAuley, Han
    Hu, Torsten Scholak, Sebastien Paquet, Jennifer Robinson, Carolyn
    Jane Anderson, Nicolas Chapados, Mostofa Patwary, Nima Tajbakhsh,
    Yacine Jernite, Carlos Munoz Ferrandis, Lingming Zhang, Sean
    Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, and Harm de
    Vries.
73. Swin Transformer (from Microsoft) released with the paper Swin
    Transformer: Hierarchical Vision Transformer using Shifted
    Windows by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng
    Zhang, Stephen Lin, Baining Guo.
74. Swin2SR (from University of Wurzburg) released with the paper
    Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution
    and Restoration by Marcos V. Conde, Ui-Jin Choi, Maxime Burchi,
    Radu Timofte.
75. T5 (from Google AI) released with the paper Exploring the Limits
    of Transfer Learning with a Unified Text-to-Text Transformer by
    Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee
    and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li
    and Peter J. Liu.
76. T5v1.1 (from Google AI) released in the repository
    google-research/text-to-text-transfer-transformer by Colin Raffel
    and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan
    Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J.
    Liu.
77. Table Transformer (from Microsoft Research) released with the
    paper PubTables-1M: Towards Comprehensive Table Extraction From
    Unstructured Documents by Brandon Smock, Rohith Pesala, Robin
    Abraham.
78. TrOCR (from Microsoft), released together with the paper TrOCR:
    Transformer-based Optical Character Recognition with Pre-trained
    Models by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei
    Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
79. UniSpeech (from Microsoft Research) released with the paper
    UniSpeech: Unified Speech Representation Learning with Labeled
    and Unlabeled Data by Chengyi Wang, Yu Wu, Yao Qian, Kenichi
    Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
80. UniSpeechSat (from Microsoft Research) released with the paper
    UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH
    SPEAKER AWARE PRE-TRAINING by Sanyuan Chen, Yu Wu, Chengyi Wang,
    Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu
    Wei, Jinyu Li, Xiangzhan Yu.
81. Vision Transformer (ViT) (from Google AI) released with the paper
    An Image is Worth 16x16 Words: Transformers for Image Recognition
    at Scale by Alexey Dosovitskiy, Lucas Beyer, Alexander
    Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner,
    Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain
    Gelly, Jakob Uszkoreit, Neil Houlsby.
82. ViTMatte (from HUST-VL) released with the paper ViTMatte:
    Boosting Image Matting with Pretrained Plain Vision Transformers
    by Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang.
83. VITS (from Kakao Enterprise) released with the paper Conditional
    Variational Autoencoder with Adversarial Learning for End-to-End
    Text-to-Speech by Jaehyeon Kim, Jungil Kong, Juhee Son.
84. Wav2Vec2 (from Facebook AI) released with the paper wav2vec 2.0:
    A Framework for Self-Supervised Learning of Speech
    Representations by Alexei Baevski, Henry Zhou, Abdelrahman
    Mohamed, Michael Auli.
85. Wav2Vec2-BERT (from Meta AI) released with the paper Seamless:
    Multilingual Expressive and Streaming Speech Translation by the
    Seamless Communication team.
86. WavLM (from Microsoft Research) released with the paper WavLM:
    Large-Scale Self-Supervised Pre-Training for Full Stack Speech
    Processing by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu,
    Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka,
    Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian,
    Jian Wu, Michael Zeng, Furu Wei.
87. Whisper (from OpenAI) released with the paper Robust Speech
    Recognition via Large-Scale Weak Supervision by Alec Radford,
    Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya
    Sutskever.
88. XLM (from Facebook) released together with the paper
    Cross-lingual Language Model Pretraining by Guillaume Lample and
    Alexis Conneau.
89. XLM-RoBERTa (from Facebook AI), released together with the paper
    Unsupervised Cross-lingual Representation Learning at Scale by
    Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav
    Chaudhary, Guillaume Wenzek, Francisco Guzman, Edouard Grave,
    Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
90. YOLOS (from Huazhong University of Science & Technology) released
    with the paper You Only Look at One Sequence: Rethinking
    Transformer in Vision through Object Detection by Yuxin Fang,
    Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu,
    Jianwei Niu, Wenyu Liu.

About

State-of-the-art Machine Learning for the web. Run  Transformers
directly in your browser, with no need for a server!

huggingface.co/docs/transformers.js

Topics

javascript browser transformers webml

Resources

Readme

License

Apache-2.0 license
Activity

Stars

7k stars

Watchers

54 watching

Forks

389 forks
Report repository

Releases 47

 
2.17.0 Latest
Apr 11, 2024
+ 46 releases

Sponsor this project

  * 
     

  * ko_fi ko-fi.com/xenova
  * https://www.buymeacoffee.com/xenova

Learn more about GitHub Sponsors

Packages 0

No packages published

Used by 3k

 

  * @KishanKokal
  * @Mallikarjun362
  * @yunusmujadidi
  * @alvarovalverde03
  * @kaaancan
  * @n-Arno
  * @xtekky
  * @Thormzy

+ 3,025

Contributors 24

  * @xenova
  * @chelouche9
  * @kungfooman
  * @dependabot[bot]
  * @DavidGOrtega
  * @D4ve-R
  * @julien-c
  * @ekolve
  * @perborgen
  * @lsb
  * @felladrin
  * @josephrocca
  * @Aschen
  * @rubiagatra

+ 10 contributors

Languages

  * JavaScript 91.3%
  * Python 8.7%

Footer

 (c) 2024 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact
  * Manage cookies
  * Do not share my personal information

You can't perform that action at this time.