https://github.com/xenova/transformers.js
Skip to content
Toggle navigation
Sign in
* Product
+
Actions
Automate any workflow
+
Packages
Host and manage packages
+
Security
Find and fix vulnerabilities
+
Codespaces
Instant dev environments
+
Copilot
Write better code with AI
+
Code review
Manage code changes
+
Issues
Plan and track work
+
Discussions
Collaborate outside of code
Explore
+ All features
+ Documentation
+ GitHub Skills
+ Blog
* Solutions
For
+ Enterprise
+ Teams
+ Startups
+ Education
By Solution
+ CI/CD & Automation
+ DevOps
+ DevSecOps
Resources
+ Learning Pathways
+ White papers, Ebooks, Webinars
+ Customer Stories
+ Partners
* Open Source
+
GitHub Sponsors
Fund open source developers
+
The ReadME Project
GitHub community articles
Repositories
+ Topics
+ Trending
+ Collections
* Pricing
Search or jump to...
Search code, repositories, users, issues, pull requests...
Search
[ ]
Clear
Search syntax tips
Provide feedback
We read every piece of feedback, and take your input very seriously.
[ ] [ ] Include my email address so I can be
contacted
Cancel Submit feedback
Saved searches
Use saved searches to filter your results more quickly
Name [ ]
Query [ ]
To see all available qualifiers, see our documentation.
Cancel Create saved search
Sign in
Sign up
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session. You switched accounts on another tab or window. Reload
to refresh your session. Dismiss alert
{{ message }}
xenova / transformers.js Public
*
* Notifications
* Fork 389
* Star 7k
*
State-of-the-art Machine Learning for the web. Run Transformers
directly in your browser, with no need for a server!
huggingface.co/docs/transformers.js
License
Apache-2.0 license
7k stars 389 forks Branches Tags Activity
Star
Notifications
* Code
* Issues 135
* Pull requests 28
* Actions
* Projects 0
* Security
* Insights
Additional navigation options
* Code
* Issues
* Pull requests
* Actions
* Projects
* Security
* Insights
xenova/transformers.js
This commit does not belong to any branch on this repository, and may
belong to a fork outside of the repository.
main
BranchesTags
Go to file
Code
Folders and files
Name Name Last commit Last commit
message date
Latest commit
History
1,057 Commits
.github .github
docs docs
examples examples
scripts scripts
src src
tests tests
.gitattributes .gitattributes
.gitignore .gitignore
LICENSE LICENSE
README.md README.md
jest.config.mjs jest.config.mjs
jsconfig.json jsconfig.json
package-lock.json package-lock.json
package.json package.json
webpack.config.js webpack.config.js
View all files
Repository files navigation
* README
* Apache-2.0 license
transformers.js javascript library logo
NPM NPM Downloads jsDelivr Hits License Documentation
State-of-the-art Machine Learning for the web. Run Transformers
directly in your browser, with no need for a server!
Transformers.js is designed to be functionally equivalent to Hugging
Face's transformers python library, meaning you can run the same
pretrained models using a very similar API. These models support
common tasks in different modalities, such as:
* Natural Language Processing: text classification, named entity
recognition, question answering, language modeling,
summarization, translation, multiple choice, and text generation.
* [?] Computer Vision: image classification, object detection, and
segmentation.
* [?] Audio: automatic speech recognition and audio classification.
* Multimodal: zero-shot image classification.
Transformers.js uses ONNX Runtime to run models in the browser. The
best part about it, is that you can easily convert your pretrained
PyTorch, TensorFlow, or JAX models to ONNX using Optimum.
For more information, check out the full documentation.
Quick tour
It's super simple to translate from existing code! Just like the
python library, we support the pipeline API. Pipelines group together
a pretrained model with preprocessing of inputs and postprocessing of
outputs, making it the easiest way to run models with the library.
Python (original) Javascript (ours)
from transformers import pipeline import { pipeline } from '@xenova/transformers';
# Allocate a pipeline for sentiment-analysis // Allocate a pipeline for sentiment-analysis
pipe = pipeline('sentiment-analysis') let pipe = await pipeline('sentiment-analysis');
out = pipe('I love transformers!') let out = await pipe('I love transformers!');
# [{'label': 'POSITIVE', 'score': 0.999806941}] // [{'label': 'POSITIVE', 'score': 0.999817686}]
You can also use a different model by specifying the model id or path
as the second argument to the pipeline function. For example:
// Use a different model for sentiment-analysis
let pipe = await pipeline('sentiment-analysis', 'Xenova/bert-base-multilingual-uncased-sentiment');
Installation
To install via NPM, run:
npm i @xenova/transformers
Alternatively, you can use it in vanilla JS, without any bundler, by
using a CDN or static hosting. For example, using ES Modules, you can
import the library with:
Examples
Want to jump straight in? Get started with one of our sample
applications/templates:
Name Description Links
Whisper Web Speech recognition w/ Whisper code, demo
Doodle Dash Real-time sketch-recognition blog, code,
game demo
Code Playground In-browser code completion code, demo
website
Semantic Image Search Search for images with text code, demo
(client-side)
Semantic Image Search Search for images with text code, demo
(server-side) (Supabase)
Vanilla JavaScript In-browser object detection video, code,
demo
React Multilingual translation code, demo
website
Text to speech In-browser speech synthesis code, demo
(client-side)
Browser extension Text classification extension code
Electron Text classification code
application
Next.js (client-side) Sentiment analysis (in-browser code, demo
inference)
Next.js (server-side) Sentiment analysis (Node.js code, demo
inference)
Node.js Sentiment analysis API code
Demo site A collection of demos code, demo
Check out the Transformers.js template on Hugging Face to get started
in one click!
Custom usage
By default, Transformers.js uses hosted pretrained models and
precompiled WASM binaries, which should work out-of-the-box. You can
customize this as follows:
Settings
import { env } from '@xenova/transformers';
// Specify a custom location for models (defaults to '/models/').
env.localModelPath = '/path/to/models/';
// Disable the loading of remote models from the Hugging Face Hub:
env.allowRemoteModels = false;
// Set location of .wasm files. Defaults to use a CDN.
env.backends.onnx.wasm.wasmPaths = '/path/to/files/';
For a full list of available settings, check out the API Reference.
Convert your models to ONNX
We recommend using our conversion script to convert your PyTorch,
TensorFlow, or JAX models to ONNX in a single command. Behind the
scenes, it uses Optimum to perform conversion and quantization of
your model.
python -m scripts.convert --quantize --model_id
For example, convert and quantize bert-base-uncased using:
python -m scripts.convert --quantize --model_id bert-base-uncased
This will save the following files to ./models/:
bert-base-uncased/
+-- config.json
+-- tokenizer.json
+-- tokenizer_config.json
+-- onnx/
+-- model.onnx
+-- model_quantized.onnx
For the full list of supported architectures, see the Optimum
documentation.
Supported tasks/models
Here is the list of all tasks and architectures currently supported
by Transformers.js. If you don't see your task/model listed here or
it is not yet supported, feel free to open up a feature request here.
To find compatible models on the Hub, select the "transformers.js"
library tag in the filter menu (or visit this link). You can refine
your search by selecting the task you're interested in (e.g.,
text-classification).
Tasks
Natural Language Processing
Task ID Description Supported?
Masking some of
the words in a
sentence and (docs)
Fill-Mask fill-mask predicting which (models)
words should
replace those
masks.
Retrieve the
Question question-answering answer to a (docs)
Answering question from a (models)
given text.
Sentence Determining how (docs)
Similarity sentence-similarity similar two texts (models)
are.
Producing a
shorter version of
Summarization summarization a document while (docs)
preserving its (models)
important
information.
Answering a
Table Question table-question-answering question about
Answering information from a
given table.
Text text-classification or Assigning a label (docs)
Classification sentiment-analysis or class to a (models)
given text.
Producing new text
Text text-generation by predicting the (docs)
Generation next word in a (models)
sequence.
Converting one
Text-to-text text2text-generation text sequence into (docs)
Generation another text (models)
sequence.
Token token-classification or Assigning a label (docs)
Classification ner to each token in a (models)
text.
Converting text (docs)
Translation translation from one language (models)
to another.
Classifying text
Zero-Shot zero-shot-classification into classes that (docs)
Classification are unseen during (models)
training.
Transforming raw
data into
numerical features
Feature feature-extraction that can be (docs)
Extraction processed while (models)
preserving the
information in the
original dataset.
Vision
Task ID Description Supported?
Predicting the
Depth depth-estimation depth of objects (docs)
Estimation present in an (models)
image.
Image Assigning a label (docs)
Classification image-classification or class to an (models)
entire image.
Divides an image
into segments
where each pixel
is mapped to an
object. This task
Image has multiple (docs)
Segmentation image-segmentation variants such as (models)
instance
segmentation,
panoptic
segmentation and
semantic
segmentation.
Transforming a
source image to
match the (docs)
Image-to-Image image-to-image characteristics of (models)
a target image or
a target image
domain.
Mask Generate masks for
Generation mask-generation the objects in an
image.
Identify objects
Object object-detection of certain defined (docs)
Detection classes within an (models)
image.
Video Assigning a label
Classification n/a or class to an
entire video.
Generating images
Unconditional with no condition
Image n/a in any context
Generation (like a prompt
text or another
image).
Transforming raw
data into
numerical features
Image Feature image-feature-extraction that can be (docs)
Extraction processed while (models)
preserving the
information in the
original image.
Audio
Task ID Description Supported?
Assigning a
Audio audio-classification label or class (docs)
Classification to a given (models)
audio.
Generating audio
Audio-to-Audio n/a from an input
audio source.
Automatic Transcribing a (docs)
Speech automatic-speech-recognition given audio into (models)
Recognition text.
Generating
Text-to-Speech text-to-speech or natural-sounding (docs)
text-to-audio speech given (models)
text input.
Tabular
Task ID Description Supported?
Tabular n/ Classifying a target category (a group)
Classification a based on set of attributes.
Tabular n/ Predicting a numerical value given a set
Regression a of attributes.
Multimodal
Task ID Description Supported?
Document Answering
Question document-question-answering questions on (docs)
Answering document (models)
images.
Output text (docs)
Image-to-Text image-to-text from a given (models)
image.
Generates
Text-to-Image text-to-image images from
input text.
Answering
Visual open-ended
Question visual-question-answering questions
Answering based on an
image.
Classifying
Zero-Shot audios into
Audio zero-shot-audio-classification classes that (docs)
Classification are unseen (models)
during
training.
Classifying
Zero-Shot images into
Image zero-shot-image-classification classes that (docs)
Classification are unseen (models)
during
training.
Identify
Zero-Shot objects of
Object zero-shot-object-detection classes that (docs)
Detection are unseen (models)
during
training.
Reinforcement Learning
Task ID Description Supported?
Learning from actions by interacting with
Reinforcement n/ an environment through trial and error
Learning a and receiving rewards (negative or
positive) as feedback.
Models
1. ALBERT (from Google Research and the Toyota Technological
Institute at Chicago) released with the paper ALBERT: A Lite BERT
for Self-supervised Learning of Language Representations, by
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel,
Piyush Sharma, Radu Soricut.
2. Audio Spectrogram Transformer (from MIT) released with the paper
AST: Audio Spectrogram Transformer by Yuan Gong, Yu-An Chung,
James Glass.
3. BART (from Facebook) released with the paper BART: Denoising
Sequence-to-Sequence Pre-training for Natural Language
Generation, Translation, and Comprehension by Mike Lewis, Yinhan
Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer
Levy, Ves Stoyanov and Luke Zettlemoyer.
4. BEiT (from Microsoft) released with the paper BEiT: BERT
Pre-Training of Image Transformers by Hangbo Bao, Li Dong, Furu
Wei.
5. BERT (from Google) released with the paper BERT: Pre-training of
Deep Bidirectional Transformers for Language Understanding by
Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
6. Blenderbot (from Facebook) released with the paper Recipes for
building an open-domain chatbot by Stephen Roller, Emily Dinan,
Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle
Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
7. BlenderbotSmall (from Facebook) released with the paper Recipes
for building an open-domain chatbot by Stephen Roller, Emily
Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu,
Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason
Weston.
8. BLOOM (from BigScience workshop) released by the BigScience
Workshop.
9. CamemBERT (from Inria/Facebook/Sorbonne) released with the paper
CamemBERT: a Tasty French Language Model by Louis Martin*,
Benjamin Muller*, Pedro Javier Ortiz Suarez*, Yoann Dupont,
Laurent Romary, Eric Villemonte de la Clergerie, Djame Seddah and
Benoit Sagot.
10. Chinese-CLIP (from OFA-Sys) released with the paper Chinese CLIP:
Contrastive Vision-Language Pretraining in Chinese by An Yang,
Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou,
Chang Zhou.
11. CLAP (from LAION-AI) released with the paper Large-scale
Contrastive Language-Audio Pretraining with Feature Fusion and
Keyword-to-Caption Augmentation by Yusong Wu, Ke Chen, Tianyu
Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov.
12. CLIP (from OpenAI) released with the paper Learning Transferable
Visual Models From Natural Language Supervision by Alec Radford,
Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh,
Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin,
Jack Clark, Gretchen Krueger, Ilya Sutskever.
13. CLIPSeg (from University of Gottingen) released with the paper
Image Segmentation Using Text and Image Prompts by Timo Luddecke
and Alexander Ecker.
14. CodeGen (from Salesforce) released with the paper A
Conversational Paradigm for Program Synthesis by Erik Nijkamp, Bo
Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio
Savarese, Caiming Xiong.
15. CodeLlama (from MetaAI) released with the paper Code Llama: Open
Foundation Models for Code by Baptiste Roziere, Jonas Gehring,
Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi
Adi, Jingyu Liu, Tal Remez, Jeremy Rapin, Artyom Kozhevnikov,
Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton
Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Defossez, Jade
Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier,
Thomas Scialom, Gabriel Synnaeve.
16. ConvBERT (from YituTech) released with the paper ConvBERT:
Improving BERT with Span-based Dynamic Convolution by Zihang
Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng,
Shuicheng Yan.
17. ConvNeXT (from Facebook AI) released with the paper A ConvNet for
the 2020s by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph
Feichtenhofer, Trevor Darrell, Saining Xie.
18. ConvNeXTV2 (from Facebook AI) released with the paper ConvNeXt
V2: Co-designing and Scaling ConvNets with Masked Autoencoders by
Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang
Liu, In So Kweon, Saining Xie.
19. DeBERTa (from Microsoft) released with the paper DeBERTa:
Decoding-enhanced BERT with Disentangled Attention by Pengcheng
He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
20. DeBERTa-v2 (from Microsoft) released with the paper DeBERTa:
Decoding-enhanced BERT with Disentangled Attention by Pengcheng
He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
21. DeiT (from Facebook) released with the paper Training
data-efficient image transformers & distillation through
attention by Hugo Touvron, Matthieu Cord, Matthijs Douze,
Francisco Massa, Alexandre Sablayrolles, Herve Jegou.
22. Depth Anything (from University of Hong Kong and TikTok) released
with the paper Depth Anything: Unleashing the Power of
Large-Scale Unlabeled Data by Lihe Yang, Bingyi Kang, Zilong
Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao.
23. DETR (from Facebook) released with the paper End-to-End Object
Detection with Transformers by Nicolas Carion, Francisco Massa,
Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey
Zagoruyko.
24. DINOv2 (from Meta AI) released with the paper DINOv2: Learning
Robust Visual Features without Supervision by Maxime Oquab,
Timothee Darcet, Theo Moutakanni, Huy Vo, Marc Szafraniec, Vasil
Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa,
Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech
Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra,
Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve
Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, Piotr
Bojanowski.
25. DistilBERT (from HuggingFace), released together with the paper
DistilBERT, a distilled version of BERT: smaller, faster, cheaper
and lighter by Victor Sanh, Lysandre Debut and Thomas Wolf. The
same method has been applied to compress GPT2 into DistilGPT2,
RoBERTa into DistilRoBERTa, Multilingual BERT into DistilmBERT
and a German version of DistilBERT.
26. DiT (from Microsoft Research) released with the paper DiT:
Self-supervised Pre-training for Document Image Transformer by
Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
27. Donut (from NAVER), released together with the paper OCR-free
Document Understanding Transformer by Geewook Kim, Teakgyu Hong,
Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok
Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.
28. DPT (from Intel Labs) released with the paper Vision Transformers
for Dense Prediction by Rene Ranftl, Alexey Bochkovskiy, Vladlen
Koltun.
29. EfficientNet (from Google Brain) released with the paper
EfficientNet: Rethinking Model Scaling for Convolutional Neural
Networks by Mingxing Tan, Quoc V. Le.
30. ELECTRA (from Google Research/Stanford University) released with
the paper ELECTRA: Pre-training text encoders as discriminators
rather than generators by Kevin Clark, Minh-Thang Luong, Quoc V.
Le, Christopher D. Manning.
31. ESM (from Meta AI) are transformer protein language models.
ESM-1b was released with the paper Biological structure and
function emerge from scaling unsupervised learning to 250 million
protein sequences by Alexander Rives, Joshua Meier, Tom Sercu,
Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C.
Lawrence Zitnick, Jerry Ma, and Rob Fergus. ESM-1v was released
with the paper Language models enable zero-shot prediction of the
effects of mutations on protein function by Joshua Meier, Roshan
Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives.
ESM-2 and ESMFold were released with the paper Language models of
protein sequences at the scale of evolution enable accurate
structure prediction by Zeming Lin, Halil Akin, Roshan Rao, Brian
Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam
Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
32. Falcon (from Technology Innovation Institute) by Almazrouei,
Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and
Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane
and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and
Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and
Penedo, Guilherme.
33. FLAN-T5 (from Google AI) released in the repository
google-research/t5x by Hyung Won Chung, Le Hou, Shayne Longpre,
Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa
Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu,
Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery,
Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping
Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean,
Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei
34. GLPN (from KAIST) released with the paper Global-Local Path
Networks for Monocular Depth Estimation with Vertical CutDepth by
Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan
Chun, Junmo Kim.
35. GPT Neo (from EleutherAI) released in the repository EleutherAI/
gpt-neo by Sid Black, Stella Biderman, Leo Gao, Phil Wang and
Connor Leahy.
36. GPT NeoX (from EleutherAI) released with the paper GPT-NeoX-20B:
An Open-Source Autoregressive Language Model by Sid Black, Stella
Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence
Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang,
Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria
Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
37. GPT-2 (from OpenAI) released with the paper Language Models are
Unsupervised Multitask Learners by Alec Radford*, Jeffrey Wu*,
Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
38. GPT-J (from EleutherAI) released in the repository kingoflolz/
mesh-transformer-jax by Ben Wang and Aran Komatsuzaki.
39. GPTBigCode (from BigCode) released with the paper SantaCoder:
don't reach for the stars! by Loubna Ben Allal, Raymond Li, Denis
Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz
Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey,
Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel
Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry
Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni,
Bernardo Garcia del Rio, Qian Liu, Shamik Bose, Urvashi
Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco
Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish
Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite,
Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von
Werra.
40. HerBERT (from Allegro.pl, AGH University of Science and
Technology) released with the paper KLEJ: Comprehensive Benchmark
for Polish Language Understanding by Piotr Rybak, Robert
Mroczkowski, Janusz Tracz, Ireneusz Gawlik.
41. Hubert (from Facebook) released with the paper HuBERT:
Self-Supervised Speech Representation Learning by Masked
Prediction of Hidden Units by Wei-Ning Hsu, Benjamin Bolte,
Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov,
Abdelrahman Mohamed.
42. LongT5 (from Google AI) released with the paper LongT5: Efficient
Text-To-Text Transformer for Long Sequences by Mandy Guo, Joshua
Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan
Sung, Yinfei Yang.
43. LLaMA (from The FAIR team of Meta AI) released with the paper
LLaMA: Open and Efficient Foundation Language Models by Hugo
Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet,
Marie-Anne Lachaux, Timothee Lacroix, Baptiste Roziere, Naman
Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand
Joulin, Edouard Grave, Guillaume Lample.
44. Llama2 (from The FAIR team of Meta AI) released with the paper
Llama2: Open Foundation and Fine-Tuned Chat Models by Hugo
Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad
Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra,
Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher,
Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David
Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller,
Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn,
Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor
Kerkez Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh
Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana
Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor
Mihaylov, Pushka rMishra, Igor Molybog, Yixin Nie, Andrew
Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan
Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian,
Xiaoqing EllenTan, Binh Tang, Ross Taylor, Adina Williams, Jian
Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang,
Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez,
Robert Stojnic, Sergey Edunov, Thomas Scialom.
45. M2M100 (from Facebook) released with the paper Beyond
English-Centric Multilingual Machine Translation by Angela Fan,
Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky,
Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek,
Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky,
Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
46. MarianMT Machine translation models trained using OPUS data by
Jorg Tiedemann. The Marian Framework is being developed by the
Microsoft Translator Team.
47. mBART (from Facebook) released with the paper Multilingual
Denoising Pre-training for Neural Machine Translation by Yinhan
Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan
Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
48. mBART-50 (from Facebook) released with the paper Multilingual
Translation with Extensible Multilingual Pretraining and
Finetuning by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen,
Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
49. Mistral (from Mistral AI) by The Mistral AI team: Albert Jiang,
Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra
Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna
Lengyel, Guillaume Lample, Lelio Renard Lavaud, Lucile Saulnier,
Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril,
Thomas Wang, Timothee Lacroix, William El Sayed.
50. MMS (from Facebook) released with the paper Scaling Speech
Technology to 1,000+ Languages by Vineel Pratap, Andros Tjandra,
Bowen Shi, Paden Tomasello, Arun Babu, Sayani Kundu, Ali Elkahky,
Zhaoheng Ni, Apoorv Vyas, Maryam Fazel-Zarandi, Alexei Baevski,
Yossi Adi, Xiaohui Zhang, Wei-Ning Hsu, Alexis Conneau, Michael
Auli.
51. MobileBERT (from CMU/Google Brain) released with the paper
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited
Devices by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu,
Yiming Yang, and Denny Zhou.
52. MobileViT (from Apple) released with the paper MobileViT:
Light-weight, General-purpose, and Mobile-friendly Vision
Transformer by Sachin Mehta and Mohammad Rastegari.
53. MPNet (from Microsoft Research) released with the paper MPNet:
Masked and Permuted Pre-training for Language Understanding by
Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
54. MPT (from MosaiML) released with the repository llm-foundry by
the MosaicML NLP Team.
55. MT5 (from Google AI) released with the paper mT5: A massively
multilingual pre-trained text-to-text transformer by Linting Xue,
Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya
Siddhant, Aditya Barua, Colin Raffel.
56. NLLB (from Meta) released with the paper No Language Left Behind:
Scaling Human-Centered Machine Translation by the NLLB team.
57. Nougat (from Meta AI) released with the paper Nougat: Neural
Optical Understanding for Academic Documents by Lukas Blecher,
Guillem Cucurull, Thomas Scialom, Robert Stojnic.
58. OPT (from Meta AI) released with the paper OPT: Open Pre-trained
Transformer Language Models by Susan Zhang, Stephen Roller, Naman
Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
59. OWL-ViT (from Google AI) released with the paper Simple
Open-Vocabulary Object Detection with Vision Transformers by
Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann,
Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag
Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai,
Thomas Kipf, and Neil Houlsby.
60. OWLv2 (from Google AI) released with the paper Scaling
Open-Vocabulary Object Detection by Matthias Minderer, Alexey
Gritsenko, Neil Houlsby.
61. Phi (from Microsoft) released with the papers - Textbooks Are All
You Need by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio Cesar
Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan
Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi,
Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sebastien
Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi
Li, Textbooks Are All You Need II: phi-1.5 technical report by
Yuanzhi Li, Sebastien Bubeck, Ronen Eldan, Allie Del Giorno,
Suriya Gunasekar and Yin Tat Lee.
62. Qwen2 (from the Qwen team, Alibaba Group) released with the paper
Qwen Technical Report by Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu
Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei
Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin,
Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui
Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan,
Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu,
Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang,
Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang,
Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren
Zhou, Xiaohuan Zhou and Tianhang Zhu.
63. ResNet (from Microsoft Research) released with the paper Deep
Residual Learning for Image Recognition by Kaiming He, Xiangyu
Zhang, Shaoqing Ren, Jian Sun.
64. RoBERTa (from Facebook), released together with the paper
RoBERTa: A Robustly Optimized BERT Pretraining Approach by Yinhan
Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen,
Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
65. RoFormer (from ZhuiyiTechnology), released together with the
paper RoFormer: Enhanced Transformer with Rotary Position
Embedding by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen
and Yunfeng Liu.
66. SegFormer (from NVIDIA) released with the paper SegFormer: Simple
and Efficient Design for Semantic Segmentation with Transformers
by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M.
Alvarez, Ping Luo.
67. Segment Anything (from Meta AI) released with the paper Segment
Anything by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi
Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer
Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick.
68. SigLIP (from Google AI) released with the paper Sigmoid Loss for
Language Image Pre-Training by Xiaohua Zhai, Basil Mustafa,
Alexander Kolesnikov, Lucas Beyer.
69. SpeechT5 (from Microsoft Research) released with the paper
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken
Language Processing by Junyi Ao, Rui Wang, Long Zhou, Chengyi
Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang,
Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei.
70. SqueezeBERT (from Berkeley) released with the paper SqueezeBERT:
What can computer vision teach NLP about efficient neural
networks? by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna,
and Kurt W. Keutzer.
71. StableLm (from Stability AI) released with the paper StableLM 3B
4E1T (Technical Report) by Jonathan Tow, Marco Bellagente, Dakota
Mahan, Carlos Riquelme Ruiz, Duy Phung, Maksym Zhuravinskyi,
Nathan Cooper, Nikhil Pinnaparaju, Reshinth Adithyan, and James
Baicoianu.
72. Starcoder2 (from BigCode team) released with the paper StarCoder
2 and The Stack v2: The Next Generation by Anton Lozhkov, Raymond
Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier,
Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei,
Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes
Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil
Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu,
Terry Yue Zhuo, Evgenii Zheltonozhskii, Nii Osae Osae Dade,
Wenhao Yu, Lucas Krauss, Naman Jain, Yixuan Su, Xuanli He, Manan
Dey, Edoardo Abati, Yekun Chai, Niklas Muennighoff, Xiangru Tang,
Muhtasham Oblokulov, Christopher Akiki, Marc Marone, Chenghao
Mou, Mayank Mishra, Alex Gu, Binyuan Hui, Tri Dao, Armel Zebaze,
Olivier Dehaene, Nicolas Patry, Canwen Xu, Julian McAuley, Han
Hu, Torsten Scholak, Sebastien Paquet, Jennifer Robinson, Carolyn
Jane Anderson, Nicolas Chapados, Mostofa Patwary, Nima Tajbakhsh,
Yacine Jernite, Carlos Munoz Ferrandis, Lingming Zhang, Sean
Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, and Harm de
Vries.
73. Swin Transformer (from Microsoft) released with the paper Swin
Transformer: Hierarchical Vision Transformer using Shifted
Windows by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng
Zhang, Stephen Lin, Baining Guo.
74. Swin2SR (from University of Wurzburg) released with the paper
Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution
and Restoration by Marcos V. Conde, Ui-Jin Choi, Maxime Burchi,
Radu Timofte.
75. T5 (from Google AI) released with the paper Exploring the Limits
of Transfer Learning with a Unified Text-to-Text Transformer by
Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee
and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li
and Peter J. Liu.
76. T5v1.1 (from Google AI) released in the repository
google-research/text-to-text-transfer-transformer by Colin Raffel
and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan
Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J.
Liu.
77. Table Transformer (from Microsoft Research) released with the
paper PubTables-1M: Towards Comprehensive Table Extraction From
Unstructured Documents by Brandon Smock, Rohith Pesala, Robin
Abraham.
78. TrOCR (from Microsoft), released together with the paper TrOCR:
Transformer-based Optical Character Recognition with Pre-trained
Models by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei
Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
79. UniSpeech (from Microsoft Research) released with the paper
UniSpeech: Unified Speech Representation Learning with Labeled
and Unlabeled Data by Chengyi Wang, Yu Wu, Yao Qian, Kenichi
Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
80. UniSpeechSat (from Microsoft Research) released with the paper
UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH
SPEAKER AWARE PRE-TRAINING by Sanyuan Chen, Yu Wu, Chengyi Wang,
Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu
Wei, Jinyu Li, Xiangzhan Yu.
81. Vision Transformer (ViT) (from Google AI) released with the paper
An Image is Worth 16x16 Words: Transformers for Image Recognition
at Scale by Alexey Dosovitskiy, Lucas Beyer, Alexander
Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner,
Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain
Gelly, Jakob Uszkoreit, Neil Houlsby.
82. ViTMatte (from HUST-VL) released with the paper ViTMatte:
Boosting Image Matting with Pretrained Plain Vision Transformers
by Jingfeng Yao, Xinggang Wang, Shusheng Yang, Baoyuan Wang.
83. VITS (from Kakao Enterprise) released with the paper Conditional
Variational Autoencoder with Adversarial Learning for End-to-End
Text-to-Speech by Jaehyeon Kim, Jungil Kong, Juhee Son.
84. Wav2Vec2 (from Facebook AI) released with the paper wav2vec 2.0:
A Framework for Self-Supervised Learning of Speech
Representations by Alexei Baevski, Henry Zhou, Abdelrahman
Mohamed, Michael Auli.
85. Wav2Vec2-BERT (from Meta AI) released with the paper Seamless:
Multilingual Expressive and Streaming Speech Translation by the
Seamless Communication team.
86. WavLM (from Microsoft Research) released with the paper WavLM:
Large-Scale Self-Supervised Pre-Training for Full Stack Speech
Processing by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu,
Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka,
Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian,
Jian Wu, Michael Zeng, Furu Wei.
87. Whisper (from OpenAI) released with the paper Robust Speech
Recognition via Large-Scale Weak Supervision by Alec Radford,
Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya
Sutskever.
88. XLM (from Facebook) released together with the paper
Cross-lingual Language Model Pretraining by Guillaume Lample and
Alexis Conneau.
89. XLM-RoBERTa (from Facebook AI), released together with the paper
Unsupervised Cross-lingual Representation Learning at Scale by
Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav
Chaudhary, Guillaume Wenzek, Francisco Guzman, Edouard Grave,
Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
90. YOLOS (from Huazhong University of Science & Technology) released
with the paper You Only Look at One Sequence: Rethinking
Transformer in Vision through Object Detection by Yuxin Fang,
Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu,
Jianwei Niu, Wenyu Liu.
About
State-of-the-art Machine Learning for the web. Run Transformers
directly in your browser, with no need for a server!
huggingface.co/docs/transformers.js
Topics
javascript browser transformers webml
Resources
Readme
License
Apache-2.0 license
Activity
Stars
7k stars
Watchers
54 watching
Forks
389 forks
Report repository
Releases 47
2.17.0 Latest
Apr 11, 2024
+ 46 releases
Sponsor this project
*
* ko_fi ko-fi.com/xenova
* https://www.buymeacoffee.com/xenova
Learn more about GitHub Sponsors
Packages 0
No packages published
Used by 3k
* @KishanKokal
* @Mallikarjun362
* @yunusmujadidi
* @alvarovalverde03
* @kaaancan
* @n-Arno
* @xtekky
* @Thormzy
+ 3,025
Contributors 24
* @xenova
* @chelouche9
* @kungfooman
* @dependabot[bot]
* @DavidGOrtega
* @D4ve-R
* @julien-c
* @ekolve
* @perborgen
* @lsb
* @felladrin
* @josephrocca
* @Aschen
* @rubiagatra
+ 10 contributors
Languages
* JavaScript 91.3%
* Python 8.7%
Footer
(c) 2024 GitHub, Inc.
Footer navigation
* Terms
* Privacy
* Security
* Status
* Docs
* Contact
* Manage cookies
* Do not share my personal information
You can't perform that action at this time.