https://petals.dev/

[logo]

Petals

Run large language models at home, BitTorrent-style

  * Generate text with Llama 2 (70B), Falcon (180B), BLOOM (176B) (or
    their derivatives), and fine-tune them for your tasks -- using a
    consumer-grade GPU or Google Colab.
  * You load a small part of the model, then join a network of people
    serving the other parts. Single-batch inference runs at up to
    6 tokens/sec for Llama 2 (70B) and up to 4 tokens/sec for Falcon
    (180B) -- enough for chatbots and interactive apps.
  * Beyond classic LLM APIs -- you can employ any fine-tuning and
    sampling methods, execute custom paths through the model, or see
    its hidden states. You get the comforts of an API with the
    flexibility of PyTorch and  Transformers interfaces.

Thanks for subscribing!

We will email you only if we have really exciting updates.

Try now in Colab Docs on GitHub

Top contributors right now:

Loading...

Want to contribute? Check out available models and help hosting one
of them!

Join our Discord or subscribe via email
to follow Petals development:

[                    ] Subscribe
We send updates once a few months. No spam.
Submitting...
We sent you an email to confirm your address. Click it and you're in!

Featured on:

[techcrunch]

This project is a part of the BigScience research workshop.

[bigscience]