https://petals.dev/ [logo] Petals Run large language models at home, BitTorrent-style * Generate text with Llama 2 (70B), Falcon (180B), BLOOM (176B) (or their derivatives), and fine-tune them for your tasks -- using a consumer-grade GPU or Google Colab. * You load a small part of the model, then join a network of people serving the other parts. Single-batch inference runs at up to 6 tokens/sec for Llama 2 (70B) and up to 4 tokens/sec for Falcon (180B) -- enough for chatbots and interactive apps. * Beyond classic LLM APIs -- you can employ any fine-tuning and sampling methods, execute custom paths through the model, or see its hidden states. You get the comforts of an API with the flexibility of PyTorch and Transformers interfaces. Thanks for subscribing! We will email you only if we have really exciting updates. Try now in Colab Docs on GitHub Top contributors right now: Loading... Want to contribute? Check out available models and help hosting one of them! Join our Discord or subscribe via email to follow Petals development: [ ] Subscribe We send updates once a few months. No spam. Submitting... We sent you an email to confirm your address. Click it and you're in! Featured on: [techcrunch] This project is a part of the BigScience research workshop. [bigscience]