https://play.ht/news/introducing-play-3-0-mini/

Skip to content
Play HT

  * Products arrow

    Products

      + Submenu
        Ai Voice Agents arrow

        Create conversational human-like agents using realtime, low-
        latency state of the art voice ai

      + Submenu
        Ultra realistic Al voices

        Next generation Al speech technology, our voices capture
        emotion from text to generate speech that is truly human-like

      + Submenu
        Text to Speech

        800+ Al Voices in 130+ languages with great customizability
        and control

      + Submenu
        Text to Speech API

        Enjoy low latency, high-quality AI voices for any project you
        dream of.

      + Submenu
        Answering Service

        Customize & launch your AI virtual receptionist in minutes.

      + Submenu
        Voice Cloning

        Create high-fidelity voice clones that are 100% accurate to
        their real human voices

      + Submenu
        Al Pronunciation

        Create custom pronunciations of acronyms, niche terms, and
        save them in your pronunciation library

      + Submenu
        Audio Widgets

        Plug-and-play, and fully customizable audio widgets for your
        websites to increase accessibility, time on page metrics and
        user engagement

      + Submenu
        Al Podcasts

        Create and publish your audio content to iTunes, Spotify and
        Google Podcasts

  * Use Cases arrow

    Use Cases

      + Submenu
        Videos

        Upload videos, transcribe, sync audio to videos easily with
        our Ultra Realistic editor

      + Submenu
        Elearning and Training

        For Learning & Development teams, Training course providers
        and educators

      + Submenu
        IVR System

        Create humanlike Al voice responses for IVR Systems

      + Submenu
        Audio Articles and Accessability

        Engage, Retain and Attract new audience with audio

      + Submenu
        YouTube videos

        Easily narrate your YouTube videos with Al Voice Generator

      + Submenu
        Tik Tok videos

        Discover Al voices to narrate your TikTok videos

      + Submenu
        Character Voice Generator

        Create stunning voices for your characters in games,
        animation, and cartoons

      + Submenu
        Celebrity Voice Generator

        Capture any celebrity voice and generate speech that is
        identical to the original voice

  * Resources arrow

    Resources

      + Blog
        Blog arrow
      + AI Apps
        AI Apps arrow
      + API Documentation
        API Documentation arrow
      + Submenu
        Help Guides arrow
      + Podcast
        Podcast arrow
      + API Playground
        API Playground arrow
  * Pricing
  * AI Voice Agents
  * About Us
  * Login
  * Try for free

Log in Try for Free

October 11, 2024

Introducing Play 3.0 mini - A lightweight, reliable and
cost-efficient Multilingual Text-to-Speech model

  * copy the link
  * Share to Linkedin
  * Share to Twitter
  * Share to Facebook

Introducing Play 3.0 mini - A lightweight, reliable and
cost-efficient Multilingual Text-to-Speech model

Today we're releasing our most capable and conversational voice model
that can speak in 30+ languages using any voice or accent, with
industry leading speed and accuracy. We're also releasing 50+ new
conversational AI voices across languages.

Our mission is to make voice AI accessible, personal and capable for
all. Part of that mission is to advance the current state of
interactive voice technology in conversational AI and elevate user
experience.

When you're building real time applications using TTS, a few things
really matter - latency, reliability, quality and naturalness of
speech. While we've been leading on latency and naturalness of speech
with our previous generation models, Play 3.0 mini makes significant
improvements to reliability and audio quality while still being the
fastest and most conversational voice model.

Play3.0 mini is the first in a series of efficient multi-lingual AI
text-to-speech models we plan to release over the coming months. Our
goal is to make the models smaller and cost-efficient so they can be
run on devices and at scale.

Play 3.0 mini is our fastest, most conversational speech model yet

3.0 mini achieves a mean latency of 189 milliseconds for TTFB, making
it our fastest AI Text to Speech model. It supports text-in streaming
from LLMs and audio-out streaming, and can be used via our HTTP REST
API, websockets API or SDKs. 3.0 mini is also more efficient than
Play 2.0, and runs inference 28% faster.

Play 3.0 mini supports 30+ languages across any voice

Play 3.0 mini now supports more than 30+ languages, many with
multiple male and female voice options out of the box.  Our English,
Japanese, Hindi, Arabic, Spanish, Italian, German, French, and
Portuguese voices are available now for production use cases, and are
available through our API and on our playground.  Additionally,
Afrikaans, Bulgarian, Croatian, Czech, Hebrew, Hungarian, Indonesian,
Malay, Mandarin, Polish, Serbian, Swedish, Tagalog, Thai, Turkish,
Ukrainian, Urdu, and Xhosa are available for testing.

Play 3.0 mini is more accurate

Our goal with Play 3.0 mini was to build the best TTS model for
conversational AI. To achieve this, the model had to outperform
competitor models in latency and accuracy while generating speech in
the most conversational tone.

LLMs hallucinate and voice LLMs are no different. Hallucinations in
voice LLMs can be in the form of extra or missed words or numbers in
the output audio not part of the input text. Sometimes they can just
be random sounds in the audio. This makes it difficult to use
generative voice models reliably.

Here are some challenging text prompts that most TTS models struggle
to get right -

    "Okay, so your flight UA2390 from San Francisco to Las Vegas on
    November 3rd is confirmed. And, your ticket number is F X 2, 3 9
    A, 7 R T. The flight is scheduled to depart at 2:45 p.m. Is there
    anything else I can assist you with?"

    "Now, when people RSVP, they can call the event coordinator at 
    555 342 1234, but if they need more details, they can also call
    the backup number, which is 416 789 0123."

    "I've successfully processed your order and I'd like to confirm
    your product ID. It is A as in Alpha, 1, 2, 3, B as in Bravo, 5,
    6, 7,  Z as in Zulu, 8, 9, 0,  X as in X-ray."

3.0 mini was finetuned specifically on a diverse dataset of
alpha-numeric phrases to make it reliable for critical use cases
where important information such as phone numbers, passport numbers,
dates, currencies, etc. can't be misread.

Play 3.0 mini reads alphanumeric sequences more naturally

We've trained the model to read numbers and acronyms just like humans
do. The model adjusts its pace and slows down any alpha-numeric
characters. Phone numbers for instance are read out with more natural
pacing, and similarly all acronyms and abbreviations. This makes the
overall conversational experience more natural.

    "Alright, let's troubleshoot your laptop issue. First, let's
    confirm your device's ID so we're on the same page. The I D is
    894-d94-774-496-438-9b0-d2. Did I get that right?"

Play 3.0 mini achieves the best voice similarity for voice cloning

When cloning voices, close often isn't good enough.  Play 3.0 voice
cloning achieves state-of-the-art performance when cloning voices,
ensuring accurate reproduction of accent, tone, and inflection of
cloned voices.  In benchmarking using a popular open source embedding
model, we lead competitor models by a wide margin for similarity to
the original voice.  Try it for yourself by cloning your own voice,
and talking to yourself on https://play.ai 

Websockets API Support

3.0 mini's API now supports websockets, which significantly reduces
the overhead of opening and closing HTTP connections, and makes it
easier than ever to enable text-in streaming from LLMs or other
sources.

Play 3.0 mini is a cost efficient model

We're happy to announce reduced pricing for our higher volume Startup
and Growth tiers, and have now introduced a new Pro tier at $49 a
month for businesses with more modest requirements.  Check out our
new pricing table here.

We look forward to seeing what you build with us!  If you've custom,
high volume requirements, feel free to contact our sales team.

Share this news

  * copy the link
  * Share to Linkedin
  * Share to Twitter
  * Share to Facebook

Previous Announcements

 
[featured-p]

October 12, 2023

Introducing PlayHT 2.0 Turbo [?][?] - The Fastest Generative AI
Text-to-Speech API

TL;DR We are thrilled to announce the release of the FASTEST Voice
LLM to date! Experience real-time speech streaming from...

Read More Arrow
 
[featured-p]

August 9, 2023

Introducing PlayHT1.0: A Truly Realistic Text to Speech Model with
Emotion and Laughter

Today we're introducing the first ever Generative Text to Voice AI
model that's capable of synthesizing humanlike speech with
incredible...

Read More Arrow
 
[text-to-sp]

August 7, 2023

Introducing Cross-Language Voice Cloning while preserving Speaker
Accent

Today we're announcing a new feature that enables non-English
speakers to clone their voices to create English speaking clones
of...

Read More Arrow
 
[featured-p]

August 6, 2023

Introducing PlayHT2.0: The state-of-the-art Generative Voice AI Model
for Conversational Speech

Today we're introducing a new Generative Text-to-Voice AI Model
that's trained and built to generate conversational speech. This
model also...

Read More Arrow
 
[IMG_4712-s]

March 29, 2023

Play.ht hits GDC 2023: After Action Report

PlayHT at GDC 2023. A full recap. We believe that AI voices have a
bright future in game development. With...

Read More Arrow
 
[featured-p]

June 12, 2020

Out With the Old, In with the New. Welcome to PlayHT!

Today, we're announcing that we're making a slight yet important
change to our punctuation. We're removing the full stop between...

Read More Arrow

  * logo
  * logo
  * logo
  * logo

  * About us
  * Company
  * Contact Us
  * Affiliates
  * Pricing
  * Help Guides
  * Media Kit
  * Blog

  * Products
  * Text to Speech
  * AI Pronunciation
  * AI Audio Widgets
  * AI Voice Podcast Generator
  * Ultra Realistic AI Voice
  * Answering Service
  * AI Team Access
  * AI Voice Cloning
  * Usecases
  * AI Voiceover for Videos
  * E-learning
  * AI Interactive Voice Response (IVR)
  * Audio Accessiblity
  * YouTube videos
  * TikTok videos
  * TTS API

  * Help Guides
  * Roadmap
  * Podcast
  * Affiliate Program
  * AI Apps
  * Compare
  * Answering Services Near You

(c) 2024 PlayHT

  * Privacy Policy
  * Terms of Service
  * GDPR Compliance

Text to speech Voices Arrow

  * Afghan Pashto,
  * Albanian,
  * Algerian Arabic,
  * American English,
  * American Spanish,
  * Arabic,
  * Argentinean Spanish,
  * Australian English,
  * Austrian German,
  * Azerbaijani,
  * Bahraini Arabic,
  * Bangladeshi Bengali,
  * Belgian Dutch,
  * Belgian French,
  * Bolivian Spanish,
  * Bosnian - Herzegovinian Bosnian,
  * Brazilian Portuguese,
  * British English,
  * British Welsh,
  * Bulgarian,
  * Burmese,
  * Cambodian Khmer,
  * Canadian English,
  * Canadian French,
  * Chilean Spanish,
  * Chinese,
  * Colombian Spanish,
  * Costa Rican Spanish,
  * Croatian,
  * Cuban Spanish,
  * Czech,
  * Danish,
  * Dominican (Dominican Republic) Spanish,
  * Dutch,
  * Ecuadorean Spanish,
  * Egyptian Arabic,
  * Emirian Arabic,
  * English,
  * Equatorial Guinean Spanish,
  * Estonian,
  * Ethiopian Amharic,
  * Filipino,
  * Filipino English,
  * Finnish,
  * French,
  * Georgian,
  * German,
  * Greek,
  * Guatemalan Spanish,
  * Honduran Spanish,
  * Hong Kong Chinese,
  * Hong Kong English,
  * Hungarian,
  * Icelandic,
  * Indian Bengali,
  * Indian English,
  * Indian Gujarati,
  * Indian Hindi,
  * Indian Kannada,
  * Indian Malayalam,
  * Indian Marathi,
  * Indian Panjabi,
  * Indian Tamil,
  * Indian Telugu,
  * Indian Urdu,
  * Indonesian,
  * Indonesian Javanese,
  * Indonesian Sundanese,
  * Iranian Persian,
  * Iraqi Arabic,
  * Irish,
  * Irish English,
  * Israeli Hebrew,
  * Italian,
  * Japanese,
  * Jordanian Arabic,
  * Kazakhstani Kazakh,
  * Kenyan English,
  * Kenyan Swahili,
  * Kuwaiti Arabic,
  * Laotian Lao,
  * Latvian,
  * Lebanese Arabic,
  * Libyan Arabic,
  * Lithuanian,
  * Macedonian,
  * Malaysian Malay,
  * Malaysian Tamil,
  * Maltese,
  * Mexican Spanish,
  * Modern Standard Arabic,
  * Mongolian,
  * Moroccan Arabic,
  * Nepalese Nepali,
  * New Zealander English,
  * Nicaraguan Spanish,
  * Nigerien English,
  * Norwegian Bokmal,
  * Omani Arabic,
  * Pakistani Urdu,
  * Panamanian Spanish,
  * Paraguayan Spanish,
  * Peruvian Spanish,
  * Polish,
  * Portuguese,
  * Puerto Rico Spanish,
  * Qatari Arabic,
  * Romanian,
  * Russian,
  * Salvadoran Spanish,
  * Saudi Arabic,
  * Serbian,
  * Singaporean English,
  * Singaporean Tamil,
  * Slovak,
  * Slovenian,
  * Somali,
  * South African Afrikaans,
  * South African English,
  * South African Zulu,
  * South Korean,
  * Spanish,
  * Spanish Catalan,
  * Spanish Galician,
  * Sri Lankan Sinhala,
  * Sri Lankan Tamil,
  * Swedish,
  * Swiss French,
  * Swiss German,
  * Syrian Arabic,
  * Taiwanese Chinese,
  * Tanzanian English,
  * Tanzanian Swahili,
  * Thai,
  * Tunisian Arabic,
  * Turkish,
  * Ukrainian,
  * Uruguayan Spanish,
  * Uzbek,
  * Venezuelan Spanish,
  * Vietnamese,
  * Welsh English,
  * Yemenite Arabic