hngopher.com

       [HN Gopher] Launch HN: Retell AI (YC W24) - Conversational Speec...
       ___________________________________________________________________
        
       Launch HN: Retell AI (YC W24) - Conversational Speech API for Your
       LLM
        
       Hey HN, we're the co-founders of Retell AI
       (https://www.retellai.com/). We are building a conversational
       speech engine to help developers build natural-sounding voice AI.
       Our API abstracts away the complexities of AI voice conversations,
       so you can make your voice application the best at what it does.
       Here's a demo video: https://www.youtube.com/watch?v=0LT64_mgkro.
       With the advent of LLMs and recent breakthroughs in speech
       synthesis, conversational voice AI has just gotten good enough to
       create really exciting use cases. However, developers often
       underestimate what's required to build a good and natural-sounding
       conversational voice AI. Many simply stitch together ASR (speech-
       to-text), an LLM, and TTS (text-to-speech), and expect to get a
       great experience. It turns out it's not that simple.  There's more
       going on in conversation than we consciously realize: things like
       knowing when to speak and when to listen, handling interruptions,
       0-200 ms latency and backchanneling phrases (e.g., "yeah", "uh
       huh") to signal that they are listening. These are natural for
       humans, but hard for AI to get right. Developers spend hundreds of
       hours on the AI conversation experience but end up with poor
       experiences like 4-5s long latencies, inappropriate cutoffs,
       speaking over each other, etc.  So, we built Retell AI. We have
       followed the overall paradigm of having speech-to-text, LLM, and
       text-to-speech components, but have added additional conversation
       models in between to orchestrate the conversation while allowing
       maximum configurability for the developers in each step. You can
       think of our models as adding a "domain expert" layer for the
       dynamics of conversation itself.  Retell is designed for you to
       bring your own LLM into our pipeline. Currently, we can achieve
       800ms end-to-end latency, handle interruptions, speech isolation,
       with tons of customization options (e.g., speaking rate, voice
       temperature, add ambient sound). We created a guest account for HN,
       so you can try our playground with a 10-min free trial without
       login: https://beta.retellai.com/dashboard/hn (Playground tutorial:
       https://docs.retellai.com/guide/dashboard). Our product is usage-
       based and the price is $0.1-0.17/min.  Our main product is a
       developer-facing API, but you can try it without writing code (e.g.
       create agents, connect to a phone number) via our dashboard. If you
       want to test it in production, feel free to also self-serve with
       our API documentation. One of our customers just launched, and you
       can view their demo:
       https://www.loom.com/share/64f09a53bf6d4b3799e5ebd08b23fec4?...  We
       are thrilled to see what our users are building with our API, and
       we're excited to show our product to the community and look forward
       to your feedback!
        
       Author : yanyan_evie
       Score  : 196 points
       Date   : 2024-02-21 13:18 UTC (9 hours ago)
        
       | ohadron wrote:
       | I was skeptical but the demo is incredible
       | (https://beta.retellai.com/home-agent)
        
         | yanyan_evie wrote:
         | glad you like the demo!
        
         | stuartjohnson12 wrote:
         | Wow, I agree! That was beyond expectations. The only let-down
         | was the AI contradicted itself when I tried layering on
         | conditionals. It was something like this:
         | 
         | "What time works"
         | 
         | "Morning on tuesday would be best, but I can also do afternoon"
         | 
         | "I'm sorry, I didn't catch what time in the afternoon you
         | wanted"
         | 
         | "No, I said the morning"
         | 
         | "I'm having a hard time hearing you. What time in the morning
         | did you want?"
         | 
         | "10am"
         | 
         | And from there things were fine. It seemed very rigid on
         | picking a time and didn't suggest times when I laid out a
         | range.
        
           | yanyan_evie wrote:
           | Great point! Theres's some room for the prompt to improve~
        
       | debarshri wrote:
       | Is there different language support too?
        
         | yanyan_evie wrote:
         | It's definitely in our roadmap. After the core product--the
         | voice AI part--becomes humanlike enough, we will support
         | multilingual capabilities.
        
           | debarshri wrote:
           | Spanish would be very helpful
        
             | yanyan_evie wrote:
             | Yes, if you don't mind, you could leave your email on the
             | waiting list at the footer of the website. We could keep
             | you posted!
        
               | vrc wrote:
               | I second this request. Specifically, for a lot of
               | applications, native-language "good enough" might not
               | suffice (a dental office with English speaking employees
               | and predominantly English speaking customers), but
               | between a stilted conversation with a non-native speaker
               | (broken English to English v. slightly incorrect native
               | language to their native language), there might be more
               | tolerance for some of the hiccups that AI has. As in, I
               | might get more information in a poor conversation in a
               | person's native language than us trying to communicate in
               | their poor English.
        
               | yanyan_evie wrote:
               | Yes. Great point
        
       | monkeydust wrote:
       | Just tried, its impressive but needs work - trying to book an
       | appointment out in 3 weeks, it acked that but could not confirm
       | exact date time. Still impressed.
        
         | yanyan_evie wrote:
         | Thanks!! see you then
        
           | yanyan_evie wrote:
           | If it does not work, try this
           | link:https://calendly.com/retell-ai/retell-ai-
           | user?month=2024-02
        
       | niblettc wrote:
       | This is incredible and terrifying at the same time. Does it
       | support long context? As in, can I voice chat with an instance of
       | an agent, and then later in a different chat refer to items
       | discussed in the previous chat? Can I also type / text with the
       | agent and have it recall items from a previous session?
        
         | yanyan_evie wrote:
         | That's an interesting point! We did consider adding memory to
         | the voice agent, and we have use cases like an AI therapy
         | session wanting to know the former conversation with the
         | patient. Adding the previous chat would be very helpful as
         | well.
        
           | Cheer2171 wrote:
           | > an AI therapy session
           | 
           | oh no
        
             | yanyan_evie wrote:
             | The use case I recall involves a nonprofit organization
             | focused on preventing suicide. They are hoping for an AI
             | therapy solution capable of listening to patients and
             | picking up the phone when no human is available. This isn't
             | entirely unacceptable because one of the therapist's roles
             | is to listen to problems, so AI can effectively substitute
             | in this aspect.
        
               | toomuchtodo wrote:
               | You're not wrong, and I agree this is a great use case,
               | but consider calling it crisis response vs a therapist. A
               | therapist is there to help you dig deep, over a long
               | time, crisis response is a tactical mechanism to prevent
               | imminent self harm.
               | 
               | Amazing product, looking forward to working with it.
        
       | Delumine wrote:
       | Wonder what the justifications for the different voice prices
       | are...
        
         | yanyan_evie wrote:
         | The different providers have different prices. openai tts &
         | deepgram are cheaper, 11labs are higher
        
           | echelon wrote:
           | You'll be able to build your own high quality, low latency
           | voices at scale.
        
       | dang wrote:
       | [stub for offtopicness]
        
         | Xavier_L wrote:
         | Cool!
        
           | yanyan_evie wrote:
           | Thank you!
        
         | liangludev wrote:
         | cool
        
           | yanyan_evie wrote:
           | thank you!
        
         | 369316020 wrote:
         | Very cool
        
         | langyou wrote:
         | Amazing, tried the dental front desk from playground. The voice
         | sounds very natural and could hardly tell it's AI-generated.
        
           | yanyan_evie wrote:
           | glad you like it :)
        
         | threeseed wrote:
         | Why is every comment here from an account with no other
         | comments ?
        
           | dang wrote:
           | Ugh. Sorry. Probably some of their users found out about this
           | thread.
           | 
           | I'm going to move all of this to an offtopic stub and
           | collapse it.
           | 
           | We tell founders to make sure this doesn't happen (see
           | https://news.ycombinator.com/yli.html) but I probably need to
           | make the message louder. Not everyone understands that the
           | culture of HN doesn't work this way.
        
         | productlordtr wrote:
         | Do you hire?
        
           | yanyan_evie wrote:
           | Thanks for asking. We are not hiring at this stage.
        
       | xgantan wrote:
       | With a little more tweaking and training, the voice AI will sound
       | like her in https://en.wikipedia.org/wiki/Her_(film).
        
         | yanyan_evie wrote:
         | yes... one of our favorite movies
        
       | blakeburch wrote:
       | Just tried the dental appointment example. Voice sounds great!
       | But I found two issues with sharing: - I told it I wasn't
       | available until next year. We confirmed a date. It said Feb 4th,
       | next year. I asked it when next year was and it gave me the
       | definition. On further prying, it told me the current year was
       | 2022, so next year was 2023. For a scheduling use case, it should
       | be date/availability/time zone aware. - At the end, it got into a
       | loop saying "I apologize for the confusion. Let me double check
       | your records...". After staying silent, it said "it looks like
       | we've been disconnected". I said "no, I was waiting for you to
       | check my records". The loop repeated. I eventually asked how long
       | it would take to check my records and it told me "a few minutes"
       | but still went through the "disconnected" message.
        
         | yanyan_evie wrote:
         | Thanks for the great feedback! Absolutely, with a fine-tuned
         | LLM or a better prompt, we can make the responses more
         | reasonable. We'll make a note to update our demo prompt
         | accordingly!
        
       | blindgeek wrote:
       | My friend group and I have been playing with LLMs:
       | https://news.ycombinator.com/item?id=39208451. We tend to hang
       | out in multi-user voice chat sometimes, and I've speculated that
       | it would be interesting to hook an LLM to ASR and TTS and bring
       | it into our voice chat. Yeah, naive, and to be honest, I'm not
       | even sure where to start. Have you tried bringing your
       | conversational LLM into a multi-person conversation?
        
         | yanyan_evie wrote:
         | It's a great idea. We have a use case that they want to add
         | voice agent into the zoom. Could schedule a call to talk about
         | tech design
        
       | nextworddev wrote:
       | How does this compare to vocode, another YC company?
        
         | yanyan_evie wrote:
         | If you have your own LLM, our feature is the most customizable.
         | And since we don't own an LLM, we'll focus on making our Voice
         | AI as human-like as possible.
        
         | yanyan_evie wrote:
         | I think vocode will focus more on open source libraries, they
         | have tons of integrations. We don't have any integrations, we
         | only focus on the voice AI API part and leave the LLM part to
         | customer.
        
       | jamesmcintyre wrote:
       | Until this demo the most impressive conversational experiences
       | I've seen were Pi and Livekit's Kitt demo
       | (https://livekit.io/kitt). I do not think kitt was quite as fast
       | in response time (as retell) but incredibly impressive for being
       | fully opensource and open to any choice of api's (imagine kitt
       | with groq api + deepgram's aura for super low latency).
       | 
       | Retell focusing on all of the other weird/unpredictable aspects
       | of human conversation sounds super interesting and the demo's
       | incredible.
       | 
       | Things are moving so fast, wow.
        
         | yanyan_evie wrote:
         | Thanks! Will push harder
        
         | russ wrote:
         | We recently made it a lot easier to build your own KITT too:
         | https://github.com/livekit/agents
        
       | djyaz1200 wrote:
       | I tried the demo, and it got confused and disconnected, but it's
       | a cool proof of concept. Suggest bumping up the happiness emotion
       | on the agent, and a Calendly integration would immediately unlock
       | a lot of use cases. Good luck!
        
         | yanyan_evie wrote:
         | Thanks for the suggestion! Will take a look into the confusion
         | problem
        
       | bricee98 wrote:
       | The demo was incredible, and this seems perfect for my current
       | project. I am going to try to integrate this as soon as I can to
       | see if it works for me. How responsive can I expect the support
       | to be?
        
         | yanyan_evie wrote:
         | We pride ourselves on being very responsive! We usually create
         | a Slack group with users actively integrating and answer any
         | questions ASAP
        
       | _fw wrote:
       | This is absolutely wild - I got chills when I thought about the
       | fact I'm talking to a computer. Congratulations on flying
       | straight over uncanny valley.
        
         | AustinZzx wrote:
         | Thanks for the support, we still have a lot of work ahead of us
         | to make it better!
        
       | user_7832 wrote:
       | It's really good, but the AI cracks still show up. Trying the
       | demo therapist, I mentioned I'm not finding a job. It suggested
       | finding a career counsellor and said "it would get back as soon
       | as possible"... yeah, no it didn't. It claimed to be "working on
       | it" but would say "I'm here if you want to speak...". It clearly
       | doesn't understand what it's saying, it feels like bing's ai
       | would be "better" at not claiming to do a task it can't.
        
         | ywj7931 wrote:
         | Thanks for trying that out! Retell focuses on making the AI
         | sound like human, it is developers' LLM responsibility to make
         | it think smart. The therapist in the dashboard is for demo
         | purpose only and ideally some developers will plug in their
         | great AI therapist LLM to make it more human-like :)
        
       | samstave wrote:
       | With respect to Alignment, it should be a fundamental requirement
       | that an speech AI is ____REQUIRED____ to honestly inform a Human
       | if its speaking to an AI.
       | 
       | Can you please ensure, going forward, you have the universal
       | "truth" as it were, to have your system always identify if its AI
       | when "prompted" (irrespective of what app/dev has built - your
       | API should ensure that if "safeword" is used it shall reveal its
       | AI)
       | 
       | --
       | 
       |  _" trust me, if you ask them if they are a cop, they legally
       | have to tell you they are a cop"_ (court rules its legal for cops
       | to lie to you) etc....
       | 
       | (it should be like those tones on a hold-call, to remind you that
       | youre still on hold... but instead its a constant reminder that
       | this bitch is AI) -- there should be some Root-level escape word
       | to require any use of this tool to contact a Human. That word
       | used to be "operator" MANY times, but still...
       | 
       | Maybe if a conversation with an elderly Human goes on with too
       | many "huh? I cant hear you" or "i dont understand, can you repeat
       | that" questions, your AI knows its talking to a non-tech Human,
       | and it should re-MIND the Human that youre just an AI. (meaning
       | no sympathy, emotion, it will not stop until you are dead) etc...
       | 
       | Guardrails, motherfucker, Do you speak it!"
        
         | AustinZzx wrote:
         | Good point. Currently, our product does not contain LLM, as we
         | are purely voice API -- instead the developer is bringing in
         | their own LLM solutions and gets to decide what to say. This
         | would be a great guardrail to build in for all sorts of
         | reasons, will see how we can suggest our users adopt it.
        
           | samstave wrote:
           | May I please understand your arch;
           | 
           | a dev builds an app it | to your API and you spit it back
           | out? - if so - ensure when you spit out whatever it defines
           | itself to whomever is listening....
           | 
           | --
           | 
           | Plz explain the arch of how your system works? (or link me if
           | I missed..)
           | 
           | ----
           | 
           | Shortest and most importnat law ever written:
           | 
           | "an AI must identify itself as AI when asked by Humans."
           | 
           | 0. Law of robotics.
           | 
           | ------
           | 
           | @autsin
           | 
           | -
           | 
           | Cool - so im on an important call with [your customer] your
           | system has an outage?
           | 
           | How is this handled? dropped call?
           | 
           | (I am not being cynical - im being someone who is allergic to
           | post mortems.
           | 
           | ----
           | 
           | EDIT: you need to stop using the term "user" in anything you
           | market or describe. full stop.
           | 
           | the reason: in the case of your product, the USER is the
           | motherhecker on the phone listening to anything your CUSTOMER
           | is spewing at them VIA your API.
           | 
           | the USER is who is making IRL * _> >>DECISIONS<<<*_ based on
           | what they hear from your system.
           | 
           | Your CUSTOMER is from whom you receive money.
           | 
           | THEIR customer, is whom they get money to pay you.
           | 
           | The USER is the end-point Human. who doesnt even know you
           | exist.
        
             | AustinZzx wrote:
             | We handle the audio bytes in / out, and also connect to
             | your user's server for response. We handle the interaction
             | and decide when to listen and when to talk, and send live
             | updates to our users. When a response is needed, we ask for
             | it and get it from our user.
             | 
             | Our homepage https://www.retellai.com/ has a GIF on it that
             | illustrates this point.
        
             | AustinZzx wrote:
             | Nice catch on the working -- customer is indeed more
             | accurate than user.
             | 
             | For outage handling: we strive to keep up 99.9 plus up
             | time, and in the case of a dropped call, the agent would
             | hang up if using phone, and might have different error
             | handling in web depending on how customer handles it.
        
       | esafak wrote:
       | Congratulations! How do you position yourselves against Google
       | Duplex/Dialogflow and other competitors?
       | https://cloud.google.com/dialogflow
        
         | AustinZzx wrote:
         | We strive to make conversation humanlike, so maybe less contact
         | center ops development, but more focus on performance and
         | customizability of voice interactions. As a startup, our edge
         | over big tech is being nimble and executing fast.
        
           | esafak wrote:
           | I would keep working on positioning; I feel that your
           | language is woolly at times:
           | 
           | > we focus most of our energy on innovating the AI
           | conversation experience, making it more magical day by day.
           | We pride ourselves on wowing our customers when they
           | experience our product themselves.
           | 
           | This is not useful; you already have testimonials to show
           | what customers think.
           | 
           | Maybe convert that first FAQ point about differentiation into
           | a table comparing you against the closest competitors. Since
           | you talk about performance you should measure it. Use a
           | standard benchmark if there is one for your field.
        
             | AustinZzx wrote:
             | Good point, note token. benchmarking is a great tool to
             | show differentiation. BTW, apart from what we think is
             | important ourselves (latency, mean opinion score, etc),
             | would you mind sharing what you want to see in such a
             | benchmark? One key metric I like to keep an eye on is the
             | end conversion rate of using the product, but that's very
             | use-case specific.
        
       | gfodor wrote:
       | Can you share what you're doing for TTS? Is it a proprietary
       | fully pretrained in-house model, a fine tuned open source one, or
       | a commercially licensed one?
        
         | AustinZzx wrote:
         | For TTS, we are currently integrating with different providers
         | like Elevenlabs, Openai TTS, etc. We do have plans down the
         | road to train our own TTS model.
        
           | gfodor wrote:
           | Ah thank you! What's the lowest latency option you have found
           | so far?
        
       | 101008 wrote:
       | I would feel deceived if I were a customer of any company or
       | office that uses this. If I take the trouble to call by phone,
       | it's because I want to speak with a person. If I wanted to talk
       | to a machine, I would send an email, talk to a chatbot, or even
       | try to communicate with the company through social media. Calling
       | by phone implies that I am investing time and effort, and I
       | expect the same from the other side.
        
         | AustinZzx wrote:
         | Totally understandable that most people would want to chat with
         | a human agent (I sometimes share the same feeling). However, I
         | do think that a major reason for that is voice bots were bad
         | before and could not understand and get things done, and felt
         | like waste of time. With advancements in voice AI and LLM, I'm
         | confident that there would be more use cases where talking to a
         | voice bot is not a bad experience.
        
           | vages wrote:
           | No. LLMs are worse for customer experience than their
           | predecessors: LLMs confabulate, and their language is so
           | smooth that you often need expertise to catch them in it.
           | 
           | People call customer service because they don't know what to
           | do. It would be better for most customers to talk to a bot
           | that they can catch making a mistake.
           | 
           | Recent example: https://bc.ctvnews.ca/air-canada-s-chatbot-
           | gave-a-b-c-man-th...
        
             | AustinZzx wrote:
             | Yes, I agree there are problems with LLM (hallucinations,
             | persona, etc), and that's exciting because that means room
             | for improvement and opportunities. I know many people who
             | are working hard in that field trying to make LLM converse
             | better.
             | 
             | For example - "hallucinations / LLMs confabulate":
             | techniques like RAG can help - "Language is so smooth that
             | you often need the expertise to catch them in it", fine-
             | tuning and prompt engineering can help
        
         | jp42 wrote:
         | Personally I think if bot can get things done, then I wont
         | mind. I just hope these bot don't repeat same things and don't
         | get something done
        
         | monkeydust wrote:
         | Its a good point and one the bot industry has not really
         | figured out, forget voice bot but talking about those annoying
         | ones telecom companies throw up. My immediate reaction when I
         | get a bot is to throw in a bunch of garbage to get routed to
         | human as fast as possible. When they get better, perhaps I
         | might change my behaviour.
        
         | ywj7931 wrote:
         | Personally speaking, when I called the DMV and was asked to
         | wait for 40 minutes, if an AI can help me solve that problem, I
         | wouldn't mind. But I definitely understand that different
         | people have different expectations.
        
         | aik wrote:
         | Completely disagree. You're not making a phone call in most
         | cases for entertainment purposes. If the options are wait in
         | line for 20 minutes or speak to an actually useful bot, I would
         | take the latter in 100% of cases.
        
       | sidcool wrote:
       | Congrats on launching. It feels very natural and the demo call
       | was good.
        
         | AustinZzx wrote:
         | Thanks for the support. Means a lot to us.
        
       | plutosmoon wrote:
       | Nice work. You seem to have addressed some of the challenges that
       | arise in teaching computers to speak.
       | 
       | This blog breaks it down well:
       | 
       | https://www.papercup.com/blog/realistic-synthetic-voices
        
         | AustinZzx wrote:
         | We are actively working on that. Thanks for the support.
        
       | Gulipad wrote:
       | Wow, this is sweet! With a little better latency and less
       | perfection, it'd be well over the uncanny valley (not that it
       | wouldn't fool many people as-is). Are you planning to add more
       | "human" elements like filler words or disfluencies? If anything
       | it feels too perfect to be human. Awesome stuff!
       | 
       | P.S: I tried to fool the Dental Office demo trying to book on
       | Sunday or outside of the slots it had indicated, and it did a
       | better job than many humans would have :)
        
         | yanyan_evie wrote:
         | Yes, we do plan to make the responses more conversational by
         | adding pauses, filter words, slight stuttering, etc. This is
         | also a high priority for us to work on.
        
       | aik wrote:
       | Curious what model the dentist bot is running on? Tried it out,
       | was surprisingly good, though eventually it contradicted itself
       | (booked a slot it said previously was not available). (I get
       | that's the programming but am curious especially given the
       | latency is really great).
        
         | AustinZzx wrote:
         | The demo uses a simple gpt 3.5 turbo.
        
       | nnf wrote:
       | This is very interesting. One thing I wondered about the per-
       | minute pricing is how to keep a phone agent like this from being
       | kept on the phone in order to run up a bill for a company using
       | it. It'd be very inexpensive to make many automated calls to an
       | AI bot like the dentist receptionist in the demo, and to just
       | play a recording of someone asking questions designed to keep the
       | bot on the phone.
       | 
       | As a customer of a service like Retell (though of course not
       | specific to Retell itself), how might one go about setting up
       | rules to keep a phone conversation from going on for too long? At
       | 17C/ per minute, a 6-minute call will cost just over $1, or about
       | $10 per hour. Assuming the AI receptionist can take calls outside
       | of business hours (which would be a nice thing for a business to
       | offer), then such a malicious/time-wasting caller could start at
       | closing time (5pm) and continue nonstop until opening time the
       | next day (8am), with that 15 hour span costing the business $150
       | for billable AI time. If the receptionist is available on
       | weekends (from Friday at 5pm until Monday at 8am), that's a
       | 63-hour stretch of time, or $630. And if the phone system can
       | handle 10 calls in parallel, the dentist could come in Monday
       | morning to an AI receptionist bill of over $6,300 for a single
       | weekend (63 hours x $10 per hour x 10 lines).
       | 
       | This is in no way a reflection on Retell (I think the service is
       | compelling and the usage-based pricing is fair, and with that
       | being the only cost, it's approachable and easy for people to try
       | out). The problem of when to end a call is one I hadn't
       | considered until now. Of course you could waste the time of a
       | human receptionist who is being paid an hourly wage by the
       | business, but that receptionist is going to hang up on you when
       | it becomes clear you're just wasting their time. But an AI bot
       | may not know when to hang up, or may be prevented from doing so
       | by its programming if the human (or recording) on the other end
       | of the line is asking it not to hang up. You could say it
       | shouldn't ever take more than five minutes to book a dentist
       | appointment, but what if the person has questions about the
       | available dental procedures, or what if it's a person who can't
       | hear well or a non-native speaker who has trouble understanding
       | and needs the receptionist to repeat certain things? A human can
       | handle that easily, but it seems difficult to program limits like
       | this in a phone system.
        
         | AustinZzx wrote:
         | This can be handled with function calling and other features in
         | LLM. We support the input signal of closing the call, and you
         | can have your rule-based (timer) system or LLM-based end call
         | functionality and use that to hang up.
        
         | nsokolsky wrote:
         | What stops it for regular human-operated phone lines?
        
       | lalala6_89 wrote:
       | If I record screen and audio when playing video games with my
       | friends, will I be able to fine-tune an LLM+audio model on that
       | dataset?
       | 
       | It'll be like that San Junipero episode of Black Mirror -
       | immortality in a dark way.
        
         | AustinZzx wrote:
         | You certainly could, given you play video games for long enough
         | to gather the needed data lol.
        
       | thimkerbell wrote:
       | Is there societal value that this product is harming?
        
       | zachbee wrote:
       | What's the difference between your product and Gridspace's? I get
       | the sense that your offering is more developer-focused, but I'm
       | curious if there are any technical differences.
        
         | yanyan_evie wrote:
         | I believe Gridspace is an IVR solution, not one based on LLM.
         | It's challenging to ask questions that deviate from the initial
         | settings. We're using LLM to generate responses, which makes
         | the conversation smoother.
        
       | JustinGu wrote:
       | Wow this is incredible. I've worked a bit in the conversational
       | LLM space and one of the hardest problems we were struggling with
       | was human interruption handling. From the demo it seems like you
       | guys have it down. Can't wait to see where this goes :) BTW I
       | don't think the demo works on mobile, tried it on safari on IOS
       | and got no response.
        
         | yanyan_evie wrote:
         | It might ask for permission to use the microphone. If you can't
         | find it, try going to the website's homepage, where you can
         | enter your phone number to receive a call.
        
           | monkeydust wrote:
           | https://github.com/vocodedev/vocode-python
           | 
           | If your looking for phone flavour, also very good. Curious
           | which is 'better'
        
             | JustinGu wrote:
             | Retell is much stronger at handling human interruptions
        
           | gsharma wrote:
           | It seems to be broken on iOS Safari. I got no response after
           | accepting the microphone prompt.
        
             | yanyan_evie wrote:
             | Thanks for the feedback. We will look into it
        
           | JustinGu wrote:
           | Yep, I gave permission for both my mac and phone but I got to
           | try the demo out on mac anyways
        
       | CuriouslyC wrote:
       | This is interesting but having a piece like my speech engine tied
       | to a specific model provider is a non-starter. I'll probably
       | become a customer at some point if you guys just make a cheap API
       | for streaming natural voice from LLM text ouput, if open source
       | tools don't solve that problem conclusively before then.
        
         | AustinZzx wrote:
         | Could you elaborate a bit on "my speech engine tied to a
         | specific model provider"? Sorry, I might be lacking some
         | context on what you are referring to here.
        
           | CuriouslyC wrote:
           | I will be in the market for a text-to-speech engine, but from
           | looking at the website it seems the model of Retell is trying
           | to push is "use our all in one model + text to speech
           | service" which is problematic when my choice of model and
           | control over how that model runs is at the core of my
           | product, and text to speech is a "nice to have" feature. I
           | want an endpoint that I can fire off text to in a streaming
           | mode, where it'll buffer that streaming text a little and
           | then stream out a beautiful, natural sounding voice with
           | appropriate emotion and intonation in a voice of my design.
           | I'm sure I'm not really Retell's ideal customer, and they're
           | going after lucrative "all in one" customers that just want
           | to build on top of a batteries-included product.
        
             | yanyan_evie wrote:
             | If you are looking for a text to speech solution, you could
             | use elevenlabs turbo model.
        
       | nsokolsky wrote:
       | The demo is nice but it makes me wonder: why would a company have
       | a fully automated voice line rather than a booking interface? As
       | a customer I'm never happy to call a company to make a
       | reservation. I'd be extra annoyed if an AI picked up and I had to
       | go through the motions of a conversation instead of doing two
       | clicks in a Web UI.
        
         | AustinZzx wrote:
         | I totally get that clicking in Web UI is super convenient in
         | many scenarios, and I think GUI and voice can co-exist and
         | create synergy. Suppose AI voice agent can solve your problem,
         | cater to your needs, and interact like a human. In that case, I
         | believe it would be super helpful in many scenarios (like
         | others mentioned, waiting on line for 40 minutes is a pretty
         | bad experience). There are also new opportunities in voice like
         | AI companion, AI assistant, etc that we see are starting to
         | emerge.
        
         | yanyan_evie wrote:
         | Yes, for booking appointments, a simple interface might do the
         | trick. However, we've seen many excellent use cases of our API
         | that prevent repetitive tasks and help companies save money,
         | like AI logistics assistants, pre-surgery data collection, AI
         | tutor and AI therapists. I believe the future will bring even
         | more voice interface applications. Imagine not having to
         | navigate complex UIs; you could easily book a flight or a hotel
         | just by speaking. Also, older people might prefer phone calls
         | over navigating UI interfaces.
        
       | Jommi wrote:
       | Sad that people think working on crypto is a waste of time yet
       | here we are making antiquated contact methodologies even harder
       | to prune out.
        
       | tin7in wrote:
       | Very cool! Have you looked into call center agents use cases?
        
         | AustinZzx wrote:
         | Yes, we are a developer tool, and we certainly get interest
         | from clients working on customer satisfaction agents, call
         | center agent training, etc.
        
       | intalentive wrote:
       | In the "Select Voice" dialog, all the DeepGram clips end in a
       | loud click. Might want to fix that.
        
         | yanyan_evie wrote:
         | Roger that! Will fix it.
        
       | nprateem wrote:
       | I'm always curious for things like this where people get training
       | data.
        
       | omeze wrote:
       | the actual conversational flow is awesome. 800ms is only a little
       | worse than internet audio latency (commonly 300-500ms on services
       | like Discord or even in-game audio for things like Valorant)!
       | Also cool that you can bring your own LLM and audio provider.
       | awesome product!
        
         | AustinZzx wrote:
         | Thank you for the support!
        
         | yanyan_evie wrote:
         | Glad you're into the "bring your own LLM" feature--it's tough
         | to fine-tune an LLM, but it's definitely worth it for the
         | improved results.
        
       ___________________________________________________________________
       (page generated 2024-02-21 23:00 UTC)