[HN Gopher] ElevenReader
___________________________________________________________________
ElevenReader
Author : mfiguiere
Score : 250 points
Date : 2025-02-12 06:10 UTC (16 hours ago)
(HTM) web link (elevenreader.io)
(TXT) w3m dump (elevenreader.io)
| theothertimcook wrote:
| This is so impressive.
|
| No audiobook exists, drop epub into ElevenReader and have Bert
| Reynolds read it to you, honestly better than some human
| narrators.
| emptysongglass wrote:
| I would never trust the company that acquired Omnivore only to
| sunset it with 2 weeks notice to retrieve data.
|
| Companies won't stop pulling this garbage unless we stop
| supporting them.
| echelon wrote:
| You can fight back by supporting and advocating for open source
| foundation text to speech models. XTTS, GptSoVits, Tortoise,
| Zonos, etc.
|
| Open source models drive proprietary foundation models' margin
| to zero.
|
| The only reason elevenlabs became a unicorn was their margin.
| If they became a commodity, they'd find themselves in a deep
| pit.
| qnleigh wrote:
| Sounds good. Do any of these have iOS or Android apps?
| james-bcn wrote:
| OMG I didn't realize that had happened. That sucks. Omnivore
| was great. But now I'm really glad I didn't make it part of my
| processes.
| agnishom wrote:
| This is my main gripe with this company
| podgietaru wrote:
| I want to say, a lot of effort has been made recently to allow
| you to Self-Host Omnivore. I have done a lot to move it over so
| that all the features are self-hostable, including rewriting
| the entire PDF stack. I received a lot of support from the devs
| doing this too.
|
| I know the decisions of the Dev team were disappointing, but
| it's also worth pointing out that the site was kept up until
| around last month - despite the warning stating that'd be down
| in November.
|
| Omnivore could have shut down their code base, and prevented
| self-hosting entirely. I'm glad they didn't.
| letmeinhere wrote:
| What's the contribution model moving forward? I see the
| repository is still active, but is it not still under the
| Eleven's control? How will it evolve when they stop accepting
| pull requests?
| podgietaru wrote:
| It won't be under Elevens control, part of the deal I
| believe. They're allowed to remain opensource. Not folded
| into ElevenLabs.
|
| As for contribution model, it's still something I'm trying
| to figure out. For the moment, it was just trying to get a
| self host build ready and working.
|
| But I have admin rights to the repo, and am not working for
| ElevenLabs, nor officially Omnivore. I was just a
| contributor before.
| nmca wrote:
| I've listened to a few audiobooks on long drives, and have been
| surprised how hard it is to find good voices on audible. Often a
| book that might otherwise be good has a prohibitively annoying
| tone. So honestly the exciting thing here is the customisation.
|
| That said, even in their cherries the emphasis still isn't quite
| right in the Tolkien example.
| unbecoming wrote:
| As a first impression, french sounding names should be read as
| french sounding, even in english text. The voice per se is ok,
| but as delivery goes (pausing, title vs content), it could be
| better.
| barrell wrote:
| Been using eleven labs for several years now. I was really
| impressed with their multilingual model a few years ago.
|
| Since then, they've released a few cheaper models, but the
| quality suffers greatly (they still have the old models though so
| it's not an issue). They've also been releasing a ton of
| different products around TTS.
|
| I don't mean this as a criticism -- I just am curious why SOTA
| TTS has not improved from one model by one company several years
| ago, and why even said company isn't able to improve on that
| model.
| BoorishBears wrote:
| The biggest challenge with TTS is high quality voice data. The
| architectures of closed providers still mostly trace their
| roots to stuff like Tortoise with a few exceptions.
|
| Which is why it's especially ridiculous ElevenLabs allows
| professionals to upload their voices, charges users of those
| voices a _minimum_ of $50 per million characters, likely pays
| under $1 for the compute... and then passes on a whopping $2
| back to the professional.
|
| I think the next disruptive TTS competitor is going to form out
| of just offering to pay better rates than ElevenLabs to their
| PVC users.
|
| Finetuning established architectures on cleaner synthetic data
| is already getting open source models increasingly competitive,
| so getting top PVC samples from the source would likely put you
| right about where they are today.
| limo11 wrote:
| Rev share is up to 20% on default rates (depending on notice
| period). With custom rates they can make their voice more
| expensive and earn up to $0.2 for every 1000 characters. So
| you can do the math.
| BoorishBears wrote:
| The math is you're paying a pittance considering the insane
| margins involved and the fact you're using their voices in
| a flywheel that's actively obsoleting them.
|
| Edit: And since you're concerned we might not be aware of
| Elevenlabs' generous terms... why is your documentation so
| cagey about them? https://elevenlabs.io/docs/product-
| guides/voices/payouts#thi...
|
| I see users need to keep paying you a subscription fee in
| order to even get their payouts... but "up to 20%" isn't
| saying particularly much without the kind of details that
| should probably be on that page.
|
| -
|
| Considering how much your company owes to an open source
| model, it's also impressive how little you've returned to
| the commons.
|
| But no worries, the top comment under this post is an open
| source model that was finetuned for a couple of thousand
| dollars by a single dude soliciting the public for random
| voice samples.
|
| If Google has no moat, you're out to sea.
| brookst wrote:
| Why would you pay more than necessary to attract the voice
| talent you need? There aren't (m)any businesses that pay
| multiples of market rates just to be nice.
| BoorishBears wrote:
| There are plenty of businesses that assign integrity a non-
| zero value, because most businesses reflect people.
|
| Maybe you're in a bubble devoid of that kind of thinking,
| so it seems very foreign or quaint.
|
| Even then it's short-sighted thinking at best: the "market
| rate" is not some magic self-optimizing number.
|
| Underpaying their creators is just creating the opportunity
| for someone to take the best of them on better terms.
|
| -
|
| Elevenlabs is also able to raise trivially in this
| environment: you'd think while they're still floating out
| here without a moat other than high quality data, they'd
| _overpay_ if anything and make narrators feel like royalty
| until they 're replaced.
|
| This isn't unlike Uber initially paying drivers massive
| bonuses and undercharging riders until they were able to
| leverage their massive network to increase prices past what
| the taxi providers they had decimated were charging. But in
| this case the marginal cost of providing the service is so
| low they don't even have to lose money to run a similar
| play, just take less of it. (in other words, even ruthless
| greed is not antithetical to paying these folks better)
| bjackman wrote:
| Really glad these products are appearing!
|
| So much of my time for "reading" is in a context where I can't
| physically read, so audiobooks are incredibly useful. But being
| limited to the set of books that gets recorded by the publisher
| is a real shame.
|
| Haven't tried it yet but AI TTV seems basically perfect now so
| I'm very optimistic this will work great.
| VierScar wrote:
| I'm interested for this reason too, even listened to AI TTS
| books before, but the issue is that they are very monotonous.
| The tone almost never changes, nor the pacing, it's all
| delivered with almost no variation which makes listening dull
| and easy to lose focus
| rapind wrote:
| I recommend John Doe if using eleven labs. Maybe too much
| variation, but I like it.
| milofeynman wrote:
| This raises an interesting question around the rights of the
| author/publisher and who they sold their ebook rights to. If in 3
| years we have a perfect AI voice that can read any book as good
| or better than mid-level narrators, why would you ever buy an
| audiobook when you could just buy the ebook and pick your
| voice(s). What a time to be alive
| evrenesat wrote:
| The rise of streaming has made CDs and other offline media
| obsolete and publishing rights for them largely irrelevant.
| Audiobooks are likely to face a similar demise. One by one, all
| the frictions, I mean the colours of life, are fading away,
| sacrificed for the sake of convenience.
|
| Edit: I think the effect of the invention of vinyl on live
| performers is more akin to how the commoditisation of HQ TTS
| will be detrimental to audiobook narrators.
| wiether wrote:
| I guess it's the same with other jobs: AI will replace the
| mid/low quality workers, but the good ones will keep delivering
| something AI can't.
|
| Two audiobooks that come to my mind:
|
| - The Lord of the Ring series read by Andy Serkis; not only he
| perfectly switches between each characters voice, but also the
| feeling of listening "Gollum" for ours is something else
| altogether
|
| - David Goggins' books; the audiobook version is completely
| different than the book, since he's not just reading the book,
| and overall it makes the content easier to digest
| vunderba wrote:
| I don't know if you remember but some of the earlier Kindles
| had both speakers and TTS built in but were sadly pressured to
| remove the feature.
|
| https://chasingperfection.co.uk/post/2013/01/14/text-to-spee...
| jnsaff2 wrote:
| It seems that this is using one of the less refined models. In
| English it sounds like a 4th grader reading in front of a class.
| Kinda stilted word by word voicing with static pauses between
| words and no variation in intonation. Tried with two voices and
| both are the same.
| stavros wrote:
| Well, you get what you pay for...
| ipsum2 wrote:
| I use ElevenReader on a weekly basis, and it sounds fine.
| Definitely not what you describe.
| csantini wrote:
| You can get pretty close with open source software:
|
| https://claudio.uk/posts/audiblez-v4.html
| rapind wrote:
| Oh wow. Thanks for posting! Samples sound great (on par with
| eleven by my untrained ear). Will definitely use this.
| neom wrote:
| How does it hold up on long stuff? I use Elevenlabs Studio
| daily and once things start to get into the chapters long, the
| voice can really start to go off the rails. It'd say they've
| solved a lot of this over the past 2/3 months, but it does
| still happen on long stuff.
| masteruvpuppetz wrote:
| >> the voice can really start to go off the rails. Do you
| mean the AI gets tired?
| zaptrem wrote:
| In autoregressive models error accumulates over time. He
| likely means the voice starts to make odd sounds/gets lower
| quality. It would be really interesting if OP could share a
| clip of this phenomenon!
| neom wrote:
| Various different things can happen, it would take me
| quite some time to dig up examples but at least with
| elevenlabs you don't get the clicks and pops you get like
| on notebook LM for example. 11labs instability comes in
| the forms of intonation, pitch, accent, garbled words or
| even once language. I've only seen it happen in the 3k+
| words gen's I've done, usually actually around the 75%
| point of the narration of whatever I've converted, and on
| average lasting a couple of seconds top.
| wrsh07 wrote:
| Yeah - I've experienced this with eleven reader (I don't
| think you can gen text this long anymore using the reader
| app, lol) but switching voices fixed it for me
|
| I can go back and try to repro and get a recording....
| csantini wrote:
| It holds up well, because Audiblez uses sentence splitting
| (via Spacy models) before audio synthesis
| ultrasounder wrote:
| Bravo!
| simongray wrote:
| Oh no, it doesn't run on Apple Silicon. That's too bad.
| _joel wrote:
| > On my M2 MacBook Pro, on CPU, it takes about 1 hour, at a
| rate of about 60 characters per second.
|
| Umm, it does.
| simongray wrote:
| My bad. I misread the official website:
|
| > We don't currently support Apple Silicon, as there is not
| yet a Kokoro implementation in MLX. As soon as it will be
| available, we will support it.
|
| I thought that meant that it didn't support Apple Silicon
| in general, but they were just talking about GPU support.
| fl0id wrote:
| though they wouldn't need to use MLX, could also use
| pytorch etc
| csantini wrote:
| It works on Apple Silicon, but it doesn't use the GPU.
| Because Kokoro has not been implemented yet in MLX
| simongray wrote:
| Ah my bad! I just read the "We don't currently support
| Apple Silicon" on the official website, but I didn't
| realise that only pertains to GPU support.
| eamag wrote:
| I wrote about a similar model for MLX that can run be on
| apple silicon https://eamag.me/2025/Voice-Cloning
| csantini wrote:
| Hi eamag, this sounds great! I'm gonna try add it to
| Audiblez
| mhuffman wrote:
| >Oh no, it doesn't run on Apple Silicon. That's too bad.
|
| Interesting, because the hero image is a Mac App screenshot.
| tonyhart7 wrote:
| good, now how I can use this on mobile??
| csantini wrote:
| Generate the audiobook on a laptop and then listen to it on
| mobile
| tonyhart7 wrote:
| this is the easy way, but I want the hard way
| laurentlb wrote:
| Interesting! This uses the Kokoro-82M model, which has a pretty
| good quality, but the set of languages is still quite limited.
| anonymous344 wrote:
| does this run on linux machine also?
| nkmnz wrote:
| third line on the page right below the first image says: >
| Audiblez 4.2 running on MacOSX via wxWidgets. Linux and
| Windows are supported too
| __rito__ wrote:
| Is there a pricing page? I am not seeing any.
| jampekka wrote:
| In the FAQ:
|
| > Is the app free?
|
| > Yes. The app is completely free to download and use today.
| Listening to content on the app will not consume credits from
| your monthly web plan. We do plan to eventually launch some
| premium version of the app, but even then we will maintain a
| generous free plan.
| mkmk3 wrote:
| Damn, tried a unicornriot article [1] and it just skipped several
| paragraphs past the grisly stuff.
|
| Can anyone else confirm?
|
| [1] - https://unicornriot.ninja/2024/sextortion-coms-inside-a-
| vile...
| ravetcofx wrote:
| Important article but horrific content. It seems to read it all
| for me.
| mkmk3 wrote:
| For sure, I saw where it was skipping and I wouldn't have
| been surprised if it were intentional, but good to disprove.
| Thanks for checking, have a good day
| limo11 wrote:
| Did you have some iconic voice selected? It skipped most likely
| due to inappropriate content. You can try with some non iconic
| voice
| mkmk3 wrote:
| Wasn't using an iconic voice but it does seem to be voice
| specific, good call
| macco wrote:
| How is the quality compared to speechify?
|
| I use it to listen to PDFs. It works, but has plenty of hiccups
| with headers, footers and colons.
| limo11 wrote:
| Way better
| sky2224 wrote:
| The video shows scenarios of people listening to pdfs of pretty
| dense material (e.g., computer science, bio mechanics).
|
| Does anyone here actually have positive results doing this? It
| seems to me listening to anything that's even remotely complex
| with the intent of learning it just isn't something that's
| feasible.
| nice__two wrote:
| That's my biggest gripe with audiobooks: good for fiction, not
| so good for learning.
| yreg wrote:
| For me they are actually best for non-fiction, but it has to
| be books. Papers are too information dense.
|
| I get easily distracted and lose attention while listening to
| an audiobook. This is usually problematic with fiction,
| because suddenly I don't know who this new character is or
| what's happening. And rewinding to the precise position where
| I stopped paying attention is of course much more difficult
| than in written text.
|
| I found that non-fiction books work great for me, because
| even if you ignore a page or two it makes no difference, the
| author keeps repeating their point and propping it up with
| many arguments anyway.
| woodson wrote:
| I used to have papers read to me via TTS when I had a long
| commute. This was before the current crop of neural TTS, mind
| you, so the quality and naturalness wasn't as good, but it was
| good enough to tolerate and to get the gist of a paper. It
| failed terribly on equations, of course, but that's often not
| too important on the first reading.
| qnleigh wrote:
| It depends a lot on the paper. I've been using a TTS app to
| read papers for years. Papers that are really equation dense,
| convey they key ideas in figures or get too detailed aren't
| listenable. But sometimes review articles or papers with one
| clear message hit that sweet spot and are very listenable.
| There's one topic where everything I know about it I learned by
| listening to a review article on a long run. It was actually
| quite pleasant!
| neom wrote:
| Severe dyslexia here, but ask me about any conversation or
| audio book or class I've listened to. Gimme anything audio and
| gimme it at 1.5x plz! I spend so much money gen'ing audio these
| days but it's soooo nice to be able to learn so quickly now.
| b33f wrote:
| Is this streaming server-side audio or is the TTS running locally
| on device ? Can it work offline ?
| yawnxyz wrote:
| all server-side
|
| you could build your local TTS using kokoro browser though --
| https://huggingface.co/spaces/webml-community/kokoro-webgpu
| t0lo wrote:
| This is definitely the future, I'm worried about the electric
| slip and slide world we're heading into though, where everything
| is completely spoonfed and consumptive. I can't help but think
| we're heading back into animalism.
| falcor84 wrote:
| > heading back into animalism
|
| Could you expand upon this? Any milestones towards that which
| we should be mindful of?
| t0lo wrote:
| Technology is pretty quickly and apparently not only coming
| for our critical thinking, but our agency
|
| With llms, "knowing things" is already starting to feel like
| a thing of a past, not to me, but to a lot of others, there's
| no longer an incentive to "switch on".
|
| Why should a kid learn anything if a robot is instantly
| better at everything? Maths got replaced by calculators, deep
| critical thinking will get replaced by llms a lot of the
| time, which are word calculators, which is the closest thing
| we have to a logic calculator.
|
| This is more passive autopilot software, which further
| promotes learning as something you 'consume' rather than
| something you seek.
|
| The public consciousness has absolutely taken a semptember 11
| tier nosedive since social media, we're approaching what I
| term cultual schizophrenia, which I posted about on my blog
| which I deleted, but I've readded it if you're interested
| [https://substack.com/home/post/p-156983317]. There's no
| contextualisers in the media to give the right emphasis to
| the right things.
|
| This is just my perspective, from what I've seen from other
| younger people of my age. We are heading into extremely
| interesting times, everything profoundly destabalising thing
| we've speculated about is happening at the exact same time.
| We desperately need visionaries in politics.
|
| Basically I'm not doing too hot
| falcor84 wrote:
| Some good observations there, but I'm still unclear on why
| you used the term "animalism" - none of that seems to me at
| all similar to how other species engage with the world.
| brookst wrote:
| I'm sorry you're stressed, but please at least consider
| that you may be falling into the generational "kids these
| days" trap. I'm old, so I have lived through the world
| being on the brink of disasters caused by AI, social media,
| gay marriage, violent video games, the internet in general,
| cell phones, pagers, nuclear weapons, television. Probably
| a bunch more world-ending crises I forgot.
|
| The world is changing, but then again it always has been.
| IMO some things will get better, some will get worse, but
| ghe overall arc of human health and prosperity will
| continue upwards. There is less poverty, less starvation,
| more opportunity today than ever... even though some
| aspects of the world are bad and getting worse. That's the
| way it's always been.
| Kabukks wrote:
| Last time I tried Elevenlabs for German text, it got a lot of
| numbers and dates wrong.
|
| E. g. saying "1963" when the actual year in the text was 1967.
| Yeah, the voices sound very realistic. But I'm not sure how
| useful that is if you can't trust the spoken words.
|
| Does anyone know if it got better in the last weeks?
| aeroniero wrote:
| Yes, it's better now, at least on the Reader app that I've
| tried.
| jeswin wrote:
| The ad shows someone listening to an article or a story while
| driving a large vehicle - this is unsafe (depending on the
| individual). It's not like listening to music.
| yreg wrote:
| I'm curious, is there evidence it is unsafe?
| jeswin wrote:
| I can listen to a song while coding. I can't listen to a
| podcast while coding. A podcast demands way more attention
| than a song.
|
| 1.2 million people die in road accidents, and most of them
| are children and young people. Even more are seriously
| injured.
| cess11 wrote:
| Are you saying you can't drive a car if passengers are
| talking?
|
| If that's the case, maybe a driver's license isn't your
| thing?
| yreg wrote:
| That's a hypothesis but not evidence. I can present a
| counter-hypothesis: I fall asleep while listening to music
| (or staying in silence). Listening to spoken word keeps me
| awake or at least helps me notice that I'm getting tired.
|
| 1.2 million people die in road accidents, and most of them
| are children and young people. Even more are seriously
| injured.
| brokensegue wrote:
| People listen to talk radio while driving all the time
| vunderba wrote:
| If there were any substantial evidence of this, they would
| have shutdown the entire A.M. spectrum 50 years ago.
| mozzieman wrote:
| The best ive heard but still too monotone over time compared too
| real productions. Feel blown away at first but listen a chapter
| or two gets difficult. Just a matter of time most likely until it
| becomes as good or better then the real thing.
| benrutter wrote:
| I've been looking for a good and convenient way to read papers
| that are published in PDF for a while.
|
| Ideally, I'd be able to strip out the text content and send it to
| my kindle in readable form. Since apparently that's science
| fiction, this looks like a really good plan B! Will definitely
| give it a go.
| elashri wrote:
| You can jailbreak you kindle [1] and install KOReader[2] and
| this will allow you to do this science fiction.
|
| [1] https://kindlemodding.org/jailbreaking/WinterBreak/
|
| [2] https://koreader.rocks/
| janpmz wrote:
| You can try https://www.pdftomp3.com/ as well.
| billbrown wrote:
| Readwise Reader does PDFs very well (and apparently can do TTS
| on them, but I've never tried that).
| https://docs.readwise.io/reader/docs/faqs/text-to-speech
| hiAndrewQuinn wrote:
| This is excellent. I just tested the Finnish voices on my simple
| news archive [1], and the pronunciation was quite good and clear.
|
| It's unfortunate that I can't export audio clips locally;
| otherwise I would immediately look into using this for generating
| my Finnish flashcard decks from the same material [2]. I've
| thought about doing the same with the audio and video feeds
| included with this news broadcast, but getting Whisper to sync up
| properly with what's written down and cutting up the raw audio in
| that way still seems like more effort than I'm willing to invest
| right now.
|
| [1]: https://hiandrewquinn.github.io/selkouutiset-archive/
|
| [2]: https://github.com/Selkouutiset-Archive/selkokortti
| gwd wrote:
| > It's unfortunate that I can't export audio clips locally;
| otherwise I would immediately look into using this for
| generating my Finnish flashcard decks from the same material
| [2].
|
| elevenlabs has an API which seemed quite reasonable when I
| looked into it. A bit of python should get you what you want
| pretty quickly.
| hiAndrewQuinn wrote:
| Oh! I'll look into that, thanks.
| darkwater wrote:
| I know I'm growing old but this is the kind of tech application
| that I don't like. Arts should be the last thing to be 100% fully
| done by a program. Enhancing capabilities in artists? Hell yeah.
| Replacing completely voice actors? No, thanks.
| ramonverse wrote:
| AI voice is literally the only way I have to "read" an obscure
| article during 1h non-static commutes.
| darkwater wrote:
| I understand, It can do things that weren't previously
| possible, but it will also replace things that were done by
| humans, by artists before. Overall, in my opinion, is still a
| loss.
| nathanyukai wrote:
| "replace things that were done by humans" isn't a loss by
| itself, if it frees up human labour to do other things. If
| human replaced by AI can't find better things to do, such
| that it makes them poorer, or anti-social its a loss but
| not necessarily AI's fault.
| Martinussen wrote:
| Doesn't apply to all situations, but "replace things that
| were done by humans" in _arts_ can absolutely be a loss
| by itself. Making graphics /speech/video a commodity
| doesn't replace designers, voice actors, or directors,
| but we've definitely see it can directly harm them and
| the people that enjoy their work.
|
| > can't find better things to do, such that it makes them
| poorer, or anti-social its a loss
|
| I feel like this misses the point a bit - lost
| income/sustainability for artists is obviously a big
| issue we'll be facing, but looking for a performance
| indicator in an artistic endeavour doesn't really get you
| anywhere. There's more ways to value a painting than
| "what the market would pay" and "potential heat output as
| firewood", right?
| brookst wrote:
| How do you feel about what word processors did to the
| typist career?
| add-sub-mul-div wrote:
| How do you feel about replacing general labor, period,
| and doing so for a class that no longer maintains a
| semblance of a social safety net? Do you think there's a
| difference between displacing one profession and
| displacing most professions at once?
|
| Do you people ever step out of the abstract and think
| about the actual context you're living in?
| reustle wrote:
| I think calling this art is a stretch, as they usually
| aren't the author.
|
| By automating it, it lowers the barrier to access this type
| of audio content for the masses. If you want to choose to
| pay someone you read something for you, the market allows
| that. This feels like a net gain.
| darkwater wrote:
| If the AI content is good enough, nobody will use it, or
| at least not in the numbers that Audible et similia had
| before. It will just be a tiny minority following their
| principles.
|
| We lived this already with social networks. Initially us
| tech enthusiasts were all like "it will democratize
| access to news, it democratize producing the news!
| curated work will still be there, it's a net gain". And
| we all saw how it actually developed. As someone on the
| Internet said, I want AI to do my laundry and repeating
| task so I can do art or other more interesting things, I
| don't want AI to do arts and force me to do laundry by
| hand because due to AI taking my job now I don't have
| money to pay for a washing machine.
| haswell wrote:
| > _I think calling this art is a stretch, as they usually
| aren't the author._
|
| I can't even remotely agree.
|
| Narrating a book is absolutely an art. Listen to a book
| narrated by Stephen Fry, and all other books will sound
| awful. Considerable care and craft goes into a well-read
| book.
|
| But this is why I'm actually _excited_ about good TTS
| tools. Not because I want to displace Stephen Fry, but
| because there are so many books read by awful narrators
| and something like ElevenReader would be a huge step up
| in quality.
|
| I share the parent commenter's concerns about the
| displacement of artists, but I'm less convinced that TTS
| tools are a net negative.
| noizejoy wrote:
| > I think calling this art is a stretch, as they usually
| aren't the author.
|
| So I guess in your worldview a concert violinist also
| doesn't make art, when they are playing a Mozart
| composition?
| msh wrote:
| I feel conflicted about this. I somewhat agreeing with you, but
| the other hand not needing voice actors is a big help to people
| with disabilities that prevent them from reading.
| Kerbiter wrote:
| Would've been valid if TTS was, indeed, art, but it's not.
| Audiobooks won't be able to replace TTS in e-readers just
| because they need to be produced first. And I don't think my
| mom would be able to find an audiobook of all the Russian
| books, or, especially, articles she's reading, and especially
| synchronise it with the actual book in her reader app.
| vunderba wrote:
| Of all the criticisms leveled against GenAI, I'd say making the
| case against "TTS on-demand" would probably be the weakest.
|
| Having natural sounding TTS enhances accessibility for blind
| users, enables language localizations, etc. It's 100% a win
| even though there will be (and already is) disruption in the VA
| community.
| woadwarrior01 wrote:
| Hasn't this been around for ~4 months? Interesting to see this
| here, since their competitor Zyphra, just released two Apache 2.0
| licensed open weights TTS models yesterday[1].
|
| [1]: https://news.ycombinator.com/item?id=43004589
| crakhamster01 wrote:
| The generative podcasts feature feels so dystopian. I didn't
| realize this SNL skit was based off of a real product lol
|
| https://www.youtube.com/watch?v=ua4rYsMdC4U
| juliendorra wrote:
| You should try it with your own voice! (By first creating a
| custom voice on the web interface. The quick basic clone should
| be enough).
|
| I found that it's my preferred way to use their reader, as it
| makes the reading more neutral and transparent for my brain.
| Klaster_1 wrote:
| Personally, I can't stand my voice when I hear its recording. I
| wish there was a way to easily tune it to sound more like what
| you hear. Maybe even use that adjusted voice during calls.
| layer8 wrote:
| Most people don't like hearing their own voice (how it sounds
| like in reality, not in your head).
| leumon wrote:
| Unfortunately the app is not compatible with Android 15.
| jacek wrote:
| I love the idea as I listen to a lot of podcasts and an
| occasional audiobook.
|
| The first impression is not that great. There's nothing natural
| about the voice. While individual words and phrases sound good,
| there's still no decent cadence and intonation. Feels flat and
| robotic.
|
| However, I will definitely experiment some more.
| yapyap wrote:
| yeah, no thanks.
|
| if you are reading for information, I guess if this helps, sure
| go ahead.
|
| when reading for pleasure, this is not it though.
| reustle wrote:
| I've been using this for a few weeks, it works great. Can't wait
| until this is built natively into browsers or even the OS (ios
| voice is currently terrible)
| frontalier wrote:
| ios voice works better than read-aloud from chatgpt does. it
| sucks but doesn't fail after the first paragraph or so
| cube2222 wrote:
| So, I wanted to like this, but frankly the quality isn't
| fantastic.
|
| The text to speech is alright, but it lacks almost any emotion,
| and it reads everything literally, which when the article/pdf has
| a weird layout, or has figures, doesn't sound natural. Though I
| expect they're just not using their top-of-the-line models for
| this - I've had much more luck pushing a pdf through Claude to
| generate the "verbal version" (which is mostly literal, but also
| describes the layout and figures) and then the result through the
| top-of-the-line ElevenLabs model.
|
| Now, I've also checked out the podcast feature, and it's pretty
| clear they first do a textual generation, and then a simple text
| to speech. Again, lack of emotion, very mechanical flow.
|
| I made a podcast of a technical article[0] in both ElevenLabs
| reader and Google's NotebookLM, and the NotebookLM podcast is a
| night-and-day improvement - maybe they use a better model, maybe
| they use straight "article to podcast" end-to-end multimodal
| generation, I don't know, but the quality, flow, emotion, is just
| on a completely different level. I had to quickly turn off the
| ElevenLabs-generated podcast cause I couldn't keep listening to
| it, while NotebookLM's one is legitimately enjoyable.
|
| Now to finish on a more positive note, fingers crossed for the
| ElevenLabs team improving this, and us getting some competition
| in the area of article-to-audio, both podcast-style, and direct!
| I think, in general, it's a very promising product direction.
| Feature-wise, I would also love to get a daily overview podcast
| based on all my RSS feed articles for a given day.
|
| [0]: https://huggingface.co/blog/modernbert
| andrewstuart wrote:
| TTS seemed to take a great leap forward a few years ago and seems
| to have stalled again.
|
| Services are expensive and in most cases the voices are easily
| detectable as not human. I would find it very hard to listen to
| such voices for a long period of time.
|
| Even ElevenLabs voices which seem to be known as the best have
| only a few that are really good quality but even then they're
| very, very far from the capabilities of a human.
| wink wrote:
| > Application error: a client-side exception has occurred (see
| the browser console for more information).
|
| Probably because I have WebGL disabled in this browser. Not
| exactly sure what they're doing with it on the landing page,
| maybe the fluffy effects.
| whazor wrote:
| Is there any technology that can do separate voices for each
| individual person speaking in an audiobook?
| xnx wrote:
| Zonos is a new open weights text-to-speech model that has quality
| at least as good as ElevenLabs: https://www.zyphra.com/post/beta-
| release-of-zonos-v0-1
| pg5 wrote:
| When I type anything in their demo, it replies "I'm sorry but I
| can't I'm sorry but I can't..."
| waynenilsen wrote:
| TTS is increasingly being commoditized.
|
| Kokoro was posted and it works on webgpu, absolutely incredible
| quality for where it can run
|
| https://news.ycombinator.com/item?id=42973769
| knowaveragejoe wrote:
| Kokoro hasn't released their encoder, but they are already
| moving on to a newer model. Hopefully they release that!
| nialv7 wrote:
| Hey, they are using Mamba! Happy to see Mamba is used at least
| somewhere :/
| davidanekstein wrote:
| I use ElevenLabs to narrate tutorials for my app and I'm a happy
| customer thus far.
|
| Here is an example:
| https://youtube.com/shorts/UKjqrydITLA?si=iC7ehp6LmlLH0M-U
| layer8 wrote:
| Does this work for reading articles on websites?
| cooper_ganglia wrote:
| The company I work for has been using ElevenLabs to translate
| hour-long programs into Spanish, French, Portuguese, Greek,
| German, and Chinese. We have a large international audience, so
| it's worked great for this purpose!
|
| Before, we were hiring people to translate, and then hiring
| others to dub the audio. Now, our files are automatically
| translated and spoken in the voice of the actual speaker, and we
| just have a small Quality Control team of native speakers quickly
| verify the results are accurate. We've reduced costs and
| increased the quality of our translated media.
| rickcarlino wrote:
| I wish there was a reader app that was serious about text speech.
| This is not it, unfortunately. Reader apps need to focus on a
| text to speech experience that is identical to a music player so
| that you can use the app while in hands free situations. The app
| is also hard to use as a "read it later" tool on iOS.
|
| I was really hoping they would fix these issues by now because it
| was promising. This app truly does feel like a portfolio demo app
| for a text to speech engine company rather than an actual reader
| app.
|
| UPDATE: yes, I have actually used the app, no it does not work
| well. See replies for details.
| rickcarlino wrote:
| It's interesting that they show people going on runs and
| driving cars in their demo videos. I'm pretty sure nobody
| developing that app has actually gone on a run or driven a car
| while using their app.
| wrsh07 wrote:
| Wow really? I use it all the time for ~equivalent activities
| rickcarlino wrote:
| How long are the articles you are reading? I'm reading blog
| articles rather than long form content. My queue is in the
| hundreds and the articles very in length from two minutes
| to 20 minutes. I found it really annoying to need to push
| buttons while driving to skip or auto play the next
| article.
| wrsh07 wrote:
| Yeah, mostly super long form stuff. If it's only 2
| minutes it's faster for me to just read it than to open
| it in their reader
|
| Fwiw, I would use their app way more if it were better.
| Right now I use it for 1-2 long form articles at a time,
| I am sometimes willing to push buttons in order to stay
| focused but will bail out to eg my podcasts app if that
| becomes untenable
| billbrown wrote:
| I find Readwise Reader to be a great RIL tool and I've used
| their TTS on my phone. I can't say I use it enough to know if
| it addresses your needs so I share this as "this might work for
| you." https://docs.readwise.io/reader/docs/faqs/text-to-speech
| jhiggins777 wrote:
| Have you used it? I use it for both hands free and read later.
| When I'm on a webpage I just use the safari share sheet to send
| it to ElevenLabs Reader and then just listen whenever I have
| time.
| rickcarlino wrote:
| Let's say I have article 20 article articles of two minutes
| length each. On the iOS app, there are no next buttons and it
| does not automatically play the next article. If I am on a
| long drive, or I am running for two hours with my phone in my
| bag, I would need to reach into my bag and open the app every
| time and click the next article. If I I don't like the
| article I am listening to, there is no way to skip to the
| next article using integrated controls on a Bluetooth device.
| These features already exist on apps like Pocket.
| Slippery_John wrote:
| Speechify is pretty good. You gotta pay to get the most out of
| it, but I use it enough to justify it. (Mostly for an
| egregiously long serial novel.) Sometimes there's jank, but the
| support and dev teams are super responsive.
| culi wrote:
| I only heard of Eleven today. Downloaded and tried it and I was
| actually shocked by how well it works. It works perfectly with
| my headphones and I can skip forwards or backwards as I want. I
| can change the speed of the voice (tho, that does get a little
| buggy). I just put in a random Aeon article and was shocked how
| quickly it did everything. Even giving me an audio length
| dyauspitr wrote:
| I don't know- I used the app last night as an audiobook reader
| before going to bed and it had automatic chapter detection, a
| sleep timer and you could even click on a word and it would
| start reading from there. It's pretty solid.
| Kerbiter wrote:
| Would've been great as a TTS component that could be installed
| and used in existing e-readers.
| freefaler wrote:
| Android Moon Reader Pro TTS plugin works OK for me.
| sys32768 wrote:
| I was briefly excited to try this on out-of-print books I find on
| Google Books, but alas the OCR in Acrobat PRO is super glitchy.
|
| I need to find some AI-assist OCR to fix tons of mistakes like
| "186o" for 1860 or "gla)" for glad.
| eigenvalue wrote:
| I made a site like that, fixmydocuments.com
|
| Also check out my open source project for that:
|
| https://github.com/Dicklesworthstone/llm_aided_ocr
| sys32768 wrote:
| Will definitely check it out.
|
| Hyphenated words, page numbers and chapter titles seem to be
| my main issue. I can easily do search and replace on chapter
| titles though.
| dazzaji wrote:
| I rely on ElevenReader several times a week for quick text to
| voice on snippets of text I'm working on or sometimes on full web
| pages when I hand it a url. It's quick and easy to use and the
| performance and quality is high.
| smoothbenny wrote:
| Tried this app last week w/ an EPUB. It read all of the drop caps
| as individual letters, before moving on to the remaining portion
| of the word. It said "tilde" before each item in an unordered
| list. Too distracting to be of any practical use for me, unless
| there's a setting I missed.
| _qua wrote:
| I recognize and appreciate that this is free right now. But
| surely it won't always be. And I can't keep paying $10-20/mo for
| every individual AI tool.
| berbec wrote:
| I have used Moon+ Reader [1] for years with the build-in Android
| TTS service. It works very well, is free, and sounds good enough
| for me.
|
| 1: https://moondownload.com/
| randysalami wrote:
| I've actually used this extensively for months now since it's
| free and works with PDFs I've downloaded off the internet. I was
| so frustrated with ridiculously overpriced TTS (must pay for
| annual sub! no monthly) when I found this gem.
|
| My main use case is comp. sci and philosophy books. I download
| PDFs of varying quality off the internet onto my phone and import
| them into this app. The text translation is always solid but for
| the former, graphs and diagrams really break it. It's a tricky
| problem because these often are important to the text so skipping
| them (for the app) isn't ideal but the current solution just
| makes the reader goof up. I think it would be cool if the model
| could identify these objects and maybe generate some text
| describing the object and TTSing that. Minor gripe and for the
| latter, it's perfect.
|
| I've probably used this app for 70 reading hours at 1.5x speed
| across long road trips and walking my dog at the park. I've
| gotten through numerous books I wouldn't have and for free. I'm
| happy!
|
| (annoying bug I find often: it seems certain characters or tokens
| just break it and it freezes. I need to manually skip ahead
| hoping it doesn't get stuck again. Really detracts from the hands
| free nature and is difficult to manage while driving)
| flakiness wrote:
| Are there any good papers from which I can learn the recent
| development of TTS tech?
| jdlyga wrote:
| The voices are excellent, but the app needs work. It lost my
| place in a book a few times, so I switched back to VoiceDream
| (don't use VoiceDream, it stinks unless you're a legacy
| purchaser).
| wedn3sday wrote:
| I immediately copy/pasted in some smut to check if it was going
| to lecture me on my moral failings and was pleasantly shocked to
| find a corporate AI model that did what I asked without pushing
| puritanical nonsense one me.
| zoba wrote:
| I've been enjoying this app except I could not find a way to
| export the content to an audio file. I want to send the content
| to others - I'd even take a link to a website with a Play button
| (just not one that forces an app download)
| BeetleB wrote:
| If you want free/ultracheap, the Google Cloud TTS is good enough
| for simple use cases. You get enough free minutes that it may end
| up being free (I think I've paid a cent so far).
|
| Some of their voices sound very artificial, some very real. I've
| been slowly making a list of the good ones.
|
| I use it to convert long articles into audio, and have a script
| to add it to my podcast feed to listen to while driving:
|
| https://blog.nawaz.org/posts/2024/Apr/reading-articles-via-p...
| kvn8888 wrote:
| Chirp (HD) gives you $30 per 1M characters for free on the free
| tier also
| BeetleB wrote:
| I'd have to analyze my usage. For me, having used it for over
| a year cost me a penny. If I can ensure my total cost is less
| than $1/month, I'll consider it if the quality is really
| good. The Google one is "good enough", but not great.
|
| One other feature I'd really like: Having the AI figure out
| who is saying what and use different voices (e.g. one voice
| for overall narrator, and separate voices for each person who
| is quoted in the article).
|
| Not sure if any of the solutions out there do that
| automatically without my guidance.
|
| (Still probably wouldn't pay more than $2/mo for it - I just
| don't use it often enough to justify paying much).
| kvn8888 wrote:
| The audio quality is amazing. It's transformer based. I use
| it occasionally
| wombatpm wrote:
| You start doing that for text from ebooks and Audible is
| going to want to have words with you.
| ratedgene wrote:
| Honestly, why isn't this same service baked into my OS? the
| reader there is really atrocious, but I imagine even for a single
| voice a pretty small model can be downloaded and made available
| as a plugin for the reader app.
| codybontecou wrote:
| I just wish this had a Chrome Extension so I can listen to
| article while on my computer.
___________________________________________________________________
(page generated 2025-02-12 23:01 UTC)