hngopher.com

       [HN Gopher] ElevenReader
       ___________________________________________________________________
        
       ElevenReader
        
       Author : mfiguiere
       Score  : 250 points
       Date   : 2025-02-12 06:10 UTC (16 hours ago)
        
 (HTM) web link (elevenreader.io)
 (TXT) w3m dump (elevenreader.io)
        
       | theothertimcook wrote:
       | This is so impressive.
       | 
       | No audiobook exists, drop epub into ElevenReader and have Bert
       | Reynolds read it to you, honestly better than some human
       | narrators.
        
       | emptysongglass wrote:
       | I would never trust the company that acquired Omnivore only to
       | sunset it with 2 weeks notice to retrieve data.
       | 
       | Companies won't stop pulling this garbage unless we stop
       | supporting them.
        
         | echelon wrote:
         | You can fight back by supporting and advocating for open source
         | foundation text to speech models. XTTS, GptSoVits, Tortoise,
         | Zonos, etc.
         | 
         | Open source models drive proprietary foundation models' margin
         | to zero.
         | 
         | The only reason elevenlabs became a unicorn was their margin.
         | If they became a commodity, they'd find themselves in a deep
         | pit.
        
           | qnleigh wrote:
           | Sounds good. Do any of these have iOS or Android apps?
        
         | james-bcn wrote:
         | OMG I didn't realize that had happened. That sucks. Omnivore
         | was great. But now I'm really glad I didn't make it part of my
         | processes.
        
         | agnishom wrote:
         | This is my main gripe with this company
        
         | podgietaru wrote:
         | I want to say, a lot of effort has been made recently to allow
         | you to Self-Host Omnivore. I have done a lot to move it over so
         | that all the features are self-hostable, including rewriting
         | the entire PDF stack. I received a lot of support from the devs
         | doing this too.
         | 
         | I know the decisions of the Dev team were disappointing, but
         | it's also worth pointing out that the site was kept up until
         | around last month - despite the warning stating that'd be down
         | in November.
         | 
         | Omnivore could have shut down their code base, and prevented
         | self-hosting entirely. I'm glad they didn't.
        
           | letmeinhere wrote:
           | What's the contribution model moving forward? I see the
           | repository is still active, but is it not still under the
           | Eleven's control? How will it evolve when they stop accepting
           | pull requests?
        
             | podgietaru wrote:
             | It won't be under Elevens control, part of the deal I
             | believe. They're allowed to remain opensource. Not folded
             | into ElevenLabs.
             | 
             | As for contribution model, it's still something I'm trying
             | to figure out. For the moment, it was just trying to get a
             | self host build ready and working.
             | 
             | But I have admin rights to the repo, and am not working for
             | ElevenLabs, nor officially Omnivore. I was just a
             | contributor before.
        
       | nmca wrote:
       | I've listened to a few audiobooks on long drives, and have been
       | surprised how hard it is to find good voices on audible. Often a
       | book that might otherwise be good has a prohibitively annoying
       | tone. So honestly the exciting thing here is the customisation.
       | 
       | That said, even in their cherries the emphasis still isn't quite
       | right in the Tolkien example.
        
       | unbecoming wrote:
       | As a first impression, french sounding names should be read as
       | french sounding, even in english text. The voice per se is ok,
       | but as delivery goes (pausing, title vs content), it could be
       | better.
        
       | barrell wrote:
       | Been using eleven labs for several years now. I was really
       | impressed with their multilingual model a few years ago.
       | 
       | Since then, they've released a few cheaper models, but the
       | quality suffers greatly (they still have the old models though so
       | it's not an issue). They've also been releasing a ton of
       | different products around TTS.
       | 
       | I don't mean this as a criticism -- I just am curious why SOTA
       | TTS has not improved from one model by one company several years
       | ago, and why even said company isn't able to improve on that
       | model.
        
         | BoorishBears wrote:
         | The biggest challenge with TTS is high quality voice data. The
         | architectures of closed providers still mostly trace their
         | roots to stuff like Tortoise with a few exceptions.
         | 
         | Which is why it's especially ridiculous ElevenLabs allows
         | professionals to upload their voices, charges users of those
         | voices a _minimum_ of $50 per million characters, likely pays
         | under $1 for the compute... and then passes on a whopping $2
         | back to the professional.
         | 
         | I think the next disruptive TTS competitor is going to form out
         | of just offering to pay better rates than ElevenLabs to their
         | PVC users.
         | 
         | Finetuning established architectures on cleaner synthetic data
         | is already getting open source models increasingly competitive,
         | so getting top PVC samples from the source would likely put you
         | right about where they are today.
        
           | limo11 wrote:
           | Rev share is up to 20% on default rates (depending on notice
           | period). With custom rates they can make their voice more
           | expensive and earn up to $0.2 for every 1000 characters. So
           | you can do the math.
        
             | BoorishBears wrote:
             | The math is you're paying a pittance considering the insane
             | margins involved and the fact you're using their voices in
             | a flywheel that's actively obsoleting them.
             | 
             | Edit: And since you're concerned we might not be aware of
             | Elevenlabs' generous terms... why is your documentation so
             | cagey about them? https://elevenlabs.io/docs/product-
             | guides/voices/payouts#thi...
             | 
             | I see users need to keep paying you a subscription fee in
             | order to even get their payouts... but "up to 20%" isn't
             | saying particularly much without the kind of details that
             | should probably be on that page.
             | 
             | -
             | 
             | Considering how much your company owes to an open source
             | model, it's also impressive how little you've returned to
             | the commons.
             | 
             | But no worries, the top comment under this post is an open
             | source model that was finetuned for a couple of thousand
             | dollars by a single dude soliciting the public for random
             | voice samples.
             | 
             | If Google has no moat, you're out to sea.
        
           | brookst wrote:
           | Why would you pay more than necessary to attract the voice
           | talent you need? There aren't (m)any businesses that pay
           | multiples of market rates just to be nice.
        
             | BoorishBears wrote:
             | There are plenty of businesses that assign integrity a non-
             | zero value, because most businesses reflect people.
             | 
             | Maybe you're in a bubble devoid of that kind of thinking,
             | so it seems very foreign or quaint.
             | 
             | Even then it's short-sighted thinking at best: the "market
             | rate" is not some magic self-optimizing number.
             | 
             | Underpaying their creators is just creating the opportunity
             | for someone to take the best of them on better terms.
             | 
             | -
             | 
             | Elevenlabs is also able to raise trivially in this
             | environment: you'd think while they're still floating out
             | here without a moat other than high quality data, they'd
             | _overpay_ if anything and make narrators feel like royalty
             | until they 're replaced.
             | 
             | This isn't unlike Uber initially paying drivers massive
             | bonuses and undercharging riders until they were able to
             | leverage their massive network to increase prices past what
             | the taxi providers they had decimated were charging. But in
             | this case the marginal cost of providing the service is so
             | low they don't even have to lose money to run a similar
             | play, just take less of it. (in other words, even ruthless
             | greed is not antithetical to paying these folks better)
        
       | bjackman wrote:
       | Really glad these products are appearing!
       | 
       | So much of my time for "reading" is in a context where I can't
       | physically read, so audiobooks are incredibly useful. But being
       | limited to the set of books that gets recorded by the publisher
       | is a real shame.
       | 
       | Haven't tried it yet but AI TTV seems basically perfect now so
       | I'm very optimistic this will work great.
        
         | VierScar wrote:
         | I'm interested for this reason too, even listened to AI TTS
         | books before, but the issue is that they are very monotonous.
         | The tone almost never changes, nor the pacing, it's all
         | delivered with almost no variation which makes listening dull
         | and easy to lose focus
        
           | rapind wrote:
           | I recommend John Doe if using eleven labs. Maybe too much
           | variation, but I like it.
        
       | milofeynman wrote:
       | This raises an interesting question around the rights of the
       | author/publisher and who they sold their ebook rights to. If in 3
       | years we have a perfect AI voice that can read any book as good
       | or better than mid-level narrators, why would you ever buy an
       | audiobook when you could just buy the ebook and pick your
       | voice(s). What a time to be alive
        
         | evrenesat wrote:
         | The rise of streaming has made CDs and other offline media
         | obsolete and publishing rights for them largely irrelevant.
         | Audiobooks are likely to face a similar demise. One by one, all
         | the frictions, I mean the colours of life, are fading away,
         | sacrificed for the sake of convenience.
         | 
         | Edit: I think the effect of the invention of vinyl on live
         | performers is more akin to how the commoditisation of HQ TTS
         | will be detrimental to audiobook narrators.
        
         | wiether wrote:
         | I guess it's the same with other jobs: AI will replace the
         | mid/low quality workers, but the good ones will keep delivering
         | something AI can't.
         | 
         | Two audiobooks that come to my mind:
         | 
         | - The Lord of the Ring series read by Andy Serkis; not only he
         | perfectly switches between each characters voice, but also the
         | feeling of listening "Gollum" for ours is something else
         | altogether
         | 
         | - David Goggins' books; the audiobook version is completely
         | different than the book, since he's not just reading the book,
         | and overall it makes the content easier to digest
        
         | vunderba wrote:
         | I don't know if you remember but some of the earlier Kindles
         | had both speakers and TTS built in but were sadly pressured to
         | remove the feature.
         | 
         | https://chasingperfection.co.uk/post/2013/01/14/text-to-spee...
        
       | jnsaff2 wrote:
       | It seems that this is using one of the less refined models. In
       | English it sounds like a 4th grader reading in front of a class.
       | Kinda stilted word by word voicing with static pauses between
       | words and no variation in intonation. Tried with two voices and
       | both are the same.
        
         | stavros wrote:
         | Well, you get what you pay for...
        
         | ipsum2 wrote:
         | I use ElevenReader on a weekly basis, and it sounds fine.
         | Definitely not what you describe.
        
       | csantini wrote:
       | You can get pretty close with open source software:
       | 
       | https://claudio.uk/posts/audiblez-v4.html
        
         | rapind wrote:
         | Oh wow. Thanks for posting! Samples sound great (on par with
         | eleven by my untrained ear). Will definitely use this.
        
         | neom wrote:
         | How does it hold up on long stuff? I use Elevenlabs Studio
         | daily and once things start to get into the chapters long, the
         | voice can really start to go off the rails. It'd say they've
         | solved a lot of this over the past 2/3 months, but it does
         | still happen on long stuff.
        
           | masteruvpuppetz wrote:
           | >> the voice can really start to go off the rails. Do you
           | mean the AI gets tired?
        
             | zaptrem wrote:
             | In autoregressive models error accumulates over time. He
             | likely means the voice starts to make odd sounds/gets lower
             | quality. It would be really interesting if OP could share a
             | clip of this phenomenon!
        
               | neom wrote:
               | Various different things can happen, it would take me
               | quite some time to dig up examples but at least with
               | elevenlabs you don't get the clicks and pops you get like
               | on notebook LM for example. 11labs instability comes in
               | the forms of intonation, pitch, accent, garbled words or
               | even once language. I've only seen it happen in the 3k+
               | words gen's I've done, usually actually around the 75%
               | point of the narration of whatever I've converted, and on
               | average lasting a couple of seconds top.
        
               | wrsh07 wrote:
               | Yeah - I've experienced this with eleven reader (I don't
               | think you can gen text this long anymore using the reader
               | app, lol) but switching voices fixed it for me
               | 
               | I can go back and try to repro and get a recording....
        
           | csantini wrote:
           | It holds up well, because Audiblez uses sentence splitting
           | (via Spacy models) before audio synthesis
        
         | ultrasounder wrote:
         | Bravo!
        
         | simongray wrote:
         | Oh no, it doesn't run on Apple Silicon. That's too bad.
        
           | _joel wrote:
           | > On my M2 MacBook Pro, on CPU, it takes about 1 hour, at a
           | rate of about 60 characters per second.
           | 
           | Umm, it does.
        
             | simongray wrote:
             | My bad. I misread the official website:
             | 
             | > We don't currently support Apple Silicon, as there is not
             | yet a Kokoro implementation in MLX. As soon as it will be
             | available, we will support it.
             | 
             | I thought that meant that it didn't support Apple Silicon
             | in general, but they were just talking about GPU support.
        
               | fl0id wrote:
               | though they wouldn't need to use MLX, could also use
               | pytorch etc
        
           | csantini wrote:
           | It works on Apple Silicon, but it doesn't use the GPU.
           | Because Kokoro has not been implemented yet in MLX
        
             | simongray wrote:
             | Ah my bad! I just read the "We don't currently support
             | Apple Silicon" on the official website, but I didn't
             | realise that only pertains to GPU support.
        
           | eamag wrote:
           | I wrote about a similar model for MLX that can run be on
           | apple silicon https://eamag.me/2025/Voice-Cloning
        
             | csantini wrote:
             | Hi eamag, this sounds great! I'm gonna try add it to
             | Audiblez
        
           | mhuffman wrote:
           | >Oh no, it doesn't run on Apple Silicon. That's too bad.
           | 
           | Interesting, because the hero image is a Mac App screenshot.
        
         | tonyhart7 wrote:
         | good, now how I can use this on mobile??
        
           | csantini wrote:
           | Generate the audiobook on a laptop and then listen to it on
           | mobile
        
             | tonyhart7 wrote:
             | this is the easy way, but I want the hard way
        
         | laurentlb wrote:
         | Interesting! This uses the Kokoro-82M model, which has a pretty
         | good quality, but the set of languages is still quite limited.
        
         | anonymous344 wrote:
         | does this run on linux machine also?
        
           | nkmnz wrote:
           | third line on the page right below the first image says: >
           | Audiblez 4.2 running on MacOSX via wxWidgets. Linux and
           | Windows are supported too
        
       | __rito__ wrote:
       | Is there a pricing page? I am not seeing any.
        
         | jampekka wrote:
         | In the FAQ:
         | 
         | > Is the app free?
         | 
         | > Yes. The app is completely free to download and use today.
         | Listening to content on the app will not consume credits from
         | your monthly web plan. We do plan to eventually launch some
         | premium version of the app, but even then we will maintain a
         | generous free plan.
        
       | mkmk3 wrote:
       | Damn, tried a unicornriot article [1] and it just skipped several
       | paragraphs past the grisly stuff.
       | 
       | Can anyone else confirm?
       | 
       | [1] - https://unicornriot.ninja/2024/sextortion-coms-inside-a-
       | vile...
        
         | ravetcofx wrote:
         | Important article but horrific content. It seems to read it all
         | for me.
        
           | mkmk3 wrote:
           | For sure, I saw where it was skipping and I wouldn't have
           | been surprised if it were intentional, but good to disprove.
           | Thanks for checking, have a good day
        
         | limo11 wrote:
         | Did you have some iconic voice selected? It skipped most likely
         | due to inappropriate content. You can try with some non iconic
         | voice
        
           | mkmk3 wrote:
           | Wasn't using an iconic voice but it does seem to be voice
           | specific, good call
        
       | macco wrote:
       | How is the quality compared to speechify?
       | 
       | I use it to listen to PDFs. It works, but has plenty of hiccups
       | with headers, footers and colons.
        
         | limo11 wrote:
         | Way better
        
       | sky2224 wrote:
       | The video shows scenarios of people listening to pdfs of pretty
       | dense material (e.g., computer science, bio mechanics).
       | 
       | Does anyone here actually have positive results doing this? It
       | seems to me listening to anything that's even remotely complex
       | with the intent of learning it just isn't something that's
       | feasible.
        
         | nice__two wrote:
         | That's my biggest gripe with audiobooks: good for fiction, not
         | so good for learning.
        
           | yreg wrote:
           | For me they are actually best for non-fiction, but it has to
           | be books. Papers are too information dense.
           | 
           | I get easily distracted and lose attention while listening to
           | an audiobook. This is usually problematic with fiction,
           | because suddenly I don't know who this new character is or
           | what's happening. And rewinding to the precise position where
           | I stopped paying attention is of course much more difficult
           | than in written text.
           | 
           | I found that non-fiction books work great for me, because
           | even if you ignore a page or two it makes no difference, the
           | author keeps repeating their point and propping it up with
           | many arguments anyway.
        
         | woodson wrote:
         | I used to have papers read to me via TTS when I had a long
         | commute. This was before the current crop of neural TTS, mind
         | you, so the quality and naturalness wasn't as good, but it was
         | good enough to tolerate and to get the gist of a paper. It
         | failed terribly on equations, of course, but that's often not
         | too important on the first reading.
        
         | qnleigh wrote:
         | It depends a lot on the paper. I've been using a TTS app to
         | read papers for years. Papers that are really equation dense,
         | convey they key ideas in figures or get too detailed aren't
         | listenable. But sometimes review articles or papers with one
         | clear message hit that sweet spot and are very listenable.
         | There's one topic where everything I know about it I learned by
         | listening to a review article on a long run. It was actually
         | quite pleasant!
        
         | neom wrote:
         | Severe dyslexia here, but ask me about any conversation or
         | audio book or class I've listened to. Gimme anything audio and
         | gimme it at 1.5x plz! I spend so much money gen'ing audio these
         | days but it's soooo nice to be able to learn so quickly now.
        
       | b33f wrote:
       | Is this streaming server-side audio or is the TTS running locally
       | on device ? Can it work offline ?
        
         | yawnxyz wrote:
         | all server-side
         | 
         | you could build your local TTS using kokoro browser though --
         | https://huggingface.co/spaces/webml-community/kokoro-webgpu
        
       | t0lo wrote:
       | This is definitely the future, I'm worried about the electric
       | slip and slide world we're heading into though, where everything
       | is completely spoonfed and consumptive. I can't help but think
       | we're heading back into animalism.
        
         | falcor84 wrote:
         | > heading back into animalism
         | 
         | Could you expand upon this? Any milestones towards that which
         | we should be mindful of?
        
           | t0lo wrote:
           | Technology is pretty quickly and apparently not only coming
           | for our critical thinking, but our agency
           | 
           | With llms, "knowing things" is already starting to feel like
           | a thing of a past, not to me, but to a lot of others, there's
           | no longer an incentive to "switch on".
           | 
           | Why should a kid learn anything if a robot is instantly
           | better at everything? Maths got replaced by calculators, deep
           | critical thinking will get replaced by llms a lot of the
           | time, which are word calculators, which is the closest thing
           | we have to a logic calculator.
           | 
           | This is more passive autopilot software, which further
           | promotes learning as something you 'consume' rather than
           | something you seek.
           | 
           | The public consciousness has absolutely taken a semptember 11
           | tier nosedive since social media, we're approaching what I
           | term cultual schizophrenia, which I posted about on my blog
           | which I deleted, but I've readded it if you're interested
           | [https://substack.com/home/post/p-156983317]. There's no
           | contextualisers in the media to give the right emphasis to
           | the right things.
           | 
           | This is just my perspective, from what I've seen from other
           | younger people of my age. We are heading into extremely
           | interesting times, everything profoundly destabalising thing
           | we've speculated about is happening at the exact same time.
           | We desperately need visionaries in politics.
           | 
           | Basically I'm not doing too hot
        
             | falcor84 wrote:
             | Some good observations there, but I'm still unclear on why
             | you used the term "animalism" - none of that seems to me at
             | all similar to how other species engage with the world.
        
             | brookst wrote:
             | I'm sorry you're stressed, but please at least consider
             | that you may be falling into the generational "kids these
             | days" trap. I'm old, so I have lived through the world
             | being on the brink of disasters caused by AI, social media,
             | gay marriage, violent video games, the internet in general,
             | cell phones, pagers, nuclear weapons, television. Probably
             | a bunch more world-ending crises I forgot.
             | 
             | The world is changing, but then again it always has been.
             | IMO some things will get better, some will get worse, but
             | ghe overall arc of human health and prosperity will
             | continue upwards. There is less poverty, less starvation,
             | more opportunity today than ever... even though some
             | aspects of the world are bad and getting worse. That's the
             | way it's always been.
        
       | Kabukks wrote:
       | Last time I tried Elevenlabs for German text, it got a lot of
       | numbers and dates wrong.
       | 
       | E. g. saying "1963" when the actual year in the text was 1967.
       | Yeah, the voices sound very realistic. But I'm not sure how
       | useful that is if you can't trust the spoken words.
       | 
       | Does anyone know if it got better in the last weeks?
        
         | aeroniero wrote:
         | Yes, it's better now, at least on the Reader app that I've
         | tried.
        
       | jeswin wrote:
       | The ad shows someone listening to an article or a story while
       | driving a large vehicle - this is unsafe (depending on the
       | individual). It's not like listening to music.
        
         | yreg wrote:
         | I'm curious, is there evidence it is unsafe?
        
           | jeswin wrote:
           | I can listen to a song while coding. I can't listen to a
           | podcast while coding. A podcast demands way more attention
           | than a song.
           | 
           | 1.2 million people die in road accidents, and most of them
           | are children and young people. Even more are seriously
           | injured.
        
             | cess11 wrote:
             | Are you saying you can't drive a car if passengers are
             | talking?
             | 
             | If that's the case, maybe a driver's license isn't your
             | thing?
        
             | yreg wrote:
             | That's a hypothesis but not evidence. I can present a
             | counter-hypothesis: I fall asleep while listening to music
             | (or staying in silence). Listening to spoken word keeps me
             | awake or at least helps me notice that I'm getting tired.
             | 
             | 1.2 million people die in road accidents, and most of them
             | are children and young people. Even more are seriously
             | injured.
        
             | brokensegue wrote:
             | People listen to talk radio while driving all the time
        
             | vunderba wrote:
             | If there were any substantial evidence of this, they would
             | have shutdown the entire A.M. spectrum 50 years ago.
        
       | mozzieman wrote:
       | The best ive heard but still too monotone over time compared too
       | real productions. Feel blown away at first but listen a chapter
       | or two gets difficult. Just a matter of time most likely until it
       | becomes as good or better then the real thing.
        
       | benrutter wrote:
       | I've been looking for a good and convenient way to read papers
       | that are published in PDF for a while.
       | 
       | Ideally, I'd be able to strip out the text content and send it to
       | my kindle in readable form. Since apparently that's science
       | fiction, this looks like a really good plan B! Will definitely
       | give it a go.
        
         | elashri wrote:
         | You can jailbreak you kindle [1] and install KOReader[2] and
         | this will allow you to do this science fiction.
         | 
         | [1] https://kindlemodding.org/jailbreaking/WinterBreak/
         | 
         | [2] https://koreader.rocks/
        
         | janpmz wrote:
         | You can try https://www.pdftomp3.com/ as well.
        
         | billbrown wrote:
         | Readwise Reader does PDFs very well (and apparently can do TTS
         | on them, but I've never tried that).
         | https://docs.readwise.io/reader/docs/faqs/text-to-speech
        
       | hiAndrewQuinn wrote:
       | This is excellent. I just tested the Finnish voices on my simple
       | news archive [1], and the pronunciation was quite good and clear.
       | 
       | It's unfortunate that I can't export audio clips locally;
       | otherwise I would immediately look into using this for generating
       | my Finnish flashcard decks from the same material [2]. I've
       | thought about doing the same with the audio and video feeds
       | included with this news broadcast, but getting Whisper to sync up
       | properly with what's written down and cutting up the raw audio in
       | that way still seems like more effort than I'm willing to invest
       | right now.
       | 
       | [1]: https://hiandrewquinn.github.io/selkouutiset-archive/
       | 
       | [2]: https://github.com/Selkouutiset-Archive/selkokortti
        
         | gwd wrote:
         | > It's unfortunate that I can't export audio clips locally;
         | otherwise I would immediately look into using this for
         | generating my Finnish flashcard decks from the same material
         | [2].
         | 
         | elevenlabs has an API which seemed quite reasonable when I
         | looked into it. A bit of python should get you what you want
         | pretty quickly.
        
           | hiAndrewQuinn wrote:
           | Oh! I'll look into that, thanks.
        
       | darkwater wrote:
       | I know I'm growing old but this is the kind of tech application
       | that I don't like. Arts should be the last thing to be 100% fully
       | done by a program. Enhancing capabilities in artists? Hell yeah.
       | Replacing completely voice actors? No, thanks.
        
         | ramonverse wrote:
         | AI voice is literally the only way I have to "read" an obscure
         | article during 1h non-static commutes.
        
           | darkwater wrote:
           | I understand, It can do things that weren't previously
           | possible, but it will also replace things that were done by
           | humans, by artists before. Overall, in my opinion, is still a
           | loss.
        
             | nathanyukai wrote:
             | "replace things that were done by humans" isn't a loss by
             | itself, if it frees up human labour to do other things. If
             | human replaced by AI can't find better things to do, such
             | that it makes them poorer, or anti-social its a loss but
             | not necessarily AI's fault.
        
               | Martinussen wrote:
               | Doesn't apply to all situations, but "replace things that
               | were done by humans" in _arts_ can absolutely be a loss
               | by itself. Making graphics /speech/video a commodity
               | doesn't replace designers, voice actors, or directors,
               | but we've definitely see it can directly harm them and
               | the people that enjoy their work.
               | 
               | > can't find better things to do, such that it makes them
               | poorer, or anti-social its a loss
               | 
               | I feel like this misses the point a bit - lost
               | income/sustainability for artists is obviously a big
               | issue we'll be facing, but looking for a performance
               | indicator in an artistic endeavour doesn't really get you
               | anywhere. There's more ways to value a painting than
               | "what the market would pay" and "potential heat output as
               | firewood", right?
        
               | brookst wrote:
               | How do you feel about what word processors did to the
               | typist career?
        
               | add-sub-mul-div wrote:
               | How do you feel about replacing general labor, period,
               | and doing so for a class that no longer maintains a
               | semblance of a social safety net? Do you think there's a
               | difference between displacing one profession and
               | displacing most professions at once?
               | 
               | Do you people ever step out of the abstract and think
               | about the actual context you're living in?
        
             | reustle wrote:
             | I think calling this art is a stretch, as they usually
             | aren't the author.
             | 
             | By automating it, it lowers the barrier to access this type
             | of audio content for the masses. If you want to choose to
             | pay someone you read something for you, the market allows
             | that. This feels like a net gain.
        
               | darkwater wrote:
               | If the AI content is good enough, nobody will use it, or
               | at least not in the numbers that Audible et similia had
               | before. It will just be a tiny minority following their
               | principles.
               | 
               | We lived this already with social networks. Initially us
               | tech enthusiasts were all like "it will democratize
               | access to news, it democratize producing the news!
               | curated work will still be there, it's a net gain". And
               | we all saw how it actually developed. As someone on the
               | Internet said, I want AI to do my laundry and repeating
               | task so I can do art or other more interesting things, I
               | don't want AI to do arts and force me to do laundry by
               | hand because due to AI taking my job now I don't have
               | money to pay for a washing machine.
        
               | haswell wrote:
               | > _I think calling this art is a stretch, as they usually
               | aren't the author._
               | 
               | I can't even remotely agree.
               | 
               | Narrating a book is absolutely an art. Listen to a book
               | narrated by Stephen Fry, and all other books will sound
               | awful. Considerable care and craft goes into a well-read
               | book.
               | 
               | But this is why I'm actually _excited_ about good TTS
               | tools. Not because I want to displace Stephen Fry, but
               | because there are so many books read by awful narrators
               | and something like ElevenReader would be a huge step up
               | in quality.
               | 
               | I share the parent commenter's concerns about the
               | displacement of artists, but I'm less convinced that TTS
               | tools are a net negative.
        
               | noizejoy wrote:
               | > I think calling this art is a stretch, as they usually
               | aren't the author.
               | 
               | So I guess in your worldview a concert violinist also
               | doesn't make art, when they are playing a Mozart
               | composition?
        
         | msh wrote:
         | I feel conflicted about this. I somewhat agreeing with you, but
         | the other hand not needing voice actors is a big help to people
         | with disabilities that prevent them from reading.
        
         | Kerbiter wrote:
         | Would've been valid if TTS was, indeed, art, but it's not.
         | Audiobooks won't be able to replace TTS in e-readers just
         | because they need to be produced first. And I don't think my
         | mom would be able to find an audiobook of all the Russian
         | books, or, especially, articles she's reading, and especially
         | synchronise it with the actual book in her reader app.
        
         | vunderba wrote:
         | Of all the criticisms leveled against GenAI, I'd say making the
         | case against "TTS on-demand" would probably be the weakest.
         | 
         | Having natural sounding TTS enhances accessibility for blind
         | users, enables language localizations, etc. It's 100% a win
         | even though there will be (and already is) disruption in the VA
         | community.
        
       | woadwarrior01 wrote:
       | Hasn't this been around for ~4 months? Interesting to see this
       | here, since their competitor Zyphra, just released two Apache 2.0
       | licensed open weights TTS models yesterday[1].
       | 
       | [1]: https://news.ycombinator.com/item?id=43004589
        
       | crakhamster01 wrote:
       | The generative podcasts feature feels so dystopian. I didn't
       | realize this SNL skit was based off of a real product lol
       | 
       | https://www.youtube.com/watch?v=ua4rYsMdC4U
        
       | juliendorra wrote:
       | You should try it with your own voice! (By first creating a
       | custom voice on the web interface. The quick basic clone should
       | be enough).
       | 
       | I found that it's my preferred way to use their reader, as it
       | makes the reading more neutral and transparent for my brain.
        
         | Klaster_1 wrote:
         | Personally, I can't stand my voice when I hear its recording. I
         | wish there was a way to easily tune it to sound more like what
         | you hear. Maybe even use that adjusted voice during calls.
        
         | layer8 wrote:
         | Most people don't like hearing their own voice (how it sounds
         | like in reality, not in your head).
        
       | leumon wrote:
       | Unfortunately the app is not compatible with Android 15.
        
       | jacek wrote:
       | I love the idea as I listen to a lot of podcasts and an
       | occasional audiobook.
       | 
       | The first impression is not that great. There's nothing natural
       | about the voice. While individual words and phrases sound good,
       | there's still no decent cadence and intonation. Feels flat and
       | robotic.
       | 
       | However, I will definitely experiment some more.
        
       | yapyap wrote:
       | yeah, no thanks.
       | 
       | if you are reading for information, I guess if this helps, sure
       | go ahead.
       | 
       | when reading for pleasure, this is not it though.
        
       | reustle wrote:
       | I've been using this for a few weeks, it works great. Can't wait
       | until this is built natively into browsers or even the OS (ios
       | voice is currently terrible)
        
         | frontalier wrote:
         | ios voice works better than read-aloud from chatgpt does. it
         | sucks but doesn't fail after the first paragraph or so
        
       | cube2222 wrote:
       | So, I wanted to like this, but frankly the quality isn't
       | fantastic.
       | 
       | The text to speech is alright, but it lacks almost any emotion,
       | and it reads everything literally, which when the article/pdf has
       | a weird layout, or has figures, doesn't sound natural. Though I
       | expect they're just not using their top-of-the-line models for
       | this - I've had much more luck pushing a pdf through Claude to
       | generate the "verbal version" (which is mostly literal, but also
       | describes the layout and figures) and then the result through the
       | top-of-the-line ElevenLabs model.
       | 
       | Now, I've also checked out the podcast feature, and it's pretty
       | clear they first do a textual generation, and then a simple text
       | to speech. Again, lack of emotion, very mechanical flow.
       | 
       | I made a podcast of a technical article[0] in both ElevenLabs
       | reader and Google's NotebookLM, and the NotebookLM podcast is a
       | night-and-day improvement - maybe they use a better model, maybe
       | they use straight "article to podcast" end-to-end multimodal
       | generation, I don't know, but the quality, flow, emotion, is just
       | on a completely different level. I had to quickly turn off the
       | ElevenLabs-generated podcast cause I couldn't keep listening to
       | it, while NotebookLM's one is legitimately enjoyable.
       | 
       | Now to finish on a more positive note, fingers crossed for the
       | ElevenLabs team improving this, and us getting some competition
       | in the area of article-to-audio, both podcast-style, and direct!
       | I think, in general, it's a very promising product direction.
       | Feature-wise, I would also love to get a daily overview podcast
       | based on all my RSS feed articles for a given day.
       | 
       | [0]: https://huggingface.co/blog/modernbert
        
       | andrewstuart wrote:
       | TTS seemed to take a great leap forward a few years ago and seems
       | to have stalled again.
       | 
       | Services are expensive and in most cases the voices are easily
       | detectable as not human. I would find it very hard to listen to
       | such voices for a long period of time.
       | 
       | Even ElevenLabs voices which seem to be known as the best have
       | only a few that are really good quality but even then they're
       | very, very far from the capabilities of a human.
        
       | wink wrote:
       | > Application error: a client-side exception has occurred (see
       | the browser console for more information).
       | 
       | Probably because I have WebGL disabled in this browser. Not
       | exactly sure what they're doing with it on the landing page,
       | maybe the fluffy effects.
        
       | whazor wrote:
       | Is there any technology that can do separate voices for each
       | individual person speaking in an audiobook?
        
       | xnx wrote:
       | Zonos is a new open weights text-to-speech model that has quality
       | at least as good as ElevenLabs: https://www.zyphra.com/post/beta-
       | release-of-zonos-v0-1
        
         | pg5 wrote:
         | When I type anything in their demo, it replies "I'm sorry but I
         | can't I'm sorry but I can't..."
        
         | waynenilsen wrote:
         | TTS is increasingly being commoditized.
         | 
         | Kokoro was posted and it works on webgpu, absolutely incredible
         | quality for where it can run
         | 
         | https://news.ycombinator.com/item?id=42973769
        
           | knowaveragejoe wrote:
           | Kokoro hasn't released their encoder, but they are already
           | moving on to a newer model. Hopefully they release that!
        
         | nialv7 wrote:
         | Hey, they are using Mamba! Happy to see Mamba is used at least
         | somewhere :/
        
       | davidanekstein wrote:
       | I use ElevenLabs to narrate tutorials for my app and I'm a happy
       | customer thus far.
       | 
       | Here is an example:
       | https://youtube.com/shorts/UKjqrydITLA?si=iC7ehp6LmlLH0M-U
        
       | layer8 wrote:
       | Does this work for reading articles on websites?
        
       | cooper_ganglia wrote:
       | The company I work for has been using ElevenLabs to translate
       | hour-long programs into Spanish, French, Portuguese, Greek,
       | German, and Chinese. We have a large international audience, so
       | it's worked great for this purpose!
       | 
       | Before, we were hiring people to translate, and then hiring
       | others to dub the audio. Now, our files are automatically
       | translated and spoken in the voice of the actual speaker, and we
       | just have a small Quality Control team of native speakers quickly
       | verify the results are accurate. We've reduced costs and
       | increased the quality of our translated media.
        
       | rickcarlino wrote:
       | I wish there was a reader app that was serious about text speech.
       | This is not it, unfortunately. Reader apps need to focus on a
       | text to speech experience that is identical to a music player so
       | that you can use the app while in hands free situations. The app
       | is also hard to use as a "read it later" tool on iOS.
       | 
       | I was really hoping they would fix these issues by now because it
       | was promising. This app truly does feel like a portfolio demo app
       | for a text to speech engine company rather than an actual reader
       | app.
       | 
       | UPDATE: yes, I have actually used the app, no it does not work
       | well. See replies for details.
        
         | rickcarlino wrote:
         | It's interesting that they show people going on runs and
         | driving cars in their demo videos. I'm pretty sure nobody
         | developing that app has actually gone on a run or driven a car
         | while using their app.
        
           | wrsh07 wrote:
           | Wow really? I use it all the time for ~equivalent activities
        
             | rickcarlino wrote:
             | How long are the articles you are reading? I'm reading blog
             | articles rather than long form content. My queue is in the
             | hundreds and the articles very in length from two minutes
             | to 20 minutes. I found it really annoying to need to push
             | buttons while driving to skip or auto play the next
             | article.
        
               | wrsh07 wrote:
               | Yeah, mostly super long form stuff. If it's only 2
               | minutes it's faster for me to just read it than to open
               | it in their reader
               | 
               | Fwiw, I would use their app way more if it were better.
               | Right now I use it for 1-2 long form articles at a time,
               | I am sometimes willing to push buttons in order to stay
               | focused but will bail out to eg my podcasts app if that
               | becomes untenable
        
         | billbrown wrote:
         | I find Readwise Reader to be a great RIL tool and I've used
         | their TTS on my phone. I can't say I use it enough to know if
         | it addresses your needs so I share this as "this might work for
         | you." https://docs.readwise.io/reader/docs/faqs/text-to-speech
        
         | jhiggins777 wrote:
         | Have you used it? I use it for both hands free and read later.
         | When I'm on a webpage I just use the safari share sheet to send
         | it to ElevenLabs Reader and then just listen whenever I have
         | time.
        
           | rickcarlino wrote:
           | Let's say I have article 20 article articles of two minutes
           | length each. On the iOS app, there are no next buttons and it
           | does not automatically play the next article. If I am on a
           | long drive, or I am running for two hours with my phone in my
           | bag, I would need to reach into my bag and open the app every
           | time and click the next article. If I I don't like the
           | article I am listening to, there is no way to skip to the
           | next article using integrated controls on a Bluetooth device.
           | These features already exist on apps like Pocket.
        
         | Slippery_John wrote:
         | Speechify is pretty good. You gotta pay to get the most out of
         | it, but I use it enough to justify it. (Mostly for an
         | egregiously long serial novel.) Sometimes there's jank, but the
         | support and dev teams are super responsive.
        
         | culi wrote:
         | I only heard of Eleven today. Downloaded and tried it and I was
         | actually shocked by how well it works. It works perfectly with
         | my headphones and I can skip forwards or backwards as I want. I
         | can change the speed of the voice (tho, that does get a little
         | buggy). I just put in a random Aeon article and was shocked how
         | quickly it did everything. Even giving me an audio length
        
         | dyauspitr wrote:
         | I don't know- I used the app last night as an audiobook reader
         | before going to bed and it had automatic chapter detection, a
         | sleep timer and you could even click on a word and it would
         | start reading from there. It's pretty solid.
        
       | Kerbiter wrote:
       | Would've been great as a TTS component that could be installed
       | and used in existing e-readers.
        
         | freefaler wrote:
         | Android Moon Reader Pro TTS plugin works OK for me.
        
       | sys32768 wrote:
       | I was briefly excited to try this on out-of-print books I find on
       | Google Books, but alas the OCR in Acrobat PRO is super glitchy.
       | 
       | I need to find some AI-assist OCR to fix tons of mistakes like
       | "186o" for 1860 or "gla)" for glad.
        
         | eigenvalue wrote:
         | I made a site like that, fixmydocuments.com
         | 
         | Also check out my open source project for that:
         | 
         | https://github.com/Dicklesworthstone/llm_aided_ocr
        
           | sys32768 wrote:
           | Will definitely check it out.
           | 
           | Hyphenated words, page numbers and chapter titles seem to be
           | my main issue. I can easily do search and replace on chapter
           | titles though.
        
       | dazzaji wrote:
       | I rely on ElevenReader several times a week for quick text to
       | voice on snippets of text I'm working on or sometimes on full web
       | pages when I hand it a url. It's quick and easy to use and the
       | performance and quality is high.
        
       | smoothbenny wrote:
       | Tried this app last week w/ an EPUB. It read all of the drop caps
       | as individual letters, before moving on to the remaining portion
       | of the word. It said "tilde" before each item in an unordered
       | list. Too distracting to be of any practical use for me, unless
       | there's a setting I missed.
        
       | _qua wrote:
       | I recognize and appreciate that this is free right now. But
       | surely it won't always be. And I can't keep paying $10-20/mo for
       | every individual AI tool.
        
       | berbec wrote:
       | I have used Moon+ Reader [1] for years with the build-in Android
       | TTS service. It works very well, is free, and sounds good enough
       | for me.
       | 
       | 1: https://moondownload.com/
        
       | randysalami wrote:
       | I've actually used this extensively for months now since it's
       | free and works with PDFs I've downloaded off the internet. I was
       | so frustrated with ridiculously overpriced TTS (must pay for
       | annual sub! no monthly) when I found this gem.
       | 
       | My main use case is comp. sci and philosophy books. I download
       | PDFs of varying quality off the internet onto my phone and import
       | them into this app. The text translation is always solid but for
       | the former, graphs and diagrams really break it. It's a tricky
       | problem because these often are important to the text so skipping
       | them (for the app) isn't ideal but the current solution just
       | makes the reader goof up. I think it would be cool if the model
       | could identify these objects and maybe generate some text
       | describing the object and TTSing that. Minor gripe and for the
       | latter, it's perfect.
       | 
       | I've probably used this app for 70 reading hours at 1.5x speed
       | across long road trips and walking my dog at the park. I've
       | gotten through numerous books I wouldn't have and for free. I'm
       | happy!
       | 
       | (annoying bug I find often: it seems certain characters or tokens
       | just break it and it freezes. I need to manually skip ahead
       | hoping it doesn't get stuck again. Really detracts from the hands
       | free nature and is difficult to manage while driving)
        
       | flakiness wrote:
       | Are there any good papers from which I can learn the recent
       | development of TTS tech?
        
       | jdlyga wrote:
       | The voices are excellent, but the app needs work. It lost my
       | place in a book a few times, so I switched back to VoiceDream
       | (don't use VoiceDream, it stinks unless you're a legacy
       | purchaser).
        
       | wedn3sday wrote:
       | I immediately copy/pasted in some smut to check if it was going
       | to lecture me on my moral failings and was pleasantly shocked to
       | find a corporate AI model that did what I asked without pushing
       | puritanical nonsense one me.
        
       | zoba wrote:
       | I've been enjoying this app except I could not find a way to
       | export the content to an audio file. I want to send the content
       | to others - I'd even take a link to a website with a Play button
       | (just not one that forces an app download)
        
       | BeetleB wrote:
       | If you want free/ultracheap, the Google Cloud TTS is good enough
       | for simple use cases. You get enough free minutes that it may end
       | up being free (I think I've paid a cent so far).
       | 
       | Some of their voices sound very artificial, some very real. I've
       | been slowly making a list of the good ones.
       | 
       | I use it to convert long articles into audio, and have a script
       | to add it to my podcast feed to listen to while driving:
       | 
       | https://blog.nawaz.org/posts/2024/Apr/reading-articles-via-p...
        
         | kvn8888 wrote:
         | Chirp (HD) gives you $30 per 1M characters for free on the free
         | tier also
        
           | BeetleB wrote:
           | I'd have to analyze my usage. For me, having used it for over
           | a year cost me a penny. If I can ensure my total cost is less
           | than $1/month, I'll consider it if the quality is really
           | good. The Google one is "good enough", but not great.
           | 
           | One other feature I'd really like: Having the AI figure out
           | who is saying what and use different voices (e.g. one voice
           | for overall narrator, and separate voices for each person who
           | is quoted in the article).
           | 
           | Not sure if any of the solutions out there do that
           | automatically without my guidance.
           | 
           | (Still probably wouldn't pay more than $2/mo for it - I just
           | don't use it often enough to justify paying much).
        
             | kvn8888 wrote:
             | The audio quality is amazing. It's transformer based. I use
             | it occasionally
        
             | wombatpm wrote:
             | You start doing that for text from ebooks and Audible is
             | going to want to have words with you.
        
       | ratedgene wrote:
       | Honestly, why isn't this same service baked into my OS? the
       | reader there is really atrocious, but I imagine even for a single
       | voice a pretty small model can be downloaded and made available
       | as a plugin for the reader app.
        
       | codybontecou wrote:
       | I just wish this had a Chrome Extension so I can listen to
       | article while on my computer.
        
       ___________________________________________________________________
       (page generated 2025-02-12 23:01 UTC)