[HN Gopher] Show HN: I generated 70k audiobooks with OpenAI Text...
       ___________________________________________________________________
        
       Show HN: I generated 70k audiobooks with OpenAI Text-to-Speech
        
       Hey HN. I'm Ivan, hacker from Ukraine.  For about a year, I was
       working on Listenly -- an app to listen to text content with
       OpenAI's natural-sounding text-to-speech model.  At some moment, I
       realized that it would be cool to take all the public domain
       e-books and create audio versions for them. So I did it... kind-of.
       It would cost an immense amount of money to generate all the audio
       right away (OpenAI TTS costs approximately $0.84/hour of audio;
       11labs, for comparison, is 10 times more expensive). So, I took a
       more gradual approach.  I took all the metadata from the Project
       Gutenberg catalog (it's about 70GB of dirty XML), cleaned it, put
       it into my database, and created a browsable catalog. When the
       first user visits a book page on Listenly, I download the full text
       of the book, save it in my cloud storage, and calculate the price
       for audio generation based on the book's length. Then, if the user
       decides to purchase it, we generate the audio.  I know it's not
       perfect.  I've burned out a couple of times already while doing it.
       But still, I need to show it to the world. And I'll be glad to hear
       your feedback.  Peace.
        
       Author : evan_ry
       Score  : 87 points
       Date   : 2024-07-14 15:07 UTC (7 hours ago)
        
 (HTM) web link (listenly.io)
 (TXT) w3m dump (listenly.io)
        
       | cranberryturkey wrote:
       | is your code on github somewhere?
        
         | evan_ry wrote:
         | No, it's closed-source for now. You think I should open-source
         | it?
        
           | cranberryturkey wrote:
           | I would love to contribute and run a version on my server.
           | Also you need a search engine. Your list is too long to click
           | through categories. The cool thing is you have monetized it
           | -- virtually no open source projects have monetization built
           | it.
           | 
           | If you don't want to open source it, send me an email:
           | anthony@chovy.com -- i'd like to collaborate with you
           | privately if I can run my own instance.
        
             | evan_ry wrote:
             | Interesting. We can meet and talk!
             | 
             | I'm really not sure about fully open-sourcing it. It's
             | generally good for developer-focused products, but for
             | Listenly... I just can't see the benefits. But I might be
             | very wrong.
        
               | cranberryturkey wrote:
               | There probably aren't a lot of benefits unless you want
               | to showcase your skills to employers.
               | 
               | Anyway, hit me up on https://t.me/chovy2 or
               | https://fightclub.profullstack.com -- I'd like to help
               | you out.
        
       | lobito14 wrote:
       | Why only Google account to login? also, why only dark theme when
       | so many users have difficulty reading on dark backgrounds?
        
         | evan_ry wrote:
         | Sorry for the inconvenience :(
         | 
         | Additional auth providers and UI theming were not a priority,
         | and frankly, this is the first time I have received such a
         | request.
         | 
         | But you're right, I definitely will do it.
        
       | scandox wrote:
       | How much listening have you done to the results? How do you feel
       | about the results? Just interested because I've listened to quite
       | a few AI readings (sometimes without knowing ahead of time) and
       | I'm still sort of processing my reactions.
        
         | cco wrote:
         | I'm curious about this as well, I listened to a few of the
         | samples and they seem to be of ok quality. But having
         | experimented with this myself a bit, things can be fine until
         | it chokes and you get very weird emphasis, emotion etc.
         | 
         | Side note, this almost feels like something 2012 Google would
         | have done, a la their scanning of the Library of Congress.
         | Something to show off their text-to-speech.
        
         | evan_ry wrote:
         | I personally actively listen to shorter-form content. I have
         | already listened to around 30 of Paul Graham's essays and the
         | "Shape Up" mini-book from Basecamp (both are free on
         | listenly.io/public-library).
         | 
         | Jason Cohen's blog posts and TechCrunch "Startup Weekly"
         | newsletter are also great to listen to.
         | 
         | In terms of books, I'm not as active. But actually, I like
         | Churchill's books much better with this AI narration than any
         | that I've found on Audible. It looks like they're trying to
         | narrate Churchill's books as if Churchill would, and it's not a
         | good thing.
         | 
         | I think it's already very good in terms of sound quality. If
         | not for fiction, then for professional literature, it's just
         | great.
         | 
         | Some people have already purchased some books, finding them
         | through Google (it is indexing all the pages right now, but it
         | is taking some time, as there are 100,000+ pages for all the
         | books, authors, and subjects).
        
       | qup wrote:
       | Nice job Ivan.
       | 
       | I expect your costs to drive down over time, which is nice.
        
       | mattferderer wrote:
       | I love how much better listening to books with AI has become.
       | 
       | Have you done any attempts at multiple narrators telling a story?
       | 
       | Microsoft's Azure has a great tool for doing this but it's time
       | consuming as you have to take all the text & match it to the
       | narrator by hand. Open AI's last big demo kind of showed using
       | voice chat to change narrator voices on the fly.
       | 
       | I think it would be awesome if you could submit a book, have a
       | simple tool parse through & find all the speakers. Then let you
       | sample how each one sounds with a brief description of what the
       | person is like. Basically you get to have each voice do an
       | audition & you pick your favorites. Then it goes through page by
       | page generating audio based on the voices selected.
       | 
       | I'm not suggesting this feature for the app. I'm just throwing
       | out this idea as one I've been thinking about. There have been a
       | lot of books I've wanted to listen to but don't have time to sit
       | down & read.
        
         | evan_ry wrote:
         | Yeah, I think it should be possible technically. Put all the
         | chapters through LLM and ask it to add markup for different
         | characters/voices.
         | 
         | Right now, my paid users are listening mostly to non-fiction,
         | so it seems like they don't need it.
         | 
         | But this whole Project Gutenberg saga is kinda diluting
         | everything, and I need to think which users/market to focus on.
         | 
         | Will see :)
        
       | scosman wrote:
       | Great project.
       | 
       | Pricing: maybe try a mobile app with monthly subscription?
       | Something for recurring revenue.
       | 
       | Features: can you generate at 1.5x speed? Might be more natural
       | than the playback speed up options and be a nice differentiator.
        
         | forgotpasagain wrote:
         | +1 on this. Even if the subscription isn't ideal for most
         | users, you will get more active users and a better feedback
         | loop with it.
        
           | evan_ry wrote:
           | Really interesting take!
        
         | evan_ry wrote:
         | I really thought that 1.5x playback speed would be the same as
         | 1.5x generation speed. Wow. Looks like I was wrong.
         | 
         | Regarding the subscription -- I thought that no subscription
         | was actually a competitive advantage, but now so many people
         | are telling me to do it, that I'm really not sure anymore.
        
       | harrisonjackson wrote:
       | Ah, nice! I've been doing something similar to convert web novels
       | --> epub --> mp3/m4b --> sorta a graphic novel --> sorta a video
       | / slide show
       | 
       | Here is pride and prejudice and up the thread you can see another
       | web novel example:
       | 
       | https://twitter.com/HarrisonJackson/status/18109373574214537...
       | 
       | ElevenLabs has so many great voice models but is super expensive.
       | I want to experiment with some oss voice models and even train my
       | own but not sure on a great starting point with that. Play.ht has
       | some good voices, too.
       | 
       | Seeing some of the results here with the openai tts I will
       | probably switch at least the narrator to use one of these to save
       | some money.
        
         | evan_ry wrote:
         | This is very cool!
         | 
         | I think you should try OpenAI's voices for characters too.
         | They're really good at catching the emotions. They even can
         | scream! https://x.com/ivryb/status/1780210661189992877
        
       | toddmorey wrote:
       | I definitely support your goal: take all the public domain
       | e-books and create audio versions for them. I think the "on-
       | demand" approach is kinda brilliant. Once a book is requested,
       | how long does it take to generate the audio file? Does it happen
       | in one shot?
       | 
       | I sadly found an AI audio project I don't support: This person
       | was instead summarizing popular books into 10 minutes of audio.
       | Basically trying to SEO better than the author and I know the
       | authors aren't compensated. That just left me feeling sad. (I
       | know book summaries for busy people have been a thing for a
       | while, but this just all felt so opportunistic.)
       | 
       | As I search podcasts these days, I'm finding more and more of
       | these low-effort, "doesn't take more than a few minutes to set
       | up, why not" type AI-generated spam cannons. Been hard for a
       | while but it's about to get REALLY hard to separate the wheat
       | from the chaff.
        
         | evan_ry wrote:
         | Right now, I'm splitting all the text into 4,000-character
         | chunks (OpenAI TTS limitation), and converting them into audio
         | "on-demand".
         | 
         | When it's like 1-2 minutes before the end of the current chunk
         | -- I'm starting to generate the next one, for a seamless
         | transition.
         | 
         | One chunk is taking about 30-40 seconds to generate (OpenAI API
         | is 20-30s, Azure OpenAI API is ~40s).
         | 
         | I was planning to convert the whole book (just by queuing and
         | parallelizing the requests) and concatenate it into a single
         | MP3 (or an MP3 for each chapter), but it's not ready yet.
        
         | meiraleal wrote:
         | I think it would be fair IF writers also paid royalties to
         | authors of books in the same genre/subjects they have read.
        
         | jokethrowaway wrote:
         | I like to watch short movie recaps on youtube instead of the
         | whole things.
         | 
         | I also read summaries of books for research purposes or for
         | dull school homeworks.
         | 
         | They both have a place before or after ai.
        
       | notsure357 wrote:
       | Are there any books among Project Gutenberg books that haven't
       | already been performed as an audiobook? Assuming that all of the
       | popular books in Project Gutenberg have an audiobook available to
       | purchase read by a human which is probably better quality or at
       | least more likely to be better quality, why would I want to pay
       | money for this instead? I don't see the value proposition here.
        
         | evan_ry wrote:
         | You're right about the popular books, but the long-tail of not-
         | so-popular ones doesn't have a human audio version, and
         | probably will never have.
         | 
         | Plus, sometimes available human narrations are so bad that you
         | really would like to listen to an AI one (I've experienced it
         | with Churchill's audiobooks on Audible).
         | 
         | I don't know if it will work. It felt like it should work, at
         | least for pSEO.
         | 
         | I got my first two audiobook purchases two weeks after I
         | submitted the sitemap to Google. It was some romantic novels.
         | But now it's flatlined again.
         | 
         | Will see...
        
         | agf wrote:
         | I see why you'd think a human-read one would be better, but in
         | my experience that's not the case. It's not that easy to read
         | out loud and actually sound good.
         | 
         | I've spent a fair amount of time listening to free audiobooks
         | (https://archive.org/details/librivoxaudio) including many that
         | are out of copyright like these, as opposed to modern but in
         | the public domain.
         | 
         | After listening to a few minutes of "Frankenstein" on his site,
         | I would say that these OpenAI generated voices sound better
         | than almost all of the human-read ones on Librevox, both in
         | audio and performance quality -- these are voices that are
         | designed to sound good, and they succeed at that.
        
       | jjcm wrote:
       | Very cool, and nice work on this! I used to record wikipedia's
       | articles in audio format to help those who had trouble reading,
       | so I'm a huge fan of anything that makes public domain work more
       | accessible.
       | 
       | As a rabid audiobook consumer, I do have a couple of suggestions.
       | 
       | An easy one - currently you only use the Onyx voice from OpenAI.
       | I'd recommend that at the very least you match the gender of the
       | voice to the gender of the author. I find this is pretty common
       | with published audiobooks, and I find it helps bring out the tone
       | of the author more.
       | 
       | A harder one - most great audiobook narrators change their voice
       | depending on the character speaking. If you really wanted to go
       | in depth here, parsing the text by character and matching them to
       | a voice would go a long way in making these more listenable. It
       | would be fairly straightforward (albeit more expensive) to parse
       | these books with an LLM and ask it to add inline markdown for the
       | right voice options for each speaking character.
        
         | evan_ry wrote:
         | I'm postponing the development of the voice selector for like 3
         | months already. Something more important is always popping up
         | xD
        
           | jjcm wrote:
           | Totally fair. Solo dev is hard, and those priority choices
           | are always a challenge. Remember that you have more context
           | than anyone else suggesting things here - I'm sure that 3mo
           | delay is for a reason. Great work so far!
        
         | jobigoud wrote:
         | I wonder if we are ripe for the following:
         | 
         | Given a great narration in one language, have a model annotate
         | the tone and emotion of the narrator for each sentence, and re-
         | apply these emotions to the voice synthesis for a target
         | language, on the translated version.
         | 
         | Narration/recitation is such an orthogonal axis to the story
         | and literary style, and an integral part of the experience.
        
         | delichon wrote:
         | > I'd recommend that at the very least you match the gender of
         | the voice to the gender of the author.
         | 
         | I prefer the voice to match the protagonist. Or better yet an
         | audio play with the narrator voice plus a voice matched to each
         | speaker.
         | 
         | This is the kind of bikeshedding that AI text-to-voice can make
         | moot. We can all have it our own way. That's an argument for
         | generating the voice just in time rather than as a batch. But
         | as long as such tools aren't ubiquitous this batch is a great
         | public service.
        
       | 42lux wrote:
       | Did you know that Microsoft did basically the same thing for free
       | last year?
       | 
       | https://marhamilresearch4.blob.core.windows.net/gutenberg-pu...
        
         | evan_ry wrote:
         | Holy sh*t, nope, didn't know it
        
         | evan_ry wrote:
         | Their TTS model is worse than OpenAI's though
        
           | code51 wrote:
           | Both will seem dull going forward. TTS will feel more natural
           | every passing year so the current spending for any TTS model
           | will seem kind of wasteful after 1-2 years.
        
       | frankohn wrote:
       | I created a similar project for the book _Madame Bovary_ , but in
       | French using the ElevenLabs API.
       | 
       | A sample of the first chapter is available here:
       | 
       | https://fairpublishing.org/index.php/ebooks/sample-audiobook...
       | 
       | The voice quality and pronunciation are excellent. However, the
       | system struggles with acting, so the tone and emotional
       | expression are often wrong during dialogues. Additionally, I have
       | to fragment the text into short paragraphs, making it challenging
       | to set appropriate break durations, resulting in an unnatural
       | rhythm.
       | 
       | Despite the technical quality and my appreciation for the reading
       | voice, I won't continue in this direction.
       | 
       | ElevenLabs is quite expensive, but it would be worth it if the
       | final result were good enough for listeners to purchase the
       | audiobook.
       | 
       | I don't know if using OpenAI's API in English would yield better
       | results. However, OpenAI's performance in non-English languages
       | is not satisfactory.
        
         | evan_ry wrote:
         | In general, it is not great for fiction right now, needs a lot
         | of improvement But for history/philosophy/science books its
         | great.
         | 
         | And yeah, OpenAI's model is bad for non-English languages. At
         | least, for now...
        
         | jokethrowaway wrote:
         | Bark is better in expressing the right emotions, but the voice
         | quality and hallucinations are bad.
         | 
         | Maybe generating a bunch of runs and then asking the users to
         | vote could get us the best narrated book overall.
        
       | gooseyman wrote:
       | Once generated, (I.e. a user pays for the audio to be generated)
       | does it become available to the public? If so, very cool!
        
         | mikae1 wrote:
         | If it works that way it's a rather nice setup. Would love to
         | have an answer from the developer.
        
           | evan_ry wrote:
           | Right now, it's not working like that.
           | 
           | I was thinking about it.
           | 
           | On the one hand, I want to make money. On the other hand, I
           | understand that making everything available for free would be
           | much more aligned with the Project Gutenberg philosophy.
           | 
           | I left my job, living on the savings, and in the last year
           | listenly made only $400 ~= $35 MRR. Although I was not doing
           | much marketing.
           | 
           | I'm dreaming of it making $1k, $3k, $5k MRR.
           | 
           | Right now, I set the price to be 50% of the API cost, so I
           | would make a profit starting from the 3rd same book purchase.
           | 
           | But maybe I should make it fully social project, get some
           | donations, and treat it as "lead magnet" to monetize
           | something else. I'm open to your suggestions!
        
             | jokethrowaway wrote:
             | I do AI consulting and I did some audio related projects
             | where I basically resold ElevenLabs + quality control. EL
             | is much better than OpenAI imho.
             | 
             | Monetizing is good but there is no value proposition in the
             | product.
             | 
             | The chances I'll get something I'd like to listen are low
             | because: - AI errors - AI lack of emotion - You picked a
             | voice I've heard in thousands of automatically generated
             | youtube videos and that I came to hate.
             | 
             | There is no chance I'd buy this, I'd rather buy an
             | audiobook made by a human.
             | 
             | Now, people may not understand that - but then they'll be
             | disappointed, bother you for a refund (chargebacks are 15$
             | a pop if you don't) or just speak badly about the project.
             | Repeating sales potential is pretty bad imho.
             | 
             | I hope I don't come across as rude.
             | 
             | If you are really set on this idea I'd recommend to
             | generate 1 book, make it perfect until it reads like it
             | should and then sell it on as many platforms as you can
             | (Amazon mainly I guess). Maybe use a custom cloned voice so
             | it will sound unique and constistent across all books. You
             | don't need a website but you have one so you might as well
             | use it for marketing and maybe to gauge interest for the
             | next book to process.
             | 
             | An audiobook is a good product in itself.
        
       | laurent_du wrote:
       | I think there may be issue with data collection. I tried
       | listening to some of pg's articles but they were cut off right in
       | the beginning, see e.g. 005 Lisp for Web Applications.
        
         | evan_ry wrote:
         | In this particular case, it's just that the blog post is
         | basically just a link: https://paulgraham.com/lwba.html
        
       | saberience wrote:
       | I guess you didn't hear about Librivox? Which allows anyone to
       | provide voiceovers for Project Gutenberg books. Much better than
       | AI generated voice in my experience.
        
         | evan_ry wrote:
         | I don't think that they contradict.
         | 
         | Maybe AI-generated books should also be a part of Librivox.
         | 
         | I tried to listen to some, but the quality of narration was
         | bad.
        
       | dv35z wrote:
       | If you're interested in further text to speech missions, I just
       | got Piper (open-source text-to-speech engine) running happily in
       | a Docker container on my Mac. Effectively "free", high quality,
       | fast-generating text-to-speech.
       | 
       | Check out their voice samples: https://rhasspy.github.io/piper-
       | samples/ (or make your own).
       | 
       | Happy to help you set it up locally...
       | 
       | https://github.com/rhasspy/piper
        
         | evan_ry wrote:
         | I don't think it's high quality, tbh
         | 
         | Much less enjoyable than with OpenAI TTS.
        
       | saberience wrote:
       | Also, I find it quite unethical that you're charging for public
       | domain books. It's frankly gross, in my opinion.
        
         | mikae1 wrote:
         | If the audio generation is paid for, by the first listener, it
         | will be available to everybody for free? No?
        
         | evan_ry wrote:
         | Well, someone has to pay for API calls :D
         | 
         | I was thinking about launching a Kickstarter campaign and
         | making the whole library free for everyone. But I need more
         | feedback. I don't know if it's viable.
        
       | j45 wrote:
       | Is there any open source text to speech library that's starting
       | to be half close or decent for something like this?
        
         | jokethrowaway wrote:
         | xTTS is not open source but you can download it and use it for
         | some things - and it's the nicest sounding one.
         | 
         | Bark has potential but the voice quality is pretty off.
         | 
         | The tortoise fork which improves the model and restores cloning
         | (the author of tortoise decided it was to dangerous and
         | crippled the project) is ok with some voices but it takes a lot
         | of tries.
         | 
         | Voicebox from Meta is pretty good, comparable quality to
         | ElevenLabs, but it's research-only for now.
         | 
         | Pretty sad overall.
        
       | ukuina wrote:
       | Can you support Apple Pay?
       | 
       | https://docs.lemonsqueezy.com/help/checkout/payment-methods#...
        
         | evan_ry wrote:
         | Seems like some kind of bug on the LemonSqueezy side. It is
         | enabled in the store settings, but I also cannot see it. Will
         | open a ticket.
        
       | dmje wrote:
       | So the model is - "first person pays, rest of community gets that
       | audio for free", have I understood that right?
       | 
       | Cos if so - cool, that's a lovely model. And you should make more
       | of it. There's a definite feel good factor associated with this.
       | You could probably also charge a bit more - $5 for a thing I get
       | alone vs $10 for a thing that I get but everyone else gets for
       | free too seems a no brainer incentive to me.
       | 
       | FWIW I find Omnivore[0] to be really compellingly realistic TTS.
       | I don't know what they use but it's pretty great imo.
       | 
       | [0] https://omnivore.app/
        
         | akudha wrote:
         | Another way to do this would be to crowd fund. Instead of 1
         | person paying 50$ per book (just a random number, I dunno how
         | much it costs) 10 people can pay 5$ each. 11th person onwards
         | can get it for free.
         | 
         | You could also get some credits from these companies in return
         | for advertising "this book is sponsored by blah company"
        
           | evan_ry wrote:
           | I answerd this here:
           | https://news.ycombinator.com/item?id=40963194
           | 
           | I like the idea of letting people donate the audio they
           | purchased to the community.
           | 
           | Although I'm scared that I'll have no money.
        
         | DidYaWipe wrote:
         | Or, the first person underwrites the initial generation, and
         | then gets some credits as subsequent people pay a small amount.
        
         | evan_ry wrote:
         | I checked out the omnivore TTS.
         | 
         | They're using a previous generation of TTS models, which most
         | of the reader apps are using. They're reasonable, cheap, but
         | sound noticeably worse than OpenAI's or 11Labs. I don't like
         | them.
        
       | eplatzek wrote:
       | I did some spot checks and the cadence and intonation of their
       | speech feels so natural. The sentences flow. It's the best I've
       | ever heard. Thanks for doing this.
        
       | jkbbwr wrote:
       | Honestly? The quality of the output is as expected, I wondered
       | how it would manage something like Shakespeare which depends so
       | heavily on iambic pentameter, instead AI does what it usually
       | does which is drone on at a slightly too fast speed, with no
       | natural pauses and no delivery. Honestly as with most things you
       | would be better paying for a human performance than relying on
       | this.
       | 
       | I wish the OP well, and the project is nicely designed. But AI
       | simply isn't there for this yet, not without a lot of individual
       | hand holding and extra work.
        
         | evan_ry wrote:
         | You should try listening to some non-fiction, such as history,
         | philosophy, biographies, etc.
         | 
         | It's already great for that purpose.
        
           | jkbbwr wrote:
           | They are better, but they still sound slightly unnatural to
           | me, the pauses are in the wrong places, or not long enough.
           | It takes me out of focusing on the actual words
        
       ___________________________________________________________________
       (page generated 2024-07-14 23:01 UTC)