hngopher.com

       [HN Gopher] Google Illuminate: Books and papers turned into audio
       ___________________________________________________________________
        
       Google Illuminate: Books and papers turned into audio
        
       Author : leblancfg
       Score  : 275 points
       Date   : 2024-09-10 16:22 UTC (6 hours ago)
        
 (HTM) web link (illuminate.google.com)
 (TXT) w3m dump (illuminate.google.com)
        
       | fny wrote:
       | Very clever use case. I'm presuming the set up here is as
       | follows:
       | 
       | - LLM-driven back and forth with the paper as context
       | 
       | - Text-to-speech
       | 
       | Pricing for high quality text to speech with Google's studio
       | voices run at USD 160.00/1M count. And given the average 10
       | minute recording at the average 130 WPM is 1,300 words and at 5
       | characters per word is 6500, we can estimate an audio cost of $1.
       | LLM cost is probably about the same given the research paper
       | processing and conversation.
       | 
       | So only costs about $2-3 per 10 minute recording. Wild.
        
         | paxys wrote:
         | Retail pricing != Google's actual cost.
        
           | jhickok wrote:
           | I would actually be surprised if companies are focusing on
           | profit at this stage.
        
         | wg0 wrote:
         | There's no guarantee that the discussion would be accurate.
         | This stems from how the LLMs work.
        
       | freefaler wrote:
       | Great idea. I wonder how long until we'd see a lot of
       | "autogenerated" podcasts with syndicated advertising inside
       | spamming the podcast space.
       | 
       | Like with robovoiced videos on YT reading some scraped content.
        
         | cut3 wrote:
         | Amazon has a project for this already, apparently they are
         | using voice actors to train it.
        
         | TranquilMarmot wrote:
         | Would you listen to an auto-generated podcast? Seems like
         | removing the humans from the equation kind of defeats the
         | purpose.
        
           | LordShredda wrote:
           | People have been reading bot spam for ages, and already watch
           | auto generated spam. I'd expect this to pick up once it gets
           | cheap enough
        
           | netghost wrote:
           | I don't know, it depends on whether I get to control the auto
           | generated podcast or someone else.
           | 
           | If I get to control it and I can have it draw in enough
           | interesting angles into something, I think it could be fun. I
           | wouldn't replace one of my favorites, but I'd gladly use
           | something that could generate creative new content.
        
           | Jeff_Brown wrote:
           | If it seemed full of annoying product placement, no. If the
           | content and presentation were sufficiently good, yes.
           | 
           | I believe (but then again I also want to believe, so make of
           | this what you will) that I'd be holding the AI to only the
           | same standards I hold humans to. It's not like I'm trying to
           | build a relationship to the speaker in either case.
        
           | AuthError wrote:
           | I would watch history pods for sure
        
           | pavel_lishin wrote:
           | If it gets good enough, you wouldn't even know.
        
           | freefaler wrote:
           | Being auto-generated is not the problem. I listen to a lot of
           | text-to-speech voiced articles and epub books now.
           | 
           | The problem is that filtering/searching on that massive
           | catalog and weeding the useless stuff out.
        
             | smeej wrote:
             | Are you doing that with "old-fashioned" TTS, or have you
             | found a good resource for uploading your own docs/epubs and
             | having them read back by one of these higher quality
             | synthesized voices? (I've been looking for the latter, but
             | not having much luck.)
        
               | freefaler wrote:
               | Just old-school TTS from Acapella, a paid one Heather. I
               | got used to it before there was a wide selection on
               | Audible and it's ok.
               | 
               | You can't use audio for serious books or articles but
               | History, Biographies, Fiction, random tech articles
               | bookmarked in Pocket and it's locally generated, so no
               | latency is great.
               | 
               | Additionally, when you use a TTS engine, you can see the
               | text and easily copy the things you want to make a note
               | on later. With Audiobooks it's not possible.
        
               | staticman2 wrote:
               | Elevenlabs reader does AI voices for free, not sure if
               | they'll start charging at any point since I don't know
               | how this fits into their business model.
        
               | freefaler wrote:
               | It'll be great when the AI generation gets on device and
               | you won't need to pay per minute of text generated.
               | Elevenlabs would burn through the investors' money
               | someday and they'd stop subsidizing the reader voice
               | generation.
        
               | smeej wrote:
               | It won't run on GrapheneOS, and I don't have any other
               | Android phones. They hide behind "security," but I don't
               | buy it. What risk is there?
        
           | ertgbnm wrote:
           | Depends on the what you are trying to get out of a podcast.
           | Most of the podcasts I listen to are because I want to learn
           | something new in an entertaining format. I'm not listening to
           | develop parasocial relationships with the hosts, so removing
           | that element could be a good thing for me.
           | 
           | Of course if you listen to podcasts because you like the
           | parasocial aspect or the celebrity interviews, then yeah...
           | Not really a point.
        
             | smeej wrote:
             | I don't know that "parasocial relationships" are the
             | primary reason people like having real hosts. I have a huge
             | list of things I've managed to change in my life because I
             | heard some other real person talking about how they were
             | possible. Listening to these people over time and realizing
             | there's nothing about them that's so special that it makes
             | things possible for them that aren't possible for me gets
             | me off my butt to set about the hard work of making the
             | changes I didn't otherwise realize were possible.
        
               | panarky wrote:
               | In the same way that corporations are people, my friend,
               | AI-generated and AI-voiced summaries of works by real
               | people are also people, my friend.
        
               | smeej wrote:
               | I don't think we're friends, bot...
        
               | hluska wrote:
               | You called a long term user a bot in the most rude way
               | imaginable. Not only are you bad at spotting bots, but
               | you're rude about it for no reason. Good for you - you
               | must feel very accomplished.
        
             | tiltowait wrote:
             | IMO, a lot of the best podcast content comes from a
             | spontaneous tangent. You'd lose those moments with
             | autogenerated podcasts.
        
               | OutOfHere wrote:
               | With regard to AI, it's easier to make a whole new
               | episode on a tangent. It works better this way.
        
           | culi wrote:
           | Maybe not a podcast, but I've often wished I could listen to
           | a paper or an article while on a long drive
        
             | phemartin wrote:
             | You may enjoy the product I've been working on...[0] it
             | lets you listen to articles and subscribe to any website.
             | 
             | [0] https://playtext.app
        
               | theologic wrote:
               | Cool app. The biggest issue for me is the voice sounds
               | very much like the typical system voice apps, when we are
               | seeing such leaps and bounds in the voice quality. But
               | your interface is simple and nice.
        
             | slashdave wrote:
             | Could be me, but the amount of attention I need to reserve
             | in order to properly read and understand a technical paper
             | makes this idea rather scary.
        
             | panarky wrote:
             | A great way to learn something is to listen to a
             | conversation among two to four well informed and articulate
             | people, where each person has a memorable personality and
             | each person has a different perspective about the topic.
             | 
             | This Google Illuminate experiment shows how just listening
             | to two voices discuss a technical paper for three minutes
             | is far more effective than reading a three-minute AI
             | summary of the paper.
             | 
             | Imagine if there were three or four voices, with varied
             | personalities, more humor and sarcasm, different priorities
             | and points of view, and even a little disagreement.
             | 
             | Then imagine you're not just listening to the conversation,
             | but you're participating in it. That seems like a pretty
             | amazing way to learn.
        
             | OutOfHere wrote:
             | Lookup podgenai.
        
           | narrationbox wrote:
           | A lot of our customers use us [0] for that, it works pretty
           | well if executed properly. The voiceovers work best as
           | inserts into an existing podcast. If you see the articles of
           | major news orgs like NYT, they often have a (usually) machine
           | narrated voiceover.
           | 
           | [0] https://narrationbox.com
        
           | zoklet-enjoyer wrote:
           | I don't like podcasts that are conversations
        
           | tjr wrote:
           | I would be interested in seeing an AI developed to listen to
           | auto-generated podcasts, removing humans from the equation
           | altogether.
        
             | nine_k wrote:
             | Of course the whole point would be in adding an acoustic
             | side channel imperceptible to humans but affecting the
             | listening AI in interesting ways.
        
           | onlyrealcuzzo wrote:
           | Lots of people follow bots on Instagram and Twitter, etc.
           | 
           | Why not follow bots on YouTube and Spotify?
        
           | OutOfHere wrote:
           | I have been listening to podgenai for the past three+ months.
           | The point is to listen selectively to only the topics or
           | titles that interest you.
        
           | lxgr wrote:
           | Personally, probably not.
           | 
           | I actually quite often wish I could access a condensed
           | version of a few podcasts in text form. Sometimes there's
           | little nuggets of information dropped by hosts or guests that
           | don't make it onto any other medium.
           | 
           | When I do intentionally listen to podcasts (i.e. as opposed
           | to having to, because that's the only available form of some
           | content), I do so because I enjoy the style of the
           | conversation itself.
        
           | dredmorbius wrote:
           | I listen to a number of podcasts which are reading books,
           | stories, literature, etc. Having a professional actor read a
           | text has appeal (e.g., _Selected Shorts_ ), but many are
           | less-than-professional. A sufficiently-competent automated
           | text-to-speech would fit at least some roles.
           | 
           | There are a few podcasts for which I'd have greater interest
           | if the narration were by someone _other_ than the current
           | host....
           | 
           | There are also services such as the National Library for the
           | Blind (UK) and BARD (US) which provide books, including a
           | large number of audiobooks, for the blind. Automated text-to-
           | speech would make a vastly larger library available,
           | particularly of very recent publications, niche publications,
           | and long-since-out-of-print books. Such services _do_ take
           | requests, but tend to focus on works published within the
           | past five years.
        
             | blueboo wrote:
             | What are your favourites? A podcast curating great short
             | stories sounds interesting, done well
        
               | dredmorbius wrote:
               | "Selected Shorts" is up there. My principle complaint is
               | that episodes remain live for only a month or so. If you
               | happen to catch an episode you like you'll have to keep
               | it downloaded. All but certainly on account of copyright.
               | 
               | Various non-English pods as well, to maintain / increase
               | fluency. Germany has a good set via Deutschlandfunk. I've
               | found a few in other languages, though tending toward
               | advertising-supported, which is less than ideal.
               | 
               | Searching for stories, literature, childrens' stories (a
               | surprisingly good way to learn basic vocabulary, grammar,
               | and culture), and history in your target language of
               | choice tends to be a pretty good guide.
        
         | fallinditch wrote:
         | Wondercraft have been offering this service for a while, and
         | produce some of their own auto-generated podcasts including the
         | Hacker News Recap which does an excellent job of summarizing
         | the most engaged posts on HN. https://www.wondercraft.ai/our-
         | podcasts
        
           | swyx wrote:
           | also for papers there is https://papersread.ai/ which does
           | not get nearly enough attention imo (the reading is meh, but
           | the curation is ace)
        
           | mmsc wrote:
           | This is a bit meta for me. A year ago a website was posted on
           | here HN which allowed you to visit a random website with an
           | /ideas page. For some reason it would always land me on the
           | same website, which outlined something close to this. The
           | idea was something like an RSS feed that would summarize all
           | the entries in the feed for the day/weekin the form of a
           | podcast.
           | 
           | I wonder if that was inspiration for Wondercraft.
        
         | evilkorn wrote:
         | I hate the robo voiced videos. I watch a lot of space content
         | and run into them often on the homepage. Usually easy to spot
         | with low views and 1k subs.
        
           | vletal wrote:
           | This sounds too good. It's not too far away from me having a
           | hard time wondering "is it just overly scripted corporate PR
           | podcast".
        
           | OutOfHere wrote:
           | That low-quality stuff has no relation to high-quality AI
           | created content.
        
         | OutOfHere wrote:
         | It isn't spam. It is the present and the future. Advertising
         | however is the spam.
        
       | oidar wrote:
       | The voice models for this are very good. I'd love to have
       | granular control over the output of a model like this locally.
        
         | willwade wrote:
         | Like SSML? See azure tts or google cloud tts, or ibm Watson or
         | even old school system tts like SAPI voices on windows. But I
         | hear you. In a VITS typical model system ssml isn't standard.
         | Piper tts does have it on the roadmap.
        
           | oidar wrote:
           | I just want programmable prosody. Prosodic controls would
           | allow much more believable TTS - apple used to have it on the
           | earlier TTS models, but these new TTS models sound so natural
           | at the phoneme level, but the prosody is often jacked up so
           | that it's easily identifiable as artificial.
        
       | smusamashah wrote:
       | Is that audio all generated? All the pauses, breaths, speed ups
       | and everything?
        
         | TranquilMarmot wrote:
         | From the "Help" modal:
         | 
         | "Illuminate is an experimental technology that uses AI to adapt
         | content to your learning preferences. Illuminate generates
         | audio with two AI-generated voices in conversation, discussing
         | the key points of select papers. Illuminate is currently
         | optimized for published computer science academic papers.
         | 
         | As an experimental product, the generated audio with two AI-
         | generated voices in conversation may not always perfectly
         | capture the nuances of the original research papers. Please be
         | aware that there may be occasional errors or inconsistencies
         | and that we are continually iterating to improve the user
         | experience."
        
           | smusamashah wrote:
           | Wow. I did not pick anything in the voice as a clue that it's
           | generated. So does it make it current best text to audio
           | system?
        
             | Legend2440 wrote:
             | I don't know if Google's specifically is the best, but
             | these new GenAI-based text-to-speech systems blow away
             | everything else.
        
         | achow wrote:
         | GCP's text to speech options, equally amazing
         | 
         | https://cloud.google.com/text-to-speech/docs/voice-types#cha...
        
       | colesantiago wrote:
       | So podcasts are now automated, anything with a speaker or a
       | screen is now assumed to be not human.
       | 
       | Is this supposed to be a good thing that we want to accelerate
       | (e/acc) towards?
        
         | consf wrote:
         | I think it depends on how we balance AI innovation with
         | preserving human elements in mdia
        
         | Jeff_Brown wrote:
         | If can tell where content came from, it's fine with me. If a
         | host of paid spammers or bots can astroturf an opinion and fool
         | me into thinking they are a wide demographic, that's a problem.
         | And it is -- but it predates LLMs.
        
         | thisoneworks wrote:
         | I honestly don't think this is all that big. What we are seeing
         | has been possible for more than 6 months now(?) with gpt4 and
         | elevenlabs, its just put together in a nice little demo website
         | and with what seems like a multi-modal model(?) trained on
         | nytimes the daily episodes lol. And no i don't think this will
         | gain all that much traction. We will keep valuing authentic
         | human interaction more and more.
        
         | throwthrowuknow wrote:
         | Man, it's going to blow your mind when you realize that all the
         | talking heads aren't real and never were.
        
         | drivers99 wrote:
         | like Max Headroom
        
       | bluelightning2k wrote:
       | This is really cool. Although I wouldn't put money on a Google
       | project sticking around even if it was a full fledged product!
       | 
       | More of a tech demo than anything else.
       | 
       | What's wild about this is that the voices seem way better than
       | GCP's TTS that I've seen. Any way to get those voices as an API?
        
         | bluelightning2k wrote:
         | Self-answer but leaving in case anyone else has the same
         | question... seems there are some new options in GCP TTS. Both
         | "studio" and "jorney" are new since I last checked (and I check
         | pretty often).
        
       | dlisboa wrote:
       | One problem I see with this is legitimizing LLM-extracted content
       | as canon. The realistic human speech masks the fact that the LLM
       | might be hallucinating or highlighting the wrong parts of a
       | book/paper as important.
        
         | gs17 wrote:
         | We'll have to see how it holds up for general books. The books
         | they highlighted are all very old and very famous, so the
         | training set of whatever LLM they use definitely has a huge
         | amount of human-written content about them, and the papers are
         | all relatively short.
        
         | shmatt wrote:
         | The top list of Apple Podcasts is full of real humans
         | intentionally lying or manipulating information, it makes me
         | worry much less about computer generated lies
        
           | dlisboa wrote:
           | Even if society is kinda collapsing that way people are still
           | less likely to listen to a random influencer's review of
           | biochemistry than a Professor in Biochemistry. These LLMs
           | know just as much about the topic they're summarizing as a
           | toddler, they should be treated with just as much skepticism.
           | 
           | There are hacks everywhere but humans lying sometimes have
           | implications (libel/slander) that we can control. Computers
           | are thought of in general society as devoid of bias and
           | "smart" so if they lie people are more likely to listen.
        
         | vanishingbee wrote:
         | Happens in the very first example:
         | 
         | [Attention is All You Need - 1:07]
         | 
         | > Voice A: How did the "Attention is All You Need" paper
         | address this sequential processing bottleneck of RNNs?
         | 
         | > Voice B: So, instead of going step-by-step like RNNs, they
         | introduced a model called the Transformer - hence the title.
         | 
         | What title? The paper is entitled "Attention is All You Need".
         | 
         | People are fooling themselves. These are stochastic parrots
         | cosplaying as academics.
        
           | aanet wrote:
           | I had the same exact thought - "Did this summary mis-
           | represent the title??" Indeed, it did. However, I thought the
           | end2end implementation was decent.
           | 
           | > These are stochastic parrots cosplaying as academics.
           | 
           | LOL
        
           | IanCal wrote:
           | It then goes on to explain right afterwards that the key
           | thing the transformer does is rely on a mechanism called
           | attention. It makes more sense in that context IMO.
        
           | wyldfire wrote:
           | I recently listened to this great episode of "This American
           | Life" [1] which talked about this very subject. It was
           | released in June 2023 which might be ancient history in terms
           | of AI. But it discusses whether LLMs are just parrots and is
           | a nice episode intended for general audiences so it is pretty
           | enjoyable. But experts are interviewed so it also seems
           | authoritative.
           | 
           | [1] https://www.thisamericanlife.org/803/greetings-people-of-
           | ear...
        
         | nine_k wrote:
         | Frankly, humans also sometimes remember things incorrectly or
         | pay excess attention to the less significant topics while
         | discussing a book.
         | 
         | In this regard, LLMs are imperfect like ourselves, just to a
         | different extent.
        
       | consf wrote:
       | Can podcasts creators benefit from this tool? I think so...
        
       | alganet wrote:
       | Cool tech. Now we know that very soon no one will be able to
       | trust podcasts or video narration.
        
         | Legend2440 wrote:
         | You shouldn't have been trusting podcasts in the first place,
         | Joe Rogan says plenty of false things no AI required.
        
           | lelandfe wrote:
           | Sure, but now now I - an idiot - can publish a podcast on...
           | "Bayesian Multilevel Models," and fool almost everyone into
           | thinking I know anything about it.
           | 
           | I've seen YouTubers provide tutorials on auto-creating
           | YouTube videos and podcast episodes on niche scientific
           | subjects, on how to build seemingly-reputable brands with
           | _zero_ ongoing effort. That is all totally novel. Being able
           | to lie or be wrong before is orthogonal to the real issue:
           | scale.
        
             | alganet wrote:
             | Scale has already been achieved with money (advertisement
             | revenue) and influence (politics agendas, fame) on a viral
             | platform.
             | 
             | What this tech brings is speed. If Google did it, someone
             | else will also do it.
        
             | throwthrowuknow wrote:
             | All the more reason to empower people to review, rate,
             | comment on, block, downvote, and otherwise signal when
             | something is incorrect.
        
           | alganet wrote:
           | It takes time for humans to say false things, record and edit
           | them.
           | 
           | This tech can allow "content creators" to spin hundreds of
           | podcasts with garbage simultaneously, saturating the search
           | space with nonsense. Similar to what is already being done
           | with text everywhere.
           | 
           | What makes one skeptic regarding conspiracionist ideas is
           | access and visibility to more enlightened content. If that
           | access gets disrupted (it already has been), many people will
           | not be able to tell the difference, specially future
           | generations.
        
       | dgellow wrote:
       | Really impressive. The podcasting spam we will get from this will
       | be a pain, but really impressive demo
        
         | jhickok wrote:
         | I honestly think it could be the opposite, and we will have
         | entire high-quality works of fiction at our fingertips.
        
       | nxobject wrote:
       | A related experiment from Google: NotebookLM
       | (notebooklm.google.com), which takes a group of documents and
       | provides a RAG Gemini chatbot in return.
       | 
       | I wish Google would make these experiments more well-known!
        
         | timmg wrote:
         | You also might find a similar feature arriving in that
         | product.. soon.
        
       | ansk wrote:
       | Imagine reading a math or programming textbook where each
       | statement was true with probability 0.95.
        
         | sno129 wrote:
         | Plenty of mistakes in textbooks and research articles, it's
         | possible the probability is already even lower.
        
           | slashdave wrote:
           | That just means you are adding errors on top of existing
           | ones, hardly an improvement
        
         | throwthrowuknow wrote:
         | errata. Also real humans often make mistakes in live
         | interviews. The biggest difference is that eventually these
         | fake humans will have lower error rates than real ones.
        
           | contagiousflow wrote:
           | > eventually these fake humans will have lower error rates
           | than real ones
           | 
           | Source?
        
       | danesparza wrote:
       | I wonder how soon until this waitlisted service eventually gets
       | thrown on the trash heap that Google Reader is on.
       | 
       | Building trust with your users is important, Google.
        
       | syntaxing wrote:
       | I've been using the ElevenLabs Reader app to read some articles
       | during my drive and it's been amazing. It's great to be able to
       | listen to Money Stuff whenever I want to. The audio quality is
       | about 90% there. Occasionally, the tone of the sentence is wrong
       | (like surprised when it should be sad) and the wrong enunciation
       | (bow, like bowing down or tying a bow) but still very listenable.
        
       | bogwog wrote:
       | What does this accomplish? Who does this help? How does this make
       | the world a better place?
       | 
       | This only seems like it would be useful for spammers trying to
       | game platforms, which is silly because spam is probably the
       | number one thing bringing down the quality of Google's own
       | products and services.
        
       | nonrandomstring wrote:
       | I think I just discovered a new emotion. Simultaneous feelings of
       | excitement and disappointment.
       | 
       | No matter how great the idea, it's hard to stay excited for more
       | than a few microseconds at the sight of the word "Google". I can
       | already hear the gravediggers shovels preparing a plot in the
       | Google graveyard, and hear the sobs of the people who built their
       | lives, workflows, even jobs and businesses around something that
       | will be tossed aside as soon as it stops being someone's pet
       | play-thing at Google.
       | 
       | A strange ambivalent feeling of hope already tarnished with
       | tragedy.
        
       | srameshc wrote:
       | We are working on something content driven (for an ad or
       | subscription model) with lot of effort and time and I am
       | concerned how this technology will affect all that effort and
       | eventually monetization ideas. But I can see how helpful this
       | tool can be for learning new stuff.
        
       | timonoko wrote:
       | Works surprisingly well. I actually bothered to listen
       | "discussions" about these boring-looking papers.
       | 
       | English is particularly bad to read aloud because it is like
       | programming language Fortran based on immutable tokens. If you
       | want tonal variety, you have to understand the content.
       | 
       | Some other languages modify the tokens themselves, so just one
       | word can be pompous, comical, uneducated etc.
        
       | albert_e wrote:
       | the player always starts at 30:00 for me and plays a 4 to 7
       | minute cllip that seems complete but very brief
        
       | Ninjinka wrote:
       | the Lexification/Roganization/Dwarkeshing/Hubermanning of reading
        
       | srik wrote:
       | Nothing is real anymore.
        
         | airstrike wrote:
         | Might as well dive into the deep end of the metaverse
        
         | kornhole wrote:
         | AKA fake and gay
        
       | bitshiftfaced wrote:
       | Occasionally there's a podcast or video I'd like to listen to,
       | but one of the voices is either difficult to understand, or in
       | some way awful to listen to, or maybe the sound quality is really
       | bad. It would be nice to have a an option for an automatically
       | redubbed audio.
        
         | wintermutestwin wrote:
         | I sure do wish podcasters would learn about compression. I am
         | constantly getting my ears blown out in the car from a podcast
         | with multiple speakers who are at different volumes.
        
           | swyx wrote:
           | podcaster here. what does compression have to do with it?
           | youre just talking about different levels from diff mics
        
             | semi-extrinsic wrote:
             | Probably a lot of the problem GP is describing comes from
             | people having inconsistent distance to their microphone,
             | moving around a lot. Then using an audio compressor effect
             | plugin is an appropriate answer.
             | 
             | I've often thought about adding a compressor pedal to my TV
             | sound system. It would be excellent for when you're
             | watching action movies with hard to hear dialogue mixed
             | with loud noises, and the kids are asleep, so you spend the
             | evening turning volume up and down eight times per minute.
        
             | drivers99 wrote:
             | Setting the levels equally to start would help, but doesn't
             | control when someone suddenly gets loud. With compression,
             | you can increase quiet sounds, decrease loud sounds, or
             | both.
             | 
             | https://en.wikipedia.org/wiki/Dynamic_range_compression
             | 
             | A type of compressor used to limit the maximum signal is a
             | limiter. "Limiters are common as a safety device in live
             | sound and broadcast applications to prevent sudden volume
             | peaks from occurring."
             | 
             | https://en.wikipedia.org/wiki/Limiter
        
       | fabmilo wrote:
       | so much pleasantry so much fluff. reduce the noise. get to the
       | point.
        
       | ants_everywhere wrote:
       | This is a good idea and well executed. I think the hard part now
       | is pointing it in an appropriate direction.
       | 
       | If it's just used for generating low quality robo content like we
       | see on TikTok and YouTube then it's not so interesting.
        
       | RobMurray wrote:
       | I couldn't listen for more than a couple of minutes. It's the
       | usual repetitive, over wordy llm generated drivel.
        
       | franze wrote:
       | Oh, another Google Waitlist...
        
       | SeanAnderson wrote:
       | I'm fairly excited for this use case. I recently made the switch
       | from Audible to Libby for my audiobook needs. Overall, it's been
       | good/fine, but I get disappointed when the library only has text
       | copies of a book I want to listen to. Often times they aren't
       | especially popular books so it seems unlikely they'll get a
       | voiceover anytime soon. Using AI to narrate these books will
       | solve a real problem I experience currently :)
        
       | banach wrote:
       | I can see this working reasonably for text that you can
       | understand without referring to figures, and for texts for which
       | there is external content available that such a conversation
       | could be based on. For a new, say, math paper, without prose
       | interspersed, I'd be surprised if the generated conversation will
       | be worth much. On the other hand, that is a corner case and,
       | personally, I suspect I will be using this for the many texts
       | where all I need is a presentation of the material that is easy
       | to listen to.
        
       | aanet wrote:
       | What a fantastic idea! Great way to learn about those pesky
       | research papers I keep downloading (but never get to reading
       | them). I tried a few, e.g. Attention is All You Need, etc. The
       | summary was fantastic, and the discussion was, well, informative.
       | 
       | Does anyone know how the summary was generated? (text
       | summarization, I suppose?) Is there a bias towards "podcast-style
       | discussion"? Not that I'm complaining about it - just that I
       | found it helpful.
        
       | oulipo wrote:
       | Why not, if you could also interject with questions, remarks, or
       | "cut the chase" like remarks.
       | 
       | Also it's weird that they focus only on AI papers in the demo,
       | and not more interesting social stuff, like environment
       | protection, climate change, etc
        
         | ftmch wrote:
         | Guess they want to avoid any political backlash that could
         | arise from topics like that, which will happen inevitably.
        
         | sandspar wrote:
         | Google's fingers get burned whenever it lets its AI touch
         | social topics.
        
       | leobg wrote:
       | I made something like this for my kids:
       | 
       | 1. Take a science book. I used one Einstein loved as a kid, in
       | German. But I can also use Asimov in English. Or anything else.
       | We'll handle language and outdated information on the LLM level.
       | 
       | 2. Extract the core ideas and narrative with an LLM and rewrite
       | it into a conversation, say, between a curious 7 year old girl
       | and her dad. We can take into account what my kids are interested
       | in, what they already know, facts from their own life,
       | comparisons with their surroundings etc. to make it more
       | engaging.
       | 
       | 3. Turn it into audio using Text-to-Speech (multiple voices).
        
         | flakiness wrote:
         | How do you get the source data (text) from a book? To me it is
         | the major roadblock for LLM-based commercial content
         | consumption.
        
           | leobg wrote:
           | Old books are on Gutenberg, archive.org etc.
           | 
           | Physical ones, I scan. Cutting the spine is easiest. But
           | today you can also just take pics with your phone.
           | 
           | Many retailers also sell EPUB. Which is just HTML.
           | 
           | Obviously, that's all for private consumption only. (Unless
           | you're OpenAI I guess. :-P)
        
             | flakiness wrote:
             | Oh you gotta serious! Salute to you from a lazy dad.
        
       | antirez wrote:
       | Related: [rumors] Audible is starting a pilot project to do just
       | that with the ebooks.
        
         | lxgr wrote:
         | At this point, this is seems more like a question of "how
         | soon", not if.
        
       | OutOfHere wrote:
       | Can it make something bigger than 5 minutes?
        
       | Analemma_ wrote:
       | Books I can understand, but I'm genuinely curious: would anyone
       | here find it useful to hear scientific papers as narrated audio?
       | Maybe it depends on the field, but when I read e.g. an ML paper,
       | I almost always have to go through it line-by-line with a pen and
       | scratchpad, jumping back and forth and taking notes, to be sure
       | I've actually "got it". Sometimes I might read a paragraph a
       | dozen times. I can't see myself getting any value out of this,
       | but I'm interested if others would find it useful.
        
         | creativenolo wrote:
         | I'm not sure "hear scientific papers as narrated audio" best
         | describes what this is. From the link:
         | 
         | > Illuminate generates audio with two AI-generated voices in
         | conversation, discussing the key points of select papers.
        
       | motoxpro wrote:
       | This is insane! To be able to listen to a conversation to learn
       | about any topic is amazing. Maybe it's just me because I listen
       | to so many podcasts but this is Planet Money or The Indicator
       | from NPR about anything.
       | 
       | Definitely one of the coolest things I have seen an LLM do.
        
       | vincentpants wrote:
       | Listening to an AI generated discussion-based podcast on the
       | topic of anticipating the scraping of deceased people's digital
       | footprint to create an AI copy of your loved one makes the cells
       | that make up my body want to give up on fighting entropy.
        
       | alenwithoutproc wrote:
       | it would be really _cool if we'd have a clubhouse-style gen-ai
       | feed for hn or reddit comments to listen to.
       | 
       | _ to me
        
       | belval wrote:
       | I guess I am in my grouchy old person phase but all I could think
       | of what the Gilfoyle quote from Silicon Valley when presented
       | with a talking refrigerator.
       | 
       | > "Bad enough it has to talk, does it need fake vocal tics...?" -
       | Gilfoyle
       | 
       | Found it: https://youtu.be/APlmfdbjmUY?si=b4-rgkxeXigU_un_&t=179
        
         | drivers99 wrote:
         | I would want to select a voice without vocal fry, which one of
         | the voices in these demos has.
        
       | layman51 wrote:
       | Did anyone else notice that according to the generation info,
       | each recording was created on 12/31/69 at 4:00 PM?
        
         | oneepic wrote:
         | That lines up with 1/1/70 0:00 UTC, but that's also hilarious.
        
       | e12e wrote:
       | Interesting - listening to the first example (Attention is all
       | you need)[1] - I wonder what illuminate would make of Fielding's
       | REST thesis?
       | 
       | [1] https://illuminate.google.com/home?pli=1&play=SKUdNc_PPLL8
        
       | CatWChainsaw wrote:
       | So it will immediately be trashed by GenAI bullshit and
       | killedbygoogle within three years, right?
        
       | elashri wrote:
       | One useful use case would be helping making academic papers more
       | accessible. It would be useful also for people to listen to arxiv
       | papers that seems interesting. It would be useful tool in
       | academic world. Also useful for students who would have more
       | accessible form of learning.
       | 
       | I have a project idea already to use arxiv RSS API to fetch
       | interesting papers based on keywords (or some LLM summary) and
       | then pass it to something like illuminate and then you have a
       | listening queue to follow latest in the field. Though there will
       | be some problems with formatting but then you could just open the
       | pdf to see the plots and equations.
        
       | yismail wrote:
       | I got in the beta a couple weeks ago and tried it out on some
       | papers [0]
       | 
       | [0] https://news.ycombinator.com/item?id=41020635
        
       | ElijahLynn wrote:
       | I've been meaning be the all you need is attention paper for
       | yours and never have. And I finally listened to that little
       | generated interview as their first example. I think this is going
       | to be very very useful to me!
        
       | yunohn wrote:
       | I listened to multiple demos, the pauses and vocal intonations
       | sound so fake. They're inserted at odd times that a real human
       | speaker would not.
        
       ___________________________________________________________________
       (page generated 2024-09-10 23:00 UTC)