[HN Gopher] Google Illuminate: Books and papers turned into audio
___________________________________________________________________
Google Illuminate: Books and papers turned into audio
Author : leblancfg
Score : 275 points
Date : 2024-09-10 16:22 UTC (6 hours ago)
(HTM) web link (illuminate.google.com)
(TXT) w3m dump (illuminate.google.com)
| fny wrote:
| Very clever use case. I'm presuming the set up here is as
| follows:
|
| - LLM-driven back and forth with the paper as context
|
| - Text-to-speech
|
| Pricing for high quality text to speech with Google's studio
| voices run at USD 160.00/1M count. And given the average 10
| minute recording at the average 130 WPM is 1,300 words and at 5
| characters per word is 6500, we can estimate an audio cost of $1.
| LLM cost is probably about the same given the research paper
| processing and conversation.
|
| So only costs about $2-3 per 10 minute recording. Wild.
| paxys wrote:
| Retail pricing != Google's actual cost.
| jhickok wrote:
| I would actually be surprised if companies are focusing on
| profit at this stage.
| wg0 wrote:
| There's no guarantee that the discussion would be accurate.
| This stems from how the LLMs work.
| freefaler wrote:
| Great idea. I wonder how long until we'd see a lot of
| "autogenerated" podcasts with syndicated advertising inside
| spamming the podcast space.
|
| Like with robovoiced videos on YT reading some scraped content.
| cut3 wrote:
| Amazon has a project for this already, apparently they are
| using voice actors to train it.
| TranquilMarmot wrote:
| Would you listen to an auto-generated podcast? Seems like
| removing the humans from the equation kind of defeats the
| purpose.
| LordShredda wrote:
| People have been reading bot spam for ages, and already watch
| auto generated spam. I'd expect this to pick up once it gets
| cheap enough
| netghost wrote:
| I don't know, it depends on whether I get to control the auto
| generated podcast or someone else.
|
| If I get to control it and I can have it draw in enough
| interesting angles into something, I think it could be fun. I
| wouldn't replace one of my favorites, but I'd gladly use
| something that could generate creative new content.
| Jeff_Brown wrote:
| If it seemed full of annoying product placement, no. If the
| content and presentation were sufficiently good, yes.
|
| I believe (but then again I also want to believe, so make of
| this what you will) that I'd be holding the AI to only the
| same standards I hold humans to. It's not like I'm trying to
| build a relationship to the speaker in either case.
| AuthError wrote:
| I would watch history pods for sure
| pavel_lishin wrote:
| If it gets good enough, you wouldn't even know.
| freefaler wrote:
| Being auto-generated is not the problem. I listen to a lot of
| text-to-speech voiced articles and epub books now.
|
| The problem is that filtering/searching on that massive
| catalog and weeding the useless stuff out.
| smeej wrote:
| Are you doing that with "old-fashioned" TTS, or have you
| found a good resource for uploading your own docs/epubs and
| having them read back by one of these higher quality
| synthesized voices? (I've been looking for the latter, but
| not having much luck.)
| freefaler wrote:
| Just old-school TTS from Acapella, a paid one Heather. I
| got used to it before there was a wide selection on
| Audible and it's ok.
|
| You can't use audio for serious books or articles but
| History, Biographies, Fiction, random tech articles
| bookmarked in Pocket and it's locally generated, so no
| latency is great.
|
| Additionally, when you use a TTS engine, you can see the
| text and easily copy the things you want to make a note
| on later. With Audiobooks it's not possible.
| staticman2 wrote:
| Elevenlabs reader does AI voices for free, not sure if
| they'll start charging at any point since I don't know
| how this fits into their business model.
| freefaler wrote:
| It'll be great when the AI generation gets on device and
| you won't need to pay per minute of text generated.
| Elevenlabs would burn through the investors' money
| someday and they'd stop subsidizing the reader voice
| generation.
| smeej wrote:
| It won't run on GrapheneOS, and I don't have any other
| Android phones. They hide behind "security," but I don't
| buy it. What risk is there?
| ertgbnm wrote:
| Depends on the what you are trying to get out of a podcast.
| Most of the podcasts I listen to are because I want to learn
| something new in an entertaining format. I'm not listening to
| develop parasocial relationships with the hosts, so removing
| that element could be a good thing for me.
|
| Of course if you listen to podcasts because you like the
| parasocial aspect or the celebrity interviews, then yeah...
| Not really a point.
| smeej wrote:
| I don't know that "parasocial relationships" are the
| primary reason people like having real hosts. I have a huge
| list of things I've managed to change in my life because I
| heard some other real person talking about how they were
| possible. Listening to these people over time and realizing
| there's nothing about them that's so special that it makes
| things possible for them that aren't possible for me gets
| me off my butt to set about the hard work of making the
| changes I didn't otherwise realize were possible.
| panarky wrote:
| In the same way that corporations are people, my friend,
| AI-generated and AI-voiced summaries of works by real
| people are also people, my friend.
| smeej wrote:
| I don't think we're friends, bot...
| hluska wrote:
| You called a long term user a bot in the most rude way
| imaginable. Not only are you bad at spotting bots, but
| you're rude about it for no reason. Good for you - you
| must feel very accomplished.
| tiltowait wrote:
| IMO, a lot of the best podcast content comes from a
| spontaneous tangent. You'd lose those moments with
| autogenerated podcasts.
| OutOfHere wrote:
| With regard to AI, it's easier to make a whole new
| episode on a tangent. It works better this way.
| culi wrote:
| Maybe not a podcast, but I've often wished I could listen to
| a paper or an article while on a long drive
| phemartin wrote:
| You may enjoy the product I've been working on...[0] it
| lets you listen to articles and subscribe to any website.
|
| [0] https://playtext.app
| theologic wrote:
| Cool app. The biggest issue for me is the voice sounds
| very much like the typical system voice apps, when we are
| seeing such leaps and bounds in the voice quality. But
| your interface is simple and nice.
| slashdave wrote:
| Could be me, but the amount of attention I need to reserve
| in order to properly read and understand a technical paper
| makes this idea rather scary.
| panarky wrote:
| A great way to learn something is to listen to a
| conversation among two to four well informed and articulate
| people, where each person has a memorable personality and
| each person has a different perspective about the topic.
|
| This Google Illuminate experiment shows how just listening
| to two voices discuss a technical paper for three minutes
| is far more effective than reading a three-minute AI
| summary of the paper.
|
| Imagine if there were three or four voices, with varied
| personalities, more humor and sarcasm, different priorities
| and points of view, and even a little disagreement.
|
| Then imagine you're not just listening to the conversation,
| but you're participating in it. That seems like a pretty
| amazing way to learn.
| OutOfHere wrote:
| Lookup podgenai.
| narrationbox wrote:
| A lot of our customers use us [0] for that, it works pretty
| well if executed properly. The voiceovers work best as
| inserts into an existing podcast. If you see the articles of
| major news orgs like NYT, they often have a (usually) machine
| narrated voiceover.
|
| [0] https://narrationbox.com
| zoklet-enjoyer wrote:
| I don't like podcasts that are conversations
| tjr wrote:
| I would be interested in seeing an AI developed to listen to
| auto-generated podcasts, removing humans from the equation
| altogether.
| nine_k wrote:
| Of course the whole point would be in adding an acoustic
| side channel imperceptible to humans but affecting the
| listening AI in interesting ways.
| onlyrealcuzzo wrote:
| Lots of people follow bots on Instagram and Twitter, etc.
|
| Why not follow bots on YouTube and Spotify?
| OutOfHere wrote:
| I have been listening to podgenai for the past three+ months.
| The point is to listen selectively to only the topics or
| titles that interest you.
| lxgr wrote:
| Personally, probably not.
|
| I actually quite often wish I could access a condensed
| version of a few podcasts in text form. Sometimes there's
| little nuggets of information dropped by hosts or guests that
| don't make it onto any other medium.
|
| When I do intentionally listen to podcasts (i.e. as opposed
| to having to, because that's the only available form of some
| content), I do so because I enjoy the style of the
| conversation itself.
| dredmorbius wrote:
| I listen to a number of podcasts which are reading books,
| stories, literature, etc. Having a professional actor read a
| text has appeal (e.g., _Selected Shorts_ ), but many are
| less-than-professional. A sufficiently-competent automated
| text-to-speech would fit at least some roles.
|
| There are a few podcasts for which I'd have greater interest
| if the narration were by someone _other_ than the current
| host....
|
| There are also services such as the National Library for the
| Blind (UK) and BARD (US) which provide books, including a
| large number of audiobooks, for the blind. Automated text-to-
| speech would make a vastly larger library available,
| particularly of very recent publications, niche publications,
| and long-since-out-of-print books. Such services _do_ take
| requests, but tend to focus on works published within the
| past five years.
| blueboo wrote:
| What are your favourites? A podcast curating great short
| stories sounds interesting, done well
| dredmorbius wrote:
| "Selected Shorts" is up there. My principle complaint is
| that episodes remain live for only a month or so. If you
| happen to catch an episode you like you'll have to keep
| it downloaded. All but certainly on account of copyright.
|
| Various non-English pods as well, to maintain / increase
| fluency. Germany has a good set via Deutschlandfunk. I've
| found a few in other languages, though tending toward
| advertising-supported, which is less than ideal.
|
| Searching for stories, literature, childrens' stories (a
| surprisingly good way to learn basic vocabulary, grammar,
| and culture), and history in your target language of
| choice tends to be a pretty good guide.
| fallinditch wrote:
| Wondercraft have been offering this service for a while, and
| produce some of their own auto-generated podcasts including the
| Hacker News Recap which does an excellent job of summarizing
| the most engaged posts on HN. https://www.wondercraft.ai/our-
| podcasts
| swyx wrote:
| also for papers there is https://papersread.ai/ which does
| not get nearly enough attention imo (the reading is meh, but
| the curation is ace)
| mmsc wrote:
| This is a bit meta for me. A year ago a website was posted on
| here HN which allowed you to visit a random website with an
| /ideas page. For some reason it would always land me on the
| same website, which outlined something close to this. The
| idea was something like an RSS feed that would summarize all
| the entries in the feed for the day/weekin the form of a
| podcast.
|
| I wonder if that was inspiration for Wondercraft.
| evilkorn wrote:
| I hate the robo voiced videos. I watch a lot of space content
| and run into them often on the homepage. Usually easy to spot
| with low views and 1k subs.
| vletal wrote:
| This sounds too good. It's not too far away from me having a
| hard time wondering "is it just overly scripted corporate PR
| podcast".
| OutOfHere wrote:
| That low-quality stuff has no relation to high-quality AI
| created content.
| OutOfHere wrote:
| It isn't spam. It is the present and the future. Advertising
| however is the spam.
| oidar wrote:
| The voice models for this are very good. I'd love to have
| granular control over the output of a model like this locally.
| willwade wrote:
| Like SSML? See azure tts or google cloud tts, or ibm Watson or
| even old school system tts like SAPI voices on windows. But I
| hear you. In a VITS typical model system ssml isn't standard.
| Piper tts does have it on the roadmap.
| oidar wrote:
| I just want programmable prosody. Prosodic controls would
| allow much more believable TTS - apple used to have it on the
| earlier TTS models, but these new TTS models sound so natural
| at the phoneme level, but the prosody is often jacked up so
| that it's easily identifiable as artificial.
| smusamashah wrote:
| Is that audio all generated? All the pauses, breaths, speed ups
| and everything?
| TranquilMarmot wrote:
| From the "Help" modal:
|
| "Illuminate is an experimental technology that uses AI to adapt
| content to your learning preferences. Illuminate generates
| audio with two AI-generated voices in conversation, discussing
| the key points of select papers. Illuminate is currently
| optimized for published computer science academic papers.
|
| As an experimental product, the generated audio with two AI-
| generated voices in conversation may not always perfectly
| capture the nuances of the original research papers. Please be
| aware that there may be occasional errors or inconsistencies
| and that we are continually iterating to improve the user
| experience."
| smusamashah wrote:
| Wow. I did not pick anything in the voice as a clue that it's
| generated. So does it make it current best text to audio
| system?
| Legend2440 wrote:
| I don't know if Google's specifically is the best, but
| these new GenAI-based text-to-speech systems blow away
| everything else.
| achow wrote:
| GCP's text to speech options, equally amazing
|
| https://cloud.google.com/text-to-speech/docs/voice-types#cha...
| colesantiago wrote:
| So podcasts are now automated, anything with a speaker or a
| screen is now assumed to be not human.
|
| Is this supposed to be a good thing that we want to accelerate
| (e/acc) towards?
| consf wrote:
| I think it depends on how we balance AI innovation with
| preserving human elements in mdia
| Jeff_Brown wrote:
| If can tell where content came from, it's fine with me. If a
| host of paid spammers or bots can astroturf an opinion and fool
| me into thinking they are a wide demographic, that's a problem.
| And it is -- but it predates LLMs.
| thisoneworks wrote:
| I honestly don't think this is all that big. What we are seeing
| has been possible for more than 6 months now(?) with gpt4 and
| elevenlabs, its just put together in a nice little demo website
| and with what seems like a multi-modal model(?) trained on
| nytimes the daily episodes lol. And no i don't think this will
| gain all that much traction. We will keep valuing authentic
| human interaction more and more.
| throwthrowuknow wrote:
| Man, it's going to blow your mind when you realize that all the
| talking heads aren't real and never were.
| drivers99 wrote:
| like Max Headroom
| bluelightning2k wrote:
| This is really cool. Although I wouldn't put money on a Google
| project sticking around even if it was a full fledged product!
|
| More of a tech demo than anything else.
|
| What's wild about this is that the voices seem way better than
| GCP's TTS that I've seen. Any way to get those voices as an API?
| bluelightning2k wrote:
| Self-answer but leaving in case anyone else has the same
| question... seems there are some new options in GCP TTS. Both
| "studio" and "jorney" are new since I last checked (and I check
| pretty often).
| dlisboa wrote:
| One problem I see with this is legitimizing LLM-extracted content
| as canon. The realistic human speech masks the fact that the LLM
| might be hallucinating or highlighting the wrong parts of a
| book/paper as important.
| gs17 wrote:
| We'll have to see how it holds up for general books. The books
| they highlighted are all very old and very famous, so the
| training set of whatever LLM they use definitely has a huge
| amount of human-written content about them, and the papers are
| all relatively short.
| shmatt wrote:
| The top list of Apple Podcasts is full of real humans
| intentionally lying or manipulating information, it makes me
| worry much less about computer generated lies
| dlisboa wrote:
| Even if society is kinda collapsing that way people are still
| less likely to listen to a random influencer's review of
| biochemistry than a Professor in Biochemistry. These LLMs
| know just as much about the topic they're summarizing as a
| toddler, they should be treated with just as much skepticism.
|
| There are hacks everywhere but humans lying sometimes have
| implications (libel/slander) that we can control. Computers
| are thought of in general society as devoid of bias and
| "smart" so if they lie people are more likely to listen.
| vanishingbee wrote:
| Happens in the very first example:
|
| [Attention is All You Need - 1:07]
|
| > Voice A: How did the "Attention is All You Need" paper
| address this sequential processing bottleneck of RNNs?
|
| > Voice B: So, instead of going step-by-step like RNNs, they
| introduced a model called the Transformer - hence the title.
|
| What title? The paper is entitled "Attention is All You Need".
|
| People are fooling themselves. These are stochastic parrots
| cosplaying as academics.
| aanet wrote:
| I had the same exact thought - "Did this summary mis-
| represent the title??" Indeed, it did. However, I thought the
| end2end implementation was decent.
|
| > These are stochastic parrots cosplaying as academics.
|
| LOL
| IanCal wrote:
| It then goes on to explain right afterwards that the key
| thing the transformer does is rely on a mechanism called
| attention. It makes more sense in that context IMO.
| wyldfire wrote:
| I recently listened to this great episode of "This American
| Life" [1] which talked about this very subject. It was
| released in June 2023 which might be ancient history in terms
| of AI. But it discusses whether LLMs are just parrots and is
| a nice episode intended for general audiences so it is pretty
| enjoyable. But experts are interviewed so it also seems
| authoritative.
|
| [1] https://www.thisamericanlife.org/803/greetings-people-of-
| ear...
| nine_k wrote:
| Frankly, humans also sometimes remember things incorrectly or
| pay excess attention to the less significant topics while
| discussing a book.
|
| In this regard, LLMs are imperfect like ourselves, just to a
| different extent.
| consf wrote:
| Can podcasts creators benefit from this tool? I think so...
| alganet wrote:
| Cool tech. Now we know that very soon no one will be able to
| trust podcasts or video narration.
| Legend2440 wrote:
| You shouldn't have been trusting podcasts in the first place,
| Joe Rogan says plenty of false things no AI required.
| lelandfe wrote:
| Sure, but now now I - an idiot - can publish a podcast on...
| "Bayesian Multilevel Models," and fool almost everyone into
| thinking I know anything about it.
|
| I've seen YouTubers provide tutorials on auto-creating
| YouTube videos and podcast episodes on niche scientific
| subjects, on how to build seemingly-reputable brands with
| _zero_ ongoing effort. That is all totally novel. Being able
| to lie or be wrong before is orthogonal to the real issue:
| scale.
| alganet wrote:
| Scale has already been achieved with money (advertisement
| revenue) and influence (politics agendas, fame) on a viral
| platform.
|
| What this tech brings is speed. If Google did it, someone
| else will also do it.
| throwthrowuknow wrote:
| All the more reason to empower people to review, rate,
| comment on, block, downvote, and otherwise signal when
| something is incorrect.
| alganet wrote:
| It takes time for humans to say false things, record and edit
| them.
|
| This tech can allow "content creators" to spin hundreds of
| podcasts with garbage simultaneously, saturating the search
| space with nonsense. Similar to what is already being done
| with text everywhere.
|
| What makes one skeptic regarding conspiracionist ideas is
| access and visibility to more enlightened content. If that
| access gets disrupted (it already has been), many people will
| not be able to tell the difference, specially future
| generations.
| dgellow wrote:
| Really impressive. The podcasting spam we will get from this will
| be a pain, but really impressive demo
| jhickok wrote:
| I honestly think it could be the opposite, and we will have
| entire high-quality works of fiction at our fingertips.
| nxobject wrote:
| A related experiment from Google: NotebookLM
| (notebooklm.google.com), which takes a group of documents and
| provides a RAG Gemini chatbot in return.
|
| I wish Google would make these experiments more well-known!
| timmg wrote:
| You also might find a similar feature arriving in that
| product.. soon.
| ansk wrote:
| Imagine reading a math or programming textbook where each
| statement was true with probability 0.95.
| sno129 wrote:
| Plenty of mistakes in textbooks and research articles, it's
| possible the probability is already even lower.
| slashdave wrote:
| That just means you are adding errors on top of existing
| ones, hardly an improvement
| throwthrowuknow wrote:
| errata. Also real humans often make mistakes in live
| interviews. The biggest difference is that eventually these
| fake humans will have lower error rates than real ones.
| contagiousflow wrote:
| > eventually these fake humans will have lower error rates
| than real ones
|
| Source?
| danesparza wrote:
| I wonder how soon until this waitlisted service eventually gets
| thrown on the trash heap that Google Reader is on.
|
| Building trust with your users is important, Google.
| syntaxing wrote:
| I've been using the ElevenLabs Reader app to read some articles
| during my drive and it's been amazing. It's great to be able to
| listen to Money Stuff whenever I want to. The audio quality is
| about 90% there. Occasionally, the tone of the sentence is wrong
| (like surprised when it should be sad) and the wrong enunciation
| (bow, like bowing down or tying a bow) but still very listenable.
| bogwog wrote:
| What does this accomplish? Who does this help? How does this make
| the world a better place?
|
| This only seems like it would be useful for spammers trying to
| game platforms, which is silly because spam is probably the
| number one thing bringing down the quality of Google's own
| products and services.
| nonrandomstring wrote:
| I think I just discovered a new emotion. Simultaneous feelings of
| excitement and disappointment.
|
| No matter how great the idea, it's hard to stay excited for more
| than a few microseconds at the sight of the word "Google". I can
| already hear the gravediggers shovels preparing a plot in the
| Google graveyard, and hear the sobs of the people who built their
| lives, workflows, even jobs and businesses around something that
| will be tossed aside as soon as it stops being someone's pet
| play-thing at Google.
|
| A strange ambivalent feeling of hope already tarnished with
| tragedy.
| srameshc wrote:
| We are working on something content driven (for an ad or
| subscription model) with lot of effort and time and I am
| concerned how this technology will affect all that effort and
| eventually monetization ideas. But I can see how helpful this
| tool can be for learning new stuff.
| timonoko wrote:
| Works surprisingly well. I actually bothered to listen
| "discussions" about these boring-looking papers.
|
| English is particularly bad to read aloud because it is like
| programming language Fortran based on immutable tokens. If you
| want tonal variety, you have to understand the content.
|
| Some other languages modify the tokens themselves, so just one
| word can be pompous, comical, uneducated etc.
| albert_e wrote:
| the player always starts at 30:00 for me and plays a 4 to 7
| minute cllip that seems complete but very brief
| Ninjinka wrote:
| the Lexification/Roganization/Dwarkeshing/Hubermanning of reading
| srik wrote:
| Nothing is real anymore.
| airstrike wrote:
| Might as well dive into the deep end of the metaverse
| kornhole wrote:
| AKA fake and gay
| bitshiftfaced wrote:
| Occasionally there's a podcast or video I'd like to listen to,
| but one of the voices is either difficult to understand, or in
| some way awful to listen to, or maybe the sound quality is really
| bad. It would be nice to have a an option for an automatically
| redubbed audio.
| wintermutestwin wrote:
| I sure do wish podcasters would learn about compression. I am
| constantly getting my ears blown out in the car from a podcast
| with multiple speakers who are at different volumes.
| swyx wrote:
| podcaster here. what does compression have to do with it?
| youre just talking about different levels from diff mics
| semi-extrinsic wrote:
| Probably a lot of the problem GP is describing comes from
| people having inconsistent distance to their microphone,
| moving around a lot. Then using an audio compressor effect
| plugin is an appropriate answer.
|
| I've often thought about adding a compressor pedal to my TV
| sound system. It would be excellent for when you're
| watching action movies with hard to hear dialogue mixed
| with loud noises, and the kids are asleep, so you spend the
| evening turning volume up and down eight times per minute.
| drivers99 wrote:
| Setting the levels equally to start would help, but doesn't
| control when someone suddenly gets loud. With compression,
| you can increase quiet sounds, decrease loud sounds, or
| both.
|
| https://en.wikipedia.org/wiki/Dynamic_range_compression
|
| A type of compressor used to limit the maximum signal is a
| limiter. "Limiters are common as a safety device in live
| sound and broadcast applications to prevent sudden volume
| peaks from occurring."
|
| https://en.wikipedia.org/wiki/Limiter
| fabmilo wrote:
| so much pleasantry so much fluff. reduce the noise. get to the
| point.
| ants_everywhere wrote:
| This is a good idea and well executed. I think the hard part now
| is pointing it in an appropriate direction.
|
| If it's just used for generating low quality robo content like we
| see on TikTok and YouTube then it's not so interesting.
| RobMurray wrote:
| I couldn't listen for more than a couple of minutes. It's the
| usual repetitive, over wordy llm generated drivel.
| franze wrote:
| Oh, another Google Waitlist...
| SeanAnderson wrote:
| I'm fairly excited for this use case. I recently made the switch
| from Audible to Libby for my audiobook needs. Overall, it's been
| good/fine, but I get disappointed when the library only has text
| copies of a book I want to listen to. Often times they aren't
| especially popular books so it seems unlikely they'll get a
| voiceover anytime soon. Using AI to narrate these books will
| solve a real problem I experience currently :)
| banach wrote:
| I can see this working reasonably for text that you can
| understand without referring to figures, and for texts for which
| there is external content available that such a conversation
| could be based on. For a new, say, math paper, without prose
| interspersed, I'd be surprised if the generated conversation will
| be worth much. On the other hand, that is a corner case and,
| personally, I suspect I will be using this for the many texts
| where all I need is a presentation of the material that is easy
| to listen to.
| aanet wrote:
| What a fantastic idea! Great way to learn about those pesky
| research papers I keep downloading (but never get to reading
| them). I tried a few, e.g. Attention is All You Need, etc. The
| summary was fantastic, and the discussion was, well, informative.
|
| Does anyone know how the summary was generated? (text
| summarization, I suppose?) Is there a bias towards "podcast-style
| discussion"? Not that I'm complaining about it - just that I
| found it helpful.
| oulipo wrote:
| Why not, if you could also interject with questions, remarks, or
| "cut the chase" like remarks.
|
| Also it's weird that they focus only on AI papers in the demo,
| and not more interesting social stuff, like environment
| protection, climate change, etc
| ftmch wrote:
| Guess they want to avoid any political backlash that could
| arise from topics like that, which will happen inevitably.
| sandspar wrote:
| Google's fingers get burned whenever it lets its AI touch
| social topics.
| leobg wrote:
| I made something like this for my kids:
|
| 1. Take a science book. I used one Einstein loved as a kid, in
| German. But I can also use Asimov in English. Or anything else.
| We'll handle language and outdated information on the LLM level.
|
| 2. Extract the core ideas and narrative with an LLM and rewrite
| it into a conversation, say, between a curious 7 year old girl
| and her dad. We can take into account what my kids are interested
| in, what they already know, facts from their own life,
| comparisons with their surroundings etc. to make it more
| engaging.
|
| 3. Turn it into audio using Text-to-Speech (multiple voices).
| flakiness wrote:
| How do you get the source data (text) from a book? To me it is
| the major roadblock for LLM-based commercial content
| consumption.
| leobg wrote:
| Old books are on Gutenberg, archive.org etc.
|
| Physical ones, I scan. Cutting the spine is easiest. But
| today you can also just take pics with your phone.
|
| Many retailers also sell EPUB. Which is just HTML.
|
| Obviously, that's all for private consumption only. (Unless
| you're OpenAI I guess. :-P)
| flakiness wrote:
| Oh you gotta serious! Salute to you from a lazy dad.
| antirez wrote:
| Related: [rumors] Audible is starting a pilot project to do just
| that with the ebooks.
| lxgr wrote:
| At this point, this is seems more like a question of "how
| soon", not if.
| OutOfHere wrote:
| Can it make something bigger than 5 minutes?
| Analemma_ wrote:
| Books I can understand, but I'm genuinely curious: would anyone
| here find it useful to hear scientific papers as narrated audio?
| Maybe it depends on the field, but when I read e.g. an ML paper,
| I almost always have to go through it line-by-line with a pen and
| scratchpad, jumping back and forth and taking notes, to be sure
| I've actually "got it". Sometimes I might read a paragraph a
| dozen times. I can't see myself getting any value out of this,
| but I'm interested if others would find it useful.
| creativenolo wrote:
| I'm not sure "hear scientific papers as narrated audio" best
| describes what this is. From the link:
|
| > Illuminate generates audio with two AI-generated voices in
| conversation, discussing the key points of select papers.
| motoxpro wrote:
| This is insane! To be able to listen to a conversation to learn
| about any topic is amazing. Maybe it's just me because I listen
| to so many podcasts but this is Planet Money or The Indicator
| from NPR about anything.
|
| Definitely one of the coolest things I have seen an LLM do.
| vincentpants wrote:
| Listening to an AI generated discussion-based podcast on the
| topic of anticipating the scraping of deceased people's digital
| footprint to create an AI copy of your loved one makes the cells
| that make up my body want to give up on fighting entropy.
| alenwithoutproc wrote:
| it would be really _cool if we'd have a clubhouse-style gen-ai
| feed for hn or reddit comments to listen to.
|
| _ to me
| belval wrote:
| I guess I am in my grouchy old person phase but all I could think
| of what the Gilfoyle quote from Silicon Valley when presented
| with a talking refrigerator.
|
| > "Bad enough it has to talk, does it need fake vocal tics...?" -
| Gilfoyle
|
| Found it: https://youtu.be/APlmfdbjmUY?si=b4-rgkxeXigU_un_&t=179
| drivers99 wrote:
| I would want to select a voice without vocal fry, which one of
| the voices in these demos has.
| layman51 wrote:
| Did anyone else notice that according to the generation info,
| each recording was created on 12/31/69 at 4:00 PM?
| oneepic wrote:
| That lines up with 1/1/70 0:00 UTC, but that's also hilarious.
| e12e wrote:
| Interesting - listening to the first example (Attention is all
| you need)[1] - I wonder what illuminate would make of Fielding's
| REST thesis?
|
| [1] https://illuminate.google.com/home?pli=1&play=SKUdNc_PPLL8
| CatWChainsaw wrote:
| So it will immediately be trashed by GenAI bullshit and
| killedbygoogle within three years, right?
| elashri wrote:
| One useful use case would be helping making academic papers more
| accessible. It would be useful also for people to listen to arxiv
| papers that seems interesting. It would be useful tool in
| academic world. Also useful for students who would have more
| accessible form of learning.
|
| I have a project idea already to use arxiv RSS API to fetch
| interesting papers based on keywords (or some LLM summary) and
| then pass it to something like illuminate and then you have a
| listening queue to follow latest in the field. Though there will
| be some problems with formatting but then you could just open the
| pdf to see the plots and equations.
| yismail wrote:
| I got in the beta a couple weeks ago and tried it out on some
| papers [0]
|
| [0] https://news.ycombinator.com/item?id=41020635
| ElijahLynn wrote:
| I've been meaning be the all you need is attention paper for
| yours and never have. And I finally listened to that little
| generated interview as their first example. I think this is going
| to be very very useful to me!
| yunohn wrote:
| I listened to multiple demos, the pauses and vocal intonations
| sound so fake. They're inserted at odd times that a real human
| speaker would not.
___________________________________________________________________
(page generated 2024-09-10 23:00 UTC)