[HN Gopher] ArXiv Papers as Audiobooks
___________________________________________________________________
ArXiv Papers as Audiobooks
Author : Acsmaggart
Score : 73 points
Date : 2024-03-15 20:00 UTC (3 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| Acsmaggart wrote:
| I had been daydreaming a couple of weeks ago about being able to
| listen to papers while driving or doing repetitive tasks, and it
| looks like there is now a YouTube channel where these get posted:
|
| https://www.youtube.com/@ArxivPapers
|
| The pipeline seems to do a pretty good job of cleaning up the
| writing too, some ArXiv papers are a little rough.
|
| (I'm not the project owner)
| se4u wrote:
| Many years ago, I did that when I had a large paper reviewing
| load during my phd. My solution was simply to purchase an app
| called SayIt for like a dollar that read the pdf to me, worked
| really well.
|
| Nowadays I often pass the pdf through LLMs to get personalize
| (expand on jargon or contract the verbiage) and then read them.
| That gives me a better return on time spent.
| Uehreka wrote:
| When I've tried listening to YouTube videos explaining, say,
| Attention Is All You Need, I find that I cannot do it passively
| at all. The first 10 or so minutes I'm nodding along, folding
| laundry or doing dishes, then the presenter says something like
| "by reifying this tensor against the priors I was just talking
| about, we're able to--" and I have to pause, rewind a couple
| minutes, grab a piece of paper and actually engage with what's
| going on.
|
| I have to imagine listening to raw papers (not even someone like
| Andrei Karpathy interpreting and presenting it) would be even
| more difficult. I don't know if there's an easy way to passively
| consume academic literature at all. If it's important stuff, it
| will usually be pretty challenging.
| vibrio wrote:
| I came to post essentially this. I could listen to review
| articles in a area I'm familiar with, but listening to primary
| papers could never work for me.
| reader5000 wrote:
| Visual reading of dense papers also leads to failing to
| understand concepts or distractions.
| chaxor wrote:
| This is a great point. People will complain if LMs are
| applying to anything, but ultimately it improves
| accessibility, and allows for someone to dive deeper when
| needed.
|
| There will always be ways to misinterpret some academic work,
| and there are plenty of opportunities in the path of
| understanding a work to do that.
|
| Allowing someone to engage with a work _at all_ by lifting
| some barriers (visually impaired people's for exampld) should
| be acknowledged as an improvement, not discouraged
| continually for having some bugs.
| beacon294 wrote:
| I use a combined approach of pre listening then reading the
| technical writing later sometimes
| rhelz wrote:
| Just listening while doing nothing else is soporific, but I can
| imagine finding this invaluable if I had a long commute to
| work.
| chaxor wrote:
| There is definetly a way to make this happen though. Little bit
| o' whisper, Mixtral in some RAG, and you've got yourself a
| buddy to talk about the paper while it's reading it to you.
|
| Of course everyone will immediately say this is dangerous and
| it may mislead you by giving wrong explanations, etc etc. and
| then others will counter with 'it will definitely get better
| over time' (the best models as products are ~3 years behind the
| improvements being show in academic work for example). However,
| ultimately this is just a neat product to make, even if it has
| some bugs. Listening to TTS right now spends about half the
| time reading jumbled numbers from tables and listing off author
| names. So just tackling that alone (which this would do much
| better) would be valuable.
| julienchastang wrote:
| I am still trying to understand this, but it seems like the
| potential here is tremendous. For example, you can imagine
| producing audio tailored to the sophistication of the reader
| where a layperson may wish a more basic interpretation than a
| subject-matter expert. Really looking forward to seeing where
| this goes for the dissection and understanding of scientific
| publications.
| neuronexmachina wrote:
| The LLM prompts are pretty interesting, e.g.:
| https://github.com/imelnyk/ArxivPapers/blob/main/gpt/utils.p...
|
| > "You are an ArXiv paper audio paraphraser. Your primary goal is
| to rephrase the original paper content while preserving its
| overall meaning and structure, but simplifying along the way, and
| make it easier to understand. In the event that you encounter a
| mathematical expression, it is essential that you verbalize it in
| straightforward nonlatex terms, while remaining accurate, and in
| order to ensure that the reader can grasp the equation's meaning
| solely through your verbalization. Do not output any long latex
| expressions, summarize them in words."
| nicklecompte wrote:
| There's no way the prompt actually works, though. LLMs are not
| able to reliably "preserve the overall meaning" of things
| unless they're doing direct technical translation. The problem
| is going to be even worse with original research, because the
| LLM will try to summarize according to _old_ ideas from blog
| posts / etc in its training data, and not the _new_ ideas in
| the original research. In general document summarization is one
| of the _worst_ use cases for LLMs, both in terms of its
| reliability and the difficulty of finding errors - how would
| you know without reading the paper? I would be surprised if
| this prompt worked on a _single_ honest[1] paper that was
| written after the LLM was pretrained.
|
| The bit about translating LaTeX expressions into human-
| comprehensible math sentences is interesting and AFAIK should
| work on something like GPT-4. But that's just a case of
| technical translation. GPT-4 _definitely cannot_ "rephrase the
| overall paper... simplifying along the way." GPT-4 can't even
| summarize corporate reports without screwing up facts and
| figures - why on earth would you try to use it to summarize new
| scientific research?
|
| Stuff like this is why I'm so concerned about LLMs: this prompt
| doesn't work, and people using AI for this stuff is just
| automating ignorance. Very frustrating.
|
| [1] I say "honest" because this prompt would probably do ok on
| stuff coming out of a paper mill - the problem is _carefully
| stated original ideas._ GPT tears original ideas to shreds.
| calebkaiser wrote:
| I started working on a version of this just the other night--
| thank you for saving me the time! This is awesome.
| josh-sematic wrote:
| https://www.listening.com/ does this as a service.FWIW I haven't
| tried it myself.
|
| Edit: looks like they support a few traditional publishers as
| well.
| ipsum2 wrote:
| Whatever they're using for text to speech is rough. Probably
| using an open source model. The one used in OP (Google's) is a
| lot more listenable.
| Nowado wrote:
| Last time I tried it, app literally just read papers. As in
| parsed arxiv pdfs text2speech. It was an awful misunderstanding
| of the medium. Unless it was rebuilt significantly over last
| months, it's just bad.
| adi4213 wrote:
| We built Oration (https://oration.app) to improve on issues
| like this. It also generates a summarized version
| adi4213 wrote:
| Give oration (https://oration.app) a try! It's cheaper and many
| of our users found it a better option than Listening
| mathgradthrow wrote:
| Audiobooks make sense for thibgs which are communicated as fast
| as speech. Like stories.
| Almondsetat wrote:
| Papers are already difficult to process when reading them
| carefully multiple times, what even is the point of turning them
| into an audio version? I am genuinely at a loss, unless we are
| talking about blind people
| julienchastang wrote:
| The YouTube Channel may shed some light. As I understand this,
| it is not reading the paper, but interpreting or summarizing it
| with visual cues as to which section it is analyzing.
| Almondsetat wrote:
| I still don't get the purpose. If you have a video to watch
| it's not an audiobook anymore. Secondly, why not just read
| the abstract? The paper might contain formulas (need to be
| carefully read to understand) and data (need to be carefully
| read to understand). If you strip the paper of its scientific
| elements then only a series of badly justified steps remain,
| at which point you might as well just consider the abstract +
| conclusions paragraphs
| chaxor wrote:
| What if you want to hear about the latest arxiv updates
| while on your morning run?
|
| This seems like a fantastic idea for that purpose.
| julienchastang wrote:
| The choice of the word "audiobook" is really unfortunate.
| That's never mentioned on the GitHub project page. I find
| LLMs to do a decent job of summarizing text. Obviously, it
| depends on the audience. If it is a subject-matter expert,
| they may not be happy with the result, but a layperson
| might be.
| neuronexmachina wrote:
| It'd be interesting to also have these generate a slide
| presentation explaining a paper via some combination of
| presentation markdown, MermaidJS, and an image generator.
| mrkramer wrote:
| I had a similar idea but what happens when you stumble upon code,
| equations, tables, graphs etc.? Can LLM understand that as well?
|
| For example; you are listening to the paper with some text2speech
| model and then it stumbles open code snippet or table or
| graph....what should happen next? Should model skip it or prompt
| you to look at the graph or table or whatever. Or should you
| write some software that tries to interpret graphs and other non-
| text content.
| pulpfictional wrote:
| I've been looking for a good way to TTS longer PDFs and EPUBs
| into recordings so I may listen to them on the go. I'd like to
| take advantage of high quality TTS models but I'd prefer it to be
| one I may host myself.
|
| Haven't found the right way yet, I'm considering:
| https://github.com/MycroftAI/mimic3
| adi4213 wrote:
| If you're an iOS user, try https://oration.app
| hagbard_c wrote:
| I use Librera Reader [1] for this, it handles both ePub as well
| as PDF and then some. The quality of the TTS output is
| dependent on what you have on your (Android) device since that
| is what it uses. I tend to use Google's TTS with a male UK
| voice which I tune down (as in deeper voice) and speed up a
| bit. It mostly works fine, probably better for nonfiction than
| fiction but that is what I mostly use it for anyway. You can
| swap between reading on-screen and listening since it keeps
| position in the document while reading aloud.
|
| [1] https://f-droid.org/en/packages/com.foobnix.pro.pdf.reader/
___________________________________________________________________
(page generated 2024-03-15 23:00 UTC)