[HN Gopher] ArXiv Papers as Audiobooks
       ___________________________________________________________________
        
       ArXiv Papers as Audiobooks
        
       Author : Acsmaggart
       Score  : 73 points
       Date   : 2024-03-15 20:00 UTC (3 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | Acsmaggart wrote:
       | I had been daydreaming a couple of weeks ago about being able to
       | listen to papers while driving or doing repetitive tasks, and it
       | looks like there is now a YouTube channel where these get posted:
       | 
       | https://www.youtube.com/@ArxivPapers
       | 
       | The pipeline seems to do a pretty good job of cleaning up the
       | writing too, some ArXiv papers are a little rough.
       | 
       | (I'm not the project owner)
        
       | se4u wrote:
       | Many years ago, I did that when I had a large paper reviewing
       | load during my phd. My solution was simply to purchase an app
       | called SayIt for like a dollar that read the pdf to me, worked
       | really well.
       | 
       | Nowadays I often pass the pdf through LLMs to get personalize
       | (expand on jargon or contract the verbiage) and then read them.
       | That gives me a better return on time spent.
        
       | Uehreka wrote:
       | When I've tried listening to YouTube videos explaining, say,
       | Attention Is All You Need, I find that I cannot do it passively
       | at all. The first 10 or so minutes I'm nodding along, folding
       | laundry or doing dishes, then the presenter says something like
       | "by reifying this tensor against the priors I was just talking
       | about, we're able to--" and I have to pause, rewind a couple
       | minutes, grab a piece of paper and actually engage with what's
       | going on.
       | 
       | I have to imagine listening to raw papers (not even someone like
       | Andrei Karpathy interpreting and presenting it) would be even
       | more difficult. I don't know if there's an easy way to passively
       | consume academic literature at all. If it's important stuff, it
       | will usually be pretty challenging.
        
         | vibrio wrote:
         | I came to post essentially this. I could listen to review
         | articles in a area I'm familiar with, but listening to primary
         | papers could never work for me.
        
         | reader5000 wrote:
         | Visual reading of dense papers also leads to failing to
         | understand concepts or distractions.
        
           | chaxor wrote:
           | This is a great point. People will complain if LMs are
           | applying to anything, but ultimately it improves
           | accessibility, and allows for someone to dive deeper when
           | needed.
           | 
           | There will always be ways to misinterpret some academic work,
           | and there are plenty of opportunities in the path of
           | understanding a work to do that.
           | 
           | Allowing someone to engage with a work _at all_ by lifting
           | some barriers (visually impaired people's for exampld) should
           | be acknowledged as an improvement, not discouraged
           | continually for having some bugs.
        
         | beacon294 wrote:
         | I use a combined approach of pre listening then reading the
         | technical writing later sometimes
        
         | rhelz wrote:
         | Just listening while doing nothing else is soporific, but I can
         | imagine finding this invaluable if I had a long commute to
         | work.
        
         | chaxor wrote:
         | There is definetly a way to make this happen though. Little bit
         | o' whisper, Mixtral in some RAG, and you've got yourself a
         | buddy to talk about the paper while it's reading it to you.
         | 
         | Of course everyone will immediately say this is dangerous and
         | it may mislead you by giving wrong explanations, etc etc. and
         | then others will counter with 'it will definitely get better
         | over time' (the best models as products are ~3 years behind the
         | improvements being show in academic work for example). However,
         | ultimately this is just a neat product to make, even if it has
         | some bugs. Listening to TTS right now spends about half the
         | time reading jumbled numbers from tables and listing off author
         | names. So just tackling that alone (which this would do much
         | better) would be valuable.
        
       | julienchastang wrote:
       | I am still trying to understand this, but it seems like the
       | potential here is tremendous. For example, you can imagine
       | producing audio tailored to the sophistication of the reader
       | where a layperson may wish a more basic interpretation than a
       | subject-matter expert. Really looking forward to seeing where
       | this goes for the dissection and understanding of scientific
       | publications.
        
       | neuronexmachina wrote:
       | The LLM prompts are pretty interesting, e.g.:
       | https://github.com/imelnyk/ArxivPapers/blob/main/gpt/utils.p...
       | 
       | > "You are an ArXiv paper audio paraphraser. Your primary goal is
       | to rephrase the original paper content while preserving its
       | overall meaning and structure, but simplifying along the way, and
       | make it easier to understand. In the event that you encounter a
       | mathematical expression, it is essential that you verbalize it in
       | straightforward nonlatex terms, while remaining accurate, and in
       | order to ensure that the reader can grasp the equation's meaning
       | solely through your verbalization. Do not output any long latex
       | expressions, summarize them in words."
        
         | nicklecompte wrote:
         | There's no way the prompt actually works, though. LLMs are not
         | able to reliably "preserve the overall meaning" of things
         | unless they're doing direct technical translation. The problem
         | is going to be even worse with original research, because the
         | LLM will try to summarize according to _old_ ideas from blog
         | posts  / etc in its training data, and not the _new_ ideas in
         | the original research. In general document summarization is one
         | of the _worst_ use cases for LLMs, both in terms of its
         | reliability and the difficulty of finding errors - how would
         | you know without reading the paper? I would be surprised if
         | this prompt worked on a _single_ honest[1] paper that was
         | written after the LLM was pretrained.
         | 
         | The bit about translating LaTeX expressions into human-
         | comprehensible math sentences is interesting and AFAIK should
         | work on something like GPT-4. But that's just a case of
         | technical translation. GPT-4 _definitely cannot_ "rephrase the
         | overall paper... simplifying along the way." GPT-4 can't even
         | summarize corporate reports without screwing up facts and
         | figures - why on earth would you try to use it to summarize new
         | scientific research?
         | 
         | Stuff like this is why I'm so concerned about LLMs: this prompt
         | doesn't work, and people using AI for this stuff is just
         | automating ignorance. Very frustrating.
         | 
         | [1] I say "honest" because this prompt would probably do ok on
         | stuff coming out of a paper mill - the problem is _carefully
         | stated original ideas._ GPT tears original ideas to shreds.
        
       | calebkaiser wrote:
       | I started working on a version of this just the other night--
       | thank you for saving me the time! This is awesome.
        
       | josh-sematic wrote:
       | https://www.listening.com/ does this as a service.FWIW I haven't
       | tried it myself.
       | 
       | Edit: looks like they support a few traditional publishers as
       | well.
        
         | ipsum2 wrote:
         | Whatever they're using for text to speech is rough. Probably
         | using an open source model. The one used in OP (Google's) is a
         | lot more listenable.
        
         | Nowado wrote:
         | Last time I tried it, app literally just read papers. As in
         | parsed arxiv pdfs text2speech. It was an awful misunderstanding
         | of the medium. Unless it was rebuilt significantly over last
         | months, it's just bad.
        
           | adi4213 wrote:
           | We built Oration (https://oration.app) to improve on issues
           | like this. It also generates a summarized version
        
         | adi4213 wrote:
         | Give oration (https://oration.app) a try! It's cheaper and many
         | of our users found it a better option than Listening
        
       | mathgradthrow wrote:
       | Audiobooks make sense for thibgs which are communicated as fast
       | as speech. Like stories.
        
       | Almondsetat wrote:
       | Papers are already difficult to process when reading them
       | carefully multiple times, what even is the point of turning them
       | into an audio version? I am genuinely at a loss, unless we are
       | talking about blind people
        
         | julienchastang wrote:
         | The YouTube Channel may shed some light. As I understand this,
         | it is not reading the paper, but interpreting or summarizing it
         | with visual cues as to which section it is analyzing.
        
           | Almondsetat wrote:
           | I still don't get the purpose. If you have a video to watch
           | it's not an audiobook anymore. Secondly, why not just read
           | the abstract? The paper might contain formulas (need to be
           | carefully read to understand) and data (need to be carefully
           | read to understand). If you strip the paper of its scientific
           | elements then only a series of badly justified steps remain,
           | at which point you might as well just consider the abstract +
           | conclusions paragraphs
        
             | chaxor wrote:
             | What if you want to hear about the latest arxiv updates
             | while on your morning run?
             | 
             | This seems like a fantastic idea for that purpose.
        
             | julienchastang wrote:
             | The choice of the word "audiobook" is really unfortunate.
             | That's never mentioned on the GitHub project page. I find
             | LLMs to do a decent job of summarizing text. Obviously, it
             | depends on the audience. If it is a subject-matter expert,
             | they may not be happy with the result, but a layperson
             | might be.
        
       | neuronexmachina wrote:
       | It'd be interesting to also have these generate a slide
       | presentation explaining a paper via some combination of
       | presentation markdown, MermaidJS, and an image generator.
        
       | mrkramer wrote:
       | I had a similar idea but what happens when you stumble upon code,
       | equations, tables, graphs etc.? Can LLM understand that as well?
       | 
       | For example; you are listening to the paper with some text2speech
       | model and then it stumbles open code snippet or table or
       | graph....what should happen next? Should model skip it or prompt
       | you to look at the graph or table or whatever. Or should you
       | write some software that tries to interpret graphs and other non-
       | text content.
        
       | pulpfictional wrote:
       | I've been looking for a good way to TTS longer PDFs and EPUBs
       | into recordings so I may listen to them on the go. I'd like to
       | take advantage of high quality TTS models but I'd prefer it to be
       | one I may host myself.
       | 
       | Haven't found the right way yet, I'm considering:
       | https://github.com/MycroftAI/mimic3
        
         | adi4213 wrote:
         | If you're an iOS user, try https://oration.app
        
         | hagbard_c wrote:
         | I use Librera Reader [1] for this, it handles both ePub as well
         | as PDF and then some. The quality of the TTS output is
         | dependent on what you have on your (Android) device since that
         | is what it uses. I tend to use Google's TTS with a male UK
         | voice which I tune down (as in deeper voice) and speed up a
         | bit. It mostly works fine, probably better for nonfiction than
         | fiction but that is what I mostly use it for anyway. You can
         | swap between reading on-screen and listening since it keeps
         | position in the document while reading aloud.
         | 
         | [1] https://f-droid.org/en/packages/com.foobnix.pro.pdf.reader/
        
       ___________________________________________________________________
       (page generated 2024-03-15 23:00 UTC)