[HN Gopher] Show HN: ML paper podcast generator using GPT and To...
       ___________________________________________________________________
        
       Show HN: ML paper podcast generator using GPT and Tortoise-TTS
        
       I built a pipeline that turns tweets about ML papers into a
       podcast.  Code's up here. Happy hacking.
       https://github.com/yacineMTB/scribepod
        
       Author : yacine_
       Score  : 55 points
       Date   : 2023-01-27 17:59 UTC (5 hours ago)
        
 (HTM) web link (scribepod.substack.com)
 (TXT) w3m dump (scribepod.substack.com)
        
       | ax8080 wrote:
       | WOW! What do you use to generate voice? It's SO scary similar to
       | real podcasts. I couldn't find it in a minute in the sources.
        
         | ax8080 wrote:
         | and it's funny how sometimes it makes ahhrhrhrhrhhhhh sounds
         | what is the reason behind that?
        
           | nielsinho wrote:
           | It happens quite often with TorToiSe that it collapses in
           | this way. Especially for unseen tokens that wouldn't have
           | appeared in the training data, which likely consisted of a
           | lot of transcribed stuff and read text like audio books.
           | Trying to make it laugh by prompting it with "hahaha" (which
           | you won't really see in mentioned data) often gets you demon
           | and zombie noises.
        
         | yacine_ wrote:
         | That generation uses tortoise-tts. Play.ht has a model called
         | peregrine, I've taken to using a script to call them out. Super
         | cool company & API. I just haven't had time to get my next gen
         | version out.
        
         | qup wrote:
         | Play.ht
        
         | carlbarrdahl wrote:
         | It's making an api request to play.ht:
         | 
         | https://github.com/yacineMTB/scribepod/blob/master/playht.ts...
        
           | windsignaling wrote:
           | I wonder why the title says that it uses Tortoise TTS?
           | 
           | Also interesting that play.ht allows you to clone others'
           | voices.
        
           | tehsauce wrote:
           | How did they get to use the joe rogan voice though? It seems
           | that one isn't public?
        
             | [deleted]
        
             | nielsinho wrote:
             | It uses the TorToiSe TTS model for generation. It's simple
             | to generate conditioning voice latents using short audio
             | samples. Likely transcribed JRE episodes were part of the
             | TorToiSe training data, explaining how it's so good at
             | recreating his voice characteristics in particular.
        
       | pikseladam wrote:
       | I'm quite impressed and also shocked. Just wow! I believe we will
       | find more useful cases like that in the near future.
        
       | carlbarrdahl wrote:
       | Interesting to read the prompts used to generate these
       | conversations:
       | 
       | https://github.com/yacineMTB/scribepod/blob/master/lib/proce...
       | 
       | > Make the dialogue about this as long as possible.
        
         | yacine_ wrote:
         | No more intuitive interpreter than the english language
        
           | LawTalkingGuy wrote:
           | I'm borrowing your podcast prompts and one issue I get is
           | that Bob bounces between an asker of questions and a co-
           | announcer. I'm currently adding a bullet point to your
           | instructions which seems to be working so far.
           | Only Alice has read the facts beforehand, Bob should never
           | mention anything Alice hasn't said first.
           | 
           | I'm playing with this for popular science articles and am
           | using a two-stage process, one to extract the textual claims
           | from the article and another to rank them for the inclusion
           | into the podcast. I found that just "summarize this" boiled
           | down the wrong things - that there were discoveries not what
           | they were for instance.                 Please make a
           | bulleted list of all factual claims in this article, do not
           | summarize or include any opinions or non-objective claims.
           | 
           | and then                 Combine similar or duplicate facts.
           | Rate the facts by importance, objectivity, and checkability,
           | and pick the top six for inclusion into a podcast for already
           | semi-informed viewers.
           | 
           | When fed an article about the JWST, produced these and
           | others:                 The two ancient galaxies were found
           | billions of light-years behind a giant galaxy cluster called
           | Abell 2744.       -Importance: High, the location of the
           | galaxies helps to understand the structure of the universe
           | -Objectivity: High, this is a factual claim that can be
           | confirmed by the telescope's observations
           | -Checkability: High, the location of the galaxies can be
           | verified through scientific data and observations.
           | The two galaxies existed just 350 to 450 million years after
           | the Big Bang.       -Importance: High, this information tells
           | us about the timeline of the universe and when certain
           | galaxies formed       -Objectivity: High, this is a factual
           | claim that is supported by scientific data and research
           | -Checkability: High, the age of the galaxies can be verified
           | through scientific observations and data.
           | 
           | So far I'm just using the rankings myself to manually pick
           | the facts to discuss but I'm going to prompt it to discuss
           | them itself in context.
        
       | renderingprompt wrote:
       | Very impressive
        
       | 0x008 wrote:
       | that is ridiculously funny haha
        
       | georgeburdell wrote:
       | Horrifying. "Believe nothing of what you hear and only half of
       | what you see" has new meaning to me.
        
       | riskable wrote:
       | This awesome, haha! It's already more accurate and informative
       | (with better sound quality) than 90% of the podcasts that exist
       | :D
        
       | TheCaptain4815 wrote:
       | This is quite an amazing use of this tech.
        
       ___________________________________________________________________
       (page generated 2023-01-27 23:01 UTC)