[HN Gopher] Lip Reading as a Service (Read Their Lips by Symphon...
       ___________________________________________________________________
        
       Lip Reading as a Service (Read Their Lips by Symphonic Labs)
        
       Author : draugadrotten
       Score  : 25 points
       Date   : 2024-09-10 17:20 UTC (5 hours ago)
        
 (HTM) web link (www.readtheirlips.com)
 (TXT) w3m dump (www.readtheirlips.com)
        
       | echelon wrote:
       | Great way to build labeled training data.
       | 
       | User-submitted videos (with audio for STT), user-crafted bounding
       | boxes (we might not need these soon), and user-guided RLHF.
       | 
       | The submitted videos are likely diverse, challenging (otherwise
       | the human might just do it), and representative of solving actual
       | customer problems.
        
         | indoordin0saur wrote:
         | Doesn't even need to be user guided. Use videos that have
         | audio. You could have one AI that generates a transcript using
         | the audio/video and another that watches the video on mute and
         | tries to read the lips. Feedback would then be provided by the
         | AI that had access to the audio.
        
           | 0cf8612b2e1e wrote:
           | I am thinking of the millions of hours of tv news. Presenters
           | are almost always going to be the same position in frame and
           | may already have high quality transcripts.
        
       | pogue wrote:
       | Has anyone tried this with some video where they know what the
       | person is saying?
       | 
       | I'd be interested to know how accurate it is, from what angles it
       | will read lips at (front facing, side, etc).
       | 
       | Sounds promising if it works well. Imagine all the historical
       | videos without sound you could try to finally know what was being
       | said.
        
         | bluGill wrote:
         | Experienced lip readers are lucky to get half of what is said.
         | Better than nothing but not reliable enough for anything and so
         | better to use something else if possible.
         | 
         | 'i love you' and 'island view' have the same lip movements is
         | the clasical example.
        
           | serf wrote:
           | absolutely right.
           | 
           | my mother was a mostly-deaf lip-reader. She needed
           | conversational context in order to keep up 'legibly'; and it
           | created a lot of fun between the two of us when she would
           | come up with an oddball question or comment that had nothing
           | to do with the conversation once-in-awhile when her guesses
           | failed spectacularly.
           | 
           | With context, though, it's a great tool. She and I used to
           | watch crime dramas with the sound off late at night and never
           | miss a beat. It feels if you're trying to transcribe
           | something that has a lot of structural context the success
           | rate is higher than 50%, but I don't know that formally.
           | 
           | It's still a tool I use in conversation. Even with good
           | hearing it's tough to hear people in crowded restaurants or
           | concert venues, lip-reading helps immensely.
        
       | luma wrote:
       | Thinking through some potentially interesting sources for videos
       | where two people are talking but we don't know what was said and
       | well, I think this is a decent starting point:
       | https://www.youtube.com/watch?v=KLcfpU2cubo
       | 
       | Sadly, doesn't work too great in this situation:
       | 
       | > That they didnt go through but i would tell you theyre just a
       | chill look at here lets do it chills with all of our great men
       | and they look at every chance they go oh do you want to the black
       | man well thats my gosh thats my gosh thats my gosh thats my gosh
       | thats my gosh thats my gosh thats
        
         | willwade wrote:
         | are you kidding? that sounds very trumpian
        
       | tchock23 wrote:
       | Has anyone tried this out on Radiohead's "Just" video yet?
        
       | shrubble wrote:
       | Wondering how well this will perform on the viral video "Benny
       | Lava" and if it will be part of a group of videos used to create
       | a synthetic benchmark.
        
       ___________________________________________________________________
       (page generated 2024-09-10 23:01 UTC)