[HN Gopher] Lip Reading as a Service (Read Their Lips by Symphon...
___________________________________________________________________
Lip Reading as a Service (Read Their Lips by Symphonic Labs)
Author : draugadrotten
Score : 25 points
Date : 2024-09-10 17:20 UTC (5 hours ago)
(HTM) web link (www.readtheirlips.com)
(TXT) w3m dump (www.readtheirlips.com)
| echelon wrote:
| Great way to build labeled training data.
|
| User-submitted videos (with audio for STT), user-crafted bounding
| boxes (we might not need these soon), and user-guided RLHF.
|
| The submitted videos are likely diverse, challenging (otherwise
| the human might just do it), and representative of solving actual
| customer problems.
| indoordin0saur wrote:
| Doesn't even need to be user guided. Use videos that have
| audio. You could have one AI that generates a transcript using
| the audio/video and another that watches the video on mute and
| tries to read the lips. Feedback would then be provided by the
| AI that had access to the audio.
| 0cf8612b2e1e wrote:
| I am thinking of the millions of hours of tv news. Presenters
| are almost always going to be the same position in frame and
| may already have high quality transcripts.
| pogue wrote:
| Has anyone tried this with some video where they know what the
| person is saying?
|
| I'd be interested to know how accurate it is, from what angles it
| will read lips at (front facing, side, etc).
|
| Sounds promising if it works well. Imagine all the historical
| videos without sound you could try to finally know what was being
| said.
| bluGill wrote:
| Experienced lip readers are lucky to get half of what is said.
| Better than nothing but not reliable enough for anything and so
| better to use something else if possible.
|
| 'i love you' and 'island view' have the same lip movements is
| the clasical example.
| serf wrote:
| absolutely right.
|
| my mother was a mostly-deaf lip-reader. She needed
| conversational context in order to keep up 'legibly'; and it
| created a lot of fun between the two of us when she would
| come up with an oddball question or comment that had nothing
| to do with the conversation once-in-awhile when her guesses
| failed spectacularly.
|
| With context, though, it's a great tool. She and I used to
| watch crime dramas with the sound off late at night and never
| miss a beat. It feels if you're trying to transcribe
| something that has a lot of structural context the success
| rate is higher than 50%, but I don't know that formally.
|
| It's still a tool I use in conversation. Even with good
| hearing it's tough to hear people in crowded restaurants or
| concert venues, lip-reading helps immensely.
| luma wrote:
| Thinking through some potentially interesting sources for videos
| where two people are talking but we don't know what was said and
| well, I think this is a decent starting point:
| https://www.youtube.com/watch?v=KLcfpU2cubo
|
| Sadly, doesn't work too great in this situation:
|
| > That they didnt go through but i would tell you theyre just a
| chill look at here lets do it chills with all of our great men
| and they look at every chance they go oh do you want to the black
| man well thats my gosh thats my gosh thats my gosh thats my gosh
| thats my gosh thats my gosh thats
| willwade wrote:
| are you kidding? that sounds very trumpian
| tchock23 wrote:
| Has anyone tried this out on Radiohead's "Just" video yet?
| shrubble wrote:
| Wondering how well this will perform on the viral video "Benny
| Lava" and if it will be part of a group of videos used to create
| a synthetic benchmark.
___________________________________________________________________
(page generated 2024-09-10 23:01 UTC)