[HN Gopher] Weak supervision to isolate sign language communicat...
       ___________________________________________________________________
        
       Weak supervision to isolate sign language communicators in crowded
       news videos
        
       Author : matroid
       Score  : 27 points
       Date   : 2024-08-14 20:37 UTC (1 days ago)
        
 (HTM) web link (vrroom.github.io)
 (TXT) w3m dump (vrroom.github.io)
        
       | akira2501 wrote:
       | > I believe that we can solve continuous sign language
       | translation convincingly
       | 
       | American Sign Language is not English, in fact, it's not even
       | particularly close to English. Much of the language is conveyed
       | with body movements outside of the hands and fingers,
       | particularly with facial expressions and "named placeholders."
       | 
       | > All this is to say, that we need to build a 5000 hour scale
       | dataset for Sign Language Translation and we are good to go. But
       | where can we find this data? Luckily news broadcasters often
       | include special news segments for the hearing-impaired.
       | 
       | You need _way_ more than just 5000 hours of video. People who are
       | deaf of hard of hearing, in my experience, dislike the
       | interpreters in news broadcasts. It's very difficult, as an
       | interpreter, to provide _worthwhile_ translations of what is
       | being spoken _as_ it is being spoken.
       | 
       | It's more of a bad and broken transliteration that if you
       | struggle to think about you can parse out and understand.
       | 
       | The other issue is most interpreters are hearing and so use the
       | language slightly differently from actual deaf persons, and
       | training on this on news topics will make it very weak when it
       | comes to understanding and interpreting anything outside of this
       | context. ASL has "dialects" and "slang."
       | 
       | Hearing people always presume this will be simple. They should
       | really just take an ASL class and worth with deaf and hearing
       | impaired people first.
        
         | bluGill wrote:
         | Lifeprint.org has plenty of free asl courses taught by a deaf
         | person. Highly recommended for everyone but as with any
         | language it takes a lot of study to be useful.
        
           | jazzyjackson wrote:
           | .org landed on a squatting page, I suppose you mean
           | https://lifeprint.com/asl101/lessons/lessons.htm
        
             | wonger_ wrote:
             | Just spent 5 minutes following along with the first video.
             | Very clear and friendly instructor
        
         | voidingw wrote:
         | The blog post references translating between English and Indian
         | Sign Language (ISL). I interpreted that to mean translating
         | between spoken English and ISL, not ASL and ISL.
         | 
         | Regardless, I'm curious how (dis)similar ISL is to ASL.
        
           | matroid wrote:
           | That is correct. We want to translate between English and
           | ISL. English, because it is by and large the language of the
           | Web and I think we should try to connect ISL to it rather
           | than Indian Languages.
           | 
           | From my understanding, they are quite dissimilar. A person
           | who knows ISL will not understand ASL, for example.
        
         | al_borland wrote:
         | I know an interpreter who is a CODA. Her first language was
         | sign language, which I think helps a lot. I once asked her if
         | she thought in English or ASL and she said ASL.
         | 
         | During the pandemic she'd get very frustrated by the ASL she
         | saw on the news. Her mom and deaf friends couldn't understand
         | them. It wasn't long before she was on the news regularly to
         | make sure better information was going out. She kept getting
         | COVID, because she refused to wear a mask while working,
         | because coving up the face would make it more difficult to
         | convey the message. I had to respect the dedication.
        
         | matroid wrote:
         | Thanks for the feedback. You raise great points and this was
         | the reason why we wrote this post, so that we can hear from
         | people where the actual problem lies.
         | 
         | On a related note, this sort of explains why our model is
         | struggling to fit on 500 hours of our current dataset (even on
         | the training set). Even so, the current state of automatic
         | translation for Indian Sign Language is that, in-the-wild, even
         | individual words cannot be detected very well. We hope that
         | what we are building might at least improve the state-of-the-
         | art there.
         | 
         | > It's more of a bad and broken transliteration that if you
         | struggle to think about you can parse out and understand.
         | 
         | Can you elaborate a bit more on this. Do you think if we make a
         | system for bad/broken transliteration and funnel it through
         | ChatGPT, it might give meaningful results? That is ChatGPT
         | might be able to correct for errors as it is a strong language
         | model.
        
           | wizzwizz4 wrote:
           | I think you think it's a magic box. There's not actually such
           | thing as a "strong language model", not in the way you're
           | using the concept.
           | 
           | > _We hope that what we are building might at least improve
           | the state-of-the-art there._
           | 
           | Do you have any theoretical arguments for how and why it
           | would improve it? If not, my concern is that you're just
           | sucking the air out of the room. (Research into "throw a
           | large language model at the problem" doesn't tend to produce
           | any insight that could be used by other approaches, and
           | doesn't tend to work, but it does funnel a lot of grant
           | funding into cloud providers' pockets.)
        
         | kobalsky wrote:
         | > It's more of a bad and broken transliteration that if you
         | struggle to think about you can parse out and understand.
         | 
         | it seems to be more common to see sign language interpreters
         | now. is it just virtue signaling to have that instead of just
         | closed captions?
        
           | jallmann wrote:
           | Many deaf people do prefer sign language as an accessibility
           | option over reading captions, even if the interpreting can be
           | hit-or-miss.
        
           | matroid wrote:
           | Also, in India, many hearing-impaired people know only ISL.
        
         | WesternWind wrote:
         | Just to note this is for ISL, Indian Sign Language, not ASL,
         | American Sign Language.
        
       | jallmann wrote:
       | Sign languages have such enormous variability that I have always
       | thought having fluent sign language recognition / translation
       | probably means we have solved AGI.
       | 
       | Detecting the presence of sign language in a video is an
       | interesting subset of the problem and is important for building
       | out more diverse corpora. I would also try to find more
       | conversational sources of data, since news broadcasts can be
       | clinical as others have mentioned. Good luck.
        
       | hi-v-rocknroll wrote:
       | I'm wondering how long it will take for LLMs to be able to
       | generate complete ASL on-the-fly and put ASL translators out of a
       | job. The crux seems to be that ASL differs greatly from spoken
       | language.
        
       ___________________________________________________________________
       (page generated 2024-08-15 23:00 UTC)