[HN Gopher] Weak supervision to isolate sign language communicat...
___________________________________________________________________
Weak supervision to isolate sign language communicators in crowded
news videos
Author : matroid
Score : 27 points
Date : 2024-08-14 20:37 UTC (1 days ago)
(HTM) web link (vrroom.github.io)
(TXT) w3m dump (vrroom.github.io)
| akira2501 wrote:
| > I believe that we can solve continuous sign language
| translation convincingly
|
| American Sign Language is not English, in fact, it's not even
| particularly close to English. Much of the language is conveyed
| with body movements outside of the hands and fingers,
| particularly with facial expressions and "named placeholders."
|
| > All this is to say, that we need to build a 5000 hour scale
| dataset for Sign Language Translation and we are good to go. But
| where can we find this data? Luckily news broadcasters often
| include special news segments for the hearing-impaired.
|
| You need _way_ more than just 5000 hours of video. People who are
| deaf of hard of hearing, in my experience, dislike the
| interpreters in news broadcasts. It's very difficult, as an
| interpreter, to provide _worthwhile_ translations of what is
| being spoken _as_ it is being spoken.
|
| It's more of a bad and broken transliteration that if you
| struggle to think about you can parse out and understand.
|
| The other issue is most interpreters are hearing and so use the
| language slightly differently from actual deaf persons, and
| training on this on news topics will make it very weak when it
| comes to understanding and interpreting anything outside of this
| context. ASL has "dialects" and "slang."
|
| Hearing people always presume this will be simple. They should
| really just take an ASL class and worth with deaf and hearing
| impaired people first.
| bluGill wrote:
| Lifeprint.org has plenty of free asl courses taught by a deaf
| person. Highly recommended for everyone but as with any
| language it takes a lot of study to be useful.
| jazzyjackson wrote:
| .org landed on a squatting page, I suppose you mean
| https://lifeprint.com/asl101/lessons/lessons.htm
| wonger_ wrote:
| Just spent 5 minutes following along with the first video.
| Very clear and friendly instructor
| voidingw wrote:
| The blog post references translating between English and Indian
| Sign Language (ISL). I interpreted that to mean translating
| between spoken English and ISL, not ASL and ISL.
|
| Regardless, I'm curious how (dis)similar ISL is to ASL.
| matroid wrote:
| That is correct. We want to translate between English and
| ISL. English, because it is by and large the language of the
| Web and I think we should try to connect ISL to it rather
| than Indian Languages.
|
| From my understanding, they are quite dissimilar. A person
| who knows ISL will not understand ASL, for example.
| al_borland wrote:
| I know an interpreter who is a CODA. Her first language was
| sign language, which I think helps a lot. I once asked her if
| she thought in English or ASL and she said ASL.
|
| During the pandemic she'd get very frustrated by the ASL she
| saw on the news. Her mom and deaf friends couldn't understand
| them. It wasn't long before she was on the news regularly to
| make sure better information was going out. She kept getting
| COVID, because she refused to wear a mask while working,
| because coving up the face would make it more difficult to
| convey the message. I had to respect the dedication.
| matroid wrote:
| Thanks for the feedback. You raise great points and this was
| the reason why we wrote this post, so that we can hear from
| people where the actual problem lies.
|
| On a related note, this sort of explains why our model is
| struggling to fit on 500 hours of our current dataset (even on
| the training set). Even so, the current state of automatic
| translation for Indian Sign Language is that, in-the-wild, even
| individual words cannot be detected very well. We hope that
| what we are building might at least improve the state-of-the-
| art there.
|
| > It's more of a bad and broken transliteration that if you
| struggle to think about you can parse out and understand.
|
| Can you elaborate a bit more on this. Do you think if we make a
| system for bad/broken transliteration and funnel it through
| ChatGPT, it might give meaningful results? That is ChatGPT
| might be able to correct for errors as it is a strong language
| model.
| wizzwizz4 wrote:
| I think you think it's a magic box. There's not actually such
| thing as a "strong language model", not in the way you're
| using the concept.
|
| > _We hope that what we are building might at least improve
| the state-of-the-art there._
|
| Do you have any theoretical arguments for how and why it
| would improve it? If not, my concern is that you're just
| sucking the air out of the room. (Research into "throw a
| large language model at the problem" doesn't tend to produce
| any insight that could be used by other approaches, and
| doesn't tend to work, but it does funnel a lot of grant
| funding into cloud providers' pockets.)
| kobalsky wrote:
| > It's more of a bad and broken transliteration that if you
| struggle to think about you can parse out and understand.
|
| it seems to be more common to see sign language interpreters
| now. is it just virtue signaling to have that instead of just
| closed captions?
| jallmann wrote:
| Many deaf people do prefer sign language as an accessibility
| option over reading captions, even if the interpreting can be
| hit-or-miss.
| matroid wrote:
| Also, in India, many hearing-impaired people know only ISL.
| WesternWind wrote:
| Just to note this is for ISL, Indian Sign Language, not ASL,
| American Sign Language.
| jallmann wrote:
| Sign languages have such enormous variability that I have always
| thought having fluent sign language recognition / translation
| probably means we have solved AGI.
|
| Detecting the presence of sign language in a video is an
| interesting subset of the problem and is important for building
| out more diverse corpora. I would also try to find more
| conversational sources of data, since news broadcasts can be
| clinical as others have mentioned. Good luck.
| hi-v-rocknroll wrote:
| I'm wondering how long it will take for LLMs to be able to
| generate complete ASL on-the-fly and put ASL translators out of a
| job. The crux seems to be that ASL differs greatly from spoken
| language.
___________________________________________________________________
(page generated 2024-08-15 23:00 UTC)