[HN Gopher] Meta AI announces Massive Multilingual Speech code, ...
___________________________________________________________________
Meta AI announces Massive Multilingual Speech code, models for
1000+ languages
Author : crakenzak
Score : 343 points
Date : 2023-05-22 17:27 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| sacnoradhq wrote:
| FYI: Another round of massive layoffs at Meta this Wednesday.
| Stay-at-home lockdown ensues.
| gagabity wrote:
| I just want to translate a movie language audio from one to
| another, whats the easiest way to do this at home?
| richard___ wrote:
| meta is doing more for open ai than openai
| jchw wrote:
| I don't think anyone is too surprised by the fact that OpenAI
| went closed. As ironic as it may be, it was pretty obviously
| the only way they were going to continue to exist.
|
| On the other hand, it definitely underscores:
|
| - How blatantly companies exploit the "open source" concept
| with no remorse or even tacit acknowledgement; for many
| startups, open source is a great way to buy some goodwill and
| get some customers, then once they have what they want, close
| the doors and start asking for rent money. Nothing wrong with
| doing closed source or commercial software, but would OpenAI
| still have gained relevance if they hadn't started the way they
| did with the name they did?
|
| - How little anyone gives a shit; we all watched it happen, but
| apparently nobody really cares enough. Investors, customers,
| the general public, apparently all is fair in making money.
|
| I'm not suggesting OpenAI is especially evil, definitely not.
| In fact, the most depressing thing is that doing this sort of
| bait and switch mechanic is so commonplace and accepted now
| that it wasn't news or even interesting. It's just what we
| expect. Anything for a dollar.
|
| But maybe I still seem like I'm just being whiny. Okay, fair
| enough. But look at what's happening now; OpenAI wants heavy
| regulation on AI, particularly they want to curtail and
| probably just ban open source models, through whatever proxy
| necessary, using whatever tactics are needed to scare people
| into it. They may or may not get what they want, but I'm going
| to guess that if they do get it, ~nobody will care, and OpenAI
| will be raking in record profits while open source AI
| technology gets pushed underground.
|
| Oh I'd love to be wrong, but then again, it's not like there's
| anything particularly novel about this strategy. It's basically
| textbook at this point.
| jokethrowaway wrote:
| Because OSS doesn't work at scale, the economic incentives
| are not aligned.
|
| Sure, it may work sometimes with the goodwill of someone who
| cares, but 90% of OSS code is a dead portfolio the authors
| built with the hope of landing a tech job or skipping some
| algorithm questions.
|
| Sure, OSS allow people to experiment with crap for free (even
| though it's mostly big corps benefiting from OSS) but what
| about the negative effects OSS produce on small businesses?
|
| How many developers could spend their life maintaining small
| parts of software instead of working in soulless
| corporations, if giving away your code for free (and without
| maintenance) wasn't so common? How much better would this
| code be compared to the wasteland of OSS projects? How much
| more secure could the entire ecosystem be? How many poor
| developers are working for free just in the hope of getting a
| job someday?
|
| We need to stop dreaming the OSS dream and start making
| things fairer for the developers involved.
| jchw wrote:
| In the long term, software is not very novel. Needs evolve
| quickly in the genesis of a new category of software, but
| it doesn't take very long for it to stabilize for many
| categories. That's why, in my estimation anyways, open
| source is actually _more_ sustainable in some cases:
| because when software becomes a commodity, it makes more
| sense for stakeholders to collaborate and collectively
| benefit from it.
|
| There is no "OSS dream" anymore--today, there is an OSS
| reality. We have some open source stuff that objectively
| works: there are business models that are more or less
| proven, at least as proven as any internet or software-
| oriented business model, and plenty of highly successful
| projects that deliver immense value to the world.
|
| Then again, some of it doesn't seem to work, and there are
| a lot of unknowns about how it works, what the dynamics
| will be, etc. But, if we're to call open source into
| question, we shouldn't forget to call proprietary software
| into question, too. Proprietary software has many
| seemingly-endemic issues that are hard to mitigate, and the
| business model has been shifting as of late. Software is
| now sold more as a subscription and a service than it is a
| product. The old business model of boxed software, it
| seems, has proven unsustainable for many participants.
|
| The main issue open source seems to have is really funding.
| It works well when open source software acts as a
| compliment to some other commercial business, but it works
| poorly when the software itself is where all the value is.
| After all, if you, for example, just host an open source
| piece of software in exchange for money, you're effectively
| competing in the highly competitive web hosting business.
| It can work since you can obviously provide some value, but
| it's a precarious position. Thus very few companies really
| have a lot of money to put into the Linux desktop, or at
| least if you wanted to compare it to Windows or macOS. It's
| complimentary to some of them who use it as a developer
| workstation or something, but there are only a couple
| companies who I think genuinely have a good model. System76
| is definitely one of them, to be explicit about it.
|
| But rather than give up entirely, I propose something else:
| we should literally fund open source collectively.
| Obviously a lot of us already do: you can donate to
| projects like Blender or Krita monetarily, and you can
| donate your time and code as I'm sure many of us also do
| for random projects. But also, I think that open source
| should get (more) public funding. These projects arguably
| end up adding more value to the world than you put in them,
| and I think governments should take notice.
|
| Of course in some cases this has already taken shape for
| one reason or another. Consider Ghidra. Clearly released
| for the benefit of NSA's PR, but wait, why not release
| Ghidra? It's immensely useful, especially given that even
| at a fraction of the functionality of Hex Rays products,
| it's still extremely useful for many parties who could
| simply never afford the annual cost of maintaining IDA Pro
| licenses, especially now that it is only available as a
| subscription.
|
| The way I see it, software and computers in general are
| still moving quite fast even though it's clearly stagnating
| compared to where it once was. But, as things stabilize,
| software will simply need to be updated less, because we
| simply aren't going to have reasons to. As it is, old
| versions of Photoshop are already perfectly serviceable for
| many jobs. And at that point, we're only going to need one
| decent open source release for some given category of work.
| Things will need occasional improvements and updates, but
| c'mon, there's not an unlimited amount of potential to
| squeeze out of e.g. an image editor, any more than you can
| a table saw or a hammer. At some point you hit the point of
| diminishing returns, and I think we're nearing it in some
| places, hence why the switch to subscription models is
| necessary to sustain software businesses.
|
| It's a myth that open source is held up by poor developers
| starving for their ideology. I'm sure a lot of that exists,
| but a lot of open source is also side projects, work
| subsidized by companies for one reason or another, projects
| with healthy revenue streams or donations, etc.
| nblgbg wrote:
| this is absolutely true, with pytorch, llama, ton of models in
| vision and now with this !
| visarga wrote:
| We're in upside-down world, Meta is now cool; actually I
| always admired their open source projects, starting with
| React and PyTorch.
| sabareesh wrote:
| What an irony !
| s1k3s wrote:
| That's really not hard to do, is it?
| justapassenger wrote:
| For OpenAI, AI is the revenue source. For Meta, it's a tool
| used to build products.
|
| If OpenAI gives their stuff away, they lose customers. If Meta
| does it, they can have community around it, and have joint
| effort about improving tools that then they'll use for their
| internal products.
|
| OpenAI is modern (and most likely - very short lived) Microsoft
| in the AI space, while Meta tries to replicate Linux in the AI
| space.
| [deleted]
| rllearneratwork wrote:
| Real _Open_ AI lab.
| archon1410 wrote:
| ASR: "Automatic Speech Recognition"; also known as "Speech to
| Text" (STT)
|
| TTS: "Text to Speech"
|
| LID: "Language Identification"
|
| In case anyone else was confused about what the acronyms mean.
| lairv wrote:
| I don't have much knowledge about TTS models, is it
| possible/affordable to fine-tune those models on your own voice ?
| egberts1 wrote:
| Does it do American Sign Language, US fifth largest language?
|
| I didn't think so.
| crazygringo wrote:
| Perhaps you could help by inventing a written version of ASL,
| since one doesn't currently exist. [EDIT: I was wrong, sorry,
| see my response below.] Seems like that would be a prerequisite
| for a model based on written language.
|
| And of course if you could also create an entire corpus of
| training material in your written ASL?
| egberts1 wrote:
| It is called SignWriting for all you multiple naysayers.
|
| https://en.m.wikipedia.org/wiki/SignWriting
| crazygringo wrote:
| I stand corrected, thanks. There's a lot of info on the
| internet that says a written form of ASL doesn't exist,
| which is what I found when I Googled it.
|
| Looking into it, it seems very much at the experimental
| stage in terms of digital representation -- while Unicode
| symbols exist, they require being placed in 2D boxes (using
| a drawing tool like SVG). It seems like it's only in the
| past few years that there have been proposals for how to
| turn it into a linear canonical _text_ encoding?
|
| Is anyone actually using those linear encodings --
| SignPuddle or Formal SignWriting, they seem to be called --
| in the wild, outside of demonstration texts or academia?
| Especially since they only date to 2016 and 2019.
|
| Is there anywhere close at all to a corpus that Meta could
| train on? Because it still seems like the answer is no, but
| I also got my research wrong when Google gave no indication
| that SignWriting existed in the first place.
| egberts1 wrote:
| SignWriting has been documented at National Deafness
| section in California State University of Northridge
| South Library since 1968 and at Gallaudet University,
| Washington, D.C. since 1950s.
| ccooffee wrote:
| Very fair point. I've never seen it used, but SignWriting
| is already in the Unicode standard[0] (at U+1D800 -
| U+1DAAF).
|
| I suspect Meta/Facebook doesn't have a lot of content to
| work off of. I've only been able to find community-
| generated examples of SignWriting on the official
| website[1], and none of those seem to be using Unicode
| characters. MMS is an audio-to-text tool, so it seems
| unlikely that it can be trivially expanded to take in
| visual data (pictures of text or video of ASL being
| performed).
|
| I suspect the process of turning viewed ASL into
| SignWriting text will be very difficult to automate. I
| would not be surprised if such a project would either use a
| different textual encoding or directly translate out to
| English (which also sounds terribly hard, but these LLM
| advances recently have surprised me).
|
| [0] https://www.unicode.org/charts/PDF/U1D800.pdf
|
| [1] https://signwriting.org/#RecentPostingsUS
| yamazakiwi wrote:
| Most hearing impaired people have never heard or don't care
| to use SignWriting. You are right about it's existence, for
| what it's worth.
| ccooffee wrote:
| American Sign Language, ISO-693-3 code 'ase', does not have a
| formal written or spoken grammar. Obviously, the list of MMS
| supported languages[0] does not include support for 'ase'
| because MMS is a tool for spoken language and, as an
| intermediate state, written language.
|
| [0]
| https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mm...
| armatav wrote:
| Imagine if we used these types of models for like 500 years and
| it locked their vocabulary in time, disallowing any further
| language blending; then somehow the servers turned off and nobody
| could communicate across language barriers anymore.
|
| Someone should write that down in some sort of short-story
| involving a really tall structure.
| vlugorilla wrote:
| Would it be possible to have this in somethin "a la" whisper.cpp?
| ripvanwinkle wrote:
| Anyone know what hardware it takes to run this? Asking as an
| enthusiastic newbie
| detrites wrote:
| At a guess, based on model size, just about anything, even a
| Raspberry Pi 4 etc.
|
| As comparison the GGML port of Whisper (OpenAI's equivalent),
| runs in the browser, via WASM: https://whisper.ggerganov.com/
| EvgeniyZh wrote:
| Most of languages support only LID (language identification)
| task. Still impressive
| OkGoDoIt wrote:
| According to [1] on the accompanying blog post, this brings the
| Whisper 44.3 WER down to 18.7, although it's unclear to me how
| much better this is at primarily English speech recognition. I'd
| love to see a full comparison of accuracy improvements as well as
| a proper writeup of how much more power it takes to run this in
| production or on mobile vs something like whisper.
|
| [1]: https://scontent-
| sjc3-1.xx.fbcdn.net/v/t39.8562-6/346801894_...
| cleverwebble wrote:
| Wow, I didn't even know there was 7,000 documented languages in
| the world!
| _the_inflator wrote:
| Even Bavarian is covered... :D
| neom wrote:
| "According to the World Atlas of Languages' methodology, there
| are around 8324 languages, spoken or signed, documented by
| governments, public institutions and academic communities. Out
| of 8324, around 7000 languages are still in use."
|
| https://en.wal.unesco.org/discover/languages
| [deleted]
| mkl wrote:
| Most are at risk of extinction.
|
| "half of the languages spoken today have fewer than 10,000
| speakers and that a quarter have fewer than 1,000 speakers"
| (https://en.wikipedia.org/wiki/Language_death).
|
| "Today, on average, we lose one language in the world every
| six weeks. There are approximately 6800 languages. But four
| percent of the population speaks 96 percent of the languages,
| and 96 percent of the population speaks four percent of the
| languages. These four percent are spoken by large language
| groups and are therefore not at risk. But 96 percent of the
| languages we know are more or less at risk. You have to treat
| them like extinct species."
| (https://en.wikipedia.org/wiki/Language_preservation).
|
| "Over the past century alone, around 400 languages - about
| one every three months - have gone extinct, and most
| linguists estimate that 50% of the world's remaining 6,500
| languages will be gone by the end of this century (some put
| that figure as high as 90%, however). Today, the top ten
| languages in the world claim around half of the world's
| population."
| (https://www.bbc.com/future/article/20140606-why-we-must-
| save...).
| RheingoldRiver wrote:
| Wow, preserving almost-dead languages sounds like something
| that LLMs would be pretty appropriate for, right? We would
| primarily need as large a body of written text translated
| into both a "known" language and the dying language as
| possible.
| echelon wrote:
| > The MMS code and model weights are released under the CC-BY-NC
| 4.0 license.
|
| Huge bummer. Prevents almost everyone from using this and
| recouping their costs.
|
| I suppose motivated teams could reproduce the paper in a clean
| room, but that might also be subject to patents.
| fulafel wrote:
| My layman's reading is that the license does seem to allow
| recouping of costs, would be interested if there is a more
| nuanced interpretation to the contrary.
|
| (tangentially, like another comment briefly mentioned, are
| models actually copyrightable? Programs are, because they are
| human authored creative works.)
| reaperman wrote:
| Id imagine you could use inference from this as training for a
| commercial model, as that isn't currently protected under
| copyright.
| EMIRELADERO wrote:
| I'd go a step further and say that the models themselves
| probably aren't copyrightable.
| reaperman wrote:
| I mean I personally feel like almost all software is math
| and anything that isn't frontend isn't legally
| copyrightable or patentable. But the courts disagree with
| my interpretation.
|
| And if it were up to me all copyright and patents would
| have exponentially increasing fees on an annual basis to
| maintain, but thatd require a change is statutes.
|
| I think the jury is still out on your interpretation
| though, and I suspect the courts will fall in line with
| whatever the largest rights holders want.
| Etheryte wrote:
| I think this is a lot better than the other option, which
| would've been not releasing it at all. No company in the
| business of making money wants to give away their edge to
| would-be competitors for free.
| jewel wrote:
| Trying to attach a license to model weights seems counter-
| productive to me. If you argue they are copyrightable then
| surely they are derivative works of the training data, which is
| unlikely to all be public domain. Machine learning enthusiasts
| are better off lobbying for model weights being non-
| copyrightable as it doesn't have any creative input and is the
| result of a purely mechanical process.
|
| The copyright on the code, on the other hand, would definitely
| be copyrighted and would need a clean-room implementation, as
| you said. The community could pool its resources and do it
| once, and license it under the AGPL to keep it and further
| improvements available to everyone.
| og_kalu wrote:
| Meta on a roll. any demo on how good the text to speech is ?
| crakenzak wrote:
| Doesn't seem like there's a demo set up by them yet, but you
| can just download the model weights and run the inference
| yourself and compare it to OpenAI Whisper or anything you have
| access to.
| og_kalu wrote:
| yeah thanks. just saw that. fingers crossed it's really good
| for korean. i'm just glad they released it at least. the
| other sota TTS models by big companies won't be seeing the
| light of day.
| stanislavb wrote:
| I still hate the name "meta". You?
| pruthvishetty wrote:
| This looks huge. Anyone know how this compares with Whisper in
| terms of quality and speed?
| crakenzak wrote:
| according to their blog post[1], MMS achieves ~half the error
| rate on words, while supporting 11x more languages. pretty
| impressive.
|
| [1] https://ai.facebook.com/blog/multilingual-model-speech-
| recog...
| youssefabdelm wrote:
| I wonder what the performance is on English specifically.
|
| Edit: Just checked the paper, it seems to be worse[1][2] but
| feel free to correct me.
|
| I feel like they should've just taken the Whipser
| architecture, scaled it, and scaled the dataset as they did.
|
| [1] Page: https://i.imgur.com/bq15Tno.png
|
| [2] Paper: https://scontent.fcai19-5.fna.fbcdn.net/v/t39.8562
| -6/3488279...
| whimsicalism wrote:
| My guess is wav2vec performs better on low resource than
| whisper.
| sacred_numbers wrote:
| It's worse on English and a lot of other common languages
| (see Appendix C of the paper). It does better on less
| common languages like Latvian or Tajik, though.
| rvz wrote:
| So many so-called overnight AI gurus hyping about their snake-oil
| product and screaming about 'Meta is dying' [0] and 'It is over
| for Meta' but little of them actually do research in AI and drive
| the field forward and this once again shows that Meta has always
| been a consistent contributor to AI research, especially in
| vision systems.
|
| All we can just do is take, take, take the code. But this time,
| the code's license is CC-BY-NC 4.0. Which simply means:
|
| Take it, but no grifting allowed.
|
| [0] https://news.ycombinator.com/item?id=31832221
| azinman2 wrote:
| Unless you're in a country that doesn't care about the IP
| restrictions, then you do whatever you want.
| m3kw9 wrote:
| The problem with all these model releases is they have no demos
| or even video of it working. It's all just download it and run
| it, like it's an app.
| crazygringo wrote:
| They're intended for researchers/professionals not consumers,
| and I'm not sure how a video is going to be helpful?
|
| And the issue with a live demo is that these are resource-
| intensive, they're not just webpages. It's an entire project to
| figure out how to host them, scale them to handle peaks, pay
| for them, implement rate-limiting, and so forth.
|
| For the intended audience, download-and-run-it doesn't seem
| like an issue at all. I don't see how any questions are going
| to be answered by a video.
| tmpz22 wrote:
| Its so weird to me that they'd do 99% of the effort and just
| skip past the 1% of work to provide a dumbed-down summary and
| instructions for broader appeal. Clearly these are released
| in part for public-relations and industry clout.
|
| Don't get me wrong they published this, it took a ton of
| work, they didn't have to do it. But its ultimately a form of
| gatekeeping that seems to come straight out of Academia. And
| honestly, that part of Academia sucks.
| sdenton4 wrote:
| A good research team is a handful of people working on a
| focused set of questions.
|
| A good product team is probably around a dozen people
| minimum? Especially if you need to hit the quality bar
| expected of a release from a BigCorp. You've got frontend,
| UX, and server components to design, in addition to the
| research part of the project. The last real app I worked on
| also included an app backend (ie, local db and web API
| access, separate from the display+UX logic) and product
| team. Oh yeah, also testing+qa, logging, and data analysis.
|
| And after all that investment, God help you if you ever
| decide the headcount costs more than keeping the lights on,
| and you discontinue the project...
|
| Public app releases are incredibly expensive, in other
| words, and throwing a model on GitHub is cheap.
| barking_biscuit wrote:
| We're all research professionals now.
| Ninjinka wrote:
| Here's the blog post with video of it working:
| https://ai.facebook.com/blog/multilingual-model-speech-recog...
| s1k3s wrote:
| I'd argue that having a "download and run" approach is so much
| better than videos or demos. Why do you think this is a
| problem?
| yjftsjthsd-h wrote:
| It's a lot bigger investment. It'd be nice to at least see a
| video; that's easier than expecting people to download and
| run something just to see it.
| digging wrote:
| Why should we have any marketing materials, ever? Why do we
| show pictures of products? Sometimes people want to see the
| capabilities before they completely dive in and spend their
| time working on something.
| simonw wrote:
| Because to download and run it you need to have a laptop
| nearby to download and run it on, with the correct operating
| system and often additional dependencies too.
|
| I do most of my research reading on my phone. I want to be
| able to understand things without breaking out a laptop.
| s1k3s wrote:
| There's a paper associated with it that you can read on
| your phone. And I don't think demo videos are really
| associated with "research". I agree they could've added
| both, but let's be honest here you'll have demo videos on
| this in the next 12 hours for sure.
| simonw wrote:
| That's a related complaint: everyone continues to insist
| on releasing papers as PDFs, ignoring the fact that those
| are still pretty nasty to read on a mobile device.
|
| Sure, release a PDF (some people like those), but having
| an additional responsive web page version of a paper
| makes research much more readable to the majority of
| content consumption devices. It's 2023.
|
| I'll generally use https://www.arxiv-vanity.com/ to
| generate those but that doesn't work with this specific
| paper since it's not hosted on arXiv.
| hmoodie wrote:
| Would you like a back rub while I'm at it?
| gcr wrote:
| fwiw, i know the topic of the thread has deviated, but i
| share the frustration about reading PDFs on my device. In
| my case, it's an accessibility issue - I can't see well,
| so zooming with reflow would make my life materially
| better since I'd be able to read research papers on my
| morning commute.
|
| Sometimes users have needs that may seem superfluous and
| beg for a snarky reply, but there are often important
| reasons behind them, even though they may not be
| actionable.
|
| I'd pay $x000 for an app that does some sort of
| intelligent pdf-to-epub conversion that doesn't require
| human-in-the-loop management/checking.
| reaperman wrote:
| I assume this "competes" directly with
| https://sites.research.google/usm/ -- would be cool to see side-
| by-side benchmarks sometime! Maybe I should make those. I
| requested access to USM but have not been granted any access yet.
| minhazm wrote:
| Is there any indication that USM will be open sourced though?
| This is more so competing with Whisper.
|
| https://github.com/openai/whisper
| crakenzak wrote:
| Code:
| https://github.com/facebookresearch/fairseq/tree/main/exampl...
|
| Blog Post: https://ai.facebook.com/blog/multilingual-model-
| speech-recog...
|
| Paper: https://research.facebook.com/publications/scaling-speech-
| te...
|
| Languages coverage:
| https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mm...
| generalizations wrote:
| Based on the availability of STT, TTS, and translation models
| available to download in that github repo, a real-life
| babelfish is 'only' some glue code away. Wild times we're
| living in.
| bdidbdidhdidh wrote:
| hope that glue is better then the duct tapes google use on
| translate.g.c
|
| it is still hopeless, and much worse than the dictionary
| based, at gender/nums/dicleasions in general.
|
| i sometimes use it just to not thing about the grammar in
| some languages, and most times I'm doing a surprised double
| take of something that would be completely inappropriate or
| offensive instead of my simple phrases.
| lukeschlather wrote:
| Google Translate is still inferior to ChatGPT 3.5. I
| suspect this style of model is significantly more expensive
| to run and Google doesn't want to give it away for free.
| Really, the only problem with ChatGPT is that it refuses to
| translate things that go against its nanny programming,
| which can make it almost worse than useless in some real-
| life situations.
| emporas wrote:
| I tried ChatGPT 3.5 against Google Translate, translating
| English to Greek, my native language, and they perform
| almost the same. The text was difficult text of science
| fiction, fantasy stuff and the results were tolerable.
| Roughly 50% of the text had to be manually rewritten.
|
| Maybe for more casual sentences and not so difficult
| text, they perform better, i haven't tried. Anyways, they
| are both better than nothing.
| gameshot911 wrote:
| Despite any shortcomings, Google Translate is still a
| technological marvel.
|
| Modern translation apps and GPS are godsends that make
| travel a million times easier. And they're free! It blows
| my mind. Traveling would be so much more incredibly
| difficult without them.
| mach1ne wrote:
| Nah, full-on babelfish is simply not possible. The meaning of
| the beginning of a sentence can be modified retroactively by
| the end of the sentence. This means that the Babelfish must
| either be greatly delayed or awkwardly correct itself every
| once in a while.
| akiselev wrote:
| That's why the babelfish translates brainwaves rather than
| sounds, which is especially important for communicating
| with our nonverbal alien neighbors.
| reaperman wrote:
| Close enough for practical value. Yes, big downside. But
| personally I'd use it.
| mysterydip wrote:
| Between the options of "you can't talk to this person"
| and "your conversation will have some delays", I know
| which I'd choose.
| foota wrote:
| I wonder if an esolang exists that could be a universal
| target though (if a language can handle any ambiguity by
| appending, then it could always be output without
| backtracking).
| matsemann wrote:
| Ah, reminds me of learning German, where you can chuck all
| the verbs onto the end. There was this sentence we had as a
| fun toy example, where it was like half a paragraphs of
| verbs in the end, and you had to try and match them up with
| the beginning of the sentence.
|
| Edit:found a reference to it https://www.reddit.com/r/Germa
| n/comments/ul0xgt/just_for_fun...
| MauranKilom wrote:
| And sometimes that modification is not just "we don't know
| which verb it ends in" but "the whole structure is
| different than expected":
|
| https://en.wikipedia.org/wiki/Garden-path_sentence
| littlestymaar wrote:
| AFAIK STT is still very bad without speaker-specific fine-
| tuning, so it's not going to be a literal babelfish
| (translating in the ear of the receiver), but it could make
| you _speak_ many languages.
| cma wrote:
| Your interlocutor's earpiece could beam yours a delta for a
| finetuned model of their voice before they open their
| mouth. Except not compatible across iMessage users and
| whatsapp users or some other predictable silicon valley
| negative sum power play like that.
| simonw wrote:
| I loaded the language coverage into Datasette Lite and added
| some facets here:
|
| https://lite.datasette.io/?json=https://gist.github.com/simo...
|
| Here's how I did that:
| https://gist.github.com/simonw/63aa33ec827b093f9c6a2797df950...
|
| Here are the top 20 represented language families:
| Niger-Congo 1,019 Austronesian 609 Sino-Tibetan
| 288 Indo-European 278 Afro-Asiatic 222
| Trans-New Guinea 219 Otomanguean 149 Nilo-
| Saharan 131 Austro-Asiatic 100 Dravidian 60
| Australian 51 Creole 45 Kra-Dai 43 Uto-
| Aztecan 41 Quechuan 36 Language isolate 35
| Torricelli 32 Maipurean 31 Mayan 30
| Sepik 30
| re5i5tor wrote:
| Thanks -- great way to visualize how massive this set of
| languages really is.
| jimmySixDOF wrote:
| All this great work and the prior sota translations and Meta
| still only accepts Nort American english voice control in
| their VR equipment lol.
| freediver wrote:
| Come to think about it, Meta is a much better name for an AI
| company than a VR company.
| Hamuko wrote:
| It's not VR, it's Metaverse; VR might involve experiences that
| are not bullshit.
| idiotsecant wrote:
| The metaverse, if we define it as an information layer that
| exists in parallel with the physical 'stuff' universe and is
| a seamless, effortless, and essential part of what we
| experience as reality _will_ be an enormous part of our
| future. Meta might just be a few centuries ahead of the
| curve, which is just as bad as being a few centuries late.
| KaoruAoiShiho wrote:
| Centuries really? I think Meta's going to be just on time.
| jokethrowaway wrote:
| I think we may skip the whole VR experience and just have
| robots instead of virtual friends
| KaoruAoiShiho wrote:
| That's a large overlap rather than sequential.
| barking_biscuit wrote:
| If you define it like that, it's just the internet.
| alienlid wrote:
| if I have an extra MBP 16" 2020 hanging around 16GB Ram, Quadcore
| i-7.... can I run this? I'd like to try TTS capabilities! LMK if
| you've got any guides or instructions online I can checkout:)
| leke wrote:
| I was kind of hoping for Interlingue, but was surprised to not
| even see Esperanto on the list.
| explodingcamera wrote:
| Esperanto is supported in their larger model
| samstave wrote:
| Random thought ; could Esperanto (as a crypto-language) be
| turned into any sort of programming language. Could one,
| conceivably program in Esperanto in any meaningful way?
| int_19h wrote:
| There's nothing particularly special about Esperanto in that
| regard compared to natural languages that it is modelled
| after.
| felipesabino wrote:
| I think you are more likely to do so in a more logical and
| structured language like lojban
| sigstoat wrote:
| can any of these models be coerced into just doing straight
| phonetic transcription? like spitting out IPA?
| lee101 wrote:
| [dead]
| eigenvalue wrote:
| I just wanted to test out the TTS locally on a powerful Ubuntu
| 22.04 machine, but the process for setting it up seems pretty
| broken and poorly documented. After 20 minutes of trying I
| finally gave up since I couldn't get the VITS dependency to build
| (despite having a fully updated machine with all required
| compilers). It seems like they never really bother to see if the
| stuff works on a fresh machine starting from scratch. Somehow for
| my own projects I'm always able to start from a fresh git clone
| and then directly install everything using this block of code:
|
| ``` python3 -m venv venv source venv/bin/activate python3 -m pip
| install --upgrade pip python3 -m pip install wheel pip install -r
| requirements.txt ```
|
| But whenever I try using these complicated ML models, it's
| usually an exercise in futility and endless mucking around with
| conda and other nonsense. It ends up not being worth it and I
| just move on. But it does feel like it doesn't need to be like
| this.
| dragonwriter wrote:
| Yeah, a lot of the releases from researcg groups are woefully
| poorly documented.
|
| Usually, some hero releases a friendly install system within a
| few days, though.
| qwertox wrote:
| I would like to use stuff like this as a side-project. Buy a
| Nvidia Geforce GPU and stick it into my 24/7 server and play
| around with it in my free time, to see what can be done.
|
| The issue with all these AI models is that there's no information
| on which GPU is enough for which task. I'm absolutely clueless if
| a single RTX 4000 SFF with its 20GB VRAM and only 70W of max
| power usage will be a waste of money, or really something great
| to do experiments on. Like do some ASR with Whisper, images with
| Stable Diffusion or load a LLM onto it, or this project here from
| Facebook.
|
| Renting a GPU in the cloud doesn't seem to be a solution for this
| use case, where you just want to let something run for a couple
| of days and see if it's useful for something.
| nharada wrote:
| Wait why is renting a GPU in the cloud not a solution? You can
| even try multiple options and see which ones are capable enough
| for your use case.
|
| Look into some barebones cloud GPU services, for example Lambda
| Labs which is significantly cheaper than AWS/GCP but offers
| basically nothing besides the machine with a GPU. You could
| even try something like Vast in which people rent out their
| personal GPU machines for cheap. Not something I'd use for
| uhhh...basically anything corporate, but for a personal project
| with no data security or uptime issues it would probably work
| great.
| fnord77 wrote:
| "but offers basically nothing besides the machine with a GPU"
|
| they must offer distributed storage, that can accommodate
| massive models, though? how else would you have multiple GPUs
| working on a single training model?
| nomel wrote:
| Besides some great tooling out there if you wanted to roll
| your own, you can literally rent windows/linux computers,
| with persistent disks. If you have good internet, you can
| even use it as a gaming PC, as I do.
| itake wrote:
| Is there an easy way to off-board the persistent disk to
| cheaper machines when you don't need the gpus?
|
| Like imagine, setting up and installing everything with the
| gpu attached, but when you're not using the gpu or all the
| cpu cores, you can disconnect them.
|
| If you have docs on how to do this, please let me know.
| haliskerbas wrote:
| What sort of frame rate and cost do you get? I have the
| highest tier GeForce now subscription and it sometimes
| drops to horrible conditions.
| itake wrote:
| My annoyance was managing state. I'd have to spend hours
| installing tools, downloading data, updating code, then when
| I want to go to bed I have to package it up and store as much
| as I can on s3 before shutting off the $$ server.
| ciberado wrote:
| I've played a lot with Stable Diffusion using AWS spot
| instances, mostly because it is the platform with which I'm
| more familiar. The Terraform script[0] should be easy to
| adapt to any other project of this kind.
|
| Let me know if you are interested, and maybe we can find
| time to work on it together :).
|
| [0] https://github.com/ciberado/stable-diffusion-webui-
| terraform...
| fnordpiglet wrote:
| aws s3 sync + image snapshot
| sneak wrote:
| This is what containers solve. Don't waste time manually
| installing things. Store state in a database via the app on
| a different host.
| machinawhite wrote:
| Well then you're wasting time managing your containers?
| Have you ever used k8s, it's a full time job lol
| kunwon1 wrote:
| Speaking as someone who has encountered similar
| difficulties, this response has strong 'Draw the rest of
| the owl' vibes
| sneak wrote:
| Speaking as someone who has solved these difficulties
| hundreds of times, "draw the rest of the owl" doesn't
| tell you the specific things to google to get detailed
| examples and tutorials on how millions of others have
| sidestepped these repeated issues.
| itake wrote:
| Yep... you spend hours messing around with docker
| containers and debugging all the weird build errors.
|
| I am less familiar with storing data in a db (for ml
| hosting concerns), but I'd imagine it would add overhead
| (as opposed to accessing files on disk).
|
| You also have to deal with hosting a db and configuring
| the schema.
| sneak wrote:
| You "spend hours messing around" with everything you
| don't know or understand at first. One could say the same
| about writing the software itself. At its core
| Dockerfiles are just shell scripts with worse syntax, so
| it's not really that much more to learn. Once you get it
| done once, you don't have to screw around with it
| anymore, and you have it on any box you want in seconds.
|
| In either case you have to spend hours screwing around
| with your environment. If those hours result in a
| Dockerfile, then it's the last time. If they don't, then
| it's each time you want it on a new host (which as was
| correctly pointed out a pain in the ass).
|
| Storing data in a database vs in files on disk is like
| application development 101 and is pretty much a required
| skill period. It's required that you learn how to do this
| because almost all applications revolve around storing
| some kind of state and, as was noted, you can't
| reasonably expect it to persist on the app server without
| additional ops headaches.
|
| Many people will host dbs for you without you having to
| think about it. Schema is only required if you use a
| structured db (which is advisable) but it doesn't take
| that long.
| NBJack wrote:
| I applaud your experience, but honestly I agree with
| parent: knowledge acquisition for a side project may not
| be the best use of their time, especially if it
| significantly impedes actually launching/finishing a
| first iteration.
|
| It's a similar situation for most apps/services/startup
| ideas: you don't necessarily need a planet scale solution
| in the beginning. Containers are great and solve lots of
| problems, but they are not a panacea and come with their
| own drawbacks. Anecdotally, I personally wanted to make a
| small local 3 node Kubernetes cluster at one time on my
| beefy hypervisor. By the time I learned the ins and outs
| of Kubernetes networking, I lost momentum. It also didn't
| end up giving me what I wanted out of it. Educational,
| sure, but in the end not useful to me.
| simonw wrote:
| I'm having trouble imagining what data I would store in a
| database as opposed to a filesystem if my goal is to
| experiment with large models like Stable Diffusion.
| syntaxing wrote:
| I would recommend a 3090. It can handle everything a 4000
| series can albeit slightly slower, has enough VRAM to handle
| most things for fun, and can be bought for around $700.
| [deleted]
| ttt3ts wrote:
| You can finetune whisper, stable diffusion, and LLM up to about
| 15B parameters with 24GB VRAM.
|
| Which leads you to what hardware to get. Best bang for the $
| right now is definitely a used 3090 at ~$700. If you want more
| than 24GB vram just rent the hardware as it will be cheaper.
|
| If you're not willing to drop $700 don't buy anything just
| rent. I have had decent luck with vast.ai
| NBJack wrote:
| There is the world of used Nvidia Teslas, like the M40. Very
| cheap, but some assembly required.
| sarabande wrote:
| I'm trying to use this on a 3M mp3 file to test ASR with language
| code deu, CPU only, and I keep getting this error -- are there
| limits to the MMS inference? File
| "fairseq/data/data_utils_fast.pyx", line 30, in
| fairseq.data.data_utils_fast.batch_by_size_vec assert
| max_tokens <= 0 or np.max(num_tokens_vec) <= max_tokens, (
| AssertionError: Sentences lengths should not exceed
| max_tokens=4000000 Traceback (most recent call last):
| File "/home/xxx/fairseq/examples/mms/asr/infer/mms_infer.py",
| line 52, in <module> process(args) File
| "/home/xxx/fairseq/examples/mms/asr/infer/mms_infer.py", line 44,
| in process
___________________________________________________________________
(page generated 2023-05-22 23:00 UTC)