[HN Gopher] Meta AI announces Massive Multilingual Speech code, ...
       ___________________________________________________________________
        
       Meta AI announces Massive Multilingual Speech code,  models for
       1000+ languages
        
       Author : crakenzak
       Score  : 343 points
       Date   : 2023-05-22 17:27 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | sacnoradhq wrote:
       | FYI: Another round of massive layoffs at Meta this Wednesday.
       | Stay-at-home lockdown ensues.
        
       | gagabity wrote:
       | I just want to translate a movie language audio from one to
       | another, whats the easiest way to do this at home?
        
       | richard___ wrote:
       | meta is doing more for open ai than openai
        
         | jchw wrote:
         | I don't think anyone is too surprised by the fact that OpenAI
         | went closed. As ironic as it may be, it was pretty obviously
         | the only way they were going to continue to exist.
         | 
         | On the other hand, it definitely underscores:
         | 
         | - How blatantly companies exploit the "open source" concept
         | with no remorse or even tacit acknowledgement; for many
         | startups, open source is a great way to buy some goodwill and
         | get some customers, then once they have what they want, close
         | the doors and start asking for rent money. Nothing wrong with
         | doing closed source or commercial software, but would OpenAI
         | still have gained relevance if they hadn't started the way they
         | did with the name they did?
         | 
         | - How little anyone gives a shit; we all watched it happen, but
         | apparently nobody really cares enough. Investors, customers,
         | the general public, apparently all is fair in making money.
         | 
         | I'm not suggesting OpenAI is especially evil, definitely not.
         | In fact, the most depressing thing is that doing this sort of
         | bait and switch mechanic is so commonplace and accepted now
         | that it wasn't news or even interesting. It's just what we
         | expect. Anything for a dollar.
         | 
         | But maybe I still seem like I'm just being whiny. Okay, fair
         | enough. But look at what's happening now; OpenAI wants heavy
         | regulation on AI, particularly they want to curtail and
         | probably just ban open source models, through whatever proxy
         | necessary, using whatever tactics are needed to scare people
         | into it. They may or may not get what they want, but I'm going
         | to guess that if they do get it, ~nobody will care, and OpenAI
         | will be raking in record profits while open source AI
         | technology gets pushed underground.
         | 
         | Oh I'd love to be wrong, but then again, it's not like there's
         | anything particularly novel about this strategy. It's basically
         | textbook at this point.
        
           | jokethrowaway wrote:
           | Because OSS doesn't work at scale, the economic incentives
           | are not aligned.
           | 
           | Sure, it may work sometimes with the goodwill of someone who
           | cares, but 90% of OSS code is a dead portfolio the authors
           | built with the hope of landing a tech job or skipping some
           | algorithm questions.
           | 
           | Sure, OSS allow people to experiment with crap for free (even
           | though it's mostly big corps benefiting from OSS) but what
           | about the negative effects OSS produce on small businesses?
           | 
           | How many developers could spend their life maintaining small
           | parts of software instead of working in soulless
           | corporations, if giving away your code for free (and without
           | maintenance) wasn't so common? How much better would this
           | code be compared to the wasteland of OSS projects? How much
           | more secure could the entire ecosystem be? How many poor
           | developers are working for free just in the hope of getting a
           | job someday?
           | 
           | We need to stop dreaming the OSS dream and start making
           | things fairer for the developers involved.
        
             | jchw wrote:
             | In the long term, software is not very novel. Needs evolve
             | quickly in the genesis of a new category of software, but
             | it doesn't take very long for it to stabilize for many
             | categories. That's why, in my estimation anyways, open
             | source is actually _more_ sustainable in some cases:
             | because when software becomes a commodity, it makes more
             | sense for stakeholders to collaborate and collectively
             | benefit from it.
             | 
             | There is no "OSS dream" anymore--today, there is an OSS
             | reality. We have some open source stuff that objectively
             | works: there are business models that are more or less
             | proven, at least as proven as any internet or software-
             | oriented business model, and plenty of highly successful
             | projects that deliver immense value to the world.
             | 
             | Then again, some of it doesn't seem to work, and there are
             | a lot of unknowns about how it works, what the dynamics
             | will be, etc. But, if we're to call open source into
             | question, we shouldn't forget to call proprietary software
             | into question, too. Proprietary software has many
             | seemingly-endemic issues that are hard to mitigate, and the
             | business model has been shifting as of late. Software is
             | now sold more as a subscription and a service than it is a
             | product. The old business model of boxed software, it
             | seems, has proven unsustainable for many participants.
             | 
             | The main issue open source seems to have is really funding.
             | It works well when open source software acts as a
             | compliment to some other commercial business, but it works
             | poorly when the software itself is where all the value is.
             | After all, if you, for example, just host an open source
             | piece of software in exchange for money, you're effectively
             | competing in the highly competitive web hosting business.
             | It can work since you can obviously provide some value, but
             | it's a precarious position. Thus very few companies really
             | have a lot of money to put into the Linux desktop, or at
             | least if you wanted to compare it to Windows or macOS. It's
             | complimentary to some of them who use it as a developer
             | workstation or something, but there are only a couple
             | companies who I think genuinely have a good model. System76
             | is definitely one of them, to be explicit about it.
             | 
             | But rather than give up entirely, I propose something else:
             | we should literally fund open source collectively.
             | Obviously a lot of us already do: you can donate to
             | projects like Blender or Krita monetarily, and you can
             | donate your time and code as I'm sure many of us also do
             | for random projects. But also, I think that open source
             | should get (more) public funding. These projects arguably
             | end up adding more value to the world than you put in them,
             | and I think governments should take notice.
             | 
             | Of course in some cases this has already taken shape for
             | one reason or another. Consider Ghidra. Clearly released
             | for the benefit of NSA's PR, but wait, why not release
             | Ghidra? It's immensely useful, especially given that even
             | at a fraction of the functionality of Hex Rays products,
             | it's still extremely useful for many parties who could
             | simply never afford the annual cost of maintaining IDA Pro
             | licenses, especially now that it is only available as a
             | subscription.
             | 
             | The way I see it, software and computers in general are
             | still moving quite fast even though it's clearly stagnating
             | compared to where it once was. But, as things stabilize,
             | software will simply need to be updated less, because we
             | simply aren't going to have reasons to. As it is, old
             | versions of Photoshop are already perfectly serviceable for
             | many jobs. And at that point, we're only going to need one
             | decent open source release for some given category of work.
             | Things will need occasional improvements and updates, but
             | c'mon, there's not an unlimited amount of potential to
             | squeeze out of e.g. an image editor, any more than you can
             | a table saw or a hammer. At some point you hit the point of
             | diminishing returns, and I think we're nearing it in some
             | places, hence why the switch to subscription models is
             | necessary to sustain software businesses.
             | 
             | It's a myth that open source is held up by poor developers
             | starving for their ideology. I'm sure a lot of that exists,
             | but a lot of open source is also side projects, work
             | subsidized by companies for one reason or another, projects
             | with healthy revenue streams or donations, etc.
        
         | nblgbg wrote:
         | this is absolutely true, with pytorch, llama, ton of models in
         | vision and now with this !
        
           | visarga wrote:
           | We're in upside-down world, Meta is now cool; actually I
           | always admired their open source projects, starting with
           | React and PyTorch.
        
         | sabareesh wrote:
         | What an irony !
        
         | s1k3s wrote:
         | That's really not hard to do, is it?
        
         | justapassenger wrote:
         | For OpenAI, AI is the revenue source. For Meta, it's a tool
         | used to build products.
         | 
         | If OpenAI gives their stuff away, they lose customers. If Meta
         | does it, they can have community around it, and have joint
         | effort about improving tools that then they'll use for their
         | internal products.
         | 
         | OpenAI is modern (and most likely - very short lived) Microsoft
         | in the AI space, while Meta tries to replicate Linux in the AI
         | space.
        
           | [deleted]
        
       | rllearneratwork wrote:
       | Real _Open_ AI lab.
        
       | archon1410 wrote:
       | ASR: "Automatic Speech Recognition"; also known as "Speech to
       | Text" (STT)
       | 
       | TTS: "Text to Speech"
       | 
       | LID: "Language Identification"
       | 
       | In case anyone else was confused about what the acronyms mean.
        
       | lairv wrote:
       | I don't have much knowledge about TTS models, is it
       | possible/affordable to fine-tune those models on your own voice ?
        
       | egberts1 wrote:
       | Does it do American Sign Language, US fifth largest language?
       | 
       | I didn't think so.
        
         | crazygringo wrote:
         | Perhaps you could help by inventing a written version of ASL,
         | since one doesn't currently exist. [EDIT: I was wrong, sorry,
         | see my response below.] Seems like that would be a prerequisite
         | for a model based on written language.
         | 
         | And of course if you could also create an entire corpus of
         | training material in your written ASL?
        
           | egberts1 wrote:
           | It is called SignWriting for all you multiple naysayers.
           | 
           | https://en.m.wikipedia.org/wiki/SignWriting
        
             | crazygringo wrote:
             | I stand corrected, thanks. There's a lot of info on the
             | internet that says a written form of ASL doesn't exist,
             | which is what I found when I Googled it.
             | 
             | Looking into it, it seems very much at the experimental
             | stage in terms of digital representation -- while Unicode
             | symbols exist, they require being placed in 2D boxes (using
             | a drawing tool like SVG). It seems like it's only in the
             | past few years that there have been proposals for how to
             | turn it into a linear canonical _text_ encoding?
             | 
             | Is anyone actually using those linear encodings --
             | SignPuddle or Formal SignWriting, they seem to be called --
             | in the wild, outside of demonstration texts or academia?
             | Especially since they only date to 2016 and 2019.
             | 
             | Is there anywhere close at all to a corpus that Meta could
             | train on? Because it still seems like the answer is no, but
             | I also got my research wrong when Google gave no indication
             | that SignWriting existed in the first place.
        
               | egberts1 wrote:
               | SignWriting has been documented at National Deafness
               | section in California State University of Northridge
               | South Library since 1968 and at Gallaudet University,
               | Washington, D.C. since 1950s.
        
             | ccooffee wrote:
             | Very fair point. I've never seen it used, but SignWriting
             | is already in the Unicode standard[0] (at U+1D800 -
             | U+1DAAF).
             | 
             | I suspect Meta/Facebook doesn't have a lot of content to
             | work off of. I've only been able to find community-
             | generated examples of SignWriting on the official
             | website[1], and none of those seem to be using Unicode
             | characters. MMS is an audio-to-text tool, so it seems
             | unlikely that it can be trivially expanded to take in
             | visual data (pictures of text or video of ASL being
             | performed).
             | 
             | I suspect the process of turning viewed ASL into
             | SignWriting text will be very difficult to automate. I
             | would not be surprised if such a project would either use a
             | different textual encoding or directly translate out to
             | English (which also sounds terribly hard, but these LLM
             | advances recently have surprised me).
             | 
             | [0] https://www.unicode.org/charts/PDF/U1D800.pdf
             | 
             | [1] https://signwriting.org/#RecentPostingsUS
        
             | yamazakiwi wrote:
             | Most hearing impaired people have never heard or don't care
             | to use SignWriting. You are right about it's existence, for
             | what it's worth.
        
         | ccooffee wrote:
         | American Sign Language, ISO-693-3 code 'ase', does not have a
         | formal written or spoken grammar. Obviously, the list of MMS
         | supported languages[0] does not include support for 'ase'
         | because MMS is a tool for spoken language and, as an
         | intermediate state, written language.
         | 
         | [0]
         | https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mm...
        
       | armatav wrote:
       | Imagine if we used these types of models for like 500 years and
       | it locked their vocabulary in time, disallowing any further
       | language blending; then somehow the servers turned off and nobody
       | could communicate across language barriers anymore.
       | 
       | Someone should write that down in some sort of short-story
       | involving a really tall structure.
        
       | vlugorilla wrote:
       | Would it be possible to have this in somethin "a la" whisper.cpp?
        
       | ripvanwinkle wrote:
       | Anyone know what hardware it takes to run this? Asking as an
       | enthusiastic newbie
        
         | detrites wrote:
         | At a guess, based on model size, just about anything, even a
         | Raspberry Pi 4 etc.
         | 
         | As comparison the GGML port of Whisper (OpenAI's equivalent),
         | runs in the browser, via WASM: https://whisper.ggerganov.com/
        
       | EvgeniyZh wrote:
       | Most of languages support only LID (language identification)
       | task. Still impressive
        
       | OkGoDoIt wrote:
       | According to [1] on the accompanying blog post, this brings the
       | Whisper 44.3 WER down to 18.7, although it's unclear to me how
       | much better this is at primarily English speech recognition. I'd
       | love to see a full comparison of accuracy improvements as well as
       | a proper writeup of how much more power it takes to run this in
       | production or on mobile vs something like whisper.
       | 
       | [1]: https://scontent-
       | sjc3-1.xx.fbcdn.net/v/t39.8562-6/346801894_...
        
       | cleverwebble wrote:
       | Wow, I didn't even know there was 7,000 documented languages in
       | the world!
        
         | _the_inflator wrote:
         | Even Bavarian is covered... :D
        
         | neom wrote:
         | "According to the World Atlas of Languages' methodology, there
         | are around 8324 languages, spoken or signed, documented by
         | governments, public institutions and academic communities. Out
         | of 8324, around 7000 languages are still in use."
         | 
         | https://en.wal.unesco.org/discover/languages
        
           | [deleted]
        
           | mkl wrote:
           | Most are at risk of extinction.
           | 
           | "half of the languages spoken today have fewer than 10,000
           | speakers and that a quarter have fewer than 1,000 speakers"
           | (https://en.wikipedia.org/wiki/Language_death).
           | 
           | "Today, on average, we lose one language in the world every
           | six weeks. There are approximately 6800 languages. But four
           | percent of the population speaks 96 percent of the languages,
           | and 96 percent of the population speaks four percent of the
           | languages. These four percent are spoken by large language
           | groups and are therefore not at risk. But 96 percent of the
           | languages we know are more or less at risk. You have to treat
           | them like extinct species."
           | (https://en.wikipedia.org/wiki/Language_preservation).
           | 
           | "Over the past century alone, around 400 languages - about
           | one every three months - have gone extinct, and most
           | linguists estimate that 50% of the world's remaining 6,500
           | languages will be gone by the end of this century (some put
           | that figure as high as 90%, however). Today, the top ten
           | languages in the world claim around half of the world's
           | population."
           | (https://www.bbc.com/future/article/20140606-why-we-must-
           | save...).
        
             | RheingoldRiver wrote:
             | Wow, preserving almost-dead languages sounds like something
             | that LLMs would be pretty appropriate for, right? We would
             | primarily need as large a body of written text translated
             | into both a "known" language and the dying language as
             | possible.
        
       | echelon wrote:
       | > The MMS code and model weights are released under the CC-BY-NC
       | 4.0 license.
       | 
       | Huge bummer. Prevents almost everyone from using this and
       | recouping their costs.
       | 
       | I suppose motivated teams could reproduce the paper in a clean
       | room, but that might also be subject to patents.
        
         | fulafel wrote:
         | My layman's reading is that the license does seem to allow
         | recouping of costs, would be interested if there is a more
         | nuanced interpretation to the contrary.
         | 
         | (tangentially, like another comment briefly mentioned, are
         | models actually copyrightable? Programs are, because they are
         | human authored creative works.)
        
         | reaperman wrote:
         | Id imagine you could use inference from this as training for a
         | commercial model, as that isn't currently protected under
         | copyright.
        
           | EMIRELADERO wrote:
           | I'd go a step further and say that the models themselves
           | probably aren't copyrightable.
        
             | reaperman wrote:
             | I mean I personally feel like almost all software is math
             | and anything that isn't frontend isn't legally
             | copyrightable or patentable. But the courts disagree with
             | my interpretation.
             | 
             | And if it were up to me all copyright and patents would
             | have exponentially increasing fees on an annual basis to
             | maintain, but thatd require a change is statutes.
             | 
             | I think the jury is still out on your interpretation
             | though, and I suspect the courts will fall in line with
             | whatever the largest rights holders want.
        
         | Etheryte wrote:
         | I think this is a lot better than the other option, which
         | would've been not releasing it at all. No company in the
         | business of making money wants to give away their edge to
         | would-be competitors for free.
        
         | jewel wrote:
         | Trying to attach a license to model weights seems counter-
         | productive to me. If you argue they are copyrightable then
         | surely they are derivative works of the training data, which is
         | unlikely to all be public domain. Machine learning enthusiasts
         | are better off lobbying for model weights being non-
         | copyrightable as it doesn't have any creative input and is the
         | result of a purely mechanical process.
         | 
         | The copyright on the code, on the other hand, would definitely
         | be copyrighted and would need a clean-room implementation, as
         | you said. The community could pool its resources and do it
         | once, and license it under the AGPL to keep it and further
         | improvements available to everyone.
        
       | og_kalu wrote:
       | Meta on a roll. any demo on how good the text to speech is ?
        
         | crakenzak wrote:
         | Doesn't seem like there's a demo set up by them yet, but you
         | can just download the model weights and run the inference
         | yourself and compare it to OpenAI Whisper or anything you have
         | access to.
        
           | og_kalu wrote:
           | yeah thanks. just saw that. fingers crossed it's really good
           | for korean. i'm just glad they released it at least. the
           | other sota TTS models by big companies won't be seeing the
           | light of day.
        
       | stanislavb wrote:
       | I still hate the name "meta". You?
        
       | pruthvishetty wrote:
       | This looks huge. Anyone know how this compares with Whisper in
       | terms of quality and speed?
        
         | crakenzak wrote:
         | according to their blog post[1], MMS achieves ~half the error
         | rate on words, while supporting 11x more languages. pretty
         | impressive.
         | 
         | [1] https://ai.facebook.com/blog/multilingual-model-speech-
         | recog...
        
           | youssefabdelm wrote:
           | I wonder what the performance is on English specifically.
           | 
           | Edit: Just checked the paper, it seems to be worse[1][2] but
           | feel free to correct me.
           | 
           | I feel like they should've just taken the Whipser
           | architecture, scaled it, and scaled the dataset as they did.
           | 
           | [1] Page: https://i.imgur.com/bq15Tno.png
           | 
           | [2] Paper: https://scontent.fcai19-5.fna.fbcdn.net/v/t39.8562
           | -6/3488279...
        
             | whimsicalism wrote:
             | My guess is wav2vec performs better on low resource than
             | whisper.
        
             | sacred_numbers wrote:
             | It's worse on English and a lot of other common languages
             | (see Appendix C of the paper). It does better on less
             | common languages like Latvian or Tajik, though.
        
       | rvz wrote:
       | So many so-called overnight AI gurus hyping about their snake-oil
       | product and screaming about 'Meta is dying' [0] and 'It is over
       | for Meta' but little of them actually do research in AI and drive
       | the field forward and this once again shows that Meta has always
       | been a consistent contributor to AI research, especially in
       | vision systems.
       | 
       | All we can just do is take, take, take the code. But this time,
       | the code's license is CC-BY-NC 4.0. Which simply means:
       | 
       | Take it, but no grifting allowed.
       | 
       | [0] https://news.ycombinator.com/item?id=31832221
        
         | azinman2 wrote:
         | Unless you're in a country that doesn't care about the IP
         | restrictions, then you do whatever you want.
        
       | m3kw9 wrote:
       | The problem with all these model releases is they have no demos
       | or even video of it working. It's all just download it and run
       | it, like it's an app.
        
         | crazygringo wrote:
         | They're intended for researchers/professionals not consumers,
         | and I'm not sure how a video is going to be helpful?
         | 
         | And the issue with a live demo is that these are resource-
         | intensive, they're not just webpages. It's an entire project to
         | figure out how to host them, scale them to handle peaks, pay
         | for them, implement rate-limiting, and so forth.
         | 
         | For the intended audience, download-and-run-it doesn't seem
         | like an issue at all. I don't see how any questions are going
         | to be answered by a video.
        
           | tmpz22 wrote:
           | Its so weird to me that they'd do 99% of the effort and just
           | skip past the 1% of work to provide a dumbed-down summary and
           | instructions for broader appeal. Clearly these are released
           | in part for public-relations and industry clout.
           | 
           | Don't get me wrong they published this, it took a ton of
           | work, they didn't have to do it. But its ultimately a form of
           | gatekeeping that seems to come straight out of Academia. And
           | honestly, that part of Academia sucks.
        
             | sdenton4 wrote:
             | A good research team is a handful of people working on a
             | focused set of questions.
             | 
             | A good product team is probably around a dozen people
             | minimum? Especially if you need to hit the quality bar
             | expected of a release from a BigCorp. You've got frontend,
             | UX, and server components to design, in addition to the
             | research part of the project. The last real app I worked on
             | also included an app backend (ie, local db and web API
             | access, separate from the display+UX logic) and product
             | team. Oh yeah, also testing+qa, logging, and data analysis.
             | 
             | And after all that investment, God help you if you ever
             | decide the headcount costs more than keeping the lights on,
             | and you discontinue the project...
             | 
             | Public app releases are incredibly expensive, in other
             | words, and throwing a model on GitHub is cheap.
        
           | barking_biscuit wrote:
           | We're all research professionals now.
        
         | Ninjinka wrote:
         | Here's the blog post with video of it working:
         | https://ai.facebook.com/blog/multilingual-model-speech-recog...
        
         | s1k3s wrote:
         | I'd argue that having a "download and run" approach is so much
         | better than videos or demos. Why do you think this is a
         | problem?
        
           | yjftsjthsd-h wrote:
           | It's a lot bigger investment. It'd be nice to at least see a
           | video; that's easier than expecting people to download and
           | run something just to see it.
        
           | digging wrote:
           | Why should we have any marketing materials, ever? Why do we
           | show pictures of products? Sometimes people want to see the
           | capabilities before they completely dive in and spend their
           | time working on something.
        
           | simonw wrote:
           | Because to download and run it you need to have a laptop
           | nearby to download and run it on, with the correct operating
           | system and often additional dependencies too.
           | 
           | I do most of my research reading on my phone. I want to be
           | able to understand things without breaking out a laptop.
        
             | s1k3s wrote:
             | There's a paper associated with it that you can read on
             | your phone. And I don't think demo videos are really
             | associated with "research". I agree they could've added
             | both, but let's be honest here you'll have demo videos on
             | this in the next 12 hours for sure.
        
               | simonw wrote:
               | That's a related complaint: everyone continues to insist
               | on releasing papers as PDFs, ignoring the fact that those
               | are still pretty nasty to read on a mobile device.
               | 
               | Sure, release a PDF (some people like those), but having
               | an additional responsive web page version of a paper
               | makes research much more readable to the majority of
               | content consumption devices. It's 2023.
               | 
               | I'll generally use https://www.arxiv-vanity.com/ to
               | generate those but that doesn't work with this specific
               | paper since it's not hosted on arXiv.
        
               | hmoodie wrote:
               | Would you like a back rub while I'm at it?
        
               | gcr wrote:
               | fwiw, i know the topic of the thread has deviated, but i
               | share the frustration about reading PDFs on my device. In
               | my case, it's an accessibility issue - I can't see well,
               | so zooming with reflow would make my life materially
               | better since I'd be able to read research papers on my
               | morning commute.
               | 
               | Sometimes users have needs that may seem superfluous and
               | beg for a snarky reply, but there are often important
               | reasons behind them, even though they may not be
               | actionable.
               | 
               | I'd pay $x000 for an app that does some sort of
               | intelligent pdf-to-epub conversion that doesn't require
               | human-in-the-loop management/checking.
        
       | reaperman wrote:
       | I assume this "competes" directly with
       | https://sites.research.google/usm/ -- would be cool to see side-
       | by-side benchmarks sometime! Maybe I should make those. I
       | requested access to USM but have not been granted any access yet.
        
         | minhazm wrote:
         | Is there any indication that USM will be open sourced though?
         | This is more so competing with Whisper.
         | 
         | https://github.com/openai/whisper
        
       | crakenzak wrote:
       | Code:
       | https://github.com/facebookresearch/fairseq/tree/main/exampl...
       | 
       | Blog Post: https://ai.facebook.com/blog/multilingual-model-
       | speech-recog...
       | 
       | Paper: https://research.facebook.com/publications/scaling-speech-
       | te...
       | 
       | Languages coverage:
       | https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mm...
        
         | generalizations wrote:
         | Based on the availability of STT, TTS, and translation models
         | available to download in that github repo, a real-life
         | babelfish is 'only' some glue code away. Wild times we're
         | living in.
        
           | bdidbdidhdidh wrote:
           | hope that glue is better then the duct tapes google use on
           | translate.g.c
           | 
           | it is still hopeless, and much worse than the dictionary
           | based, at gender/nums/dicleasions in general.
           | 
           | i sometimes use it just to not thing about the grammar in
           | some languages, and most times I'm doing a surprised double
           | take of something that would be completely inappropriate or
           | offensive instead of my simple phrases.
        
             | lukeschlather wrote:
             | Google Translate is still inferior to ChatGPT 3.5. I
             | suspect this style of model is significantly more expensive
             | to run and Google doesn't want to give it away for free.
             | Really, the only problem with ChatGPT is that it refuses to
             | translate things that go against its nanny programming,
             | which can make it almost worse than useless in some real-
             | life situations.
        
               | emporas wrote:
               | I tried ChatGPT 3.5 against Google Translate, translating
               | English to Greek, my native language, and they perform
               | almost the same. The text was difficult text of science
               | fiction, fantasy stuff and the results were tolerable.
               | Roughly 50% of the text had to be manually rewritten.
               | 
               | Maybe for more casual sentences and not so difficult
               | text, they perform better, i haven't tried. Anyways, they
               | are both better than nothing.
        
             | gameshot911 wrote:
             | Despite any shortcomings, Google Translate is still a
             | technological marvel.
             | 
             | Modern translation apps and GPS are godsends that make
             | travel a million times easier. And they're free! It blows
             | my mind. Traveling would be so much more incredibly
             | difficult without them.
        
           | mach1ne wrote:
           | Nah, full-on babelfish is simply not possible. The meaning of
           | the beginning of a sentence can be modified retroactively by
           | the end of the sentence. This means that the Babelfish must
           | either be greatly delayed or awkwardly correct itself every
           | once in a while.
        
             | akiselev wrote:
             | That's why the babelfish translates brainwaves rather than
             | sounds, which is especially important for communicating
             | with our nonverbal alien neighbors.
        
             | reaperman wrote:
             | Close enough for practical value. Yes, big downside. But
             | personally I'd use it.
        
               | mysterydip wrote:
               | Between the options of "you can't talk to this person"
               | and "your conversation will have some delays", I know
               | which I'd choose.
        
             | foota wrote:
             | I wonder if an esolang exists that could be a universal
             | target though (if a language can handle any ambiguity by
             | appending, then it could always be output without
             | backtracking).
        
             | matsemann wrote:
             | Ah, reminds me of learning German, where you can chuck all
             | the verbs onto the end. There was this sentence we had as a
             | fun toy example, where it was like half a paragraphs of
             | verbs in the end, and you had to try and match them up with
             | the beginning of the sentence.
             | 
             | Edit:found a reference to it https://www.reddit.com/r/Germa
             | n/comments/ul0xgt/just_for_fun...
        
             | MauranKilom wrote:
             | And sometimes that modification is not just "we don't know
             | which verb it ends in" but "the whole structure is
             | different than expected":
             | 
             | https://en.wikipedia.org/wiki/Garden-path_sentence
        
           | littlestymaar wrote:
           | AFAIK STT is still very bad without speaker-specific fine-
           | tuning, so it's not going to be a literal babelfish
           | (translating in the ear of the receiver), but it could make
           | you _speak_ many languages.
        
             | cma wrote:
             | Your interlocutor's earpiece could beam yours a delta for a
             | finetuned model of their voice before they open their
             | mouth. Except not compatible across iMessage users and
             | whatsapp users or some other predictable silicon valley
             | negative sum power play like that.
        
         | simonw wrote:
         | I loaded the language coverage into Datasette Lite and added
         | some facets here:
         | 
         | https://lite.datasette.io/?json=https://gist.github.com/simo...
         | 
         | Here's how I did that:
         | https://gist.github.com/simonw/63aa33ec827b093f9c6a2797df950...
         | 
         | Here are the top 20 represented language families:
         | Niger-Congo 1,019         Austronesian 609         Sino-Tibetan
         | 288         Indo-European 278         Afro-Asiatic 222
         | Trans-New Guinea 219         Otomanguean 149         Nilo-
         | Saharan 131         Austro-Asiatic 100         Dravidian 60
         | Australian 51         Creole 45         Kra-Dai 43         Uto-
         | Aztecan 41         Quechuan 36         Language isolate 35
         | Torricelli 32         Maipurean 31         Mayan 30
         | Sepik 30
        
           | re5i5tor wrote:
           | Thanks -- great way to visualize how massive this set of
           | languages really is.
        
           | jimmySixDOF wrote:
           | All this great work and the prior sota translations and Meta
           | still only accepts Nort American english voice control in
           | their VR equipment lol.
        
       | freediver wrote:
       | Come to think about it, Meta is a much better name for an AI
       | company than a VR company.
        
         | Hamuko wrote:
         | It's not VR, it's Metaverse; VR might involve experiences that
         | are not bullshit.
        
           | idiotsecant wrote:
           | The metaverse, if we define it as an information layer that
           | exists in parallel with the physical 'stuff' universe and is
           | a seamless, effortless, and essential part of what we
           | experience as reality _will_ be an enormous part of our
           | future. Meta might just be a few centuries ahead of the
           | curve, which is just as bad as being a few centuries late.
        
             | KaoruAoiShiho wrote:
             | Centuries really? I think Meta's going to be just on time.
        
               | jokethrowaway wrote:
               | I think we may skip the whole VR experience and just have
               | robots instead of virtual friends
        
               | KaoruAoiShiho wrote:
               | That's a large overlap rather than sequential.
        
             | barking_biscuit wrote:
             | If you define it like that, it's just the internet.
        
       | alienlid wrote:
       | if I have an extra MBP 16" 2020 hanging around 16GB Ram, Quadcore
       | i-7.... can I run this? I'd like to try TTS capabilities! LMK if
       | you've got any guides or instructions online I can checkout:)
        
       | leke wrote:
       | I was kind of hoping for Interlingue, but was surprised to not
       | even see Esperanto on the list.
        
         | explodingcamera wrote:
         | Esperanto is supported in their larger model
        
         | samstave wrote:
         | Random thought ; could Esperanto (as a crypto-language) be
         | turned into any sort of programming language. Could one,
         | conceivably program in Esperanto in any meaningful way?
        
           | int_19h wrote:
           | There's nothing particularly special about Esperanto in that
           | regard compared to natural languages that it is modelled
           | after.
        
           | felipesabino wrote:
           | I think you are more likely to do so in a more logical and
           | structured language like lojban
        
       | sigstoat wrote:
       | can any of these models be coerced into just doing straight
       | phonetic transcription? like spitting out IPA?
        
       | lee101 wrote:
       | [dead]
        
       | eigenvalue wrote:
       | I just wanted to test out the TTS locally on a powerful Ubuntu
       | 22.04 machine, but the process for setting it up seems pretty
       | broken and poorly documented. After 20 minutes of trying I
       | finally gave up since I couldn't get the VITS dependency to build
       | (despite having a fully updated machine with all required
       | compilers). It seems like they never really bother to see if the
       | stuff works on a fresh machine starting from scratch. Somehow for
       | my own projects I'm always able to start from a fresh git clone
       | and then directly install everything using this block of code:
       | 
       | ``` python3 -m venv venv source venv/bin/activate python3 -m pip
       | install --upgrade pip python3 -m pip install wheel pip install -r
       | requirements.txt ```
       | 
       | But whenever I try using these complicated ML models, it's
       | usually an exercise in futility and endless mucking around with
       | conda and other nonsense. It ends up not being worth it and I
       | just move on. But it does feel like it doesn't need to be like
       | this.
        
         | dragonwriter wrote:
         | Yeah, a lot of the releases from researcg groups are woefully
         | poorly documented.
         | 
         | Usually, some hero releases a friendly install system within a
         | few days, though.
        
       | qwertox wrote:
       | I would like to use stuff like this as a side-project. Buy a
       | Nvidia Geforce GPU and stick it into my 24/7 server and play
       | around with it in my free time, to see what can be done.
       | 
       | The issue with all these AI models is that there's no information
       | on which GPU is enough for which task. I'm absolutely clueless if
       | a single RTX 4000 SFF with its 20GB VRAM and only 70W of max
       | power usage will be a waste of money, or really something great
       | to do experiments on. Like do some ASR with Whisper, images with
       | Stable Diffusion or load a LLM onto it, or this project here from
       | Facebook.
       | 
       | Renting a GPU in the cloud doesn't seem to be a solution for this
       | use case, where you just want to let something run for a couple
       | of days and see if it's useful for something.
        
         | nharada wrote:
         | Wait why is renting a GPU in the cloud not a solution? You can
         | even try multiple options and see which ones are capable enough
         | for your use case.
         | 
         | Look into some barebones cloud GPU services, for example Lambda
         | Labs which is significantly cheaper than AWS/GCP but offers
         | basically nothing besides the machine with a GPU. You could
         | even try something like Vast in which people rent out their
         | personal GPU machines for cheap. Not something I'd use for
         | uhhh...basically anything corporate, but for a personal project
         | with no data security or uptime issues it would probably work
         | great.
        
           | fnord77 wrote:
           | "but offers basically nothing besides the machine with a GPU"
           | 
           | they must offer distributed storage, that can accommodate
           | massive models, though? how else would you have multiple GPUs
           | working on a single training model?
        
           | nomel wrote:
           | Besides some great tooling out there if you wanted to roll
           | your own, you can literally rent windows/linux computers,
           | with persistent disks. If you have good internet, you can
           | even use it as a gaming PC, as I do.
        
             | itake wrote:
             | Is there an easy way to off-board the persistent disk to
             | cheaper machines when you don't need the gpus?
             | 
             | Like imagine, setting up and installing everything with the
             | gpu attached, but when you're not using the gpu or all the
             | cpu cores, you can disconnect them.
             | 
             | If you have docs on how to do this, please let me know.
        
             | haliskerbas wrote:
             | What sort of frame rate and cost do you get? I have the
             | highest tier GeForce now subscription and it sometimes
             | drops to horrible conditions.
        
           | itake wrote:
           | My annoyance was managing state. I'd have to spend hours
           | installing tools, downloading data, updating code, then when
           | I want to go to bed I have to package it up and store as much
           | as I can on s3 before shutting off the $$ server.
        
             | ciberado wrote:
             | I've played a lot with Stable Diffusion using AWS spot
             | instances, mostly because it is the platform with which I'm
             | more familiar. The Terraform script[0] should be easy to
             | adapt to any other project of this kind.
             | 
             | Let me know if you are interested, and maybe we can find
             | time to work on it together :).
             | 
             | [0] https://github.com/ciberado/stable-diffusion-webui-
             | terraform...
        
             | fnordpiglet wrote:
             | aws s3 sync + image snapshot
        
             | sneak wrote:
             | This is what containers solve. Don't waste time manually
             | installing things. Store state in a database via the app on
             | a different host.
        
               | machinawhite wrote:
               | Well then you're wasting time managing your containers?
               | Have you ever used k8s, it's a full time job lol
        
               | kunwon1 wrote:
               | Speaking as someone who has encountered similar
               | difficulties, this response has strong 'Draw the rest of
               | the owl' vibes
        
               | sneak wrote:
               | Speaking as someone who has solved these difficulties
               | hundreds of times, "draw the rest of the owl" doesn't
               | tell you the specific things to google to get detailed
               | examples and tutorials on how millions of others have
               | sidestepped these repeated issues.
        
               | itake wrote:
               | Yep... you spend hours messing around with docker
               | containers and debugging all the weird build errors.
               | 
               | I am less familiar with storing data in a db (for ml
               | hosting concerns), but I'd imagine it would add overhead
               | (as opposed to accessing files on disk).
               | 
               | You also have to deal with hosting a db and configuring
               | the schema.
        
               | sneak wrote:
               | You "spend hours messing around" with everything you
               | don't know or understand at first. One could say the same
               | about writing the software itself. At its core
               | Dockerfiles are just shell scripts with worse syntax, so
               | it's not really that much more to learn. Once you get it
               | done once, you don't have to screw around with it
               | anymore, and you have it on any box you want in seconds.
               | 
               | In either case you have to spend hours screwing around
               | with your environment. If those hours result in a
               | Dockerfile, then it's the last time. If they don't, then
               | it's each time you want it on a new host (which as was
               | correctly pointed out a pain in the ass).
               | 
               | Storing data in a database vs in files on disk is like
               | application development 101 and is pretty much a required
               | skill period. It's required that you learn how to do this
               | because almost all applications revolve around storing
               | some kind of state and, as was noted, you can't
               | reasonably expect it to persist on the app server without
               | additional ops headaches.
               | 
               | Many people will host dbs for you without you having to
               | think about it. Schema is only required if you use a
               | structured db (which is advisable) but it doesn't take
               | that long.
        
               | NBJack wrote:
               | I applaud your experience, but honestly I agree with
               | parent: knowledge acquisition for a side project may not
               | be the best use of their time, especially if it
               | significantly impedes actually launching/finishing a
               | first iteration.
               | 
               | It's a similar situation for most apps/services/startup
               | ideas: you don't necessarily need a planet scale solution
               | in the beginning. Containers are great and solve lots of
               | problems, but they are not a panacea and come with their
               | own drawbacks. Anecdotally, I personally wanted to make a
               | small local 3 node Kubernetes cluster at one time on my
               | beefy hypervisor. By the time I learned the ins and outs
               | of Kubernetes networking, I lost momentum. It also didn't
               | end up giving me what I wanted out of it. Educational,
               | sure, but in the end not useful to me.
        
               | simonw wrote:
               | I'm having trouble imagining what data I would store in a
               | database as opposed to a filesystem if my goal is to
               | experiment with large models like Stable Diffusion.
        
         | syntaxing wrote:
         | I would recommend a 3090. It can handle everything a 4000
         | series can albeit slightly slower, has enough VRAM to handle
         | most things for fun, and can be bought for around $700.
        
         | [deleted]
        
         | ttt3ts wrote:
         | You can finetune whisper, stable diffusion, and LLM up to about
         | 15B parameters with 24GB VRAM.
         | 
         | Which leads you to what hardware to get. Best bang for the $
         | right now is definitely a used 3090 at ~$700. If you want more
         | than 24GB vram just rent the hardware as it will be cheaper.
         | 
         | If you're not willing to drop $700 don't buy anything just
         | rent. I have had decent luck with vast.ai
        
           | NBJack wrote:
           | There is the world of used Nvidia Teslas, like the M40. Very
           | cheap, but some assembly required.
        
       | sarabande wrote:
       | I'm trying to use this on a 3M mp3 file to test ASR with language
       | code deu, CPU only, and I keep getting this error -- are there
       | limits to the MMS inference?                 File
       | "fairseq/data/data_utils_fast.pyx", line 30, in
       | fairseq.data.data_utils_fast.batch_by_size_vec         assert
       | max_tokens <= 0 or np.max(num_tokens_vec) <= max_tokens, (
       | AssertionError: Sentences lengths should not exceed
       | max_tokens=4000000       Traceback (most recent call last):
       | File "/home/xxx/fairseq/examples/mms/asr/infer/mms_infer.py",
       | line 52, in <module>         process(args)       File
       | "/home/xxx/fairseq/examples/mms/asr/infer/mms_infer.py", line 44,
       | in process
        
       ___________________________________________________________________
       (page generated 2023-05-22 23:00 UTC)