[HN Gopher] Talk = GPT-2 and Whisper and WASM
___________________________________________________________________
Talk = GPT-2 and Whisper and WASM
Author : tomthe
Score : 171 points
Date : 2022-12-07 08:41 UTC (14 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| tomthe wrote:
| This would of course be even more fun with ChatGPT, but it is a
| nice and funny demo of their whisper.cpp library. The second
| video is worth watching: https://user-
| images.githubusercontent.com/1991296/202914175-...
| dr_kiszonka wrote:
| I think LaMBDA would be really fun. If you asked ChatGPT what
| movies it likes, it would tell that it is a large language
| model trained by OpenAI and it can't have opinions yada yada
| yada.
| pmontra wrote:
| I understood that this limitation is circumvented with
| prompts like
|
| Imagine there is a guy that likes watching movies. Which ones
| would he like most in 2022?
|
| That context persists for a while.
| sheeeep86 wrote:
| It's interesting that the english language model is loaded and
| it's clearly trying to pronounce things in a spanish way.
| yuchi wrote:
| Actually that's the italian voice
| ggerganov wrote:
| Correct, I had loaded randomly the "Italian" voice of the
| Web Speech API.
| Terretta wrote:
| Listening to that demo, it's incredible how far we've come!
|
| Or, not.
|
| Racter was _commercially_ released for Mac in December 1985:
|
| _Racter strings together words according to "syntax directives",
| and the illusion of coherence is increased by repeated re-use of
| text variables. This gives the appearance that Racter can
| actually have a conversation with the user that makes some sense,
| unlike Eliza, which just spits back what you type at it. Of
| course, such a program has not been written to perfection yet,
| but Racter comes somewhat close._
|
| _Since some of the syntactical mistakes that Racter tends to
| make cannot be avoided, the decision was made to market the game
| in a humorous vein, which the marketing department at Mindscape
| dubbed "tongue-in-chip software" and "artificial insanity"._
|
| https://www.mobygames.com/game/macintosh/racter
|
| https://www.myabandonware.com/game/racter-4m/play-4m
|
| It's only amazing that chatGPT backed by GPT-3 is the _first
| thing since then_ to do enough better that _everyone_ is engaged.
|
| I owned that in 1985, and having studied AI/ML previously I've
| been (and remain something of) an AGI skeptic. But now in 2022, I
| finally think _"this changes everything"_ ... not because it 's
| AI, but because it's making the application of matching
| probabilistic patterns across mass knowledge practical and useful
| for everyday work, particularly as a structured synthesis
| assistant.
| Centigonal wrote:
| well, the AI Winter happened in the intervening years, so that
| might help explain
|
| https://en.wikipedia.org/wiki/AI_winter
| make3 wrote:
| GPT-2 is really by far massively stronger than anything in
| 1985. I suggest that you try using https://chat.openai.com/chat
| rozularen wrote:
| OpenAI chat uses GPT-3 which, as some other user already
| pointed out, is not even close to GPT-2 in terms of
| generating text
| stevenhuang wrote:
| Technically GPT-3.5, it's a newer version
| https://openai.com/blog/chatgpt/
| Rickvst wrote:
| I implemented whisper + chatgpt + pyttsx3 and it worked. But then
| suddenly the chatgpt wrapper that I found on github stopped
| working.
|
| edit: whisper is awesome
| localhost wrote:
| It looks like the ChatGPT APIs that work well are the ones that
| are implemented as a browser extension and reusing the bearer
| token that you get by signing into ChatGPT from the same
| browser. I'm guessing since you're using pyttsx3 that you wrote
| a Python app instead and not in the browser?
| lhuser123 wrote:
| Cool. Would like to see that.
| hanoz wrote:
| What are some good things to try? I can't get any sense out of it
| at all so far.
| ggerganov wrote:
| This is the smallest GPT-2 model so it usually generates
| gibberish. Maybe some better prompting could improve the
| results.
|
| Currently, the strategy is to simply prepend 8 lines of text
| (prompt/context) and keep appending every new transcribed line
| at the end:
|
| https://github.com/ggerganov/whisper.cpp/blob/master/example...
| swyx wrote:
| The total data that the page will have to load on startup
| (probably using Fetch API) is: - 74 MB for the Whisper
| tiny.en model - 240 MB for the GPT-2 small model
| - Web Speech API is built-in in modern browsers
|
| cool but im now wondering what it would take to bring this down
| enough to put this in real apps? anyone talking about this?
| justanotheratom wrote:
| Perhaps it will be built-in to browsers soon
| make3 wrote:
| I don't see why they would ever package GPT2 (the bigger
| model) in the browser.
|
| Speech to text has higher chances though, that's an
| interesting idea, as they already package text to speech too.
| neltnerb wrote:
| To be honest, I expect that in 10 years people will
| regularly use these sorts of text generation tools in the
| way text prediction and thesauruses and grammar checkers
| and spellcheckers are used today but for bigger blocks of
| text.
|
| I can't really see why not anyway, as more things are in
| the browser it makes sense to me to integrate the ability
| to "AI check" your text like a grammar or spell checker to
| improve your writing along some dimensions that you like.
|
| It's not honest, but in kind of the same way that a
| spellchecker isn't honest and since it's going to be
| possible anyway I don't see what extra harm it causes to
| make it accessible for everyone so that we can both
| actually see an upside and also begin to recognize that
| text we read is at this point likely to be at least
| partially AI generated and potentially factually incorrect.
|
| Even better if things like Firefox reader mode, one of my
| favorite tools, can also do text summarization. Just
| imagine the adversarial interaction between a tool designed
| to generate confident sounding fluff and one to summarize
| confident sounding fluff. Honestly it seems like a likely
| inevitable future path.
|
| It may as well be part of the browser where it stands a
| better chance of keeping people's long term attention on
| the ease of using these tools. Spammers will be able to do
| it, fake journalists and such will be able to do it, better
| if we can do it too so that at least we are aware of the
| potential abuse.
| visarga wrote:
| We need much better models in browsers. The main reason
| is to pass everything through the language model and get
| polite and helpful responses. You never have to see
| Google, the website or the ads ever again if you don't
| want to. The QA model should be able to detect most
| undesirable parts - spam, ads, fakes, factually incorrect
| data. Something like chatGPT running locally. This is
| important for privacy. If we run the model, we have a
| safe creative space. If they run the model, they get
| everything spilled out.
| petercooper wrote:
| Given Whisper is open source, I'd be surprised if it's not.
| It would be cool for Web Speech API's SpeechRecognition to
| simply use it, though that would make browser downloads a
| little beefier.
| globalise83 wrote:
| It could easily be downloaded separately in the background
| once the browser application is already up and running.
| Would be great to have it in the browser though for sure.
| CGamesPlay wrote:
| Unfortunately these smaller models are also terrible at
| performance, particularly the GPT-2 model small model is really
| unsuitable for the task of generating text. The largest models
| publicly available, which are nowhere near GPT-3 Da Vinci
| level, are tens of GBs.
|
| We may be able to reduce the size without sacrificing
| performance, but that's an area of active research still.
| addandsubtract wrote:
| We can bring back pre-loading screens for webpages from the Web
| 2.0 era.
| unnouinceput wrote:
| Isn't WEB 2.0 era current era? I mean WEB 3.0 era is in
| relation to blockchains only, not the rest. The proponents of
| "everything on blockchain" they actually want that for
| everything (not that will ever work, but that's beyond our
| discussion)
| agolio wrote:
| I really liked how the page tells you the size it is planning
| to download, and prompts you before downloading.
|
| Coming from a limited bandwidth contract, I hate when I click a
| link and it instantly starts downloading a huge file.
|
| Great work OP!
| fulafel wrote:
| Lots of web based apps load more data than this. The 300 MB is
| only 3 seconds on a gigabit connection.
| make3 wrote:
| in real life the models are hosted on a server and you send the
| text and sound and receive the model's output
| arcturus17 wrote:
| ~314mb is a lot for a web app but small for a desktop or even a
| mobile app.
| dormento wrote:
| > ~314mb is a lot for a web app but small for a desktop or
| even a mobile app.
|
| Everyday we stray further from god's light :/
| tjoff wrote:
| Those 314 MB are justified though, which can hardly be said
| for the typical app/homepage.
| simonw wrote:
| Anyone found a sentence that GPT-2 returns a good response for?
| My experiments have been not great so far.
|
| (LOVE this demo.)
| bilater wrote:
| I've been thinking of doing something like this but hooked up
| with ChatGPT/GPT-3-daviinci003. Obviously model will not load in
| the browser but we cna call the API. Could be a neat way to
| interact with the bot.
| atum47 wrote:
| > whisper: number of tokens: 2, 'Hello?' > gpt-2: I want
| to have you on my lap.
|
| this GPT-2 better chill
| iandanforth wrote:
| Technically this seems to work, and mad props to the author for
| getting to this point. On my computer (MacBook Pro) it's very
| slow but there are enough visual hints that it's thinking to make
| the wait ok. I have plenty of complaints about the output but
| most of that is GPT-2's problem.
| boredemployee wrote:
| offtopic but what are the real limitations of gpt2 vs gpt3? (i
| know that gpt2 is free)
| zwaps wrote:
| It's almost the same model architecture, but GPT3 is much
| better trained. GPT3 is coherent, while GPT2 is prone to
| generating gibberish or getting stuck in a loop. The advantage
| is pretty significant for longer generations.
|
| That being said, neither GPT3 nor GPT2 are "efficient" models.
|
| On the one hand, they use inefficient architectures - starting
| with using a BPE Tokenizer, to having dense attention without
| any modifications, to being a decoder only architecture etc.
| Research has come up with many more fancy ideas on how to make
| all this run better and with less compute. But there is a
| reason why GPT2/3 are architecturally simple and inefficient:
| we know how to train these models reliably (more or less) on
| thousands of GPUs, whereas the same might not be true for more
| modern and efficient implementations. For instance, when
| training OPT, Facebook started using more fancy ideas but
| finally ended up going back to GPT-3 esque basics, simply
| because training on thousands of machines is a lot harder than
| it seems in theory.
|
| On the other hand, these models have far too many parameters
| compared to the data they were trained on. You might say they
| are undertrained - or they lean heavily on available compute to
| make up for missing data. In any case, much smaller models
| (like Chinchilla by DeepMind) match their performance with less
| parameters (and hence compute or model size) by using more and
| better data.
|
| In closing, there are better models for edge devices. This
| includes GPT clones like GPT-J in 8bit, or distilled version
| thereof. Similarly, there is still a lot of gains that will
| happen when all the numerous efficiency improvements get
| implemented in a model that operates at the data/parameter
| efficiency frontier.
|
| Still, even when considering efficient models like Chinchilla
| and then even more architecturally efficient versions thereof -
| we are still talking about a lot of $$$ to train these models.
| And so we are yet further from having OpenSource
| implementations of these models than we are from someone (like
| DeepMind) having them...
|
| With time, you can expect to run coherent models on your edge
| device. But not quite yet.
| boredemployee wrote:
| Thank you. Do you know any open source model that works
| generating code from natural language? I tried salesforce
| codex and it sucks big time.
| zwaps wrote:
| Interestingly, Code models are constrained even more by
| difficulties of tokenization in light of - most crucially -
| us not having actually that much code to train on (we
| already train on all of of github, and it doesn't
| "saturate" the model).
|
| At this stage, we are back to improving model efficiency, I
| think, especially for code models. But not there yet.
|
| Sorry for the rambling, the actual answer is no I do not
| have a really good codex type model in open source.. yet
| boredemployee wrote:
| I see. The Open AI code generator gave me really
| impressive results for basic to intermediate questions in
| the data analytics space. I think it's a function of the
| context you give about the problem (aka what are the
| literal meaning of the columns in the business context)
| and how objective your question - to the model - is, plus
| some other internal model variable that I'm completely
| unaware of. But it's nice to have your input so I can
| understand a little bit what happens under the hood!
| mcbuilder wrote:
| Size of the model is a big one. GPT-3 has over 10x as many
| parameters for example. Training data would be another huge
| one. Architecturally, they aren't that different if I recall
| correctly, it's a decoder stack of transformer like self-
| attention. Real world capability has GPT-3 giving much better
| answers, it was a big step up from GPT-2.
| namrog84 wrote:
| So how 'big' is GPT-3?
|
| Is it anywhere near being able to be run on local consumer
| hardware?
|
| How long until we can have the GPT3 or 3.5 chatbot locally
| like we have StableDiffusion locally for image generation?
|
| I've been spoiled by having it accessible offline and with
| community built support/modifications to it. GPT-3 is super
| neat but feels like too many guard rails or the custom
| playground is too pricey.
| boredemployee wrote:
| got it. thanks! is there any application that gpt2 would be
| enough and could work as well as gpt3?
| rahimnathwani wrote:
| I'm curious how they chose between:
|
| A) ggml
| https://github.com/ggerganov/ggml/tree/master/examples/gpt-2
|
| B) Fabrice Bellard's GPT2C https://bellard.org/libnc/gpt2tc.html
| ggerganov wrote:
| Hey author here - I implemented `ggml` as a learning exercise.
| It allows me to easily port it to WebAssembly or iOS for
| example.
| rahimnathwani wrote:
| Oops - I didn't spot it was your own libary! Kudos!
___________________________________________________________________
(page generated 2022-12-07 23:01 UTC)