[HN Gopher] Talk = GPT-2 and Whisper and WASM
       ___________________________________________________________________
        
       Talk = GPT-2 and Whisper and WASM
        
       Author : tomthe
       Score  : 171 points
       Date   : 2022-12-07 08:41 UTC (14 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | tomthe wrote:
       | This would of course be even more fun with ChatGPT, but it is a
       | nice and funny demo of their whisper.cpp library. The second
       | video is worth watching: https://user-
       | images.githubusercontent.com/1991296/202914175-...
        
         | dr_kiszonka wrote:
         | I think LaMBDA would be really fun. If you asked ChatGPT what
         | movies it likes, it would tell that it is a large language
         | model trained by OpenAI and it can't have opinions yada yada
         | yada.
        
           | pmontra wrote:
           | I understood that this limitation is circumvented with
           | prompts like
           | 
           | Imagine there is a guy that likes watching movies. Which ones
           | would he like most in 2022?
           | 
           | That context persists for a while.
        
         | sheeeep86 wrote:
         | It's interesting that the english language model is loaded and
         | it's clearly trying to pronounce things in a spanish way.
        
           | yuchi wrote:
           | Actually that's the italian voice
        
             | ggerganov wrote:
             | Correct, I had loaded randomly the "Italian" voice of the
             | Web Speech API.
        
       | Terretta wrote:
       | Listening to that demo, it's incredible how far we've come!
       | 
       | Or, not.
       | 
       | Racter was _commercially_ released for Mac in December 1985:
       | 
       |  _Racter strings together words according to "syntax directives",
       | and the illusion of coherence is increased by repeated re-use of
       | text variables. This gives the appearance that Racter can
       | actually have a conversation with the user that makes some sense,
       | unlike Eliza, which just spits back what you type at it. Of
       | course, such a program has not been written to perfection yet,
       | but Racter comes somewhat close._
       | 
       |  _Since some of the syntactical mistakes that Racter tends to
       | make cannot be avoided, the decision was made to market the game
       | in a humorous vein, which the marketing department at Mindscape
       | dubbed "tongue-in-chip software" and "artificial insanity"._
       | 
       | https://www.mobygames.com/game/macintosh/racter
       | 
       | https://www.myabandonware.com/game/racter-4m/play-4m
       | 
       | It's only amazing that chatGPT backed by GPT-3 is the _first
       | thing since then_ to do enough better that _everyone_ is engaged.
       | 
       | I owned that in 1985, and having studied AI/ML previously I've
       | been (and remain something of) an AGI skeptic. But now in 2022, I
       | finally think _"this changes everything"_ ... not because it 's
       | AI, but because it's making the application of matching
       | probabilistic patterns across mass knowledge practical and useful
       | for everyday work, particularly as a structured synthesis
       | assistant.
        
         | Centigonal wrote:
         | well, the AI Winter happened in the intervening years, so that
         | might help explain
         | 
         | https://en.wikipedia.org/wiki/AI_winter
        
         | make3 wrote:
         | GPT-2 is really by far massively stronger than anything in
         | 1985. I suggest that you try using https://chat.openai.com/chat
        
           | rozularen wrote:
           | OpenAI chat uses GPT-3 which, as some other user already
           | pointed out, is not even close to GPT-2 in terms of
           | generating text
        
             | stevenhuang wrote:
             | Technically GPT-3.5, it's a newer version
             | https://openai.com/blog/chatgpt/
        
       | Rickvst wrote:
       | I implemented whisper + chatgpt + pyttsx3 and it worked. But then
       | suddenly the chatgpt wrapper that I found on github stopped
       | working.
       | 
       | edit: whisper is awesome
        
         | localhost wrote:
         | It looks like the ChatGPT APIs that work well are the ones that
         | are implemented as a browser extension and reusing the bearer
         | token that you get by signing into ChatGPT from the same
         | browser. I'm guessing since you're using pyttsx3 that you wrote
         | a Python app instead and not in the browser?
        
         | lhuser123 wrote:
         | Cool. Would like to see that.
        
       | hanoz wrote:
       | What are some good things to try? I can't get any sense out of it
       | at all so far.
        
         | ggerganov wrote:
         | This is the smallest GPT-2 model so it usually generates
         | gibberish. Maybe some better prompting could improve the
         | results.
         | 
         | Currently, the strategy is to simply prepend 8 lines of text
         | (prompt/context) and keep appending every new transcribed line
         | at the end:
         | 
         | https://github.com/ggerganov/whisper.cpp/blob/master/example...
        
       | swyx wrote:
       | The total data that the page will have to load on startup
       | (probably using Fetch API) is:         - 74 MB for the Whisper
       | tiny.en model         - 240 MB for the GPT-2 small model
       | - Web Speech API is built-in in modern browsers
       | 
       | cool but im now wondering what it would take to bring this down
       | enough to put this in real apps? anyone talking about this?
        
         | justanotheratom wrote:
         | Perhaps it will be built-in to browsers soon
        
           | make3 wrote:
           | I don't see why they would ever package GPT2 (the bigger
           | model) in the browser.
           | 
           | Speech to text has higher chances though, that's an
           | interesting idea, as they already package text to speech too.
        
             | neltnerb wrote:
             | To be honest, I expect that in 10 years people will
             | regularly use these sorts of text generation tools in the
             | way text prediction and thesauruses and grammar checkers
             | and spellcheckers are used today but for bigger blocks of
             | text.
             | 
             | I can't really see why not anyway, as more things are in
             | the browser it makes sense to me to integrate the ability
             | to "AI check" your text like a grammar or spell checker to
             | improve your writing along some dimensions that you like.
             | 
             | It's not honest, but in kind of the same way that a
             | spellchecker isn't honest and since it's going to be
             | possible anyway I don't see what extra harm it causes to
             | make it accessible for everyone so that we can both
             | actually see an upside and also begin to recognize that
             | text we read is at this point likely to be at least
             | partially AI generated and potentially factually incorrect.
             | 
             | Even better if things like Firefox reader mode, one of my
             | favorite tools, can also do text summarization. Just
             | imagine the adversarial interaction between a tool designed
             | to generate confident sounding fluff and one to summarize
             | confident sounding fluff. Honestly it seems like a likely
             | inevitable future path.
             | 
             | It may as well be part of the browser where it stands a
             | better chance of keeping people's long term attention on
             | the ease of using these tools. Spammers will be able to do
             | it, fake journalists and such will be able to do it, better
             | if we can do it too so that at least we are aware of the
             | potential abuse.
        
               | visarga wrote:
               | We need much better models in browsers. The main reason
               | is to pass everything through the language model and get
               | polite and helpful responses. You never have to see
               | Google, the website or the ads ever again if you don't
               | want to. The QA model should be able to detect most
               | undesirable parts - spam, ads, fakes, factually incorrect
               | data. Something like chatGPT running locally. This is
               | important for privacy. If we run the model, we have a
               | safe creative space. If they run the model, they get
               | everything spilled out.
        
           | petercooper wrote:
           | Given Whisper is open source, I'd be surprised if it's not.
           | It would be cool for Web Speech API's SpeechRecognition to
           | simply use it, though that would make browser downloads a
           | little beefier.
        
             | globalise83 wrote:
             | It could easily be downloaded separately in the background
             | once the browser application is already up and running.
             | Would be great to have it in the browser though for sure.
        
         | CGamesPlay wrote:
         | Unfortunately these smaller models are also terrible at
         | performance, particularly the GPT-2 model small model is really
         | unsuitable for the task of generating text. The largest models
         | publicly available, which are nowhere near GPT-3 Da Vinci
         | level, are tens of GBs.
         | 
         | We may be able to reduce the size without sacrificing
         | performance, but that's an area of active research still.
        
         | addandsubtract wrote:
         | We can bring back pre-loading screens for webpages from the Web
         | 2.0 era.
        
           | unnouinceput wrote:
           | Isn't WEB 2.0 era current era? I mean WEB 3.0 era is in
           | relation to blockchains only, not the rest. The proponents of
           | "everything on blockchain" they actually want that for
           | everything (not that will ever work, but that's beyond our
           | discussion)
        
         | agolio wrote:
         | I really liked how the page tells you the size it is planning
         | to download, and prompts you before downloading.
         | 
         | Coming from a limited bandwidth contract, I hate when I click a
         | link and it instantly starts downloading a huge file.
         | 
         | Great work OP!
        
         | fulafel wrote:
         | Lots of web based apps load more data than this. The 300 MB is
         | only 3 seconds on a gigabit connection.
        
         | make3 wrote:
         | in real life the models are hosted on a server and you send the
         | text and sound and receive the model's output
        
         | arcturus17 wrote:
         | ~314mb is a lot for a web app but small for a desktop or even a
         | mobile app.
        
           | dormento wrote:
           | > ~314mb is a lot for a web app but small for a desktop or
           | even a mobile app.
           | 
           | Everyday we stray further from god's light :/
        
             | tjoff wrote:
             | Those 314 MB are justified though, which can hardly be said
             | for the typical app/homepage.
        
       | simonw wrote:
       | Anyone found a sentence that GPT-2 returns a good response for?
       | My experiments have been not great so far.
       | 
       | (LOVE this demo.)
        
       | bilater wrote:
       | I've been thinking of doing something like this but hooked up
       | with ChatGPT/GPT-3-daviinci003. Obviously model will not load in
       | the browser but we cna call the API. Could be a neat way to
       | interact with the bot.
        
       | atum47 wrote:
       | > whisper: number of tokens: 2, 'Hello?'         > gpt-2: I want
       | to have you on my lap.
       | 
       | this GPT-2 better chill
        
       | iandanforth wrote:
       | Technically this seems to work, and mad props to the author for
       | getting to this point. On my computer (MacBook Pro) it's very
       | slow but there are enough visual hints that it's thinking to make
       | the wait ok. I have plenty of complaints about the output but
       | most of that is GPT-2's problem.
        
       | boredemployee wrote:
       | offtopic but what are the real limitations of gpt2 vs gpt3? (i
       | know that gpt2 is free)
        
         | zwaps wrote:
         | It's almost the same model architecture, but GPT3 is much
         | better trained. GPT3 is coherent, while GPT2 is prone to
         | generating gibberish or getting stuck in a loop. The advantage
         | is pretty significant for longer generations.
         | 
         | That being said, neither GPT3 nor GPT2 are "efficient" models.
         | 
         | On the one hand, they use inefficient architectures - starting
         | with using a BPE Tokenizer, to having dense attention without
         | any modifications, to being a decoder only architecture etc.
         | Research has come up with many more fancy ideas on how to make
         | all this run better and with less compute. But there is a
         | reason why GPT2/3 are architecturally simple and inefficient:
         | we know how to train these models reliably (more or less) on
         | thousands of GPUs, whereas the same might not be true for more
         | modern and efficient implementations. For instance, when
         | training OPT, Facebook started using more fancy ideas but
         | finally ended up going back to GPT-3 esque basics, simply
         | because training on thousands of machines is a lot harder than
         | it seems in theory.
         | 
         | On the other hand, these models have far too many parameters
         | compared to the data they were trained on. You might say they
         | are undertrained - or they lean heavily on available compute to
         | make up for missing data. In any case, much smaller models
         | (like Chinchilla by DeepMind) match their performance with less
         | parameters (and hence compute or model size) by using more and
         | better data.
         | 
         | In closing, there are better models for edge devices. This
         | includes GPT clones like GPT-J in 8bit, or distilled version
         | thereof. Similarly, there is still a lot of gains that will
         | happen when all the numerous efficiency improvements get
         | implemented in a model that operates at the data/parameter
         | efficiency frontier.
         | 
         | Still, even when considering efficient models like Chinchilla
         | and then even more architecturally efficient versions thereof -
         | we are still talking about a lot of $$$ to train these models.
         | And so we are yet further from having OpenSource
         | implementations of these models than we are from someone (like
         | DeepMind) having them...
         | 
         | With time, you can expect to run coherent models on your edge
         | device. But not quite yet.
        
           | boredemployee wrote:
           | Thank you. Do you know any open source model that works
           | generating code from natural language? I tried salesforce
           | codex and it sucks big time.
        
             | zwaps wrote:
             | Interestingly, Code models are constrained even more by
             | difficulties of tokenization in light of - most crucially -
             | us not having actually that much code to train on (we
             | already train on all of of github, and it doesn't
             | "saturate" the model).
             | 
             | At this stage, we are back to improving model efficiency, I
             | think, especially for code models. But not there yet.
             | 
             | Sorry for the rambling, the actual answer is no I do not
             | have a really good codex type model in open source.. yet
        
               | boredemployee wrote:
               | I see. The Open AI code generator gave me really
               | impressive results for basic to intermediate questions in
               | the data analytics space. I think it's a function of the
               | context you give about the problem (aka what are the
               | literal meaning of the columns in the business context)
               | and how objective your question - to the model - is, plus
               | some other internal model variable that I'm completely
               | unaware of. But it's nice to have your input so I can
               | understand a little bit what happens under the hood!
        
         | mcbuilder wrote:
         | Size of the model is a big one. GPT-3 has over 10x as many
         | parameters for example. Training data would be another huge
         | one. Architecturally, they aren't that different if I recall
         | correctly, it's a decoder stack of transformer like self-
         | attention. Real world capability has GPT-3 giving much better
         | answers, it was a big step up from GPT-2.
        
           | namrog84 wrote:
           | So how 'big' is GPT-3?
           | 
           | Is it anywhere near being able to be run on local consumer
           | hardware?
           | 
           | How long until we can have the GPT3 or 3.5 chatbot locally
           | like we have StableDiffusion locally for image generation?
           | 
           | I've been spoiled by having it accessible offline and with
           | community built support/modifications to it. GPT-3 is super
           | neat but feels like too many guard rails or the custom
           | playground is too pricey.
        
           | boredemployee wrote:
           | got it. thanks! is there any application that gpt2 would be
           | enough and could work as well as gpt3?
        
       | rahimnathwani wrote:
       | I'm curious how they chose between:
       | 
       | A) ggml
       | https://github.com/ggerganov/ggml/tree/master/examples/gpt-2
       | 
       | B) Fabrice Bellard's GPT2C https://bellard.org/libnc/gpt2tc.html
        
         | ggerganov wrote:
         | Hey author here - I implemented `ggml` as a learning exercise.
         | It allows me to easily port it to WebAssembly or iOS for
         | example.
        
           | rahimnathwani wrote:
           | Oops - I didn't spot it was your own libary! Kudos!
        
       ___________________________________________________________________
       (page generated 2022-12-07 23:01 UTC)