Post ATcHCvJPrm1hClxFey by simon@fedi.simonwillison.net
 (DIR) More posts by simon@fedi.simonwillison.net
 (DIR) Post #ATaAdrbye4F1YEcEEq by simon@fedi.simonwillison.net
       2023-03-13T19:20:41Z
       
       1 likes, 2 repeats
       
       Wrote about how the world of large language models you can run on your own personal devices continues to accelerate at break-neck speed:Stanford Alpaca, and the acceleration of on-device large language model developmenthttps://simonwillison.net/2023/Mar/13/alpaca/
       
 (DIR) Post #ATaBxtvJrJj2T2mUV6 by adr@mastodon.social
       2023-03-13T19:35:18Z
       
       0 likes, 0 repeats
       
       @simon buddy, thank you for your reporting on this! I would never have seen (well, not this quickly, anyway) llama.cpp without you. I *love* whisper.cpp and the fact that dude is doing llama is just lovely. Now this stanford thing! Gosh.
       
 (DIR) Post #ATaC90nKeVjCxnLLCy by TedUnderwood@sigmoid.social
       2023-03-13T19:39:22Z
       
       0 likes, 0 repeats
       
       @simon Your interpretation of these fast-moving events is very helpful: thanks!
       
 (DIR) Post #ATaCino6Pvpr6llmz2 by simon@fedi.simonwillison.net
       2023-03-13T19:44:00Z
       
       0 likes, 0 repeats
       
       Here are what I think are the most important takeaway points
       
 (DIR) Post #ATaFFSQB9MzeFIHKhE by moof@cupoftea.social
       2023-03-13T20:12:18Z
       
       0 likes, 0 repeats
       
       @simon I think that the other thing that’s exciting about this stuff is that it’s an AI model for programmers who haven’t boned up on how to write these models. I don’t need to know how to code and train LLaMa, etc to use it, and it has or will have things like API endpoints that I can send data to and get useful examples back from it. It’s a new paradigm, yes, but it’s akin to learning a new tool, rather than a whole new way of thinking.
       
 (DIR) Post #ATaFg8tRrXBTV2h3x2 by koen_hufkens@mastodon.social
       2023-03-13T20:17:08Z
       
       0 likes, 0 repeats
       
       @simon Yep, and with this. If you fine tune it on your own writing can you then use it as such, as your own writing. :thinkerguns:
       
 (DIR) Post #ATaGLYRDnCSwb9EmTw by ummjackson@mastodon.social
       2023-03-13T20:22:13Z
       
       0 likes, 0 repeats
       
       @simon The velocity of this stuff is impressive, wow. Each day, something new. Interested to see Alpaca running locally with llama.cpp.
       
 (DIR) Post #ATaQXickkK4xCq4Pwm by numist@xoxo.zone
       2023-03-13T22:18:52Z
       
       0 likes, 0 repeats
       
       @simon I saw 19GiB RAM usage with the 31B model (laptop only has 24 so I didn't try 65B). pretty impressive!
       
 (DIR) Post #ATaShlWl0SX90yBTxQ by jason@logoff.website
       2023-03-13T22:43:11Z
       
       0 likes, 0 repeats
       
       @simon how is this possible all of the commentators and naysayers claim this takes more power than cryptoOr maybe that’s just FUD?
       
 (DIR) Post #ATaSuoYV3jcaAeDnUW by simon@fedi.simonwillison.net
       2023-03-13T22:45:39Z
       
       0 likes, 1 repeats
       
       @jason yeah comparing power usage of AI models to crypto never made a lot of sense to meBig AI models do take a LOT of energy to train - but once trained, anyone with a copy can run them in perpetuity on much less expensive hardware (apparently now even on a phone)Crypto mining wastes energy by design: its a competition, so no matter how much energy you burn you always have to keep increasing that to beat the other miners you are competing against
       
 (DIR) Post #ATaTIfntFXjByfnTJg by jason@logoff.website
       2023-03-13T22:47:46Z
       
       0 likes, 0 repeats
       
       @simon the framing I saw was per-query, which made no sense to me, unless they know how often these things are retrained, and how many queries they’re getting.Steady-state it seems like a crazy claim.
       
 (DIR) Post #ATaTSUsuhWbT0bpaCG by simon@fedi.simonwillison.net
       2023-03-13T22:47:50Z
       
       0 likes, 0 repeats
       
       @jason likewise, AI models reward optimization: as new techniques come along they get cheaper to train and runCrypto mining has technological efficiency gains too... but because proof of work is really a competition to burn as much energy as possible they don't actually result in reduced energy consumption, just temporarily higher mining yields until other miners catch up
       
 (DIR) Post #ATaTbiBmbZyWsU3DNo by jason@logoff.website
       2023-03-13T22:48:15Z
       
       0 likes, 0 repeats
       
       @simon and yeah that characteristic difference was enough for me. In one, the waste is the point. In the other, optimization is a huge concern.
       
 (DIR) Post #ATajqptLqbAwNTfBmC by simon@fedi.simonwillison.net
       2023-03-14T01:55:10Z
       
       0 likes, 0 repeats
       
       Updated my post about Alpaca with one more bonus section: those 52,000 examples they used to fine-tune their model? They generated them using a prompt to GPT-3 (and $500 of OpenID credit spending)!https://simonwillison.net/2023/Mar/13/alpaca/#bonus-training-data
       
 (DIR) Post #ATam5eklk3RI5IqMqm by stemcoding@mastodon.social
       2023-03-14T02:20:23Z
       
       0 likes, 0 repeats
       
       @simon @jason Except for proof of stake crypto like Ethereum
       
 (DIR) Post #ATazrrynAY9iPPit2O by smy20011@m.cmx.im
       2023-03-14T04:54:48Z
       
       0 likes, 0 repeats
       
       @simon The cost of creating chatgpt went from x millions to x hundreds in a month.
       
 (DIR) Post #ATbuEGt8pm3bSyL3ho by simon@fedi.simonwillison.net
       2023-03-14T15:26:25Z
       
       0 likes, 0 repeats
       
       Today in language models on your personal device news... llama.cpp has now been seen running the 7B model (the same sized model that Alpaca is based on) at 1s per token on a Pixel 5 phone!https://twitter.com/ggerganov/status/1635605532726681600Yesterday it was 26s/token on a Pixel 6
       
 (DIR) Post #ATbuPQNrs0DQ6N1r28 by nevali@troet.cafe
       2023-03-14T15:28:04Z
       
       0 likes, 0 repeats
       
       @simon next-gen SoCs will presumably make mincemeat of it
       
 (DIR) Post #ATbubMtKgH4ykFRQoa by matt@toot.cafe
       2023-03-14T15:29:04Z
       
       0 likes, 0 repeats
       
       @simon Wow, 1s per token is close to fast enough for real-time speech. Except, of course, that text-to-speech engines want a full sentence at a time.
       
 (DIR) Post #ATbxsVTGKqxE8h6xIu by memeticist@jorts.horse
       2023-03-14T16:06:54Z
       
       0 likes, 0 repeats
       
       @simon so i started putting cows into my cow feedwhat could possibly go wrong
       
 (DIR) Post #ATbyEMDVQGT3oDA0Mi by marcel@waldvogel.family
       2023-03-14T16:09:57Z
       
       0 likes, 0 repeats
       
       @simon The "instrucitons" is in the original prompt or a copy/paste problem? (I doubt that it was introduced to add to the diversity, as requested in item 1 🤔)
       
 (DIR) Post #ATbyPMgU1csaRZxmDY by simon@fedi.simonwillison.net
       2023-03-14T16:11:21Z
       
       0 likes, 0 repeats
       
       @marcel yeah I spotted that - turns out spelling doesn't matter very much in longer prompts
       
 (DIR) Post #ATcD30LHuSx25qqRyC by piccolbo@toot.community
       2023-03-14T18:55:09Z
       
       0 likes, 0 repeats
       
       @simon The alpaca demo seems to run in 27s independent of the number of tokens I provide, from 10 to 1000. How's that possible? It also failed badly each test I tried (factorization, bio, summarization)
       
 (DIR) Post #ATcEL2P55KJpeWmXmS by simon@fedi.simonwillison.net
       2023-03-14T19:11:43Z
       
       0 likes, 0 repeats
       
       @piccolbo I think the time it takes to calculate each output token is fixed - the context window defines the number of calculations it has to make for each token, and if you only give it 10 tokens it still runs calculations for the full window size but with the equivalent of "null" for the remainder of tokens you didn't provide
       
 (DIR) Post #ATcEiBvHrFjTpxt1hw by piccolbo@toot.community
       2023-03-14T19:14:44Z
       
       0 likes, 0 repeats
       
       @simon OK, and if I provide like 500 tokens? Why doesn't the time go up?
       
 (DIR) Post #ATcHCurlWZuNp11AYq by piccolbo@toot.community
       2023-03-14T19:12:42Z
       
       0 likes, 0 repeats
       
       @simon I keep getting bad answers from Alpaca demo. For example "The prime factorization of 16 is" "16 = 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1 × 1  ..."
       
 (DIR) Post #ATcHCvJPrm1hClxFey by simon@fedi.simonwillison.net
       2023-03-14T19:42:34Z
       
       0 likes, 0 repeats
       
       @piccolbo Langauge models are bad at math generally, and I'd expect Alpaca to be particularly poor there since it's only the 7B model size