Post ATXfl9uZWFrtRIOMQS by danieldekay@masto.ai
 (DIR) More posts by danieldekay@masto.ai
 (DIR) Post #ATW1LYDEM7P3xBOTa4 by simon@fedi.simonwillison.net
       2023-03-11T19:17:47Z
       
       0 likes, 1 repeats
       
       OK, I'm calling it: Large language models are having their Stable Diffusion moment right nowhttps://simonwillison.net/2023/Mar/11/llama/
       
 (DIR) Post #ATW1WOZ6Z2BvDcnQyO by simon@fedi.simonwillison.net
       2023-03-11T19:18:36Z
       
       0 likes, 0 repeats
       
       This post is a little bit breathless, and I won't apologize for that: the ability to run a GPT-3 class large language model on a personal laptop is a HUGE dealIf you thought this AI stuff was weird already, it's about to get so much weirder
       
 (DIR) Post #ATW1kdXXgWbXiTr9d2 by danhon@dan.mastohon.com
       2023-03-11T19:22:13Z
       
       0 likes, 0 repeats
       
       @simon not *entirely* yet, because you'd still have to pirate the LLAMA models?
       
 (DIR) Post #ATW1xbvR01RGTPDaUa by admin@mastodon.adtension.com
       2023-03-11T19:24:34Z
       
       0 likes, 0 repeats
       
       @simon GPT-4 about to get launched, keep up buddy
       
 (DIR) Post #ATW2A8lMjg43nBU1Am by simon@fedi.simonwillison.net
       2023-03-11T19:26:07Z
       
       0 likes, 0 repeats
       
       @danhon Yeah you won't be able to legally build commercial stuff on top of the models (unlike Stable Diffusion) - but I'm more excited about:- We know it's possible to run a GPT-3 class language model on a MacBook now- Hackers can start tinkering. llama.cpp is just the start of this - I want to see what the LLM equivalent of ControlNet for Stable Diffusion is going to be
       
 (DIR) Post #ATW2LUQSXBe6AD7GKG by mull@mas.to
       2023-03-11T19:26:14Z
       
       0 likes, 0 repeats
       
       @simon Enjoying your LLM experimentation journey! Very informative posts.
       
 (DIR) Post #ATW2WyzrDF1EAB9aLo by simon@fedi.simonwillison.net
       2023-03-11T19:26:47Z
       
       0 likes, 0 repeats
       
       @admin Not as exciting as LLaMA on a laptop. Stable Diffusion wasn't the best image generation model when it came out, but the fact that people could actually tinker with its inner workings is what made it so interesting.
       
 (DIR) Post #ATW2kc3zH7PH5gaJbU by nat@ruby.social
       2023-03-11T19:30:07Z
       
       0 likes, 0 repeats
       
       @simon The big thing I'm worried about personally is that for a while a bunch of services are going to be hot trash while companies cram poorly-considered AI driven features into them.But overall yeah I'm with you that overall these things are going to be very good specifically because of how high of leverage they can provide when used deliberately as tools, by folks who intend to be using them, to build stuff. So this is very exciting!
       
 (DIR) Post #ATW2kdqwcaA6doUvYG by nat@ruby.social
       2023-03-11T19:31:33Z
       
       0 likes, 0 repeats
       
       @simon Hmmmm... and this might actually enable me to build a thing that I've been wanting... that uses a model to, basically, search man pages... hmmmmmm...
       
 (DIR) Post #ATW36B1OUqkpaoecgS by simon@fedi.simonwillison.net
       2023-03-11T19:31:54Z
       
       0 likes, 0 repeats
       
       @danhon I also have a hunch that the LLaMA paper has enough information in that organizations outside of Facebook would be able to train their own LLaMA alternatives with maybe less than $1m of training costs
       
 (DIR) Post #ATW3HGv8u6YF4f1LSS by admin@mastodon.adtension.com
       2023-03-11T19:33:11Z
       
       0 likes, 0 repeats
       
       @simon good answeri thought next week is new launch
       
 (DIR) Post #ATW3WJLcA9VKweZe76 by shiftingedges@hachyderm.io
       2023-03-11T19:40:09Z
       
       0 likes, 0 repeats
       
       @simon I really wonder how social media will survive this. But I applaud the work you’re doing to educate folks and promote healthy use cases.
       
 (DIR) Post #ATW5MvYPSQ04S7DCcK by nielsa@mas.to
       2023-03-11T20:00:17Z
       
       0 likes, 0 repeats
       
       @simon Very cool! I think you're right that we'll see all sorts of new innovation start to take place in this space, this does feel like a significant inflection point
       
 (DIR) Post #ATW5aPThVmyD1bRFLM by simon@fedi.simonwillison.net
       2023-03-11T20:02:36Z
       
       0 likes, 0 repeats
       
       @nielsa Yes! Inflection point is the term I couldn't quite put my finger on
       
 (DIR) Post #ATW9assqa9Sy2cgLmy by simon@fedi.simonwillison.net
       2023-03-11T20:50:00Z
       
       0 likes, 0 repeats
       
       Added this section:https://simonwillison.net/2023/Mar/11/llama/#what-to-look-for-next
       
 (DIR) Post #ATW9oc1o2zHXoqQ1Oy by dahukanna@mastodon.social
       2023-03-11T20:52:43Z
       
       0 likes, 0 repeats
       
       @simon now we are talking!Personally trained LLMs.
       
 (DIR) Post #ATWAkVbM6DEsgPeaae by corbin@defcon.social
       2023-03-11T21:02:41Z
       
       0 likes, 0 repeats
       
       @simon I think that your reaction is reasonable, but e.g. https://huggingface.co/bigscience/bloom is already licensed appropriately. Perhaps the real innovation here is the use of 4-bit weights; I don't think anybody anticipated that a large model would still be cogent at reduced fidelity.
       
 (DIR) Post #ATWAwxvwaSPu1pDvrU by simon@fedi.simonwillison.net
       2023-03-11T21:04:41Z
       
       0 likes, 0 repeats
       
       @corbin Have you been able to run Bloom on personal hardware? I've not tried
       
 (DIR) Post #ATWCi4EF6E6X7Jo3Si by lukasb@hachyderm.io
       2023-03-11T21:23:01Z
       
       0 likes, 0 repeats
       
       @simon wowwww. Fuel on the fire. Excited to see what gets built on Stability's presumed open LLM.This will really incentivize moving to software using open protocols that can be hooked into new LLM-based apps. The giants will try to build their own versions to keep people in the garden. Will they be able to keep up?
       
 (DIR) Post #ATWO99BprRDxonnOnw by ummjackson@mastodon.social
       2023-03-11T23:33:20Z
       
       0 likes, 0 repeats
       
       @simon How are you finding the output quality of the 13B model? Considering getting it spun up on my Windows machine with some PRs I just saw popping up on the llama.cpp repo.
       
 (DIR) Post #ATWOiVb8XXuXS2UYZE by simon@fedi.simonwillison.net
       2023-03-11T23:39:43Z
       
       0 likes, 0 repeats
       
       @ummjackson Not great so far - the problem is it hasn't been instruction tuned, so you need to work a lot harder at prompting it. There are some useful tips in the FAQ: https://github.com/facebookresearch/llama/blob/main/FAQ.md#2-generations-are-bad
       
 (DIR) Post #ATWQ4E5OQhv7SoZ0HQ by corbin@defcon.social
       2023-03-11T23:54:38Z
       
       0 likes, 0 repeats
       
       @simon Yep. I have a small harness (<100 lines of Python using Huggingface's Transformers) and I can choose from about half a dozen working models, including BLOOM. My current platform is a last-gen Intel CPU with about 16GiB RAM free.There's also Petals, for the specific case of BLOOM, but I prefer my privacy. I think that Petals buy-in starts at 10GiB RAM.
       
 (DIR) Post #ATWUkHVkU9Bc48CTWi by ummjackson@mastodon.social
       2023-03-12T00:47:06Z
       
       0 likes, 0 repeats
       
       @simon Very interesting. I guess this is why the instruction tuned models will likely remain proprietary and hidden behind APIs - that's the secret, monetizable sauce.
       
 (DIR) Post #ATWUvXj0XiX1XNlv6G by simon@fedi.simonwillison.net
       2023-03-12T00:47:57Z
       
       0 likes, 0 repeats
       
       @ummjackson yeah I suspect OpenAI have a huge number of important tricks that they haven't shared in any papers yet
       
 (DIR) Post #ATWWmA5NGOgRWWt7WS by smy20011@m.cmx.im
       2023-03-12T01:09:51Z
       
       0 likes, 0 repeats
       
       @simon Top of HN again! Congrats!
       
 (DIR) Post #ATX3UsB1tLTxXqtfyi by djvdq@mastodon.social
       2023-03-12T07:16:31Z
       
       0 likes, 0 repeats
       
       @simon IMO BigScience BLOOM was the SD moment of LLM. LLaMa isn't really free, it's just leaked. And even if it's better than Bloom, then Bloom is still truly open source.
       
 (DIR) Post #ATXDXOJ0PMvjJLT1sW by koen_hufkens@mastodon.social
       2023-03-12T09:09:03Z
       
       0 likes, 0 repeats
       
       @simon Thanks for sharing. I was looking forward to this moment, and not at the same time. I also wonder if we will see many more leaks in the future and if an how these models will fit on consumer hardware. A lot of #VC is betting on big players cornering the market, not on users having free access.
       
 (DIR) Post #ATXK5Oxn4IJevE92CO by wehlutyk@mastodon.social
       2023-03-12T10:22:27Z
       
       0 likes, 0 repeats
       
       @simon "That furious typing sound you can hear is thousands of hackers around the world starting to dig in and figure out what life is like when you can run a GPT-3 class model on your own hardware."New game opened here, finally in a space available beyond highly funded academic groups. While it makes frightful uses easier, it actually also opens all the positive uses outside of OpenAI control!
       
 (DIR) Post #ATXTVgpWA8YPS0swjY by troed@ioc.exchange
       2023-03-12T12:08:03Z
       
       0 likes, 0 repeats
       
       @simon Did you get consistent results on par with your "man on the moon" example from the 7B model? On average I've seen _much_ more hilarious output.
       
 (DIR) Post #ATXfl9uZWFrtRIOMQS by danieldekay@masto.ai
       2023-03-12T14:25:20Z
       
       0 likes, 0 repeats
       
       @simon @danhon Essentially it brings the LLM into a “built into my operating system” quality. Siri 2.0 anyone?
       
 (DIR) Post #ATZt9pENEYM4G8fMMS by simon@fedi.simonwillison.net
       2023-03-13T16:04:24Z
       
       0 likes, 0 repeats
       
       Since I wrote that on Saturday the LLaMA large language model has been shown running on a 4GB RaspberryPi, and this morning on a Pixel 6 phone! Added a "It's happening" section to the post here: https://simonwillison.net/2023/Mar/11/llama/#its-happening
       
 (DIR) Post #ATZtNsI2elp1nk55VI by AaronNGray@fosstodon.org
       2023-03-13T16:05:52Z
       
       0 likes, 0 repeats
       
       @simon how much battery and storage does this use on a phone ?
       
 (DIR) Post #ATZtayqVaprttmS4xc by mkarliner@mastodon.modern-industry.com
       2023-03-13T16:06:21Z
       
       0 likes, 0 repeats
       
       @simon OMG!Where's the repo?
       
 (DIR) Post #ATZtoF0snKaJnzcyFk by simon@fedi.simonwillison.net
       2023-03-13T16:07:52Z
       
       0 likes, 0 repeats
       
       @AaronNGray Don't know about battery. The 7B LLaMA model compresses down to a 3.9GB binary file when you apply the 4bit quantization that llama.cpp uses
       
 (DIR) Post #ATZtyq1I3mHxTVs4bQ by simon@fedi.simonwillison.net
       2023-03-13T16:08:35Z
       
       0 likes, 0 repeats
       
       @mkarliner Both those examples are using https://github.com/ggerganov/llama.cpp - I don't think they've shared the incantations they used to compile and run it on those devices though
       
 (DIR) Post #ATZuAE1NADqb6DL6kS by whynothugo@fosstodon.org
       2023-03-13T16:08:56Z
       
       0 likes, 0 repeats
       
       @simon I'd been wondering if it would be feasible to run it on LInux phones. It would be really interesting to see what UIs can be made with this kind of thing. Like a voice-driven shell where commands are dictated while holding Power+VolUp or something like that.
       
 (DIR) Post #ATZuNR382It5GpSZgu by AaronNGray@fosstodon.org
       2023-03-13T16:13:17Z
       
       0 likes, 0 repeats
       
       @simon decompression power usage too
       
 (DIR) Post #ATZuYhqtu7RKZK7BB2 by simon@fedi.simonwillison.net
       2023-03-13T16:16:01Z
       
       0 likes, 0 repeats
       
       @AaronNGray I used "compression" clumsily there - this isn't compression like gzip, it's using 4bit instead of 16bit floating point numbers - the model is actually faster to execute with lower precision numbers since less has to be loaded into memory
       
 (DIR) Post #ATZumRzXfcazQqi6vw by markus@hachyderm.io
       2023-03-13T16:19:50Z
       
       0 likes, 0 repeats
       
       @simon I really appreciate your sharing of all this information. I haven’t had a chance to try it myself yet, but I’m looking forward to it. 😊
       
 (DIR) Post #ATZv1ZLabXyqaZERt2 by mkarliner@mastodon.modern-industry.com
       2023-03-13T16:23:31Z
       
       0 likes, 0 repeats
       
       @simon I'm just trying to compile it on my i7 32GB Hackintosh. It wants use Apple clang and the options don't work.  I'm probably too excited to think straight...
       
 (DIR) Post #ATZvDWgSKhxdnrh4mu by mkarliner@mastodon.modern-industry.com
       2023-03-13T16:27:26Z
       
       0 likes, 0 repeats
       
       @simon The Makefile now has...ifneq ($(filter armv6%,$(UNAME_M)),)        # Raspberry Pi 1, 2, 3        CFLAGS += -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-accessendififneq ($(filter armv7%,$(UNAME_M)),)        # Raspberry Pi 4        CFLAGS += -mfpu=neon-fp-armv8 -mfp16-format=ieee -mno-unaligned-access -funsafe-math-optimizationsendififneq ($(filter armv8%,$(UNAME_M)),)        # Raspberry Pi 4        CFLAGS += -mfp16-format=ieee -mno-unaligned-accessendif
       
 (DIR) Post #ATa2rT9Va6eu75HBaq by troed@ioc.exchange
       2023-03-13T17:53:36Z
       
       0 likes, 0 repeats
       
       @simon tbh this CPU-only project runs too slow for most usage. The GPTQ-for-LLaMa project is a whole different matter. "Any" gaming GPU can run the 7B model and slightly higher end (12GB VRAM) can run the 13B. At conversational speed.