fsebugoutzone.org:9999

       Post ATVthK1F0Xeuffejs8 by ryanleesipes@mastodon.social
 (DIR) More posts by ryanleesipes@mastodon.social
 (DIR) Post #ATUdSxpku0inNzSJI8 by simon@fedi.simonwillison.net
       2023-03-11T03:15:12Z
       
       0 likes, 0 repeats
       
       Just ran llama.cpp - with Facebook&#39;s LLaMA 7B large language model - on my M2 64GB MacBook Pro!https://github.com/ggerganov/llama.cpp
       
 (DIR) Post #ATUde67gQL32HP9N20 by simon@fedi.simonwillison.net
       2023-03-11T03:17:04Z
       
       0 likes, 0 repeats
       
       It&#39;s now possible to run a genuinely interesting large language model on a consumer laptopI thought it would be at least another year or two before we got there, if not longer
       
 (DIR) Post #ATUdrPZBRcFKLbMTw0 by jesse@metasocial.com
       2023-03-11T03:18:25Z
       
       0 likes, 0 repeats
       
       @simon !!!
       
 (DIR) Post #ATUec54lRhTgPJO3JA by ryanleesipes@mastodon.social
       2023-03-11T03:28:09Z
       
       0 likes, 0 repeats
       
       @simon what did resource usage look like?
       
 (DIR) Post #ATUewCyXmi2ehQ4hsW by simon@fedi.simonwillison.net
       2023-03-11T03:31:48Z
       
       0 likes, 0 repeats
       
       @ryanleesipes ActivityMonitor spotted it using 744.7% of CPU with 8 threads while it was running, and 4GB of RAM
       
 (DIR) Post #ATUf84bcjF1gDT0nKq by ryanleesipes@mastodon.social
       2023-03-11T03:33:48Z
       
       0 likes, 0 repeats
       
       @simon these models running on laptops open up interesting opportunities for privacy-focused apps like Thunderbird. Know of any smaller ones good for summarization?
       
 (DIR) Post #ATUfJuzmHmTdsvxRtA by numist@xoxo.zone
       2023-03-11T03:34:57Z
       
       0 likes, 0 repeats
       
       @simon how is it at writing code?
       
 (DIR) Post #ATUfVyFtzUEHTGFNEO by simon@fedi.simonwillison.net
       2023-03-11T03:38:32Z
       
       0 likes, 0 repeats
       
       @numist Pretty impressive considering this is the smallest of the LLaMA models (I&#39;m running 7B but they also released 13B, 30B and 65B)Got this result for a prompt of &quot;def open_and_return_content(filename):&quot;
       
 (DIR) Post #ATUfjiTrRDdebHKYHA by simon@fedi.simonwillison.net
       2023-03-11T03:40:40Z
       
       0 likes, 0 repeats
       
       @ryanleesipes I&#39;m trying to figure out what the smallest LLM that can do summarization is - my hunch is that it would need to be one of the bigger LLaMA ones (13B or 30B or even 65B), I don&#39;t think 7B will quite cut itI&#39;m not sure how to best prompt 7B for summarization though, since it hasn&#39;t been instruction tuned
       
 (DIR) Post #ATUgY1BcAHIZ5tmWMS by 22@octodon.social
       2023-03-11T03:49:59Z
       
       0 likes, 0 repeats
       
       @simon I’m laugh-sobbing because I’ve been maxing out RAM since gosh the 2000s but this time round I was like “how long has it been since I quit math and have needed more than even a gig”
       
 (DIR) Post #ATUjkIE8vubfHaVcUy by simon@fedi.simonwillison.net
       2023-03-11T04:25:37Z
       
       0 likes, 0 repeats
       
       Here are detailed notes on how I got it to work, plus some examples of prompts and their responses https://til.simonwillison.net/llms/llama-7b-m2
       
 (DIR) Post #ATUkIVuKZlc7YdYlE0 by simon@fedi.simonwillison.net
       2023-03-11T04:32:06Z
       
       0 likes, 0 repeats
       
       @22 the torrent was 240GB so I&#39;m deeply regretting my decision to only get the TB hard drive - I had to offload my photo collection to iCloud just to fit the model on my machine!
       
 (DIR) Post #ATUlzyXO6dcYl47Gnw by ryansingel@writing.exchange
       2023-03-11T04:50:42Z
       
       0 likes, 0 repeats
       
       @simon So cool. Keeping my eye on this for sure.
       
 (DIR) Post #ATVURC7G7pLsfngTOS by Jackivers@mastodon.social
       2023-03-11T13:08:50Z
       
       0 likes, 0 repeats
       
       @simon Thanks for this. I’ll be trying out your recipe on my M1 Pro Max.
       
 (DIR) Post #ATVs7BOvTK0JZ0wETo by simon@fedi.simonwillison.net
       2023-03-11T17:34:18Z
       
       0 likes, 0 repeats
       
       Thanks to an update from llama.cc author Georgi I have now successfully run the 13B model on my machine too! That&#39;s the one which Facebook research claims is competitive with original GPT3 in benchmarks. Notes on how I did that here: https://til.simonwillison.net/llms/llama-7b-m2#user-content-running-13b
       
 (DIR) Post #ATVt0legNiA1vmoTSq by garrett@mastodon.xyz
       2023-03-11T07:45:55Z
       
       0 likes, 0 repeats
       
       @ryanleesipes @simon Alternatively, if you just want summarization, Open Text Summarizer (OTS) exists and is much less resource intensive:https://github.com/neopunisher/Open-Text-Summarizer/It&#39;s not AI and is much simpler, and it supports many languages already. It&#39;s built in several Linux distros (I know for a fact Fedora ships OTS) and has a library for or other apps to use. I think it&#39;s cross-platform for other OSes too.
       
 (DIR) Post #ATVt0mjgMbkpHa9ytc by simon@fedi.simonwillison.net
       2023-03-11T17:44:11Z
       
       0 likes, 0 repeats
       
       @garrett @ryanleesipes Having read the description of how that works I wouldn&#39;t want to use it in place of a full language model - I&#39;m certain the results wouldn&#39;t be nearly as useful or accurate
       
 (DIR) Post #ATVthK1F0Xeuffejs8 by ryanleesipes@mastodon.social
       2023-03-11T17:51:50Z
       
       0 likes, 0 repeats
       
       @simon @garrett right now have done some early work with Facebook&#39;s bart-large-cnn model that works pretty well. But pretty sure this wouldn&#39;t run on most folks&#39; machines.https://huggingface.co/facebook/bart-large-cnn
       
 (DIR) Post #ATVxwYTBxh4jeMFVDs by fap@mastodon.social
       2023-03-11T18:39:24Z
       
       0 likes, 0 repeats
       
       @simon If I understand it correctly it&#39;s only the conversion and quantization step that needs a lot of RAM? Would it be possible for people to share those converted models? Are these platform dependent?
       
 (DIR) Post #ATW2vQFVXYixdR7ya0 by jpf@mastodon.social
       2023-03-11T19:31:32Z
       
       0 likes, 0 repeats
       
       @simon thanks for posting about this! I just got the 7B model working on my I9 (after an update that fixed some issues with Intel processors, hah)
       
 (DIR) Post #ATW5zm4Ef8FWK8ccXQ by basil@hci.social
       2023-03-11T20:09:51Z
       
       0 likes, 0 repeats
       
       @simon Beavers Are Friends But They Are Not Friends With Cat Coffee might not have been what you were after but it is exactly what I was after
       
 (DIR) Post #ATWAIO1GhBiF3kMequ by is@hachyderm.io
       2023-03-11T20:58:00Z
       
       0 likes, 0 repeats
       
       @simon I’m just diving back into AI/ML. How RAM constrained are these models? I’ve a M1 Pro Mac with 32 GB RAM and a Linux machine with 32 GB as well. Wondering whether I should even attempt this.
       
 (DIR) Post #ATWCWS5qXtZpemLBPk by simon@fedi.simonwillison.net
       2023-03-11T21:22:42Z
       
       0 likes, 0 repeats
       
       @is Running the 7B model only seems to use about 4GB of RAM on my M2 MacBook Pro, 32GB should be easily enough
       
 (DIR) Post #ATWCuaqLfHffkDbRaq by osma@sigmoid.social
       2023-03-11T21:25:05Z
       
       0 likes, 0 repeats
       
       @simonThis is truly awesome, thanks a lot for sharing your experiences!Can I ask two stupid n00b questions:1. Is this exclusive to Mac hardware, or could it work on a PC laptop with enough RAM?2. Is there any chance of fine-tuning these models on custom prompts and completions, like you can do with the OpenAI API? What would it take?
       
 (DIR) Post #ATWD7jZWWVTIa3011E by simon@fedi.simonwillison.net
       2023-03-11T21:28:31Z
       
       0 likes, 0 repeats
       
       @osma People are definitely running LAmDA on PC hardware, but I don&#39;t know anything about what that requires or if you can do it without a top-spec GPU. Lots of hints about PC stuff in the Facebook repo&#39;s issues: https://github.com/facebookresearch/llama/issuesI presume fine-tuning is possible but I have no idea how you would do it I&#39;m afraid!
       
 (DIR) Post #ATWDv3XT4kpyhDYYN6 by osma@sigmoid.social
       2023-03-11T21:36:19Z
       
       0 likes, 0 repeats
       
       @simonThanks! I presume you meant LLaMA not LaMDA?
       
 (DIR) Post #ATaRWRxVK0m7BTjeVM by neil@mastodon.nz
       2023-03-13T22:29:20Z
       
       0 likes, 0 repeats
       
       @simon I was also able to quantise and run the #LLaMA 30B model using a very similar procedure (4 files needed). Takes around 20 GB RAM when running and 350 ms per token on my 32GB M1 Pro MacBook Pro.
       
 (DIR) Post #ATaaQK3EhxuOVdZMoK by Jackivers@mastodon.social
       2023-03-14T00:09:39Z
       
       0 likes, 0 repeats
       
       @simon It lives!&quot;The first president of the USA was 38 years old when he became a “citizen” and then later a President.He had been raised in New York, but his father had moved to Virginia before George Washington’s birth (1746) because there were too few farms for all the children they wanted – …”