Post ATVthK1F0Xeuffejs8 by ryanleesipes@mastodon.social
(DIR) More posts by ryanleesipes@mastodon.social
(DIR) Post #ATUdSxpku0inNzSJI8 by simon@fedi.simonwillison.net
2023-03-11T03:15:12Z
0 likes, 0 repeats
Just ran llama.cpp - with Facebook's LLaMA 7B large language model - on my M2 64GB MacBook Pro!https://github.com/ggerganov/llama.cpp
(DIR) Post #ATUde67gQL32HP9N20 by simon@fedi.simonwillison.net
2023-03-11T03:17:04Z
0 likes, 0 repeats
It's now possible to run a genuinely interesting large language model on a consumer laptopI thought it would be at least another year or two before we got there, if not longer
(DIR) Post #ATUdrPZBRcFKLbMTw0 by jesse@metasocial.com
2023-03-11T03:18:25Z
0 likes, 0 repeats
@simon !!!
(DIR) Post #ATUec54lRhTgPJO3JA by ryanleesipes@mastodon.social
2023-03-11T03:28:09Z
0 likes, 0 repeats
@simon what did resource usage look like?
(DIR) Post #ATUewCyXmi2ehQ4hsW by simon@fedi.simonwillison.net
2023-03-11T03:31:48Z
0 likes, 0 repeats
@ryanleesipes ActivityMonitor spotted it using 744.7% of CPU with 8 threads while it was running, and 4GB of RAM
(DIR) Post #ATUf84bcjF1gDT0nKq by ryanleesipes@mastodon.social
2023-03-11T03:33:48Z
0 likes, 0 repeats
@simon these models running on laptops open up interesting opportunities for privacy-focused apps like Thunderbird. Know of any smaller ones good for summarization?
(DIR) Post #ATUfJuzmHmTdsvxRtA by numist@xoxo.zone
2023-03-11T03:34:57Z
0 likes, 0 repeats
@simon how is it at writing code?
(DIR) Post #ATUfVyFtzUEHTGFNEO by simon@fedi.simonwillison.net
2023-03-11T03:38:32Z
0 likes, 0 repeats
@numist Pretty impressive considering this is the smallest of the LLaMA models (I'm running 7B but they also released 13B, 30B and 65B)Got this result for a prompt of "def open_and_return_content(filename):"
(DIR) Post #ATUfjiTrRDdebHKYHA by simon@fedi.simonwillison.net
2023-03-11T03:40:40Z
0 likes, 0 repeats
@ryanleesipes I'm trying to figure out what the smallest LLM that can do summarization is - my hunch is that it would need to be one of the bigger LLaMA ones (13B or 30B or even 65B), I don't think 7B will quite cut itI'm not sure how to best prompt 7B for summarization though, since it hasn't been instruction tuned
(DIR) Post #ATUgY1BcAHIZ5tmWMS by 22@octodon.social
2023-03-11T03:49:59Z
0 likes, 0 repeats
@simon I’m laugh-sobbing because I’ve been maxing out RAM since gosh the 2000s but this time round I was like “how long has it been since I quit math and have needed more than even a gig”
(DIR) Post #ATUjkIE8vubfHaVcUy by simon@fedi.simonwillison.net
2023-03-11T04:25:37Z
0 likes, 0 repeats
Here are detailed notes on how I got it to work, plus some examples of prompts and their responses https://til.simonwillison.net/llms/llama-7b-m2
(DIR) Post #ATUkIVuKZlc7YdYlE0 by simon@fedi.simonwillison.net
2023-03-11T04:32:06Z
0 likes, 0 repeats
@22 the torrent was 240GB so I'm deeply regretting my decision to only get the TB hard drive - I had to offload my photo collection to iCloud just to fit the model on my machine!
(DIR) Post #ATUlzyXO6dcYl47Gnw by ryansingel@writing.exchange
2023-03-11T04:50:42Z
0 likes, 0 repeats
@simon So cool. Keeping my eye on this for sure.
(DIR) Post #ATVURC7G7pLsfngTOS by Jackivers@mastodon.social
2023-03-11T13:08:50Z
0 likes, 0 repeats
@simon Thanks for this. I’ll be trying out your recipe on my M1 Pro Max.
(DIR) Post #ATVs7BOvTK0JZ0wETo by simon@fedi.simonwillison.net
2023-03-11T17:34:18Z
0 likes, 0 repeats
Thanks to an update from llama.cc author Georgi I have now successfully run the 13B model on my machine too! That's the one which Facebook research claims is competitive with original GPT3 in benchmarks. Notes on how I did that here: https://til.simonwillison.net/llms/llama-7b-m2#user-content-running-13b
(DIR) Post #ATVt0legNiA1vmoTSq by garrett@mastodon.xyz
2023-03-11T07:45:55Z
0 likes, 0 repeats
@ryanleesipes @simon Alternatively, if you just want summarization, Open Text Summarizer (OTS) exists and is much less resource intensive:https://github.com/neopunisher/Open-Text-Summarizer/It's not AI and is much simpler, and it supports many languages already. It's built in several Linux distros (I know for a fact Fedora ships OTS) and has a library for or other apps to use. I think it's cross-platform for other OSes too.
(DIR) Post #ATVt0mjgMbkpHa9ytc by simon@fedi.simonwillison.net
2023-03-11T17:44:11Z
0 likes, 0 repeats
@garrett @ryanleesipes Having read the description of how that works I wouldn't want to use it in place of a full language model - I'm certain the results wouldn't be nearly as useful or accurate
(DIR) Post #ATVthK1F0Xeuffejs8 by ryanleesipes@mastodon.social
2023-03-11T17:51:50Z
0 likes, 0 repeats
@simon @garrett right now have done some early work with Facebook's bart-large-cnn model that works pretty well. But pretty sure this wouldn't run on most folks' machines.https://huggingface.co/facebook/bart-large-cnn
(DIR) Post #ATVxwYTBxh4jeMFVDs by fap@mastodon.social
2023-03-11T18:39:24Z
0 likes, 0 repeats
@simon If I understand it correctly it's only the conversion and quantization step that needs a lot of RAM? Would it be possible for people to share those converted models? Are these platform dependent?
(DIR) Post #ATW2vQFVXYixdR7ya0 by jpf@mastodon.social
2023-03-11T19:31:32Z
0 likes, 0 repeats
@simon thanks for posting about this! I just got the 7B model working on my I9 (after an update that fixed some issues with Intel processors, hah)
(DIR) Post #ATW5zm4Ef8FWK8ccXQ by basil@hci.social
2023-03-11T20:09:51Z
0 likes, 0 repeats
@simon Beavers Are Friends But They Are Not Friends With Cat Coffee might not have been what you were after but it is exactly what I was after
(DIR) Post #ATWAIO1GhBiF3kMequ by is@hachyderm.io
2023-03-11T20:58:00Z
0 likes, 0 repeats
@simon I’m just diving back into AI/ML. How RAM constrained are these models? I’ve a M1 Pro Mac with 32 GB RAM and a Linux machine with 32 GB as well. Wondering whether I should even attempt this.
(DIR) Post #ATWCWS5qXtZpemLBPk by simon@fedi.simonwillison.net
2023-03-11T21:22:42Z
0 likes, 0 repeats
@is Running the 7B model only seems to use about 4GB of RAM on my M2 MacBook Pro, 32GB should be easily enough
(DIR) Post #ATWCuaqLfHffkDbRaq by osma@sigmoid.social
2023-03-11T21:25:05Z
0 likes, 0 repeats
@simonThis is truly awesome, thanks a lot for sharing your experiences!Can I ask two stupid n00b questions:1. Is this exclusive to Mac hardware, or could it work on a PC laptop with enough RAM?2. Is there any chance of fine-tuning these models on custom prompts and completions, like you can do with the OpenAI API? What would it take?
(DIR) Post #ATWD7jZWWVTIa3011E by simon@fedi.simonwillison.net
2023-03-11T21:28:31Z
0 likes, 0 repeats
@osma People are definitely running LAmDA on PC hardware, but I don't know anything about what that requires or if you can do it without a top-spec GPU. Lots of hints about PC stuff in the Facebook repo's issues: https://github.com/facebookresearch/llama/issuesI presume fine-tuning is possible but I have no idea how you would do it I'm afraid!
(DIR) Post #ATWDv3XT4kpyhDYYN6 by osma@sigmoid.social
2023-03-11T21:36:19Z
0 likes, 0 repeats
@simonThanks! I presume you meant LLaMA not LaMDA?
(DIR) Post #ATaRWRxVK0m7BTjeVM by neil@mastodon.nz
2023-03-13T22:29:20Z
0 likes, 0 repeats
@simon I was also able to quantise and run the #LLaMA 30B model using a very similar procedure (4 files needed). Takes around 20 GB RAM when running and 350 ms per token on my 32GB M1 Pro MacBook Pro.
(DIR) Post #ATaaQK3EhxuOVdZMoK by Jackivers@mastodon.social
2023-03-14T00:09:39Z
0 likes, 0 repeats
@simon It lives!"The first president of the USA was 38 years old when he became a “citizen” and then later a President.He had been raised in New York, but his father had moved to Virginia before George Washington’s birth (1746) because there were too few farms for all the children they wanted – …”