Post AVvJrS6n08L3KlqkXw by nick@amok.recoil.org
 (DIR) More posts by nick@amok.recoil.org
 (DIR) Post #AVvFVEIBI0W8IQKHGS by simon@fedi.simonwillison.net
       2023-05-22T19:08:25Z
       
       0 likes, 0 repeats
       
       My notes on getting RedPajama-INCITE-Chat-3B running on my Mac using mlc-chat https://til.simonwillison.net/llms/mlc-chat-redpajama
       
 (DIR) Post #AVvGn93Hbxm5D6SqTA by ericflo@mastodon.xyz
       2023-05-22T19:20:55Z
       
       0 likes, 0 repeats
       
       @simon This is my favorite model - so good for its size!
       
 (DIR) Post #AVvIw6kXjxE0aJ6HoW by nick@amok.recoil.org
       2023-05-22T19:46:57Z
       
       0 likes, 0 repeats
       
       @simon do you know the rough token generation speed? I’ve managed to get about 8-9/s from the recent updates to llama.cpp with BLAS offloading on my PC.
       
 (DIR) Post #AVvJcw8R54tK7FhMA4 by simon@fedi.simonwillison.net
       2023-05-22T19:54:38Z
       
       0 likes, 0 repeats
       
       @nick On an M2 I got "prefill: 116.9 tok/s, decode: 29.2 tok/s" for vicuna-v1-7b-q3f16_0 and "prefill: 59.0 tok/s, decode: 36.9 tok/s" for RedPajama-INCITE-Chat-3B-v1-q4f16_0
       
 (DIR) Post #AVvJrS6n08L3KlqkXw by nick@amok.recoil.org
       2023-05-22T19:57:21Z
       
       0 likes, 0 repeats
       
       @simon wow that’s considerably speedier!
       
 (DIR) Post #AVvNiQZRkUfZm7CqBs by nick@amok.recoil.org
       2023-05-22T20:40:29Z
       
       0 likes, 0 repeats
       
       @simon I got it working on my PC, and it's definitely faster than llama.cpp!  I got prefill: 8.0 tok/s, decode: 38.7 tok/s.   Very nice project, thanks for talking about it