Post AVvJrS6n08L3KlqkXw by nick@amok.recoil.org
(DIR) More posts by nick@amok.recoil.org
(DIR) Post #AVvFVEIBI0W8IQKHGS by simon@fedi.simonwillison.net
2023-05-22T19:08:25Z
0 likes, 0 repeats
My notes on getting RedPajama-INCITE-Chat-3B running on my Mac using mlc-chat https://til.simonwillison.net/llms/mlc-chat-redpajama
(DIR) Post #AVvGn93Hbxm5D6SqTA by ericflo@mastodon.xyz
2023-05-22T19:20:55Z
0 likes, 0 repeats
@simon This is my favorite model - so good for its size!
(DIR) Post #AVvIw6kXjxE0aJ6HoW by nick@amok.recoil.org
2023-05-22T19:46:57Z
0 likes, 0 repeats
@simon do you know the rough token generation speed? I’ve managed to get about 8-9/s from the recent updates to llama.cpp with BLAS offloading on my PC.
(DIR) Post #AVvJcw8R54tK7FhMA4 by simon@fedi.simonwillison.net
2023-05-22T19:54:38Z
0 likes, 0 repeats
@nick On an M2 I got "prefill: 116.9 tok/s, decode: 29.2 tok/s" for vicuna-v1-7b-q3f16_0 and "prefill: 59.0 tok/s, decode: 36.9 tok/s" for RedPajama-INCITE-Chat-3B-v1-q4f16_0
(DIR) Post #AVvJrS6n08L3KlqkXw by nick@amok.recoil.org
2023-05-22T19:57:21Z
0 likes, 0 repeats
@simon wow that’s considerably speedier!
(DIR) Post #AVvNiQZRkUfZm7CqBs by nick@amok.recoil.org
2023-05-22T20:40:29Z
0 likes, 0 repeats
@simon I got it working on my PC, and it's definitely faster than llama.cpp! I got prefill: 8.0 tok/s, decode: 38.7 tok/s. Very nice project, thanks for talking about it