[HN Gopher] PowerInfer: Fast Large Language Model Serving with a...
___________________________________________________________________
PowerInfer: Fast Large Language Model Serving with a Consumer-Grade
GPU [pdf]
Author : georgehill
Score : 21 points
Date : 2023-12-19 21:24 UTC (1 hours ago)
(HTM) web link (ipads.se.sjtu.edu.cn)
(TXT) w3m dump (ipads.se.sjtu.edu.cn)
| LoganDark wrote:
| > PowerInfer's source code is publicly available at
| https://github.com/SJTU-IPADS/PowerInfer
|
| ---
|
| Just curious - PowerInfer seems to market itself by running very
| large models (40B, 70B) on something like a 4090. If I have, say,
| a 3060 12GB, and I want to run something like a 7B or 13B, can I
| expect the same speedup of around 10x? Or does this only help
| that much for models that wouldn't already fit in VRAM?
___________________________________________________________________
(page generated 2023-12-19 23:00 UTC)