[HN Gopher] Making AMD GPUs competitive for LLM inference
___________________________________________________________________
Making AMD GPUs competitive for LLM inference
Author : djoldman
Score : 51 points
Date : 2023-08-09 18:15 UTC (4 hours ago)
(HTM) web link (blog.mlc.ai)
(TXT) w3m dump (blog.mlc.ai)
| tails4e wrote:
| Being 90% of a 4090 makes the 7900 xtx very attractive in a cost
| per compute perspective, as its about 65%, and power is
| significantly lower too
| nkingsy wrote:
| brutal slandering of AMD marketing tactics disguised as a
| benchmark summary:
|
| https://gpu.userbenchmark.com/AMD-RX-7900-XT/Rating/4141
| zdw wrote:
| There's a reason this site is banned from most hardware
| forums.
| GaggiX wrote:
| It has been a while since I saw anyone send a Userbenchmark
| link. Userbenchmark is not a site with a good reputation, you
| can find many reasons online why this is, but I guess the
| site moderators just ignore it by pretending it's "Advanced
| Marketing", meanwhile even the CPU competitor of AMD has
| banned the site from their subreddit, one of my favorite
| explanation https://youtu.be/RQSBj2LKkWg but there are of
| course more recent ones.
| jacoblambda wrote:
| Really? you are referring to userbenchmark's assessment of
| AMD?
|
| Look at literally any of their assessments for AMD products
| and they'll be dragging them through the mud while pointing
| to worse performing intel products to flex how much better
| they are because they aren't AMD.
| superkuh wrote:
| > AMD GPUs using ROCm
|
| Oh great. The AMD RX 580 was released in April 2018. AMD had
| already dropped ROCm support for it by 2021. They only supported
| the card for 3 years. 3 years. It's so lame it's bordering on
| fraudulent, even if not legally fraud. Keep this in mind when
| reading this news. The support won't last long, especially if you
| don't buy at launch. Then you'll be stuck in the dependency hell
| that is trying to use old drivers/stack.
| crowwork wrote:
| There is also vulkan support which should be more
| universal(also included in the post), for example, the post
| also shows running LLM on a steamdeck APU.
| junrushao1994 wrote:
| tbh im not sure what amds plan is on ROCm support on consumer
| devices, but i dont really think amd is being fraudulent or
| something.
|
| Both rocm and vulkan are supported in MLC LLM as mentioned in
| our blog post. we are aware that rocm is not sufficient to
| cover consumer hardwares, and in this case vulkan is a nice
| backup!
| zorgmonkey wrote:
| How does the performance with Vulkan compare to the ROCm
| performance on the same hardware?
| junrushao1994 wrote:
| One of the authors here. Glad it's on HackerNews!
|
| There are two points I personally wanted to make through this
| project:
|
| 1) With a sufficiently optimized software stack, AMD GPUs can be
| sufficiently cost-efficient to use in LLM serving; 2) ML
| compilation (MLC) techniques, through its underlying TVM Unity
| software stack, are the best fit in terms of cross-hardware
| generalizable performance optimizations, quickly delivering time-
| to-market values, etc.
|
| So far, to the best of our knowledge, MLC LLM delivers the best
| performance across NVIDIA and AMD GPUs in single-batch inference
| on quantized models, and batched/distributed inference is on the
| horizon too.
| JonChesterfield wrote:
| Did the ROCm 5.6 toolchain work for you out of the box? If not,
| what sort of hacking / hand holding did it need?
|
| I don't know whether there's a LLM inference benchmark in the
| CI suite, if not perhaps something like this should be included
| in it.
| crowwork wrote:
| Yes, it works out of box and the blog contains a prebuilt
| python package that you can try out
| junrushao1994 wrote:
| ROCm has improved a lot over the past few months, and now
| ROCm 5.6 seems to work out of box by just following this
| tutorial: https://rocm.docs.amd.com/en/latest/deploy/linux/in
| staller/i.... TVM Unity, the underlying compiler MLC LLM
| uses, seems to work out of box too on ROCm 5.6 - from Bohan
| Hou who sets up the environment
| JonChesterfield wrote:
| Awesome. I'm going to paste that into the rocm dev channel.
| Actual positive feedback on HN, novel and delightful. Thank
| you for the blog post too!
| tails4e wrote:
| When you say best performance on nvidia, do you mean against
| any other method of running this model an nvidia card?
| junrushao1994 wrote:
| yeah we tried out popular solutions like exllama and
| llama.cpp among others that support inference of 4bit
| quantized models
___________________________________________________________________
(page generated 2023-08-09 23:00 UTC)