[HN Gopher] Porting HPC Applications to AMD Instinct MI300A Usin...
___________________________________________________________________
Porting HPC Applications to AMD Instinct MI300A Using Unified
Memory and OpenMP
Author : arcanus
Score : 53 points
Date : 2024-05-04 16:47 UTC (6 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| bee_rider wrote:
| APU's for HPC are going to be a wild ride. Accelerated computing
| in shared memory. Get CPU-focused folks will actually get access
| to some high throughput compute accessible on the sort of
| timescales that we can actually reason about (the GPU is so far
| away).
| djmips wrote:
| All video game consoles use APUs and it does make memory
| related operations potentially faster but at least for video
| games it's not the bottleneck. I suppose for HPC it might have
| more significance.
| bayindirh wrote:
| If you're doing simulations, or poking big matrices
| continuously on CPUs, you can saturate the memory controller
| pretty easily. If you know what you're doing, your FPU or
| vector units are saturated at the same time, so "whole
| system" becomes the bottleneck while it tries to keep itself
| cool.
|
| Games move that kind of data in the beginning and doesn't
| stream new data that much after the initial texture and model
| data. If you are working on HPC with GPUs, you may need to
| constantly stream in new data to the GPU while streaming out
| the results. This is why datacenter/compute GPUs have
| multiple independent DMA engines.
| crest wrote:
| Afaik those unified memory architectures are mostly neither
| cache coherent nor do they support virtual addresses
| efficiently (you have to trap into privileged code to
| pin/unpin the mappings) which means that the relative cost is
| lower than a dedicated GPU accessed via PCIe slots, but still
| to high. Only the "boring" old Bobcat based AMD APUs
| supported accessing unpinned virtual memory from the L3 (aka
| system level) cache and nobody bothered with porting code to
| them.
| JonChesterfield wrote:
| APUs are very cool for GPU programming in general. Explicitly
| copying data to/from GPUs is a definite nuisance. I'm hopeful
| that the MI300A will have a positive knock on effect on the low
| power APUs in laptops and similar.
| imtringued wrote:
| >Explicitly copying data to/from GPUs is a definite nuisance.
|
| CXL allows fine grained shared memory, but people look at the
| shiny high bandwidth NVLink and talk about how much better it
| is for... AI.
| curt15 wrote:
| I was talking with a friend in HPC lately who said that AMD is
| actually quite competitive in the HPC space these days. For
| example, Frontier
| (https://docs.olcf.ornl.gov/systems/frontier_user_guide.html) is
| an all-AMD installation. Do scientists actually use ROCm in their
| code or does AMD have another programming framework for their
| Instinct chips?
| kkielhofner wrote:
| I currently have a project with ORNL OLCF (on Frontier). The
| short answer is yes. Happy to answer any questions I can.
| ysleepy wrote:
| ROCm or HIP? Does it start out with porting a lot from CUDA
| etc. or starting fresh on top of the AMD APIs?
|
| How much of the project time is spent on that compute API
| stuff in comparison to "payload" work?
| almostgotcaught wrote:
| National labs sign "cost-effective" deals. NVIDIA isn't cost-
| effective. Aurora (at Argonne) is all Intel GPU. Aurora is also
| a clusterfuck so that just tells you these decisions aren't
| made by the most competent people.
| jfkfif wrote:
| nvidia absolutely gives deals to national labs and
| universities. See Crossroads @ LANL, Isambard in the UK,
| Perlmutter @ LBL. While AMD is being deployed at LLNL and
| ORNL, Nvidia isn't done with their HPC game. Maybe not at the
| leadership level, but we'll see how Oak Ridge and LANL decide
| their next round of procurements
| wmf wrote:
| Both Frontier and Aurora bet on unproven future chips.
| Sometimes it pays off and sometimes it doesn't.
| Dalewyn wrote:
| They are competent people, just not in the fields techies
| want.
|
| When you're a national laboratory and your wallet is taxes
| from fellow Americans, it is very important that you find a
| balance between bang and buck. Lest you get your budget
| slashed or worse.
| mathiasgredal wrote:
| Having looked briefly at the code I still think C++17 parallel
| algorithms are more ergonomic compared to OpenMP:
| https://rocm.blogs.amd.com/software-tools-optimization/hipst...
| mgaunard wrote:
| funny how we only get LoC between the different versions, but
| not the performance...
|
| Of course the parallel algorithms are shorter, it's a more
| high-level interface. But being explicit gives you more control
| and potentially more performance.
| bee_rider wrote:
| Is language support why people like OpenMP?
|
| I think it is nice because it supports both C and Fortran, and
| they use the same runtime, so you can do things like pin
| threads to cores or avoid oversubscription. Stuff like calling
| a Fortran library that uses OpenMP, from a C code that also
| uses OpenMP, doesn't require anything clever.
| jltsiren wrote:
| OpenMP has been around for a long time. People know how to
| use it, and it has gained many features that are useful for
| scientific computing.
|
| The consortium behind OpenMP consists mostly of hardware
| companies and organizations doing scientific computing.
| Software companies are largely missing. That may contribute
| to the popularity of OpenMP, as the interests of scientific
| computing and software development are often different.
___________________________________________________________________
(page generated 2024-05-04 23:00 UTC)