[HN Gopher] oneAPI - A cross-industry, open, standards-based uni...
___________________________________________________________________
oneAPI - A cross-industry, open, standards-based unified
programming model
Author : wbryce
Score : 66 points
Date : 2021-03-01 14:42 UTC (8 hours ago)
(HTM) web link (www.oneapi.com)
(TXT) w3m dump (www.oneapi.com)
| stonogo wrote:
| I can't help but assume this idea seemed much more compelling
| when Intel was still selling Omnipath.
| jdc wrote:
| There's also OCCA which does JIT compilation to C++, OpenMP,
| CUDA, HIP, OpenCL and Metal. Originally built at Virginia Tech
| and now maintained by the Center for Efficient Exascale
| Discretizations of the DOE.
|
| And it seems to have decent Python bindings.
|
| https://libocca.org
| brianberns wrote:
| Classic relevant xkcd: https://xkcd.com/927/
| harles wrote:
| Exactly what I was thinking of. I'm having a lot of trouble
| understanding the value proposition to vendors here.
| BadInformatics wrote:
| It's better to think of this as a more friendly (i.e. open
| source development model) first party compute stack than some
| kind of pan-vendor standard. For example, anyone using MKL is
| now nominally using oneAPI libraries. They also went to the
| trouble of implementing/pushing existing standards instead of
| baking their own thing: SPIR-V for the IR format, SYCL for
| the high level programming interface, an OpenCL 3
| implementation (AIUI they have the most complete
| implementation of 2.x and 3), etc.
| dyingkneepad wrote:
| So "Open Source development model" (developing in the open)
| is not the same thing as "Open Source Code" (developing in
| closed doors, then throwing code to the other side of the
| wall once in a while). Intel has a history of doing both
| depending on the project and internal groups involved,
| where only really the Open Development projects are
| actually successful and long-term. OneAPI uses a lot of
| components and I am not entirely sure they all follow the
| Open Development model. There's a lot of Open Source stuff
| out there that you basically can't contribute to: your
| contributions are ignored because source code is open but
| development isn't. Does anybody here know about how this is
| done for the OneAPI-related projects?
| someguydave wrote:
| The value is that Intel wants to compete with nvidia
| rizzir wrote:
| By reading the name alone I can't help but think of the xkcd
| related to standards.
| [deleted]
| taylorlapeyre wrote:
| Situation: there are now n+1 competing standards.
|
| https://xkcd.com/927/
| mhd wrote:
| My Rational Unified Process PTSD is triggered.
| touisteur wrote:
| I'm having a hard time trusting Intel with anything AI mid to
| long term. OpenVINO was great until they removed support for
| Intel FPGAs, and they've been closing and deprecating low-level
| APIs that were necessary for low-latency, low-watt work on ex-
| Movidius hardware (and KeemBay always seems to be 'next
| semester'...). We already have huge lock-in with Nvidia, but at
| least we get wide expertise and huge perf boost, what does oneAPI
| bringing expect 'portability' and 'runs as fast as
| tf/torch/TVM'... OpenCL? Rocm? Vulkan targets? Who's going to
| debug and support that?
|
| I'm struggling to see what's enticing in this for python people
| that already have largely optimised torch and tf cpu and gpu
| backends, especially for batch work. And for latency-sensitive
| inference, I thought the 'industry' was going for TVM or other
| 'target all the things at code level'.
|
| I'm thinking: gimme your network in onnx format, everyone gives a
| C inference/compilation API, and let everyone optimize /behind/
| that... Xilinx, AMD, whoever-Google-bought-last-week...
| dljsjr wrote:
| Something similar that has been around for ages and isn't
| controlled by Intel is MAGMA:
| https://bitbucket.org/icl/magma/src/master/
| oivey wrote:
| Isn't MAGMA more like a canned set of linear algebra routines
| rather than a general interface for executing arbitrary code on
| GPUs? That's pretty different.
| jdc wrote:
| _MAGMA Users ' Guide
|
| Univ. of Tennessee, Knoxville
|
| Univ. of California, Berkeley
|
| Univ. of Colorado, Denver
|
| Date October 2020
|
| The goal of the MAGMA project is to create a new generation of
| linear algebra libraries that achieves the fastest possible
| time to an accurate solution on heterogeneous architectures,
| starting with current multicore + multi-GPU systems. To address
| the complex challenges stemming from these systems'
| heterogeneity, massive parallelism, and the gap between compute
| speed and CPU-GPU communication speed, MAGMA's research is
| based on the idea that optimal software solutions will
| themselves have to hybridize, combining the strengths of
| different algorithms within a single framework. Building on
| this idea, the goal is to design linear algebra algorithms and
| frameworks for hybrid multicore and multi-GPU systems that can
| enable applications to fully exploit the power that each of the
| hybrid components offers.
|
| Designed to be similar to LAPACK in functionality, data
| storage, and interface, the MAGMA library allows scientists to
| easily port their existing software components from LAPACK to
| MAGMA, to take advantage of the new hybrid architectures. MAGMA
| users do not have to know CUDA in order to use the library.
|
| There are two types of LAPACK-style interfaces. The first one,
| referred to as the CPU interface, takes the input and produces
| the result in the CPU's memory. The second, referred to as the
| GPU interface, takes the input and produces the result in the
| GPU's memory. In both cases, a hybrid CPU/GPU algorithm is
| used. Also included is MAGMA BLAS, a complementary to CUBLAS
| routines._
|
| http://icl.utk.edu/projectsfiles/magma/doxygen/index.html
| RcouF1uZ4gsC wrote:
| Seems like yet another CUDA alternative. Thing is, even if you
| were wanting to move away from CUDA, the churn of the alternative
| models would be scary. Do you want to develop for CUDA which has
| been stable and developed for quite some time, or do you want to
| try a new standard that you don't known if it will be obsolete in
| a couple of years?
| gbrown_ wrote:
| > Seems like yet another CUDA alternative.
|
| How many real alternatives are there though aside from OpenCL?
| Not to mention if you're interested in targeting something that
| isn't an Nvidia GPU.
| BadInformatics wrote:
| > do you want to try a new standard that you don't known if it
| will be obsolete in a couple of years
|
| Some incarnation of oneAPI is bound to exist as long as Intel
| has a foothold in the HPC market. For example, MKL and MKL-DNN
| have been rebranded with a oneAPI prefix. So no, the long term
| stability argument doesn't hold water.
|
| I'd also note that they appear to have added support for cuBLAS
| and cuDNN as backends in their respective oneAPI libraries. It
| would be hilarious if that lead to more people running oneAPI
| on non-Intel hardware than first party stuff.
| jcelerier wrote:
| otoh Intel has over the years developed a few parallel
| libraries / extensions that have then been deprecated, for
| instance Cilk Plus...
| volta83 wrote:
| and ISPC, and... and...
| celrod wrote:
| Has ispc been deprecated? Last GitHub commit was three
| days ago.
| moonbug wrote:
| it's a pet project.
| hctaw wrote:
| The problem of a heterogeneity in GPGPUs is that AMD and Intel
| can't make products that are competitive with NVidia. The
| "programming model" for "cross industries" (whatever those mean,
| the whole naming of this project is weird) aren't particularly
| deep moats if there were competitive solutions at lower prices.
| shmerl wrote:
| AMD GPUs are better than Nvidia for that - they had async
| compute way longer. I thought the problem was CUDA lock-in and
| lack of a nice programming model using something modern like
| Rust may be.
| volta83 wrote:
| > AMD GPUs are better than Nvidia for that
|
| Better in what sense?
|
| > they had async compute way longer
|
| What is that?
| shmerl wrote:
| Something like this: http://developer.amd.com/wordpress/med
| ia/2012/10/Asynchronou...
| Jhsto wrote:
| Then again CUDA is pretty accessible language to program SIMD
| machines. I am bit sceptic if most people would recognise
| subjectively well-done SIMD language (lets say from
| correctness and performance standpoint). The programming
| model for SIMD is inherently parallel with exceptions for
| sequential code, which is pretty much the exact opposite of
| the programming you do on the CPU side. I'd somehow imagine
| most people would not find that modern at all, or
| alternatively, it would not be the language with "product
| market fit" since it's less likely to catch on. AMD or Intel
| may have a "better" language in some sense, but it seems most
| people prefer familiarity with what they already know.
| marmaduke wrote:
| > I am bit sceptic if most people would recognise
| subjectively well-done SIMD language (lets say from
| correctness and performance standpoint)
|
| I found ISPC (https://ispc.github.io) to be a little easier
| to work with than CUDA, for SIMD on CPUs, as it sits a bit
| in the middle of the CUDA and regular C models. Interop can
| work via C ABI i.e. trivial, it generates regular object
| files, and so on.
| shmerl wrote:
| CUDA is still stuck with Nvidia only. So improving on that
| and on the language itself is clearly an open item.
| Jhsto wrote:
| Well, oneAPI can be used to do that. This submission is
| most likely in reference of a post a few days ago [1].
| Here, PTX which CUDA targets is retargeted into SPIR-V
| which is ran atop oneAPI [2].
|
| [1]: https://news.ycombinator.com/item?id=26262038
|
| [2]: https://spec.oneapi.com/level-zero/latest/index.html
| moonbug wrote:
| but almost no one codes to the CUDA driver api.
| nynx wrote:
| rust-gpu (https://shader.rs) is a spirv backend for Rust.
| It's early days, but already really promising. In my opinion,
| applying a good package manager (cargo) and a good language
| (rust) to gpgpu and rendering could result in great things.
| oivey wrote:
| By async compute, do you mean something different than what
| CUDA streams expose?
|
| AMD's compute cards have generally been worse than Nvidia's
| as far as I know.
| shmerl wrote:
| I mean actual hardware capabilities. AMD GPUs were ahead in
| that.
|
| See: http://developer.amd.com/wordpress/media/2012/10/Async
| hronou...
| oivey wrote:
| This only seems to applicable to shaders? GPGPU in most
| contexts is referring to code divorced from any graphics
| pipeline, like CUDA or ROCm. CUDA has had asynchronous
| and concurrent kernel calls for a long time. How are
| asynchronous compute shaders relevant in that context?
| monocasa wrote:
| Compute shaders are divorced from the graphics pipeline
| as well. As far as the hardware is concerned, CUDA/ROCm
| and compute shaders are the same thing.
| Jasper_ wrote:
| Compute shaders and CUDA have very different execution
| and driver models. Just take a look at the way memory is
| managed in each.
| my123 wrote:
| Were. AMD totally squandered their advantage that they
| had with launching GCN early through an absolutely awful
| software stack.
| shmerl wrote:
| RDNA 2 only improved on the above. Nvidia were and remain
| a dead end with CUDA lock-in, but I agree that there
| should be more combined efforts to break it.
| my123 wrote:
| Lol? ROCm is a mess.
|
| It's awful. And it is _not_ even supported on RDNA and
| RDNA2. They even dropped Polaris support last month so
| that it 's only supported on Vega.
|
| And that's very much not a high quality GPU compute
| stack, sorry...
|
| CUDA isn't a dead end and isn't going away any time soon.
| shmerl wrote:
| I was talking about hardware, not about ROCm. I said
| above, GPU compute could benefit from a nice programming
| model with a modern language. That will dislodge CUDA
| lock-in for good.
|
| CUDA isn't dead, but there should be a bigger effort to
| get rid of it because it's not a proper GPU programming
| tool but rather a market manipulation tool by Nvidia.
| my123 wrote:
| Nowadays, modern Nvidia hardware (Volta onwards) handles
| compute workloads better, notably through independent
| thread scheduling.
|
| RDNA2 with its cache and low main memory bandwidth didn't
| help either, because while the cache is there, you're
| going to exceed what it can absorb by a lot in compute
| workloads...
| RocketSyntax wrote:
| Is "oneAPI Deep Neural Network Library (oneDNN)" an alternative
| to ONNX?
| DethNinja wrote:
| Why would someone use this instead of plain old OpenCL(or CUDA)
| with C++?
|
| What is the value proposal here?
| jpsamaroo wrote:
| OpenCL and various other solutions basically require that one
| writes kernels in C/C++. This is an unfortunate limitation, and
| can make it hard for less experienced users (researchers
| especially) to write correct and performant GPU code, since
| neither language lends itself to writing many mathematical and
| scientific models in a clean, maintainable manner (in my
| opinion).
|
| What oneAPI (the runtime), and also AMD's ROCm (specifically
| the ROCR runtime), do that is new is that they enable packages
| like oneAPI.jl [1] and AMDGPU.jl [2] to exist (both Julia
| packages), without having to go through OpenCL or C++
| transpilation (which we've tried out before, and it's quite
| painful). This is a great thing, because now users of an
| entirely different language can still utilize their GPUs
| effectively and with near-optimal performance (optimal w.r.t
| what the device can reasonably attain).
|
| [1] https://github.com/JuliaGPU/oneAPI.jl [2]
| https://github.com/JuliaGPU/AMDGPU.jl
| my123 wrote:
| Which is something that CUDA provided since the very
| beginning. (with PTX)
| pjmlp wrote:
| No one will take over CUDA's dominance until they realize
| that one reason why most researchers flocked into it were
| its polyglot capabilities, and graphical debuggers.
| moonbug wrote:
| in other words, Nvidia's product execution was spot-on.
| jpsamaroo wrote:
| Yes, and that's why Julia gained CUDA support first. My
| point was to respond to "Why would someone use this instead
| of plain old OpenCL(or CUDA) with C++?", and my answer was,
| "you can use something other than OpenCL C or C++". I'm not
| trying to say that CUDA is any lesser of a platform because
| of this; instead, other vendor's GPUs are now becoming
| easier to use and program.
| gbrown_ wrote:
| Both Intel's oneAPI and AMDs ROCm/HIP stacks success are
| obviously tied to (and part of) the success of each companies
| future products. And today I'd bet AMD will come off better. They
| have multiple contracts for HPC systems with their future GPU
| products as well as an existing presence in consumer gaming
| products.
|
| Intel's effort seem to be banking on the Aroura system at ANL.
| Beyond that I don't know who's lining up to by Xe GPUs? Though I
| guess oneAPI will fit into whatever they do further down the
| line.
___________________________________________________________________
(page generated 2021-03-01 23:01 UTC)